Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion

TOP Literature Database Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2505.23266

PDF

https://arxiv.org/pdf/2505.23266

Paper Information

Author: Chunlong Xie,Jialing He,Shangwei Guo,Jiacheng Wang,Shudong Zhang,Tianwei Zhang,Tao Xiang
Published: 5-29-2025
Affiliation: College of Computer Science, Chongqing University
Country: China
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Adversarial Object Generation Optimization Methods Alignment

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

We present Adversarial Object Fusion (AdvOF), a novel attack framework targeting vision-and-language navigation (VLN) agents in service-oriented environments by generating adversarial 3D objects. While foundational models like Large Language Models (LLMs) and Vision Language Models (VLMs) have enhanced service-oriented navigation systems through improved perception and decision-making, their integration introduces vulnerabilities in mission-critical service workflows. Existing adversarial attacks fail to address service computing contexts, where reliability and quality-of-service (QoS) are paramount. We utilize AdvOF to investigate and explore the impact of adversarial environments on the VLM-based perception module of VLN agents. In particular, AdvOF first precisely aggregates and aligns the victim object positions in both 2D and 3D space, defining and rendering adversarial objects. Then, we collaboratively optimize the adversarial object with regularization between the adversarial and victim object across physical properties and VLM perceptions. Through assigning importance weights to varying views, the optimization is processed stably and multi-viewedly by iterative fusions from local updates and justifications. Our extensive evaluations demonstrate AdvOF can effectively degrade agent performance under adversarial conditions while maintaining minimal interference with normal navigation tasks. This work advances the understanding of service security in VLM-powered navigation systems, providing computational foundations for robust service composition in physical-world deployments.

External Datasets

Matterport3D (MP3D)

Habitat-Matterport 3D (HM3D)