These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
We present Adversarial Object Fusion (AdvOF), a novel attack framework
targeting vision-and-language navigation (VLN) agents in service-oriented
environments by generating adversarial 3D objects. While foundational models
like Large Language Models (LLMs) and Vision Language Models (VLMs) have
enhanced service-oriented navigation systems through improved perception and
decision-making, their integration introduces vulnerabilities in
mission-critical service workflows. Existing adversarial attacks fail to
address service computing contexts, where reliability and quality-of-service
(QoS) are paramount. We utilize AdvOF to investigate and explore the impact of
adversarial environments on the VLM-based perception module of VLN agents. In
particular, AdvOF first precisely aggregates and aligns the victim object
positions in both 2D and 3D space, defining and rendering adversarial objects.
Then, we collaboratively optimize the adversarial object with regularization
between the adversarial and victim object across physical properties and VLM
perceptions. Through assigning importance weights to varying views, the
optimization is processed stably and multi-viewedly by iterative fusions from
local updates and justifications. Our extensive evaluations demonstrate AdvOF
can effectively degrade agent performance under adversarial conditions while
maintaining minimal interference with normal navigation tasks. This work
advances the understanding of service security in VLM-powered navigation
systems, providing computational foundations for robust service composition in
physical-world deployments.