These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
A Large Language Model (LLM) powered GUI agent is a specialized autonomous
system that performs tasks on the user's behalf according to high-level
instructions. It does so by perceiving and interpreting the graphical user
interfaces (GUIs) of relevant apps, often visually, inferring necessary
sequences of actions, and then interacting with GUIs by executing the actions
such as clicking, typing, and tapping. To complete real-world tasks, such as
filling forms or booking services, GUI agents often need to process and act on
sensitive user data. However, this autonomy introduces new privacy and security
risks. Adversaries can inject malicious content into the GUIs that alters agent
behaviors or induces unintended disclosures of private information. These
attacks often exploit the discrepancy between visual saliency for agents and
human users, or the agent's limited ability to detect violations of contextual
integrity in task automation. In this paper, we characterized six types of such
attacks, and conducted an experimental study to test these attacks with six
state-of-the-art GUI agents, 234 adversarial webpages, and 39 human
participants. Our findings suggest that GUI agents are highly vulnerable,
particularly to contextually embedded threats. Moreover, human users are also
susceptible to many of these attacks, indicating that simple human oversight
may not reliably prevent failures. This misalignment highlights the need for
privacy-aware agent design. We propose practical defense strategies to inform
the development of safer and more reliable GUI agents.