These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Task arithmetic in large-scale pre-trained models enables flexible adaptation
to diverse downstream tasks without extensive re-training. By leveraging task
vectors (TVs), users can perform modular updates to pre-trained models through
simple arithmetic operations like addition and subtraction. However, this
flexibility introduces new security vulnerabilities. In this paper, we identify
and evaluate the susceptibility of TVs to backdoor attacks, demonstrating how
malicious actors can exploit TVs to compromise model integrity. By developing
composite backdoors and eliminating redudant clean tasks, we introduce BadTV, a
novel backdoor attack specifically designed to remain effective under task
learning, forgetting, and analogies operations. Our extensive experiments
reveal that BadTV achieves near-perfect attack success rates across various
scenarios, significantly impacting the security of models using task
arithmetic. We also explore existing defenses, showing that current methods
fail to detect or mitigate BadTV. Our findings highlight the need for robust
defense mechanisms to secure TVs in real-world applications, especially as TV
services become more popular in machine-learning ecosystems.