Adversarially Pretrained Transformers may be Universally Robust In-Context Learners

TOP Literature Database Adversarially Pretrained Transformers may be Universally Robust In-Context Learners

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2505.14042

PDF

https://arxiv.org/pdf/2505.14042

Paper Information

Author: Soichiro Kumano,Hiroshi Kera,Toshihiko Yamasaki
Published: 5-20-2025
Affiliation: The University of Tokyo
Country: Japan
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Adversarial Learning Relationship between Robustness and Privacy Certified Robustness

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Adversarial training is one of the most effective adversarial defenses, but it incurs a high computational cost. In this study, we show that transformers adversarially pretrained on diverse tasks can serve as robust foundation models and eliminate the need for adversarial training in downstream tasks. Specifically, we theoretically demonstrate that through in-context learning, a single adversarially pretrained transformer can robustly generalize to multiple unseen tasks without any additional training, i.e., without any parameter updates. This robustness stems from the model's focus on robust features and its resistance to attacks that exploit non-predictive features. Besides these positive findings, we also identify several limitations. Under certain conditions (though unrealistic), no universally robust single-layer transformers exist. Moreover, robust transformers exhibit an accuracy--robustness trade-off and require a large number of in-context demonstrations. The code is available at https://github.com/s-kumano/universally-robust-in-context-learner.

External Datasets

MNIST

Fashion-MNIST

CIFAR-10