Recently, prefix-tuning has gained increasing attention as a
parameter-efficient finetuning method for large-scale pretrained language
models. The method keeps the pretrained models fixed and only updates the
prefix token parameters for each downstream task. Despite being lightweight and
modular, prefix-tuning still lacks robustness to textual adversarial attacks.
However, most currently developed defense techniques necessitate auxiliary
model update and storage, which inevitably hamper the modularity and low
storage of prefix-tuning. In this work, we propose a robust prefix-tuning
framework that preserves the efficiency and modularity of prefix-tuning. The
core idea of our framework is leveraging the layerwise activations of the
language model by correctly-classified training data as the standard for
additional prefix finetuning. During the test phase, an extra batch-level
prefix is tuned for each batch and added to the original prefix for robustness
enhancement. Extensive experiments on three text classification benchmarks show
that our framework substantially improves robustness over several strong
baselines against five textual attacks of different types while maintaining
comparable accuracy on clean texts. We also interpret our robust prefix-tuning
framework from the optimal control perspective and pose several directions for
future research.