These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
As ML models become increasingly complex and integral to high-stakes domains
such as finance and healthcare, they also become more susceptible to
sophisticated adversarial attacks. We investigate the threat posed by
undetectable backdoors, as defined in Goldwasser et al. (FOCS '22), in models
developed by insidious external expert firms. When such backdoors exist, they
allow the designer of the model to sell information on how to slightly perturb
their input to change the outcome of the model.
We develop a general strategy to plant backdoors to obfuscated neural
networks, that satisfy the security properties of the celebrated notion of
indistinguishability obfuscation. Applying obfuscation before releasing neural
networks is a strategy that is well motivated to protect sensitive information
of the external expert firm. Our method to plant backdoors ensures that even if
the weights and architecture of the obfuscated model are accessible, the
existence of the backdoor is still undetectable.
Finally, we introduce the notion of undetectable backdoors to language models
and extend our neural network backdoor attacks to such models based on the
existence of steganographic functions.