These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
In this work, we study the computational complexity of determining whether a
machine learning model that perfectly fits the training data will generalizes
to unseen data. In particular, we study the power of a malicious agent whose
goal is to construct a model g that fits its training data and nothing else,
but is indistinguishable from an accurate model f. We say that g strongly
spoofs f if no polynomial-time algorithm can tell them apart. If instead we
restrict to algorithms that run in $n^c$ time for some fixed $c$, we say that g
c-weakly spoofs f. Our main results are
1. Under cryptographic assumptions, strong spoofing is possible and
2. For any c> 0, c-weak spoofing is possible unconditionally
While the assumption of a malicious agent is an extreme scenario (hopefully
companies training large models are not malicious), we believe that it sheds
light on the inherent difficulties of blindly trusting large proprietary models
or data.