These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
We investigate the theoretical foundations of data poisoning attacks in
machine learning models. Our analysis reveals that the Hessian with respect to
the input serves as a diagnostic tool for detecting poisoning, exhibiting
spectral signatures that characterize compromised datasets. We use random
matrix theory (RMT) to develop a theory for the impact of poisoning proportion
and regularisation on attack efficacy in linear regression. Through QR stepwise
regression, we study the spectral signatures of the Hessian in multi-output
regression. We perform experiments on deep networks to show experimentally that
this theory extends to modern convolutional and transformer networks under the
cross-entropy loss. Based on these insights we develop preliminary algorithms
to determine if a network has been poisoned and remedies which do not require
further training.