Poisoning Attacks on Algorithmic Fairness

TOP 文献データベース Poisoning Attacks on Algorithmic Fairness

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2004.07401

PDF

https://arxiv.org/pdf/2004.07401

文献情報

作者: David Solans,Battista Biggio,Carlos Castillo
公開日: 2020-4-15
更新日: 2020-6-26
所属機関: Universitat Pomepu Fabra
所属の国: Spain
会議名: ECML/PKDD

AIにより推定されたラベル

アルゴリズムの公平性ポイズニング最適化手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Research in adversarial machine learning has shown how the performance of machine learning models can be seriously compromised by injecting even a small fraction of poisoning points into the training data. While the effects on model accuracy of such poisoning attacks have been widely studied, their potential effects on other model performance metrics remain to be evaluated. In this work, we introduce an optimization framework for poisoning attacks against algorithmic fairness, and develop a gradient-based poisoning attack aimed at introducing classification disparities among different groups in the data. We empirically show that our attack is effective not only in the white-box setting, in which the attacker has full access to the target model, but also in a more challenging black-box scenario in which the attacks are optimized against a substitute model and then transferred to the target model. We believe that our findings pave the way towards the definition of an entirely novel set of adversarial attacks targeting algorithmic fairness in different scenarios, and that investigating such vulnerabilities will help design more robust algorithms and countermeasures in the future.