Gradient-based adversarial attacks on categorical sequence models via traversing an embedded world

TOP 文献データベース Gradient-based adversarial attacks on categorical sequence models via traversing an embedded world

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2003.04173

PDF

https://arxiv.org/pdf/2003.04173

文献情報

作者: Ivan Fursov,Alexey Zaytsev,Nikita Kluchnikov,Andrey Kravchenko,Evgeny Burnaev
公開日: 2020-3-9
更新日: 2020-10-13
所属機関: Skolkovo Institute of Science and Technology
所属の国: Russia
会議名: International Joint Conference on the Analysis of Images, Social Networks and Texts (AIST)

AIにより推定されたラベル

敵対的攻撃敵対的サンプル生成モデル

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Deep learning models suffer from a phenomenon called adversarial attacks: we can apply minor changes to the model input to fool a classifier for a particular example. The literature mostly considers adversarial attacks on models with images and other structured inputs. However, the adversarial attacks for categorical sequences can also be harmful. Successful attacks for inputs in the form of categorical sequences should address the following challenges: (1) non-differentiability of the target function, (2) constraints on transformations of initial sequences, and (3) diversity of possible problems. We handle these challenges using two black-box adversarial attacks. The first approach adopts a Monte-Carlo method and allows usage in any scenario, the second approach uses a continuous relaxation of models and target metrics, and thus allows usage of state-of-the-art methods for adversarial attacks with little additional effort. Results for money transactions, medical fraud, and NLP datasets suggest that proposed methods generate reasonable adversarial sequences that are close to original ones but fool machine learning models.