Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples

TOP 文献データベース Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2110.12357

PDF

https://arxiv.org/pdf/2110.12357

文献情報

作者: Yi Xiang Marcus Tan;Penny Chong;Jiamei Sun;Ngai-man Cheung;Yuval Elovici;Alexander Binder
公開日: 2021-10-24
所属機関: Information Systems Technology and Design Pillar, Singapore University of Technology and Design
所属の国: Singapore
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

ポイズニング評価指標敵対的サンプルの検知

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Few-shot classifiers have been shown to exhibit promising results in use cases where user-provided labels are scarce. These models are able to learn to predict novel classes simply by training on a non-overlapping set of classes. This can be largely attributed to the differences in their mechanisms as compared to conventional deep networks. However, this also offers new opportunities for novel attackers to induce integrity attacks against such models, which are not present in other machine learning setups. In this work, we aim to close this gap by studying a conceptually simple approach to defend few-shot classifiers against adversarial attacks. More specifically, we propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering, to flag out adversarial support sets which destroy the understanding of a victim classifier for a certain class. Our extended evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance, across three different few-shot classifiers and across different attack strengths, beating baselines. Our observed results allow our approach to establishing itself as a strong detection method for support set poisoning attacks. We also show that our approach constitutes a generalizable concept, as it can be paired with other filtering functions. Finally, we provide an analysis of our results when we vary two components found in our detection approach.