Few-shot classifiers have been shown to exhibit promising results in use
cases where user-provided labels are scarce. These models are able to learn to
predict novel classes simply by training on a non-overlapping set of classes.
This can be largely attributed to the differences in their mechanisms as
compared to conventional deep networks. However, this also offers new
opportunities for novel attackers to induce integrity attacks against such
models, which are not present in other machine learning setups. In this work,
we aim to close this gap by studying a conceptually simple approach to defend
few-shot classifiers against adversarial attacks. More specifically, we propose
a simple attack-agnostic detection method, using the concept of self-similarity
and filtering, to flag out adversarial support sets which destroy the
understanding of a victim classifier for a certain class. Our extended
evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack
detection performance, across three different few-shot classifiers and across
different attack strengths, beating baselines. Our observed results allow our
approach to establishing itself as a strong detection method for support set
poisoning attacks. We also show that our approach constitutes a generalizable
concept, as it can be paired with other filtering functions. Finally, we
provide an analysis of our results when we vary two components found in our
detection approach.