The Effect of Similarity Measures on Accurate Stability Estimates for Local Surrogate Models in Text-based Explainable AI

TOP Literature Database The Effect of Similarity Measures on Accurate Stability Estimates for Local Surrogate Models in Text-based Explainable AI

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2406.15839

PDF

https://arxiv.org/pdf/2406.15839

Paper Information

Author: Christopher Burger;Charles Walter;Thai Le
Published: 6-22-2024
Updated: 1-18-2025
Affiliation: Dept of Computer Science, University of Mississippi
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Adversarial Example Similarity Measurement Evaluation Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Recent work has investigated the vulnerability of local surrogate methods to adversarial perturbations on a machine learning (ML) model's inputs, where the explanation is manipulated while the meaning and structure of the original input remains similar under the complex model. Although weaknesses across many methods have been shown to exist, the reasons behind why remain little explored. Central to the concept of adversarial attacks on explainable AI (XAI) is the similarity measure used to calculate how one explanation differs from another. A poor choice of similarity measure can lead to erroneous conclusions on the efficacy of an XAI method. Too sensitive a measure results in exaggerated vulnerability, while too coarse understates its weakness. We investigate a variety of similarity measures designed for text-based ranked lists, including Kendall's Tau, Spearman's Footrule, and Rank-biased Overlap to determine how substantial changes in the type of measure or threshold of success affect the conclusions generated from common adversarial attack processes. Certain measures are found to be overly sensitive, resulting in erroneous estimates of stability.

External Datasets

gender bias Twitter dataset (GB)

symptoms to diagnosis dataset (S2D)