Interpretability

SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

Authors: João Vitorino, Eva Maia, Isabel Praça, Carlos Soares | Published: 2025-09-30
Privacy-Preserving Machine Learning
Adversarial Learning
Interpretability

Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models

Authors: Miao Yu, Zhenhong Zhou, Moayad Aloqaily, Kun Wang, Biwei Huang, Stephen Wang, Yueming Jin, Qingsong Wen | Published: 2025-09-26 | Updated: 2025-09-30
Disabling Safety Mechanisms of LLM
Self-Attention Mechanism
Interpretability

CyberRAG: An agentic RAG cyber attack classification and reporting tool

Authors: Francesco Blefari, Cristian Cosentino, Francesco Aurelio Pironti, Angelo Furfaro, Fabrizio Marozzo | Published: 2025-07-03
Poisoning attack on RAG
Vulnerability Analysis
Interpretability

ExpProof : Operationalizing Explanations for Confidential Models with ZKPs

Authors: Chhavi Yadav, Evan Monroe Laufer, Dan Boneh, Kamalika Chaudhuri | Published: 2025-02-06 | Updated: 2025-05-27
XAI (Explainable AI)
Model evaluation methods
Interpretability

The Price of Interpretability

Authors: Dimitris Bertsimas, Arthur Delarue, Patrick Jaillet, Sebastien Martin | Published: 2019-07-08
Model Selection
Optimization Strategy
Interpretability

Bridging Adversarial Robustness and Gradient Interpretability

Authors: Beomsu Kim, Junghoon Seo, Taegyun Jeon | Published: 2019-03-27 | Updated: 2019-04-19
Certified Robustness
Adversarial Learning
Interpretability