解釈可能性

CAFE-GB: Scalable and Stable Feature Selection for Malware Detection via Chunk-wise Aggregated Gradient Boosting

Authors: Ajvad Haneef K, Karan Kuwar Singh, Madhu Kumar S D | Published: 2026-01-22
機械学習アルゴリズム
特徴選択手法
解釈可能性

Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models

Authors: Manish Bhatt | Published: 2026-01-22
ハルシネーションの検知
フレームワーク
解釈可能性

Adversarially Robust and Interpretable Magecart Malware Detection

Authors: Pedro Pereira, José Gouveia, João Vitorino, Eva Maia, Isabel Praça | Published: 2025-11-06
動的分析
敵対的学習
解釈可能性

SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

Authors: João Vitorino, Eva Maia, Isabel Praça, Carlos Soares | Published: 2025-09-30
プライバシー保護機械学習
敵対的学習
解釈可能性

Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models

Authors: Miao Yu, Zhenhong Zhou, Moayad Aloqaily, Kun Wang, Biwei Huang, Stephen Wang, Yueming Jin, Qingsong Wen | Published: 2025-09-26 | Updated: 2025-09-30
LLMの安全機構の解除
自己注意メカニズム
解釈可能性

CyberRAG: An agentic RAG cyber attack classification and reporting tool

Authors: Francesco Blefari, Cristian Cosentino, Francesco Aurelio Pironti, Angelo Furfaro, Fabrizio Marozzo | Published: 2025-07-03
RAGへのポイズニング攻撃
脆弱性分析
解釈可能性

ExpProof : Operationalizing Explanations for Confidential Models with ZKPs

Authors: Chhavi Yadav, Evan Monroe Laufer, Dan Boneh, Kamalika Chaudhuri | Published: 2025-02-06 | Updated: 2025-05-27
XAI(説明可能なAI)
モデル評価手法
解釈可能性

The Price of Interpretability

Authors: Dimitris Bertsimas, Arthur Delarue, Patrick Jaillet, Sebastien Martin | Published: 2019-07-08
モデル選択
最適化戦略
解釈可能性

Bridging Adversarial Robustness and Gradient Interpretability

Authors: Beomsu Kim, Junghoon Seo, Taegyun Jeon | Published: 2019-03-27 | Updated: 2019-04-19
モデルの頑健性保証
敵対的学習
解釈可能性