モデルの解釈性

Towards LLM Guardrails via Sparse Representation Steering

Authors: Zeqing He, Zhibo Wang, Huiyu Xu, Kui Ren | Published: 2025-03-21
スパース表現手法
モデルの解釈性
機械学習の役割

A Synergistic Approach In Network Intrusion Detection By Neurosymbolic AI

Authors: Alice Bizzarri, Chung-En Yu, Brian Jalaian, Fabrizio Riguzzi, Nathaniel D. Bastian | Published: 2024-06-03
NSAI統合
モデルの解釈性
未知の攻撃検出

Explainable Malware Detection with Tailored Logic Explained Networks

Authors: Peter Anthony, Francesco Giannini, Michelangelo Diligenti, Martin Homola, Marco Gori, Stefan Balogh, Jan Mojzis | Published: 2024-05-05
マルウェア分類
モデルの解釈性
評価手法

Explainability Guided Adversarial Evasion Attacks on Malware Detectors

Authors: Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam, Moustafa Saleh | Published: 2024-05-02
ウォーターマーキング
マルウェア分類
モデルの解釈性

Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots

Authors: Xi Xin, Giles Hooker, Fei Huang | Published: 2024-04-29 | Updated: 2024-05-01
モデルの解釈性
敵対的訓練
透かし評価

MISLEAD: Manipulating Importance of Selected features for Learning Epsilon in Evasion Attack Deception

Authors: Vidit Khazanchi, Pavan Kulkarni, Yuvaraj Govindarajulu, Manojkumar Parmar | Published: 2024-04-24 | Updated: 2024-05-02
モデルの解釈性
攻撃手法
敵対的訓練

Decomposing and Editing Predictions by Modeling Model Computation

Authors: Harshay Shah, Andrew Ilyas, Aleksander Madry | Published: 2024-04-17
ウォーターマーキング
モデルの解釈性
モデル編集手法

Conformal Predictions for Probabilistically Robust Scalable Machine Learning Classification

Authors: Alberto Carlevaro, Teodoro Alamo Cantarero, Fabrizio Dabbene, Maurizio Mongelli | Published: 2024-03-15
ウォーターマーキング
モデルの解釈性
不確実性の定量化

An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach

Authors: Mohammad Amaz Uddin, Iqbal H. Sarker | Published: 2024-02-21
フィッシング検出
モデルの解釈性
モデル性能評価

LIPSTICK: Corruptibility-Aware and Explainable Graph Neural Network-based Oracle-Less Attack on Logic Locking

Authors: Yeganeh Aghamohammadi, Amin Rezaei | Published: 2024-02-06
GNN
モデルの解釈性
透かし評価