モデルの解釈性

Towards LLM Guardrails via Sparse Representation Steering

Authors: Zeqing He, Zhibo Wang, Huiyu Xu, Kui Ren | Published: 2025-03-21

スパース表現手法

モデルの解釈性

機械学習の役割

2025.03.21 2025.04.03

文献データベース

A Synergistic Approach In Network Intrusion Detection By Neurosymbolic AI

Authors: Alice Bizzarri, Chung-En Yu, Brian Jalaian, Fabrizio Riguzzi, Nathaniel D. Bastian | Published: 2024-06-03

NSAI統合

モデルの解釈性

未知の攻撃検出

2024.06.03 2025.04.03

文献データベース

Explainable Malware Detection with Tailored Logic Explained Networks

Authors: Peter Anthony, Francesco Giannini, Michelangelo Diligenti, Martin Homola, Marco Gori, Stefan Balogh, Jan Mojzis | Published: 2024-05-05

マルウェア分類

モデルの解釈性

評価手法

2024.05.05 2025.04.03

文献データベース

Explainability Guided Adversarial Evasion Attacks on Malware Detectors

Authors: Kshitiz Aryal, Maanak Gupta, Mahmoud Abdelsalam, Moustafa Saleh | Published: 2024-05-02

ウォーターマーキング

マルウェア分類

モデルの解釈性

2024.05.02 2025.04.03

文献データベース

Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots

Authors: Xi Xin, Giles Hooker, Fei Huang | Published: 2024-04-29 | Updated: 2024-05-01

モデルの解釈性

敵対的訓練

透かし評価

2024.04.29 2025.04.03

文献データベース

MISLEAD: Manipulating Importance of Selected features for Learning Epsilon in Evasion Attack Deception

Authors: Vidit Khazanchi, Pavan Kulkarni, Yuvaraj Govindarajulu, Manojkumar Parmar | Published: 2024-04-24 | Updated: 2024-05-02

モデルの解釈性

攻撃手法

敵対的訓練

2024.04.24 2025.04.03

文献データベース

Decomposing and Editing Predictions by Modeling Model Computation

Authors: Harshay Shah, Andrew Ilyas, Aleksander Madry | Published: 2024-04-17

ウォーターマーキング

モデルの解釈性

モデル編集手法

2024.04.17 2025.04.03

文献データベース

Conformal Predictions for Probabilistically Robust Scalable Machine Learning Classification

Authors: Alberto Carlevaro, Teodoro Alamo Cantarero, Fabrizio Dabbene, Maurizio Mongelli | Published: 2024-03-15

ウォーターマーキング

モデルの解釈性

不確実性の定量化

2024.03.15 2025.04.03

文献データベース

An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach

Authors: Mohammad Amaz Uddin, Iqbal H. Sarker | Published: 2024-02-21

フィッシング検出

モデルの解釈性

モデル性能評価

2024.02.21 2025.04.03

文献データベース

LIPSTICK: Corruptibility-Aware and Explainable Graph Neural Network-based Oracle-Less Attack on Logic Locking

Authors: Yeganeh Aghamohammadi, Amin Rezaei | Published: 2024-02-06

GNN

モデルの解釈性

透かし評価

2024.02.06 2025.04.03

文献データベース