評価手法

Rotated Robustness: A Training-Free Defense against Bit-Flip Attacks on Large Language Models

Authors: Deng Liu, Song Chen | Published: 2026-03-17
敵対的学習
脆弱性管理
評価手法

Exponential-Family Membership Inference: From LiRA and RMIA to BaVarIA

Authors: Rickard Brännvall | Published: 2026-03-12
攻撃計画手法
機械学習アルゴリズム
評価手法

TOSSS: a CVE-based Software Security Benchmark for Large Language Models

Authors: Marc Damie, Murat Bilgehan Ertan, Domenico Essoussi, Angela Makhanu, Gaëtan Peter, Roos Wensveen | Published: 2026-03-11
LLM性能評価
プロンプトインジェクション
評価手法

Detecting and Eliminating Neural Network Backdoors Through Active Paths with Application to Intrusion Detection

Authors: Eirik Høyheim, Magnus Wiik Eckhoff, Gudmund Grov, Robert Flood, David Aspinall | Published: 2026-03-11
データ毒性
バックドア攻撃
評価手法

Enhancing Network Intrusion Detection Systems: A Multi-Layer Ensemble Approach to Mitigate Adversarial Attacks

Authors: Nasim Soltani, Shayan Nejadshamsi, Zakaria Abou El Houda, Raphael Khoury, Kelton A. P. Costa, Tiago H. Falk, Anderson R. Avila | Published: 2026-03-11
モデルの頑健性保証
機械学習アルゴリズム
評価手法

DeepSight: An All-in-One LM Safety Toolkit

Authors: Bo Zhang, Jiaxuan Guo, Lijun Li, Dongrui Liu, Sujin Chen, Guanxu Chen, Zhijie Zheng, Qihao Lin, Lewen Yan, Chen Qian, Yijin Zhou, Yuyao Wu, Shaoxiong Guo, Tianyi Du, Jingyi Yang, Xuhao Hu, Ziqi Miao, Xiaoya Lu, Jing Shao, Xia Hu | Published: 2026-02-12
プロンプトインジェクション
大規模言語モデル
評価手法

Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models

Authors: Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis | Published: 2026-02-12
プロンプトインジェクション
実験的検証
評価手法

TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection

Authors: Jian-Yu Jiang-Lin, Kang-Yang Huang, Ling Zou, Ling Lo, Sheng-Ping Yang, Yu-Wen Tseng, Kun-Hsiang Lin, Chia-Ling Chen, Yu-Ting Ta, Yan-Tsung Wang, Po-Ching Chen, Hongxia Xie, Hong-Han Shuai, Wen-Huang Cheng | Published: 2025-12-11
ハルシネーションの検知
モデルDoS
評価手法

LLM-Assisted AHP for Explainable Cyber Range Evaluation

Authors: Vyron Kampourakis, Georgios Kavallieratos, Georgios Spathoulas, Vasileios Gkioulos, Sokratis Katsikas | Published: 2025-12-11
XAI(説明可能なAI)
信頼性評価
評価手法

From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection

Authors: Chaomeng Lu, Bert Lagaisse | Published: 2025-12-11
モデルの頑健性保証
出力の有害度の算出
評価手法