文献データベース

Not All Tokens Are Created Equal: Query-Efficient Jailbreak Fuzzing for LLMs

Authors: Wenyu Chen, Xiangtao Meng, Chuanchao Zang, Li Wang, Xinyu Gao, Jianing Wang, Peng Zhan, Zheng Li, Shanqing Guo | Published: 2026-03-24
LLM性能評価
プロンプトインジェクション
評価手法

Robust Safety Monitoring of Language Models via Activation Watermarking

Authors: Toluwani Aremu, Daniil Ognev, Samuele Poppi, Nils Lukas | Published: 2026-03-24
ウォーターマーキング
データ生成の安全性
プロンプトインジェクション

A Critical Review on the Effectiveness and Privacy Threats of Membership Inference Attacks

Authors: Najeeb Jebreel, David Sánchez, Josep Domingo-Ferrer | Published: 2026-03-24
プライバシー漏洩
メンバーシップ推論
評価手法

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Authors: Weijun Li, Arnaud Grivet Sébert, Qiongkai Xu, Annabelle McIver, Mark Dras | Published: 2026-03-24
データセット評価
差分プライバシー
評価手法

Privacy-Preserving EHR Data Transformation via Geometric Operators: A Human-AI Co-Design Technical Report

Authors: Maolin Wang, Beining Bao, Gan Yuan, Hongyu Chen, Bingkun Zhao, Baoshuo Kan, Jiming Xu, Qi Shi, Yinggong Zhao, Yao Wang, Wei Ying Ma, Jun Yan | Published: 2026-03-24
データプライバシー評価
プライバシー漏洩
評価手法

SoK: The Attack Surface of Agentic AI — Tools, and Autonomy

Authors: Ali Dehghantanha, Sajad Homayoun | Published: 2026-03-24
RAG
RAGへのポイズニング攻撃
リスク管理

Explainable Threat Attribution for IoT Networks Using Conditional SHAP and Flow Behavior Modelling

Authors: Samuel Ozechi, Jennifer Okonkwoabutu | Published: 2026-03-24
モデルDoS
特徴抽出
脅威アクター支援

CIPL: A Target-Independent Framework for Channel-Inversion Privacy Leakage in Agents

Authors: Tao Huang, Chen Hou, Jiayang Meng | Published: 2026-03-24
プライバシー漏洩
モデルインバージョン
評価手法

Does Teaming-Up LLMs Improve Secure Code Generation? A Comprehensive Evaluation with Multi-LLMSecCodeEval

Authors: Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba, Yansong Gao, Kristen Moore, SangCheol Kim, Hyoungshick Kim, Surya Nepal | Published: 2026-03-24
LLM性能評価
モデルDoS
敵対的学習

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Authors: Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee | Published: 2026-03-23
マルチモーダル安全性
大規模言語モデル
評価手法