文献データベース

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

Authors: Alexander Panfilov, Evgenii Kortukov, Kristina Nikolić, Matthias Bethge, Sebastian Lapuschkin, Wojciech Samek, Ameya Prabhu, Maksym Andriushchenko, Jonas Geiping | Published: 2025-09-22
ハルシネーション
武器設計手法
詐欺手法

Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis

Authors: Joshua Ward, Xiaofeng Lin, Chi-Hua Wang, Guang Cheng | Published: 2025-09-22
プライバシー分析
メンバーシップ推論
差分プライバシー

Federated Learning in the Wild: A Comparative Study for Cybersecurity under Non-IID and Unbalanced Settings

Authors: Roberto Doriguzzi-Corin, Petr Sabel, Silvio Cretti, Silvio Ranise | Published: 2025-09-22
クライアント選択手法
敵対的学習
連合学習

SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models

Authors: Haotian Xu, Qingsong Peng, Jie Shi, Huadi Zheng, Yu Li, Cheng Zhuo | Published: 2025-09-22
インダイレクトプロンプトインジェクション
モデルDoS
評価メトリクス

LLM-Driven SAST-Genius: A Hybrid Static Analysis Framework for Comprehensive and Actionable Security

Authors: Vaibhav Agrawal, Kiarash Ahi | Published: 2025-09-18 | Updated: 2025-09-23
プロンプトインジェクション
脆弱性評価手法
静的分析

Evil Vizier: Vulnerabilities of LLM-Integrated XR Systems

Authors: Yicheng Zhang, Zijian Huang, Sophie Chen, Erfan Shayegani, Jiasi Chen, Nael Abu-Ghazaleh | Published: 2025-09-18
セキュリティ分析
プロンプトインジェクション
攻撃アクションモデル

Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction

Authors: Yuanbo Xie, Yingjie Zhang, Tianyun Liu, Duohe Ma, Tingwen Liu | Published: 2025-09-18
プロンプトインジェクション
安全性アライメント
拒否メカニズム

Variables Ordering Optimization in Boolean Characteristic Set Method Using Simulated Annealing and Machine Learning-based Time Prediction

Authors: Minzhong Luo, Yudong Sun, Yin Long | Published: 2025-09-18
アルゴリズム
最適化手法
評価手法

Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection

Authors: Yihao Guo, Haocheng Bian, Liutong Zhou, Ze Wang, Zhaoyi Zhang, Francois Kawala, Milan Dean, Ian Fischer, Yuantao Peng, Noyan Tokgozoglu, Ivan Barrientos, Riyaaz Shaik, Rachel Li, Chandru Venkataraman, Reza Shifteh Far, Moses Pawar, Venkat Sundaranatha, Michael Xu, Frank Chu | Published: 2025-09-18
RAGへのポイズニング攻撃
オンライン学習
ロバスト性

Enterprise AI Must Enforce Participant-Aware Access Control

Authors: Shashank Shreedhar Bhatt, Tanmay Rajore, Khushboo Aggarwal, Ganesh Ananthanarayanan, Ranveer Chandra, Nishanth Chandran, Suyash Choudhury, Divya Gupta, Emre Kiciman, Sumit Kumar Pandey, Srinath Setty, Rahul Sharma, Teijia Zhao | Published: 2025-09-18
セキュリティ分析
プライバシー管理
プロンプトリーキング