LLMの安全機構の解除

Decoupling Reconnaissance and Exploitation: Measuring the Capability Boundaries of LLM-Based Web Penetration Testing

Authors: Liwei Yu, Shuo Li, Ming Zhou, Ge Chu, Yan Guo | Published: 2026-06-24
LLMの安全機構の解除
エージェント設計
自動化ペネトレーションテスト

Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

Authors: Charles Westphal, Timothy Douglas, Keivan Navaie, Tiago Pimentel, Fernando E. Rosas | Published: 2026-06-08
LLMの安全機構の解除
倫理基準遵守
研究方法論

Steganography Without Modification: Hidden Communication via LLM Seeds

Authors: Felix Mächtle, Jonas Sander, Sebastian Berndt, Ben Weimar, Nils Loose, Thomas Eisenbarth | Published: 2026-06-08
LLMの安全機構の解除
トークン識別手法
確率分布

Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

Authors: Syafiq Al Atiiq, Chun Zhou, Christian Gehrmann | Published: 2026-05-28
LLMの安全機構の解除
モデルアーキテクチャ
解釈手法

SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing

Authors: Almene De Meran Meguimtsop, Maria Leonor Pacheco, Daniel E. Acuna | Published: 2026-05-28
LLMの安全機構の解除
インダイレクトプロンプトインジェクション
著者貢献

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Authors: Zedian Shao, Charles Fleming, Teodora Baluta | Published: 2026-05-26
LLMの安全機構の解除
ロバスト性評価
透かしの耐久性

Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks

Authors: Kevin Kuo, Chhavi Yadav, Virginia Smith | Published: 2026-05-26
LLMの安全機構の解除
ロバスト性評価
防御手法の統合

Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution

Authors: Xiaozhe Zhang, Chaozhuo Li, Hui Liu, Shaocheng Yan, Bingyu Yan, Qiwei Ye, Haoliang Li | Published: 2026-05-13
LLMの安全機構の解除
アライメント
行動解析手法

Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing

Authors: Zheng Lin, Zhenxing Niu, Haoxuan Ji, Haichang Gao | Published: 2026-05-11
LLMの安全機構の解除
プロンプトインジェクション
モデルの堅牢性

Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

Authors: Yue Li, Xiao Li, Hao Wu, Yue Zhang, Yechao Zhang, Yating Liu, Fengyuan Xu, Sheng Zhong | Published: 2026-05-11
LLMの安全機構の解除
セキュリティとユーザビリティのトレードオフ
攻撃の評価