Disabling Safety Mechanisms of LLM

Decoupling Reconnaissance and Exploitation: Measuring the Capability Boundaries of LLM-Based Web Penetration Testing

Authors: Liwei Yu, Shuo Li, Ming Zhou, Ge Chu, Yan Guo | Published: 2026-06-24
Disabling Safety Mechanisms of LLM
エージェント設計
自動化ペネトレーションテスト

Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs

Authors: Charles Westphal, Timothy Douglas, Keivan Navaie, Tiago Pimentel, Fernando E. Rosas | Published: 2026-06-08
Disabling Safety Mechanisms of LLM
倫理基準遵守
Research Methodology

Steganography Without Modification: Hidden Communication via LLM Seeds

Authors: Felix Mächtle, Jonas Sander, Sebastian Berndt, Ben Weimar, Nils Loose, Thomas Eisenbarth | Published: 2026-06-08
Disabling Safety Mechanisms of LLM
Token Identification Method
Probability distribution

Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

Authors: Syafiq Al Atiiq, Chun Zhou, Christian Gehrmann | Published: 2026-05-28
Disabling Safety Mechanisms of LLM
Model Architecture
Interpretation Method

SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing

Authors: Almene De Meran Meguimtsop, Maria Leonor Pacheco, Daniel E. Acuna | Published: 2026-05-28
Disabling Safety Mechanisms of LLM
Indirect Prompt Injection
Author Contribution

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Authors: Zedian Shao, Charles Fleming, Teodora Baluta | Published: 2026-05-26
Disabling Safety Mechanisms of LLM
Robustness Evaluation
Watermark Robustness

Open-Weight LLM Fine-Tuning Defenses are Susceptible to Simple Attacks

Authors: Kevin Kuo, Chhavi Yadav, Virginia Smith | Published: 2026-05-26
Disabling Safety Mechanisms of LLM
Robustness Evaluation
防御手法の統合

Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution

Authors: Xiaozhe Zhang, Chaozhuo Li, Hui Liu, Shaocheng Yan, Bingyu Yan, Qiwei Ye, Haoliang Li | Published: 2026-05-13
Disabling Safety Mechanisms of LLM
Alignment
Behavior Analysis Method

Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing

Authors: Zheng Lin, Zhenxing Niu, Haoxuan Ji, Haichang Gao | Published: 2026-05-11
Disabling Safety Mechanisms of LLM
Prompt Injection
Model Robustness

Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

Authors: Yue Li, Xiao Li, Hao Wu, Yue Zhang, Yechao Zhang, Yating Liu, Fengyuan Xu, Sheng Zhong | Published: 2026-05-11
Disabling Safety Mechanisms of LLM
セキュリティとユーザビリティのトレードオフ
Attack Evaluation