Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks

Authors: Hoagy Cunningham, Jerry Wei, Zihan Wang, Andrew Persic, Alwin Peng, Jordan Abderrachid, Raj Agarwal, Bobby Chen, Austin Cohen, Andy Dau, Alek Dimitriev, Rob Gilson, Logan Howard, Yijin Hua, Jared Kaplan, Jan Leike, Mu Lin, Christopher Liu, Vladimir Mikulik, Rohit Mittapalli, Clare O'Hara, Jin Pan, Nikhil Saxena, Alex Silverstein, Yue Song, Xunjie Yu, Giulio Zhou, Ethan Perez, Mrinank Sharma | Published: 2026-01-08

Decision-Aware Trust Signal Alignment for SOC Alert Triage

Authors: Israt Jahan Chowdhury, Md Abu Yousuf Tanvir | Published: 2026-01-08

HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense

Authors: Siyuan Li, Xi Lin, Jun Wu, Zehao Liu, Haoyu Li, Tianjie Ju, Xiang Chen, Jianhua Li | Published: 2026-01-07

SoK: Privacy Risks and Mitigations in Retrieval-Augmented Generation Systems

Authors: Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko, Florian Matthes | Published: 2026-01-07

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

Authors: Zejian Chen, Chaozhuo Li, Chao Li, Xi Zhang, Litian Zhang, Yiming He | Published: 2026-01-07

Full-Stack Knowledge Graph and LLM Framework for Post-Quantum Cyber Readiness

Authors: Rasmus Erlemann, Charles Colyer Morris, Sanjyot Sathe | Published: 2026-01-07

SLIM: Stealthy Low-Coverage Black-Box Watermarking via Latent-Space Confusion Zones

Authors: Hengyu Wu, Yang Cao | Published: 2026-01-06

LLMs, You Can Evaluate It! Design of Multi-perspective Report Evaluation for Security Operation Centers

Authors: Hiroyuki Okada, Tatsumi Oba, Naoto Yanai | Published: 2026-01-06

JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification

Authors: Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Zhaoye Li, Bin Ji, Baosheng Wang, Jie Yu | Published: 2026-01-06

Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

Authors: Yuetian Chen, Yuntao Du, Kaiyuan Zhang, Ashish Kundu, Charles Fleming, Bruno Ribeiro, Ninghui Li | Published: 2026-01-06