Defeating Prompt Injections by Design

Authors: Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr | Published: 2025-03-24

Large Language Models powered Network Attack Detection: Architecture, Opportunities and Case Study

Authors: Xinggong Zhang, Qingyang Li, Yunpeng Tan, Zongming Guo, Lei Zhang, Yong Cui | Published: 2025-03-24

Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection

Authors: Fei Zuo, Junghwan Rhee, Yung Ryn Choe | Published: 2025-03-24

Model-Guardian: Protecting against Data-Free Model Stealing Using Gradient Representations and Deceptive Predictions

Authors: Yunfei Yang, Xiaojun Chen, Yuexin Xuan, Zhendong Zhao | Published: 2025-03-23

STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models

Authors: Xunguang Wang, Wenxuan Wang, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Daoyuan Wu, Shuai Wang | Published: 2025-03-23

NVBleed: Covert and Side-Channel Attacks on NVIDIA Multi-GPU Interconnect

Authors: Yicheng Zhang, Ravan Nazaraliyev, Sankha Baran Dutta, Andres Marquez, Kevin Barker, Nael Abu-Ghazaleh | Published: 2025-03-22

Language Models May Verbatim Complete Text They Were Not Explicitly Trained On

Authors: Ken Ziyu Liu, Christopher A. Choquette-Choo, Matthew Jagielski, Peter Kairouz, Sanmi Koyejo, Percy Liang, Nicolas Papernot | Published: 2025-03-21 | Updated: 2025-03-25

CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities

Authors: Yuxuan Zhu, Antony Kellermann, Dylan Bowman, Philip Li, Akul Gupta, Adarsh Danda, Richard Fang, Conner Jensen, Eric Ihli, Jason Benn, Jet Geronimo, Avi Dhir, Sudhit Rao, Kaicheng Yu, Twm Stone, Daniel Kang | Published: 2025-03-21

Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests

Authors: John Naulty, Eason Chen, Joy Wang, George Digkas, Kostas Chalkias | Published: 2025-03-21

Towards LLM Guardrails via Sparse Representation Steering

Authors: Zeqing He, Zhibo Wang, Huiyu Xu, Kui Ren | Published: 2025-03-21