Literature Database

From Secure Agentic AI to Secure Agentic Web: Challenges, Threats, and Future Directions

Authors: Zhihang Deng, Jiaping Gui, Weinan Zhang | Published: 2026-03-02
Indirect Prompt Injection
安全性評価
Threat Model

Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report)

Authors: Yu Lin, Qizhi Zhang, Wenqiang Ruan, Daode Zhang, Jue Hong, Ye Wu, Hanning Xia, Yunlong Mao, Sheng Zhong | Published: 2026-03-02
Disabling Safety Mechanisms of LLM
LLM Performance Evaluation
Differential Privacy

Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision

Authors: Manisha Mukherjee, Vincent J. Hellendoorn | Published: 2026-03-02
Indirect Prompt Injection
セキュリティに関連する知識を活用した手法
Prompt leaking

LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

Authors: Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus, Jason Hausenloy, Pedro Medeiros, Nathaniel Li, Aiden Kim, Yury Orlovskiy, Coleman Breen, Bryce Cai, Jasper Götting, Andrew Bo Liu, Samira Nedungadi, Paula Rodriguez, Yannis Yiming He, Mohamed Shaaban, Zifan Wang, Seth Donoughe, Julian Michael | Published: 2026-02-26
LLM Performance Evaluation
Model evaluation methods
Educational Data Mining

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

Authors: Usman Anwar, Julianna Piskorz, David D. Baek, David Africa, Jim Weatherall, Max Tegmark, Christian Schroeder de Witt, Mihaela van der Schaar, David Krueger | Published: 2026-02-26
Watermarking
Data Management System
Model evaluation methods

Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent

Authors: Boyang Zhang, Yang Zhang | Published: 2026-02-26
Disabling Safety Mechanisms of LLM
Data Privacy Assessment
Prompt leaking

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Authors: Xun Huang, Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang, Yang Liu, Xiaojun Jia | Published: 2026-02-26
Prompt Injection
Large Language Model
脱獄手法

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

Authors: Tian Zhang, Yiwei Xu, Juan Wang, Keyan Guo, Xiaoyang Xu, Bowen Xiao, Quanlong Guan, Jinlin Fan, Jiawei Liu, Zhiquan Liu, Hongxin Hu | Published: 2026-02-26
Indirect Prompt Injection
Counterfactual Explanation
Data Management System

IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

Authors: Yanpei Guo, Wenjie Qu, Linyu Wu, Shengfang Zhai, Lionel Z. Wang, Ming Xu, Yue Liu, Binhang Yuan, Dawn Song, Jiaheng Zhang | Published: 2026-02-26
LLM Performance Evaluation
Model evaluation methods
監査手法

Layer-Targeted Multilingual Knowledge Erasure in Large Language Models

Authors: Taoran Li, Varun Chandrasekaran, Zhiyuan Yu | Published: 2026-02-26
Alignment
Machine learning
Machine Learning Method