Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs Authors: Rui Pu, Chaozhuo Li, Rui Ha, Zejian Chen, Litian Zhang, Zheng Liu, Lirong Qiu, Zaisheng Ye | Published: 2024-10-18 | Updated: 2025-07-08 Disabling Safety Mechanisms of LLMPrompt InjectionPrompt validation 2024.10.18 2025.07.10 Literature Database
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method Authors: Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng | Published: 2024-09-23 | Updated: 2025-05-21 Disabling Safety Mechanisms of LLMModel Performance EvaluationInformation Extraction 2024.09.23 2025.05.27 Literature Database