On Calibration of LLM-based Guard Models for Reliable Content Moderation Authors: Hongfu Liu, Hengguan Huang, Hao Wang, Xiangming Gu, Ye Wang | Published: 2024-10-14 LLM Performance EvaluationContent ModerationPrompt Injection 2024.10.14 2025.05.27 Literature Database
Can LLMs be Scammed? A Baseline Measurement Study Authors: Udari Madhushani Sehwag, Kelly Patel, Francesca Mosca, Vineeth Ravi, Jessica Staddon | Published: 2024-10-14 LLM Performance EvaluationPrompt InjectionEvaluation Method 2024.10.14 2025.05.27 Literature Database
Survival of the Safest: Towards Secure Prompt Optimization through Interleaved Multi-Objective Evolution Authors: Ankita Sinha, Wendi Cui, Kamalika Das, Jiaxin Zhang | Published: 2024-10-12 Prompt InjectionMulti-Objective Prompt Optimization 2024.10.12 2025.05.27 Literature Database
Can a large language model be a gaslighter? Authors: Wei Li, Luyao Zhu, Yang Song, Ruixi Lin, Rui Mao, Yang You | Published: 2024-10-11 Prompt InjectionSafety AlignmentAttack Method 2024.10.11 2025.05.27 Literature Database
F2A: An Innovative Approach for Prompt Injection by Utilizing Feign Security Detection Agents Authors: Yupeng Ren | Published: 2024-10-11 | Updated: 2024-10-14 Prompt InjectionAttack EvaluationAttack Method 2024.10.11 2025.05.27 Literature Database
PILLAR: an AI-Powered Privacy Threat Modeling Tool Authors: Majid Mollaeefar, Andrea Bissoli, Silvio Ranise | Published: 2024-10-11 Privacy ProtectionPrivacy Protection MethodPrompt Injection 2024.10.11 2025.05.27 Literature Database
APOLLO: A GPT-based tool to detect phishing emails and generate explanations that warn users Authors: Giuseppe Desolda, Francesco Greco, Luca Viganò | Published: 2024-10-10 Phishing DetectionPrompt InjectionUser Education 2024.10.10 2025.05.27 Literature Database
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy Authors: Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, Wenxuan Zhou | Published: 2024-10-09 LLM Performance EvaluationPrompt Injection 2024.10.09 2025.05.27 Literature Database
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems Authors: Donghyun Lee, Mo Tiwari | Published: 2024-10-09 Prompt InjectionAttack MethodDefense Method 2024.10.09 2025.05.27 Literature Database
Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders Authors: David Noever, Forrest McKee | Published: 2024-10-09 CybersecurityPrompt InjectionAttack Method 2024.10.09 2025.05.27 Literature Database