MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots Authors: Gelei Deng, Yi Liu, Yuekang Li, Kailong Wang, Ying Zhang, Zefeng Li, Haoyu Wang, Tianwei Zhang, Yang Liu | Published: 2023-07-16 | Updated: 2023-10-25 Data LeakagePrompt InjectionWatermark Robustness 2023.07.16 2025.05.28 Literature Database
Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild Authors: Giuseppe Siracusano, Davide Sanvito, Roberto Gonzalez, Manikantan Srinivasan, Sivakaman Kamatchi, Wataru Takahashi, Masaru Kawakita, Takahiro Kakumaru, Roberto Bifulco | Published: 2023-07-14 Dataset GenerationPrompt InjectionAttack Pattern Extraction 2023.07.14 2025.05.28 Literature Database
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots Authors: Bocheng Chen, Guangjing Wang, Hanqing Guo, Yuanda Wang, Qiben Yan | Published: 2023-07-14 Prompt InjectionDialogue SystemAttack Evaluation 2023.07.14 2025.05.28 Literature Database
Effective Prompt Extraction from Language Models Authors: Yiming Zhang, Nicholas Carlini, Daphne Ippolito | Published: 2023-07-13 | Updated: 2024-08-07 Prompt InjectionPrompt leakingDialogue System 2023.07.13 2025.05.28 Literature Database
Jailbroken: How Does LLM Safety Training Fail? Authors: Alexander Wei, Nika Haghtalab, Jacob Steinhardt | Published: 2023-07-05 Security AssurancePrompt InjectionAdversarial Attack Methods 2023.07.05 2025.05.28 Literature Database
On the Exploitability of Instruction Tuning Authors: Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, Tom Goldstein | Published: 2023-06-28 | Updated: 2023-10-28 Prompt InjectionPoisoningAdversarial Attack Detection 2023.06.28 2025.05.28 Literature Database
Are aligned neural networks adversarially aligned? Authors: Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt | Published: 2023-06-26 | Updated: 2024-05-06 Prompt InjectionAdversarial ExampleAdversarial Attack Methods 2023.06.26 2025.05.28 Literature Database
ChatIDS: Explainable Cybersecurity Using Generative AI Authors: Victor Jüttner, Martin Grimmer, Erik Buchmann | Published: 2023-06-26 Online Safety AdvicePrompt InjectionExpert Opinion Collection 2023.06.26 2025.05.28 Literature Database
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions Authors: Reza Fayyazi, Shanchieh Jay Yang | Published: 2023-06-24 | Updated: 2023-08-22 Prompt InjectionMalware ClassificationNatural Language Processing 2023.06.24 2025.05.28 Literature Database
Visual Adversarial Examples Jailbreak Aligned Large Language Models Authors: Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal | Published: 2023-06-22 | Updated: 2023-08-16 Prompt InjectionInappropriate Content GenerationAdversarial attack 2023.06.22 2025.05.28 Literature Database