This page provides the attacks and factors that have a negative impact “Generation of harmful responses” in the information systems aspect in the AI Security Map, the defense methods and countermeasures against them, as well as the relevant AI technologies, tasks, and data. It also indicates related elements in the external influence aspect.
Attack or cause
Defensive method or countermeasure
Targeted AI technology
- LLM
Task
- Generation
Data
- Text
Related external influence aspect
References
Prompt injection
- Universal and Transferable Adversarial Attacks on Aligned Language Models, 2023
- Do Anything Now: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models, 2023
- Jailbroken: How Does LLM Safety Training Fail?, 2023
- Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts, 2023
- Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation, 2023
- Token-level adversarial prompt detection based on perplexity measures and contextual information, 2023
- AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models, 2024
- A novel and universal fuzzing framework for proactively discovering jailbreak vulnerabilities in large language models, 2024
- Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Neural Carrier Articles, 2024