This page provides the attacks and factors that have a negative impact “Unintended output or behavior by administrators” in the information systems aspect in the AI Security Map, the defense methods and countermeasures against them, as well as the relevant AI technologies, tasks, and data. It also indicates related elements in the external influence aspect.
Attack or cause
- Cyber attack
- Integrity violation
- Prompt injection
- Indirect prompt injection
- Backdoor attack
Defensive method or countermeasure
- Defensive method for integrity
Targeted AI technology
- All AI technologies
Task
- Classification
- Generation
Data
- Image
- Graph
- Text
- Audio
Related external influence aspect
- Usability
- Consumer fairness
- Reputation
- Human-centric principle
- Compliance with laws and regulations
- Physical impact
- Psychological impact
- Financial impact
- Economy
- Critical infrastructure
- Medical care
References
Prompt injection
- Universal and Transferable Adversarial Attacks on Aligned Language Models, 2023
- Do Anything Now: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models, 2023
- Jailbroken: How Does LLM Safety Training Fail?, 2023
- Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts, 2023
- Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation, 2023
- Token-level adversarial prompt detection based on perplexity measures and contextual information, 2023
- AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models, 2024
- A novel and universal fuzzing framework for proactively discovering jailbreak vulnerabilities in large language models, 2024
- Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Neural Carrier Articles, 2024
Indirect prompt injection
- Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection, 2023
- Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models, 2023
- Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs, 2023
- Defending Against Indirect Prompt Injection Attacks With Spotlighting, 2024
- InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents, 2024
Backdoor attack
- Targeted backdoor attacks on deep learning systems using data poisoning, 2017
- BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain, 2017
- Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses, 2020
- Hidden Trigger Backdoor Attacks, 2020
- Backdoor Attacks to Graph Neural Networks, 2021
- Graph Backdoor, 2021
- Can You Hear It? Backdoor Attack via Ultrasonic Triggers, 2021
- Backdoor Attacks Against Dataset Distillation, 2023
- Universal Jailbreak Backdoors from Poisoned Human Feedback, 2023