AIセキュリティマップにマッピングされた情報システム的側面における負の影響「管理者が意図しない出力や動作」をもたらす攻撃・要因、それに対する防御手法・対策、および対象のAI技術・タスク・データを示しています。また、関連する外部作用的側面の要素も示しています。
攻撃・要因
- サイバー攻撃
- 完全性の毀損
- プロンプトインジェクション
- インダイレクトプロンプトインジェクション
- バックドア攻撃
防御手法・対策
- 完全性の防御手法
対象のAI技術
- 全てのAI技術
タスク
- 分類
- 生成
対象のデータ
- 画像
- グラフ
- テキスト
- 音声
関連する外部作用的側面
参考文献
プロンプトインジェクション
- Universal and Transferable Adversarial Attacks on Aligned Language Models, 2023
- Do Anything Now: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models, 2023
- Jailbroken: How Does LLM Safety Training Fail?, 2023
- Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts, 2023
- Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation, 2023
- Token-level adversarial prompt detection based on perplexity measures and contextual information, 2023
- AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models, 2024
- A novel and universal fuzzing framework for proactively discovering jailbreak vulnerabilities in large language models, 2024
- Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Neural Carrier Articles, 2024
インダイレクトプロンプトインジェクション
- Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection, 2023
- Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models, 2023
- Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs, 2023
- Defending Against Indirect Prompt Injection Attacks With Spotlighting, 2024
- InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents, 2024
バックドア攻撃
- Targeted backdoor attacks on deep learning systems using data poisoning, 2017
- BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain, 2017
- Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses, 2020
- Hidden Trigger Backdoor Attacks, 2020
- Backdoor Attacks to Graph Neural Networks, 2021
- Graph Backdoor, 2021
- Can You Hear It? Backdoor Attack via Ultrasonic Triggers, 2021
- Backdoor Attacks Against Dataset Distillation, 2023
- Universal Jailbreak Backdoors from Poisoned Human Feedback, 2023