AIセキュリティマップにマッピングされた外部作用的側面における負の影響「AIが誤情報を出力」のセキュリティ対象、それをもたらす攻撃・要因、および防御手法・対策を示しています。
セキュリティ対象
- 消費者
攻撃・要因
- 完全性の毀損
- 精度の毀損
- 制御可能性の毀損
- 説明可能性の毀損
- 信頼性の毀損
- RAGへのポイズニング攻撃
- ハルシネーション
防御手法・対策
開発・活用における適用フェーズ
1. データ収集・前処理
- データキュレーション
2. モデルの選定・学習・検証
- 不確実性の定量化
3. システムの実装
- RAG
4. システムの提供・運用・保守
- XAI(説明可能なAI)
- 不確実性の定量化
5. システムの利用
- ハルシネーションの検知
- 教育やフォローアップ
参考文献
RAGへのポイズニング攻撃
- Poisoning Retrieval Corpora by Injecting Adversarial Passages, 2023
- BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models, 2024
- PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models, 2024
- Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications, 2024
- Poison-RAG: Adversarial Data Poisoning Attacks on Retrieval-Augmented Generation in Recommender Systems, 2025
ハルシネーション
- The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”, 2023
- Why Does ChatGPT Fall Short in Providing Truthful Answers?, 2023
- DefAn: Definitive-Answer-Dataset-for-LLMs-Hallucination-Evaluation, 2024
- LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples, 2024
- The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models, 2024
データキュレーション
RAG
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, 2020.0
- REALM: Retrieval-Augmented Language Model Pre-Training, 2020.0
- In-Context Retrieval-Augmented Language Models, 2023.0
- Active Retrieval Augmented Generation, 2023.0
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, 2023.0
- Query Rewriting for Retrieval-Augmented Large Language Models, 2023.0
- Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering, 2023.0
- Generate rather than Retrieve: Large Language Models are Strong Context Generators, 2023.0
- Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy, 2023.0
- From Local to Global: A Graph RAG Approach to Query-Focused Summarization, 2024.0
XAI(説明可能なAI)
- Visualizing and Understanding Convolutional Networks, 2014.0
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, 2014.0
- Understanding Deep Image Representations by Inverting Them, 2014.0
- “Why Should I Trust You?”: Explaining the Predictions of Any Classifier, 2016.0
- A Unified Approach to Interpreting Model Predictions, 2017.0
- Learning Important Features Through Propagating Activation Differences, 2017.0
- Understanding Black-box Predictions via Influence Functions, 2017.0
- Interpretable Explanations of Black Boxes by Meaningful Perturbation, 2017.0
- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV), 2018.0
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, 2019.0
ハルシネーションの検知
- Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis, 2023.0
- Cost-Effective Hallucination Detection for LLMs, 2024.0
- The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models, 2024.0
- Measuring and Reducing LLM Hallucination without Gold-Standard Answers, 2024.0
- On Large Language Models’ Hallucination with Regard to Known Facts, 2024.0
不確実性の定量化
- Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding, 2015.0
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, 2016.0
- Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, 2017.0
- Predictive Uncertainty Estimation via Prior Networks, 2018.0
- Evidential Deep Learning to Quantify Classification Uncertainty, 2018.0
- Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift, 2019.0
- Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods, 2021.0
教育やフォローアップ
- What Students Can Learn About Artificial Intelligence — Recommendations for K-12 Computing Education, 2022.0
- Learning to Prompt in the Classroom to Understand AI Limits: A pilot study, 2023.0
- Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study, 2024.0
- The Essentials of AI for Life and Society: An AI Literacy Course for the University Community, 2025.0
