This page provides the security targets of negative impacts “Adverse impact on human perception and judgment caused by disinformation” in the external influence aspect in the AI Security Map, as well as the attacks and factors that cause them, and the corresponding defense methods and countermeasures.
Security target
- Non-consumer
Attack or cause
- Integrity violation
- Availability breach
- Explainability violation
- Degradation of output fairness
- Reliability violation
- Deepfake
- Social engineering attack
Defensive method or countermeasure
- AI alignment
- Watermarking for generative AI
- Identification of AI-generated output
- Detection of disinformation
- Deepfake detection
- Education and follow-up
References
Deepfake
- Face2Face: Real-time Face Capture and Reenactment of RGB Videos, 2016
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017
- AttGAN: Facial Attribute Editing by Only Changing What You Want, 2017
- FSGAN: Subject Agnostic Face Swapping and Reenactment, 2019
- STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing, 2019
- A Style-Based Generator Architecture for Generative Adversarial Networks, 2019
- Few-Shot Adversarial Learning of Realistic Neural Talking Head Models, 2019
Social engineering attack
AI alignment
- Training language models to follow instructions with human feedback, 2022
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, 2022
- Constitutional AI: Harmlessness from AI Feedback, 2022
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model, 2023
- A General Theoretical Paradigm to Understand Learning from Human Preferences, 2023
- RRHF: Rank Responses to Align Language Models with Human Feedback without tears, 2023
- Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations, 2023
- Self-Rewarding Language Models, 2024
- KTO: Model Alignment as Prospect Theoretic Optimization, 2024
- SimPO: Simple Preference Optimization with a Reference-Free Reward, 2024
Watermarking for generative AI
Deepfake detection
- Two-Stream Neural Networks for Tampered Face Detection, 2017
- Exposing DeepFake Videos By Detecting Face Warping Artifacts, 2019
- Exposing Deep Fakes Using Inconsistent Head Poses, 2019
- CNN-generated images are surprisingly easy to spot… for now, 2020
- Face X-ray for More General Face Forgery Detection, 2020
- FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals, 2020
- End-to-end anti-spoofing with RawNet2, 2021
Education and follow-up
- What Students Can Learn About Artificial Intelligence — Recommendations for K-12 Computing Education, 2022
- Learning to Prompt in the Classroom to Understand AI Limits: A pilot study, 2023
- Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study, 2024
- The Essentials of AI for Life and Society: An AI Literacy Course for the University Community, 2025