This page provides the security targets of negative impacts “Creating disinformation using AI” in the external influence aspect in the AI Security Map, as well as the attacks and factors that cause them, and the corresponding defense methods and countermeasures.
Security target
- Non-consumer
- Society
Attack or cause
- Abuse of availability
- Abuse of accuracy
- Degradation of controllability
- Deepfake
- Social engineering attack
Defensive method or countermeasure
- AI alignment
- Watermarking for generative AI
- Encryption technology
- Identification of AI-generated output
- Detection of disinformation
- Deepfake detection
References
Deepfake
- Face2Face: Real-time Face Capture and Reenactment of RGB Videos, 2016
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017
- AttGAN: Facial Attribute Editing by Only Changing What You Want, 2017
- FSGAN: Subject Agnostic Face Swapping and Reenactment, 2019
- STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing, 2019
- A Style-Based Generator Architecture for Generative Adversarial Networks, 2019
- Few-Shot Adversarial Learning of Realistic Neural Talking Head Models, 2019
Social engineering attack
AI alignment
- Training language models to follow instructions with human feedback, 2022
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, 2022
- Constitutional AI: Harmlessness from AI Feedback, 2022
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model, 2023
- A General Theoretical Paradigm to Understand Learning from Human Preferences, 2023
- RRHF: Rank Responses to Align Language Models with Human Feedback without tears, 2023
- Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations, 2023
- Self-Rewarding Language Models, 2024
- KTO: Model Alignment as Prospect Theoretic Optimization, 2024
- SimPO: Simple Preference Optimization with a Reference-Free Reward, 2024
Watermarking for generative AI
Encryption technology
- Gazelle: A Low Latency Framework for Secure Neural Network Inference, 2018
- Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference, 2018
- nGraph-HE2: A High-Throughput Framework for Neural Network Inference on Encrypted Data, 2019
- Privacy-Preserving Machine Learning with Fully Homomorphic Encryption for Deep Neural Network, 2021
Deepfake detection
- Two-Stream Neural Networks for Tampered Face Detection, 2017
- Exposing DeepFake Videos By Detecting Face Warping Artifacts, 2019
- Exposing Deep Fakes Using Inconsistent Head Poses, 2019
- CNN-generated images are surprisingly easy to spot… for now, 2020
- Face X-ray for More General Face Forgery Detection, 2020
- FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals, 2020
- End-to-end anti-spoofing with RawNet2, 2021