Negative impact “Using AI for military purposes”

This page provides the security targets of negative impacts “Using AI for military purposes” in the external influence aspect in the AI Security Map, as well as the attacks and factors that cause them, and the corresponding defense methods and countermeasures.

Security target

Non-consumer
Society

Attack or cause

Abuse of availability
Abuse of accuracy
Degradation of controllability
Abuse of explainability

Defensive method or countermeasure

AI alignment

References

AI alignment

Training language models to follow instructions with human feedback, 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, 2022
Constitutional AI: Harmlessness from AI Feedback, 2022
Direct Preference Optimization: Your Language Model is Secretly a Reward Model, 2023
A General Theoretical Paradigm to Understand Learning from Human Preferences, 2023
RRHF: Rank Responses to Align Language Models with Human Feedback without tears, 2023
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations, 2023
Self-Rewarding Language Models, 2024
KTO: Model Alignment as Prospect Theoretic Optimization, 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward, 2024