AIセキュリティポータル
  • English
  • 日本語
  • Literature Database
  • AI Security Map
  • Resources
    • Related News
    • Links
  • About us
TOP
AI Security Map

Negative impact “Using AI for purposes that violate the law”

AIセキュリティポータル編集部
2025.05.15

This page provides the security targets of negative impacts “Using AI for purposes that violate the law” in the external influence aspect in the AI Security Map, as well as the attacks and factors that cause them, and the corresponding defense methods and countermeasures.

Security target

  • Society

Attack or cause

  • Confidentiality breach
  • Abuse of availability
  • Abuse of accuracy
  • Degradation of controllability

Defensive method or countermeasure

  • AI alignment
  • AI access control

References

AI alignment

  • Training language models to follow instructions with human feedback, 2022
  • Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, 2022
  • Constitutional AI: Harmlessness from AI Feedback, 2022
  • Direct Preference Optimization: Your Language Model is Secretly a Reward Model, 2023
  • A General Theoretical Paradigm to Understand Learning from Human Preferences, 2023
  • RRHF: Rank Responses to Align Language Models with Human Feedback without tears, 2023
  • Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations, 2023
  • Self-Rewarding Language Models, 2024
  • KTO: Model Alignment as Prospect Theoretic Optimization, 2024
  • SimPO: Simple Preference Optimization with a Reference-Free Reward, 2024
AIセキュリティポータル編集部
AIセキュリティポータル
  • Terms of Use
  • Privacy Policy
  • Follow us
Copyright© KDDI Research, Inc. All Rights Reserved.
    • Literature Database
    • AI Security Map
    • Resources
      • Related News
      • Links
    • About us
  • AIセキュリティポータル
  • JP