防御手法の統合

Proactive defense against LLM Jailbreak

Authors: Weiliang Zhao, Jinjun Peng, Daniel Ben-Levi, Zhou Yu, Junfeng Yang | Published: 2025-10-06
Disabling Safety Mechanisms of LLM
Prompt Injection
防御手法の統合

Unified Threat Detection and Mitigation Framework (UTDMF): Combating Prompt Injection, Deception, and Bias in Enterprise-Scale Transformers

Authors: Santhosh KumarRavindran | Published: 2025-10-06
Indirect Prompt Injection
Bias Mitigation Techniques
防御手法の統合

P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs

Authors: Shuai Zhao, Xinyi Wu, Shiqian Zhao, Xiaobao Wu, Zhongliang Guo, Yanhao Jia, Anh Tuan Luu | Published: 2025-10-06
Prompt Injection
Prompt validation
防御手法の統合

UpSafe$^\circ$C: Upcycling for Controllable Safety in Large Language Models

Authors: Yuhao Sun, Zhuoer Xu, Shiwen Cui, Kun Yang, Lingyun Yu, Yongdong Zhang, Hongtao Xie | Published: 2025-10-02
Relationship of AI Systems
Improvement of Learning
防御手法の統合

A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives

Authors: Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong | Published: 2025-08-20 | Updated: 2025-08-27
Model Extraction Attack
Intellectual Property Protection
防御手法の統合

Combining Machine Learning Defenses without Conflicts

Authors: Vasisht Duddu, Rui Zhang, N. Asokan | Published: 2024-11-14 | Updated: 2025-08-14
Certified Robustness
Watermark Evaluation
防御手法の統合