Proactive defense against LLM Jailbreak Authors: Weiliang Zhao, Jinjun Peng, Daniel Ben-Levi, Zhou Yu, Junfeng Yang | Published: 2025-10-06 Disabling Safety Mechanisms of LLMPrompt Injection防御手法の統合 2025.10.06 2025.10.08 Literature Database
Unified Threat Detection and Mitigation Framework (UTDMF): Combating Prompt Injection, Deception, and Bias in Enterprise-Scale Transformers Authors: Santhosh KumarRavindran | Published: 2025-10-06 Indirect Prompt InjectionBias Mitigation Techniques防御手法の統合 2025.10.06 2025.10.08 Literature Database
P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs Authors: Shuai Zhao, Xinyi Wu, Shiqian Zhao, Xiaobao Wu, Zhongliang Guo, Yanhao Jia, Anh Tuan Luu | Published: 2025-10-06 Prompt InjectionPrompt validation防御手法の統合 2025.10.06 2025.10.08 Literature Database
UpSafe$^\circ$C: Upcycling for Controllable Safety in Large Language Models Authors: Yuhao Sun, Zhuoer Xu, Shiwen Cui, Kun Yang, Lingyun Yu, Yongdong Zhang, Hongtao Xie | Published: 2025-10-02 Relationship of AI SystemsImprovement of Learning防御手法の統合 2025.10.02 2025.10.04 Literature Database
A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives Authors: Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong | Published: 2025-08-20 | Updated: 2025-08-27 Model Extraction AttackIntellectual Property Protection防御手法の統合 2025.08.20 2025.08.29 Literature Database
Combining Machine Learning Defenses without Conflicts Authors: Vasisht Duddu, Rui Zhang, N. Asokan | Published: 2024-11-14 | Updated: 2025-08-14 Certified RobustnessWatermark Evaluation防御手法の統合 2024.11.14 2025.08.16 Literature Database