大規模言語モデル

Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models

Authors: Arjun Krishna, Aaditya Rastogi, Erick Galinkin | Published: 2025-06-16
プロンプトインジェクション
大規模言語モデル
敵対的攻撃手法

Can We Infer Confidential Properties of Training Data from LLMs?

Authors: Penguin Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri | Published: 2025-06-12
プライバシー保護技術
医療診断属性
大規模言語モデル

Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures

Authors: Yukai Zhou, Sibei Yang, Wenjie Wang | Published: 2025-06-09
LLMとの協力効果
サイバー脅威
大規模言語モデル

A Red Teaming Roadmap Towards System-Level Safety

Authors: Zifan Wang, Christina Q. Knight, Jeremy Kritz, Willow E. Primack, Julian Michael | Published: 2025-05-30 | Updated: 2025-06-09
モデルDoS
大規模言語モデル
製品安全性

Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models

Authors: Yongcan Yu, Yanbo Wang, Ran He, Jian Liang | Published: 2025-05-28
LLMセキュリティ
プロンプトインジェクション
大規模言語モデル

Deconstructing Obfuscation: A four-dimensional framework for evaluating Large Language Models assembly code deobfuscation capabilities

Authors: Anton Tkachenko, Dmitrij Suskevic, Benjamin Adolphi | Published: 2025-05-26
モデル評価手法
大規模言語モデル
透かし技術

What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs

Authors: Sangyeop Kim, Yohan Lee, Yongwoo Song, Kimin Lee | Published: 2025-05-26
プロンプトインジェクション
モデル性能評価
大規模言語モデル

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval

Authors: Taiye Chen, Zeming Wei, Ang Li, Yisen Wang | Published: 2025-05-21
RAG
大規模言語モデル
防御メカニズム

sudoLLM : On Multi-role Alignment of Language Models

Authors: Soumadeep Saha, Akshay Chaturvedi, Joy Mahapatra, Utpal Garain | Published: 2025-05-20
アライメント
プロンプトインジェクション
大規模言語モデル

Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration

Authors: Tatia Tsmindashvili, Ana Kolkhidashvili, Dachi Kurtskhalia, Nino Maghlakelidze, Elene Mekvabishvili, Guram Dentoshvili, Orkhan Shamilov, Zaal Gachechiladze, Steven Saporta, David Dachi Choladze | Published: 2025-05-18 | Updated: 2025-08-11
プロンプトインジェクション
大規模言語モデル
性能評価手法