AIセキュリティポータルbot

Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

Authors: Zhiyuan Chang, Mingyang Li, Yuekai Huang, Ziyou Jiang, Xiaojun Jia, Qian Xiong, Junjie Wang, Zhaoyang Li, Qing Wang | Published: 2026-01-08
LLMの安全機構の解除
インダイレクトプロンプトインジェクション
プライバシー保護手法

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks

Authors: Hoagy Cunningham, Jerry Wei, Zihan Wang, Andrew Persic, Alwin Peng, Jordan Abderrachid, Raj Agarwal, Bobby Chen, Austin Cohen, Andy Dau, Alek Dimitriev, Rob Gilson, Logan Howard, Yijin Hua, Jared Kaplan, Jan Leike, Mu Lin, Christopher Liu, Vladimir Mikulik, Rohit Mittapalli, Clare O'Hara, Jin Pan, Nikhil Saxena, Alex Silverstein, Yue Song, Xunjie Yu, Giulio Zhou, Ethan Perez, Mrinank Sharma | Published: 2026-01-08
プロンプトインジェクション
ロバスト性分析
深層ネットワークの堅牢性

Decision-Aware Trust Signal Alignment for SOC Alert Triage

Authors: Israt Jahan Chowdhury, Md Abu Yousuf Tanvir | Published: 2026-01-08
コスト感度閾値
信号処理技術
機械学習技術

HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense

Authors: Siyuan Li, Xi Lin, Jun Wu, Zehao Liu, Haoyu Li, Tianjie Ju, Xiang Chen, Jianhua Li | Published: 2026-01-07
プロンプトインジェクション
大規模言語モデル
敵対的攻撃検出

SoK: Privacy Risks and Mitigations in Retrieval-Augmented Generation Systems

Authors: Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko, Florian Matthes | Published: 2026-01-07
RAG
RAGへのポイズニング攻撃
プライバシー保護技術

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

Authors: Zejian Chen, Chaozhuo Li, Chao Li, Xi Zhang, Litian Zhang, Yiming He | Published: 2026-01-07
プロンプトインジェクション
大規模言語モデル
敵対的攻撃検出

Full-Stack Knowledge Graph and LLM Framework for Post-Quantum Cyber Readiness

Authors: Rasmus Erlemann, Charles Colyer Morris, Sanjyot Sathe | Published: 2026-01-07
データ駆動型脆弱性評価
知識グラフ設計
脆弱性優先順位付け

SLIM: Stealthy Low-Coverage Black-Box Watermarking via Latent-Space Confusion Zones

Authors: Hengyu Wu, Yang Cao | Published: 2026-01-06
プロンプトの検証
生成AI向け電子透かし
透かし評価

LLMs, You Can Evaluate It! Design of Multi-perspective Report Evaluation for Security Operation Centers

Authors: Hiroyuki Okada, Tatsumi Oba, Naoto Yanai | Published: 2026-01-06
LLM活用
セキュリティ分析手法
ユーザー体験評価

JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification

Authors: Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Zhaoye Li, Bin Ji, Baosheng Wang, Jie Yu | Published: 2026-01-06
プロンプトインジェクション
モデル抽出攻撃
敵対的攻撃検出