文献データベース

LLM Jailbreak Detection for (Almost) Free!

Authors: Guorui Chen, Yifan Xia, Xiaojun Jia, Zhijiang Li, Philip Torr, Jindong Gu | Published: 2025-09-18
大規模言語モデル
評価手法
透かし技術

BEACON: Behavioral Malware Classification with Large Language Model Embeddings and Deep Learning

Authors: Wadduwage Shanika Perera, Haodi Jiang | Published: 2025-09-18
マルウェア検出シナリオ
行動解析手法
評価手法

Defending Diffusion Models Against Membership Inference Attacks via Higher-Order Langevin Dynamics

Authors: Benjamin Sterling, Yousef El-Laham, Mónica F. Bugallo | Published: 2025-09-17
プライバシー分析
拡散モデル
生成モデル特性

Differential Privacy in Federated Learning: Mitigating Inference Attacks with Randomized Response

Authors: Ozer Ozturk, Busra Buyuktanir, Gozde Karatas Baydogmus, Kazim Yildiz | Published: 2025-09-17
アルゴリズム
プライバシー分析
差分プライバシー

Who Taught the Lie? Responsibility Attribution for Poisoned Knowledge in Retrieval-Augmented Generation

Authors: Baolei Zhang, Haoran Xin, Yuxi Chen, Zhuqing Liu, Biao Yi, Tong Li, Lihai Nie, Zheli Liu, Minghong Fang | Published: 2025-09-17
RAGへのポイズニング攻撃
評価手法
責任帰属システム設計

Secure UAV-assisted Federated Learning: A Digital Twin-Driven Approach with Zero-Knowledge Proofs

Authors: Md Bokhtiar Al Zami, Md Raihan Uddin, Dinh C. Nguyen | Published: 2025-09-17
エネルギー管理
デジタルツイン技術
連合学習

Privacy-Aware In-Context Learning for Large Language Models

Authors: Bishnu Bhusal, Manoj Acharya, Ramneet Kaur, Colin Samplawski, Anirban Roy, Adam D. Cobb, Rohit Chadha, Susmit Jha | Published: 2025-09-17
差分プライバシー
情報抽出
透かし

A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks

Authors: S M Asif Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, M. F. Mridha, Jungpil Shin | Published: 2025-09-16 | Updated: 2025-10-01
インダイレクトプロンプトインジェクション
プロンプトインジェクション
分散型LLMアーキテクチャ

Sy-FAR: Symmetry-based Fair Adversarial Robustness

Authors: Haneen Najjar, Eyal Ronen, Mahmood Sharif | Published: 2025-09-16
公平性の確保
敵対的学習
評価指標

Jailbreaking Large Language Models Through Content Concretization

Authors: Johan Wahréus, Ahmed Hussain, Panos Papadimitratos | Published: 2025-09-16
プロンプトインジェクション
モデル評価
評価指標