文献データベース

One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models

Authors: Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang, Yaochu Jin | Published: 2025-05-12
LLMセキュリティ
LLMの安全機構の解除
プロンプトインジェクション

I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference

Authors: Zibo Gao, Junjie Hu, Feng Guo, Yixin Zhang, Yinglong Han, Siyuan Liu, Haiyang Li, Zhiqiang Lv | Published: 2025-05-10 | Updated: 2025-05-14
LLMの安全機構の解除
プロンプトリーキング
攻撃検出手法

Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy

Authors: Haoqi Wu, Wei Dai, Li Wang, Qiang Yan | Published: 2025-05-09 | Updated: 2025-05-15
トークン識別手法
プライバシー設計原則
評価手法

AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents

Authors: Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song | Published: 2025-05-09 | Updated: 2025-05-21
インダイレクトプロンプトインジェクション
ファジング
攻撃タイプ

LLM-Text Watermarking based on Lagrange Interpolation

Authors: Jarosław Janas, Paweł Morawiecki, Josef Pieprzyk | Published: 2025-05-09 | Updated: 2025-05-12
LLMセキュリティ
プロンプトリーキング
生成AI向け電子透かし

Defending against Indirect Prompt Injection by Instruction Detection

Authors: Tongyu Wen, Chenglong Wang, Xiyuan Yang, Haoyu Tang, Yueqi Xie, Lingjuan Lyu, Zhicheng Dou, Fangzhao Wu | Published: 2025-05-08 | Updated: 2025-09-17
プロンプトの検証
評価手法
透かし技術

Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks

Authors: Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal | Published: 2025-05-08
プロンプトリーキング
攻撃手法
透かし技術

Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks

Authors: Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal | Published: 2025-05-08
プロンプトリーキング
攻撃手法
透かし技術

FedTDP: A Privacy-Preserving and Unified Framework for Trajectory Data Preparation via Federated Learning

Authors: Zhihao Zeng, Ziquan Fang, Wei Shao, Lu Chen, Yunjun Gao | Published: 2025-05-08
プライバシー設計原則
モデル設計
機械学習技術

A Weighted Byzantine Fault Tolerance Consensus Driven Trusted Multiple Large Language Models Network

Authors: Haoxiang Luo, Gang Sun, Yinqiu Liu, Dongcheng Zhao, Dusit Niyato, Hongfang Yu, Schahram Dustdar | Published: 2025-05-08
ビザンチン合意メカニズム
モデルDoS
信頼性評価