LLM性能評価

Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs

Authors: Pan Suo, Yu-Ming Shang, San-Chuan Guo, Xi Zhang | Published: 2025-04-30

LLM性能評価

RAGへのポイズニング攻撃

攻撃タイプ

2025.04.30

文献データベース

Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code

Authors: Md. Azizul Hakim Bappy, Hossen A Mustafa, Prottoy Saha, Rajinus Salehat | Published: 2025-04-23

LLM性能評価

トレーニング手法

プロンプトリーキング

2025.04.23

文献データベース

aiXamine: LLM Safety and Security Simplified

Authors: Fatih Deniz, Dorde Popovic, Yazan Boshmaf, Euisuh Jeong, Minhaj Ahmad, Sanjay Chawla, Issa Khalil | Published: 2025-04-21

LLM性能評価

アライメント

パフォーマンス評価

2025.04.21

文献データベース

Watermarking Needs Input Repetition Masking

Authors: David Khachaturov, Robert Mullins, Ilia Shumailov, Sumanth Dathathri | Published: 2025-04-16

LLM性能評価

プロンプトの検証

透かし設計

2025.04.16

文献データベース

The Digital Cybersecurity Expert: How Far Have We Come?

Authors: Dawei Wang, Geng Zhou, Xianglong Li, Yu Bai, Li Chen, Ting Qin, Jian Sun, Dan Li | Published: 2025-04-16

LLM性能評価

RAGへのポイズニング攻撃

プロンプトインジェクション

2025.04.16

文献データベース

Progent: Programmable Privilege Control for LLM Agents

Authors: Tianneng Shi, Jingxuan He, Zhun Wang, Linyu Wu, Hongwei Li, Wenbo Guo, Dawn Song | Published: 2025-04-16

LLM性能評価

インダイレクトプロンプトインジェクション

プライバシー保護メカニズム

2025.04.16

文献データベース

Exploring Backdoor Attack and Defense for LLM-empowered Recommendations

Authors: Liangbo Ning, Wenqi Fan, Qing Li | Published: 2025-04-15

LLM性能評価

RAGへのポイズニング攻撃

敵対的攻撃分析

2025.04.15

文献データベース

Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails

Authors: William Hackett, Lewis Birch, Stefan Trawicki, Neeraj Suri, Peter Garraghan | Published: 2025-04-15

LLM性能評価

プロンプトインジェクション

敵対的攻撃分析

2025.04.15

文献データベース

StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models

Authors: Yang Feng, Xudong Pan | Published: 2025-04-14

LLM性能評価

インダイレクトプロンプトインジェクション

悪意のあるウェブサイト検出

2025.04.14

文献データベース

An Investigation of Large Language Models and Their Vulnerabilities in Spam Detection

Authors: Qiyao Tang, Xiangyang Li | Published: 2025-04-14

LLM性能評価

プロンプトインジェクション

モデルDoS

2025.04.14

文献データベース