評価手法

Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy

Authors: Haoqi Wu, Wei Dai, Li Wang, Qiang Yan | Published: 2025-05-09 | Updated: 2025-05-15
トークン識別手法
プライバシー設計原則
評価手法

Defending against Indirect Prompt Injection by Instruction Detection

Authors: Tongyu Wen, Chenglong Wang, Xiyuan Yang, Haoyu Tang, Yueqi Xie, Lingjuan Lyu, Zhicheng Dou, Fangzhao Wu | Published: 2025-05-08 | Updated: 2025-09-17
プロンプトの検証
評価手法
透かし技術

Towards a standardized methodology and dataset for evaluating LLM-based digital forensic timeline analysis

Authors: Hudan Studiawan, Frank Breitinger, Mark Scanlon | Published: 2025-05-06
LLM性能評価
大規模言語モデル
評価手法

GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods

Authors: Ruixuan Huang, Xunguang Wang, Zongjie Li, Daoyuan Wu, Shuai Wang | Published: 2025-02-24 | Updated: 2025-07-09
プロンプトインジェクション
脱獄手法
評価手法

Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs

Authors: Samuele Pasini, Jinhan Kim, Tommaso Aiello, Rocio Cabrera Lozoya, Antonino Sabetta, Paolo Tonella | Published: 2024-11-27 | Updated: 2025-09-17
RAG
RAGへのポイズニング攻撃
評価手法

Variational Bayesian Bow tie Neural Networks with Shrinkage

Authors: Alisa Sheinkman, Sara Wade | Published: 2024-11-17 | Updated: 2024-11-19
スパースモデル
最適化問題
評価手法

FEDLAD: Federated Evaluation of Deep Leakage Attacks and Defenses

Authors: Isaac Baglin, Xiatian Zhu, Simon Hadfield | Published: 2024-11-05 | Updated: 2025-01-05
ポイズニング
攻撃の評価
評価手法

Can LLMs be Scammed? A Baseline Measurement Study

Authors: Udari Madhushani Sehwag, Kelly Patel, Francesca Mosca, Vineeth Ravi, Jessica Staddon | Published: 2024-10-14
LLM性能評価
プロンプトインジェクション
評価手法

FedCert: Federated Accuracy Certification

Authors: Minh Hieu Nguyen, Huu Tien Nguyen, Trung Thanh Nguyen, Manh Duong Nguyen, Trong Nghia Hoang, Truong Thao Nguyen, Phi Le Nguyen | Published: 2024-10-04
評価手法

A novel application of Shapley values for large multidimensional time-series data: Applying explainable AI to a DNA profile classification neural network

Authors: Lauren Elborough, Duncan Taylor, Melissa Humphries | Published: 2024-09-26
アルゴリズム
ウォーターマーキング
評価手法