評価指標

PromptLocate: Localizing Prompt Injection Attacks

Authors: Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong | Published: 2025-10-14
プロンプトの検証
大規模言語モデル
評価指標

Sy-FAR: Symmetry-based Fair Adversarial Robustness

Authors: Haneen Najjar, Eyal Ronen, Mahmood Sharif | Published: 2025-09-16
公平性の確保
敵対的学習
評価指標

Jailbreaking Large Language Models Through Content Concretization

Authors: Johan Wahréus, Ahmed Hussain, Panos Papadimitratos | Published: 2025-09-16
プロンプトインジェクション
モデル評価
評価指標

Unsupervised anomaly detection on cybersecurity data streams: a case with BETH dataset

Authors: Evgeniy Eremin | Published: 2025-03-06 | Updated: 2025-06-16
サイバーセキュリティ
パフォーマンス評価
評価指標

Deceptive Fairness Attacks on Graphs via Meta Learning

Authors: Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong | Published: 2023-10-24
GNN
攻撃手法
評価指標

Revisiting Transferable Adversarial Images: Systemization, Evaluation, and New Insights

Authors: Zhengyu Zhao, Hanwei Zhang, Renjue Li, Ronan Sicre, Laurent Amsaleg, Michael Backes, Qi Li, Qian Wang, Chao Shen | Published: 2023-10-18 | Updated: 2025-09-16
モデルインバージョン
敵対的学習
評価指標

Private Synthetic Data Meets Ensemble Learning

Authors: Haoyuan Sun, Navid Azizan, Akash Srivastava, Hao Wang | Published: 2023-10-15
データ生成
プライバシー保護手法
評価指標

AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation

Authors: Filippo Perrina, Francesco Marchiori, Mauro Conti, Nino Vincenzo Verde | Published: 2023-10-04
データ生成
脅威モデリング
評価指標

Jailbreaker in Jail: Moving Target Defense for Large Language Models

Authors: Bocheng Chen, Advait Paliwal, Qiben Yan | Published: 2023-10-03
LLM性能評価
プロンプトインジェクション
評価指標

Beyond Labeling Oracles: What does it mean to steal ML models?

Authors: Avital Shafran, Ilia Shumailov, Murat A. Erdogdu, Nicolas Papernot | Published: 2023-10-03 | Updated: 2024-06-13
データ収集
知識抽出手法
評価指標