評価手法

Rethinking How to Evaluate Language Model Jailbreak

Authors: Hongyu Cai, Arjun Arunasalam, Leo Y. Lin, Antonio Bianchi, Z. Berkay Celik | Published: 2024-04-09 | Updated: 2024-05-07
プロンプトインジェクション
悪意のある行為者の分類
評価手法

Case Study: Neural Network Malware Detection Verification for Feature and Image Datasets

Authors: Preston K. Robinette, Diego Manzanas Lopez, Serena Serbinowska, Kevin Leach, Taylor T. Johnson | Published: 2024-04-08
ウォーターマーキング
マルウェア分類
評価手法

Contextual Chart Generation for Cyber Deception

Authors: David D. Nguyen, David Liebowitz, Surya Nepal, Salil S. Kanhere, Sharif Abuadbba | Published: 2024-04-07
データ前処理
モデル設計
評価手法

PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics

Authors: Derui Zhu, Dingfan Chen, Qing Li, Zongxiong Chen, Lei Ma, Jens Grossklags, Mario Fritz | Published: 2024-04-06
LLMセキュリティ
LLM性能評価
評価手法

SSCAE — Semantic, Syntactic, and Context-aware natural language Adversarial Examples generator

Authors: Javad Rafiei Asl, Mohammad H. Rafiei, Manar Alohaly, Daniel Takabi | Published: 2024-03-18
動的閾値計算
敵対的サンプル
評価手法

An Extensive Comparison of Static Application Security Testing Tools

Authors: Matteo Esposito, Valentina Falaschi, Davide Falessi | Published: 2024-03-14
ツールキット比較
脆弱性管理
評価手法

CovRL: Fuzzing JavaScript Engines with Coverage-Guided Reinforcement Learning for LLM-based Mutation

Authors: Jueon Eom, Seyeon Jeong, Taekyoung Kwon | Published: 2024-02-19
ファジング
強化学習
評価手法

Maatphor: Automated Variant Analysis for Prompt Injection Attacks

Authors: Ahmed Salem, Andrew Paverd, Boris Köpf | Published: 2023-12-12
LLMセキュリティ
プロンプトインジェクション
評価手法

Automated discovery of trade-off between utility, privacy and fairness in machine learning models

Authors: Bogdan Ficiu, Neil D. Lawrence, Andrei Paleyes | Published: 2023-11-27
トレードオフ分析
プライバシー保護手法
評価手法

DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release

Authors: Jie Fu, Qingqing Ye, Haibo Hu, Zhili Chen, Lulu Wang, Kuncan Wang, Xun Ran | Published: 2023-11-23 | Updated: 2023-11-29
プライバシー保護
最適化手法
評価手法