評価手法

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

Authors: Tianle Gu, Zeyang Zhou, Kexin Huang, Dandan Liang, Yixu Wang, Haiquan Zhao, Yuanqi Yao, Xingge Qiao, Keqing Wang, Yujiu Yang, Yan Teng, Yu Qiao, Yingchun Wang | Published: 2024-06-11 | Updated: 2024-06-13
LLM性能評価
データセット生成
評価手法

Ollabench: Evaluating LLMs’ Reasoning for Human-centric Interdependent Cybersecurity

Authors: Tam n. Nguyen | Published: 2024-06-11
LLM性能評価
サイバーセキュリティ
評価手法

Robust Distribution Learning with Local and Global Adversarial Corruptions

Authors: Sloan Nietert, Ziv Goldfeld, Soroosh Shafiee | Published: 2024-06-10 | Updated: 2024-06-24
ウォーターマーキング
損失関数
評価手法

Auditing Differential Privacy Guarantees Using Density Estimation

Authors: Antti Koskela, Jafar Mohammadi | Published: 2024-06-07 | Updated: 2024-10-11
プライバシー保護手法
評価手法
透かし評価

ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning

Authors: Zhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia, Bo Li, Radha Poovendran | Published: 2024-05-31 | Updated: 2024-06-05
ポイズニング
評価手法
防御手法

Revisit, Extend, and Enhance Hessian-Free Influence Functions

Authors: Ziao Yang, Han Yue, Jian Chen, Hongfu Liu | Published: 2024-05-25 | Updated: 2024-10-20
ポイズニング
モデル性能評価
評価手法

Lost in the Averages: A New Specific Setup to Evaluate Membership Inference Attacks Against Machine Learning Models

Authors: Florent Guépin, Nataša Krčo, Matthieu Meeus, Yves-Alexandre de Montjoye | Published: 2024-05-24
メンバーシップ推論
評価手法

Towards Certification of Uncertainty Calibration under Adversarial Attacks

Authors: Cornelius Emde, Francesco Pinto, Thomas Lukasiewicz, Philip H. S. Torr, Adel Bibi | Published: 2024-05-22
評価手法
透かし評価
難易度キャリブレーション

Geometry-Aware Instrumental Variable Regression

Authors: Heiner Kremer, Bernhard Schölkopf | Published: 2024-05-19
ウォーターマーキング
最適化問題
評価手法

A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks

Authors: Udi Aharon, Ran Dubin, Amit Dvir, Chen Hajaj | Published: 2024-05-18 | Updated: 2024-09-15
モデル性能評価
異常検出手法
評価手法