Evaluation Method

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

Authors: Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, Willie Neiswanger | Published: 2025-06-08 | Updated: 2025-06-11
API Security
Evaluation Method
Selection Method

DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response

Authors: Bilel Cherif, Tamas Bisztray, Richard A. Dubniczky, Aaesha Aldahmani, Saeed Alshehhi, Norbert Tihanyi | Published: 2025-05-26
Hallucination
Model Performance Evaluation
Evaluation Method

Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy

Authors: Haoqi Wu, Wei Dai, Li Wang, Qiang Yan | Published: 2025-05-09 | Updated: 2025-05-15
Token Identification Method
Privacy Design Principles
Evaluation Method

Towards a standardized methodology and dataset for evaluating LLM-based digital forensic timeline analysis

Authors: Hudan Studiawan, Frank Breitinger, Mark Scanlon | Published: 2025-05-06
LLM Performance Evaluation
Large Language Model
Evaluation Method

Variational Bayesian Bow tie Neural Networks with Shrinkage

Authors: Alisa Sheinkman, Sara Wade | Published: 2024-11-17 | Updated: 2024-11-19
Sparse Model
Optimization Problem
Evaluation Method

FEDLAD: Federated Evaluation of Deep Leakage Attacks and Defenses

Authors: Isaac Baglin, Xiatian Zhu, Simon Hadfield | Published: 2024-11-05 | Updated: 2025-01-05
Poisoning
Attack Evaluation
Evaluation Method

Can LLMs be Scammed? A Baseline Measurement Study

Authors: Udari Madhushani Sehwag, Kelly Patel, Francesca Mosca, Vineeth Ravi, Jessica Staddon | Published: 2024-10-14
LLM Performance Evaluation
Prompt Injection
Evaluation Method

FedCert: Federated Accuracy Certification

Authors: Minh Hieu Nguyen, Huu Tien Nguyen, Trung Thanh Nguyen, Manh Duong Nguyen, Trong Nghia Hoang, Truong Thao Nguyen, Phi Le Nguyen | Published: 2024-10-04
Evaluation Method

A novel application of Shapley values for large multidimensional time-series data: Applying explainable AI to a DNA profile classification neural network

Authors: Lauren Elborough, Duncan Taylor, Melissa Humphries | Published: 2024-09-26
Algorithm
Watermarking
Evaluation Method

LLM-Enhanced Software Patch Localization

Authors: Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang | Published: 2024-09-10 | Updated: 2024-09-13
LLM Performance Evaluation
Understanding Commit Content
Evaluation Method