Large Language Model

S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models

Authors: Xiaohan Yuan, Jinfeng Li, Dongxia Wang, Yuefeng Chen, Xiaofeng Mao, Longtao Huang, Jialuo Chen, Hui Xue, Xiaoxia Liu, Wenhai Wang, Kui Ren, Jingyi Wang | Published: 2024-05-23 | Updated: 2025-04-07
Risk Analysis Method
Large Language Model
Safety Alignment

Watermark Stealing in Large Language Models

Authors: Nikola Jovanović, Robin Staab, Martin Vechev | Published: 2024-02-29 | Updated: 2024-06-24
Model Extraction Attack
Large Language Model
Taxonomy of Attacks

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

Authors: Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, Thomas L. Griffiths | Published: 2024-02-06 | Updated: 2024-05-23
Bias Detection in AI Output
Algorithm Fairness
Large Language Model

Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models

Authors: Jiang Zhang, Qiong Wu, Yiming Xu, Cheng Cao, Zheng Du, Konstantinos Psounis | Published: 2023-12-13
Prompting Strategy
Calculation of Output Harmfulness
Large Language Model

Gender bias and stereotypes in Large Language Models

Authors: Hadas Kotek, Rikker Dockum, David Q. Sun | Published: 2023-08-28
Bias Detection in AI Output
Algorithm Fairness
Large Language Model

Toxicity Detection with Generative Prompt-based Inference

Authors: Yau-Shian Wang, Yingshan Chang | Published: 2022-05-24
Prompting Strategy
Calculation of Output Harmfulness
Large Language Model

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

Authors: Shrimai Prabhumoye, Rafal Kocielnik, Mohammad Shoeybi, Anima Anandkumar, Bryan Catanzaro | Published: 2021-12-15 | Updated: 2022-04-15
Bias Detection in AI Output
Few-Shot Learning
Large Language Model

Measuring Bias in Contextualized Word Representations

Authors: Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, Yulia Tsvetkov | Published: 2019-06-18
Bias Detection in AI Output
Algorithm Fairness
Large Language Model

A Machine Learning Approach To Prevent Malicious Calls Over Telephony Networks

Authors: Huichen Li, Xiaojun Xu, Chang Liu, Teng Ren, Kun Wu, Xuezhi Cao, Weinan Zhang, Yong Yu, Dawn Song | Published: 2018-04-07
Large Language Model
Time-Related Features
Statistical Analysis