Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning Authors: Arshiya Khan, Guannan Liu, Xing Gao | Published: 2024-09-27 | Updated: 2025-06-11 コード脆弱性修復セキュリティコンテキスト統合Large Language Model 2024.09.27 2025.06.13 Literature Database
Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Carrier Articles Authors: Zhilong Wang, Haizhou Wang, Nanqing Luo, Lan Zhang, Xiaoyan Sun, Yebo Cao, Peng Liu | Published: 2024-08-20 | Updated: 2025-02-07 Prompt InjectionLarge Language ModelAttack Scenario Analysis 2024.08.20 2025.05.27 Literature Database
From Theft to Bomb-Making: The Ripple Effect of Unlearning in Defending Against Jailbreak Attacks Authors: Zhexin Zhang, Junxiao Yang, Yida Lu, Pei Ke, Shiyao Cui, Chujie Zheng, Hongning Wang, Minlie Huang | Published: 2024-07-03 | Updated: 2025-05-20 Prompt InjectionLarge Language Model法執行回避 2024.07.03 2025.05.28 Literature Database
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models Authors: Shangqing Tu, Zhuoran Pan, Wenxuan Wang, Zhexin Zhang, Yuliang Sun, Jifan Yu, Hongning Wang, Lei Hou, Juanzi Li | Published: 2024-06-17 | Updated: 2025-06-09 Cooperative Effects with LLMPrompt InjectionLarge Language Model 2024.06.17 2025.06.11 Literature Database
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models Authors: Xiaohan Yuan, Jinfeng Li, Dongxia Wang, Yuefeng Chen, Xiaofeng Mao, Longtao Huang, Jialuo Chen, Hui Xue, Xiaoxia Liu, Wenhai Wang, Kui Ren, Jingyi Wang | Published: 2024-05-23 | Updated: 2025-04-07 Risk Analysis MethodLarge Language ModelSafety Alignment 2024.05.23 2025.05.27 Literature Database
Watermark Stealing in Large Language Models Authors: Nikola Jovanović, Robin Staab, Martin Vechev | Published: 2024-02-29 | Updated: 2024-06-24 Model Extraction AttackLarge Language ModelTaxonomy of Attacks 2024.02.29 2025.05.27 Literature Database
Measuring Implicit Bias in Explicitly Unbiased Large Language Models Authors: Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, Thomas L. Griffiths | Published: 2024-02-06 | Updated: 2024-05-23 Bias Detection in AI OutputAlgorithm FairnessLarge Language Model 2024.02.06 2025.05.27 Literature Database
Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models Authors: Jiang Zhang, Qiong Wu, Yiming Xu, Cheng Cao, Zheng Du, Konstantinos Psounis | Published: 2023-12-13 Prompting StrategyCalculation of Output HarmfulnessLarge Language Model 2023.12.13 2025.05.28 Literature Database
Gender bias and stereotypes in Large Language Models Authors: Hadas Kotek, Rikker Dockum, David Q. Sun | Published: 2023-08-28 Bias Detection in AI OutputAlgorithm FairnessLarge Language Model 2023.08.28 2025.05.28 Literature Database
Toxicity Detection with Generative Prompt-based Inference Authors: Yau-Shian Wang, Yingshan Chang | Published: 2022-05-24 Prompting StrategyCalculation of Output HarmfulnessLarge Language Model 2022.05.24 2025.05.28 Literature Database