On Calibration of LLM-based Guard Models for Reliable Content Moderation Authors: Hongfu Liu, Hengguan Huang, Hao Wang, Xiangming Gu, Ye Wang | Published: 2024-10-14 LLM Performance EvaluationContent ModerationPrompt Injection 2024.10.14 2025.05.27 Literature Database
Can LLMs be Scammed? A Baseline Measurement Study Authors: Udari Madhushani Sehwag, Kelly Patel, Francesca Mosca, Vineeth Ravi, Jessica Staddon | Published: 2024-10-14 LLM Performance EvaluationPrompt InjectionEvaluation Method 2024.10.14 2025.05.27 Literature Database
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization Authors: Yuqing Nie, Chong Wang, Kailong Wang, Guoai Xu, Guosheng Xu, Haoyu Wang | Published: 2024-10-11 LLM Performance EvaluationPrivacy Protection 2024.10.11 2025.05.27 Literature Database
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning Authors: Tingchen Fu, Mrinank Sharma, Philip Torr, Shay B. Cohen, David Krueger, Fazl Barez | Published: 2024-10-11 LLM Performance EvaluationBackdoor AttackPoisoning 2024.10.11 2025.05.27 Literature Database
Detecting Training Data of Large Language Models via Expectation Maximization Authors: Gyuwan Kim, Yang Li, Evangelia Spiliopoulou, Jie Ma, Miguel Ballesteros, William Yang Wang | Published: 2024-10-10 | Updated: 2025-04-21 LLM Performance EvaluationMembership Inference 2024.10.10 2025.05.27 Literature Database
RealVul: Can We Detect Vulnerabilities in Web Applications with LLM? Authors: Di Cao, Yong Liao, Xiuwei Shang | Published: 2024-10-10 LLM Performance EvaluationVulnerability Management 2024.10.10 2025.05.27 Literature Database
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy Authors: Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, Wenxuan Zhou | Published: 2024-10-09 LLM Performance EvaluationPrompt Injection 2024.10.09 2025.05.27 Literature Database
Signal Watermark on Large Language Models Authors: Zhenyu Xu, Victor S. Sheng | Published: 2024-10-09 LLM Performance EvaluationWatermarkingWatermark Evaluation 2024.10.09 2025.05.27 Literature Database
Superficial Safety Alignment Hypothesis Authors: Jianwei Li, Jung-Eun Kim | Published: 2024-10-07 LLM Performance EvaluationSafety Alignment 2024.10.07 2025.05.27 Literature Database
DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech Authors: Dominika Woszczyk, Soteris Demetriou | Published: 2024-10-05 LLM Performance EvaluationPrivacy Protection 2024.10.05 2025.05.27 Literature Database