Jailbreak Distillation: Renewable Safety Benchmarking Authors: Jingyu Zhang, Ahmed Elgohary, Xiawei Wang, A S M Iftekhar, Ahmed Magooda, Benjamin Van Durme, Daniel Khashabi, Kyle Jackson | Published: 2025-05-28 Prompt InjectionModel EvaluationAttack Evaluation 2025.05.28 2025.05.30 Literature Database
Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling Authors: Yichuan Cao, Yibo Miao, Xiao-Shan Gao, Yinpeng Dong | Published: 2025-05-27 Model EvaluationExperimental ValidationAttack Evaluation 2025.05.27 2025.05.29 Literature Database
Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models Authors: Zi Liang, Haibo Hu, Qingqing Ye, Yaxin Xiao, Haoyang Li | Published: 2024-08-05 | Updated: 2025-02-12 Prompt InjectionPrompt leakingModel Evaluation 2024.08.05 2025.05.27 Literature Database
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation Authors: A B M Ashikur Rahman, Saeed Anwar, Muhammad Usman, Ajmal Mian | Published: 2024-06-13 HallucinationModel EvaluationBias in Training Data 2024.06.13 2025.05.27 Literature Database
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images Authors: Kuofeng Gao, Yang Bai, Jindong Gu, Shu-Tao Xia, Philip Torr, Zhifeng Li, Wei Liu | Published: 2024-01-20 | Updated: 2024-03-22 Model DoSModel EvaluationResource Scarcity Issues 2024.01.20 2025.05.27 Literature Database
Language Model Inversion Authors: John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush | Published: 2023-11-22 Prompt leakingModel InversionModel Evaluation 2023.11.22 2025.05.28 Literature Database
Text Embeddings Reveal (Almost) As Much As Text Authors: John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush | Published: 2023-10-10 Membership InferenceModel InversionModel Evaluation 2023.10.10 2025.05.28 Literature Database
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” Authors: Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans | Published: 2023-09-21 | Updated: 2024-05-26 HallucinationModel EvaluationBias in Training Data 2023.09.21 2025.05.28 Literature Database
Fast Yet Effective Machine Unlearning Authors: Ayush K Tarun, Vikram S Chundawat, Murari Mandal, Mohan Kankanhalli | Published: 2021-11-17 | Updated: 2023-05-31 Machine learningModel EvaluationRobustness Evaluation 2021.11.17 2025.05.28 Literature Database
End-to-end anti-spoofing with RawNet2 Authors: Hemlata Tak, Jose Patino, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans, Anthony Larcher | Published: 2020-11-02 | Updated: 2021-12-16 Detection of DeepfakesModel EvaluationSpeech Recognition Process 2020.11.02 2025.05.28 Literature Database