LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature Authors: Maxime Würsch, Andrei Kucharavy, Dimitri Percia David, Alain Mermoud | Published: 2023-12-12 LLM Performance EvaluationData PreprocessingKnowledge Extraction Method 2023.12.12 2025.05.28 Literature Database
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks Authors: Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas | Published: 2023-10-05 | Updated: 2024-06-11 LLM Performance EvaluationPrompt InjectionDefense Method 2023.10.05 2025.05.28 Literature Database
Misusing Tools in Large Language Models With Visual Adversarial Examples Authors: Xiaohan Fu, Zihan Wang, Shuheng Li, Rajesh K. Gupta, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Earlence Fernandes | Published: 2023-10-04 LLM Performance EvaluationPrompt InjectionAdversarial Example 2023.10.04 2025.05.28 Literature Database
Jailbreaker in Jail: Moving Target Defense for Large Language Models Authors: Bocheng Chen, Advait Paliwal, Qiben Yan | Published: 2023-10-03 LLM Performance EvaluationPrompt Injectionevaluation metrics 2023.10.03 2025.05.28 Literature Database
On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused? Authors: Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, Jinyuan Jia, Jinghui Chen, Dinghao Wu | Published: 2023-10-02 LLM Performance EvaluationPrompt InjectionClassification of Malicious Actors 2023.10.02 2025.05.28 Literature Database
Watch Your Language: Investigating Content Moderation with Large Language Models Authors: Deepak Kumar, Yousef AbuHashem, Zakir Durumeric | Published: 2023-09-25 | Updated: 2024-01-17 LLM Performance EvaluationPrompt InjectionInappropriate Content Generation 2023.09.25 2025.05.28 Literature Database
Can LLM-Generated Misinformation Be Detected? Authors: Canyu Chen, Kai Shu | Published: 2023-09-25 | Updated: 2024-04-23 LLM Performance EvaluationPrompt InjectionInappropriate Content Generation 2023.09.25 2025.05.28 Literature Database
Recovering from Privacy-Preserving Masking with Large Language Models Authors: Arpita Vats, Zhe Liu, Peng Su, Debjyoti Paul, Yingyi Ma, Yutong Pang, Zeeshan Ahmed, Ozlem Kalinli | Published: 2023-09-12 | Updated: 2023-12-14 LLM Performance EvaluationData Protection MethodPrivacy Technique 2023.09.12 2025.05.28 Literature Database
Evaluating Superhuman Models with Consistency Checks Authors: Lukas Fluri, Daniel Paleka, Florian Tramèr | Published: 2023-06-16 | Updated: 2023-10-19 LLM Performance EvaluationAlgorithmEvaluation Method 2023.06.16 2025.05.28 Literature Database
Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models Authors: Myles Foley, Ambrish Rawat, Taesung Lee, Yufang Hou, Gabriele Picco, Giulio Zizzo | Published: 2023-06-15 LLM Performance EvaluationAlgorithmPrompt Injection 2023.06.15 2025.05.28 Literature Database