Literature Database 文献データベースでは、AIセキュリティに関する文献情報を分類・集約しています。詳しくは文献データベースについてをご覧ください。 The Literature Database categorizes and aggregates literature related to AI security. For more details, please see About Literature Database.
Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability Authors: Shova Kuikel, Aritran Piplai, Palvi Aggarwal | Published: 2025-06-16 AlignmentPrompt InjectionLarge Language Model 2025.06.16 2025.06.18 Literature Database
Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models Authors: Arjun Krishna, Aaditya Rastogi, Erick Galinkin | Published: 2025-06-16 Prompt InjectionLarge Language ModelAdversarial Attack Methods 2025.06.16 2025.06.18 Literature Database
Watermarking LLM-Generated Datasets in Downstream Tasks Authors: Yugeng Liu, Tianshuo Cong, Michael Backes, Zheng Li, Yang Zhang | Published: 2025-06-16 Prompt leakingModel Protection MethodsDigital Watermarking for Generative AI 2025.06.16 2025.06.18 Literature Database
From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs Authors: Alsharif Abuadbba, Chris Hicks, Kristen Moore, Vasilios Mavroudis, Burak Hasircioglu, Diksha Goel, Piers Jennings | Published: 2025-06-16 Indirect Prompt InjectionCybersecurityEducation and Follow-up 2025.06.16 2025.06.18 Literature Database
Using LLMs for Security Advisory Investigations: How Far Are We? Authors: Bayu Fedra Abdullah, Yusuf Sulistyo Nugroho, Brittany Reid, Raula Gaikovina Kula, Kazumasa Shimari, Kenichi Matsumoto | Published: 2025-06-16 Advice ProvisionHallucinationPrompt leaking 2025.06.16 2025.06.18 Literature Database
Detecting Hard-Coded Credentials in Software Repositories via LLMs Authors: Chidera Biringa, Gokhan Kul | Published: 2025-06-16 Software SecurityPerformance EvaluationPrompt leaking 2025.06.16 2025.06.18 Literature Database
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark Authors: Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng | Published: 2025-06-12 Data Collection MethodPrompt leakingCalculation of Output Harmfulness 2025.06.12 2025.06.14 Literature Database
Unsourced Adversarial CAPTCHA: A Bi-Phase Adversarial CAPTCHA Framework Authors: Xia Du, Xiaoyuan Liu, Jizhe Zhou, Zheng Lin, Chi-man Pun, Zhe Chen, Wei Ni, Jun Luo | Published: 2025-06-12 Certified RobustnessAdversarial LearningAdversarial Attack Detection 2025.06.12 2025.06.14 Literature Database
SoK: Evaluating Jailbreak Guardrails for Large Language Models Authors: Xunguang Wang, Zhenlan Ji, Wenxuan Wang, Zongjie Li, Daoyuan Wu, Shuai Wang | Published: 2025-06-12 Prompt InjectionTrade-Off Between Safety And Usability脱獄攻撃手法 2025.06.12 2025.06.14 Literature Database
SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks Authors: Kaiyuan Zhang, Siyuan Cheng, Hanxi Guo, Yuetian Chen, Zian Su, Shengwei An, Yuntao Du, Charles Fleming, Ashish Kundu, Xiangyu Zhang, Ninghui Li | Published: 2025-06-12 Privacy Protection MethodPrompt InjectionPrompt leaking 2025.06.12 2025.06.14 Literature Database