LLMセキュリティ

Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

Authors: Manish Bhatt, Sahana Chennabasappa, Cyrus Nikolaidis, Shengye Wan, Ivan Evtimov, Dominik Gabi, Daniel Song, Faizan Ahmad, Cornelius Aschermann, Lorenzo Fontana, Sasha Frolov, Ravi Prakash Giri, Dhaval Kapil, Yiannis Kozyrakis, David LeBlanc, James Milazzo, Aleksandar Straumann, Gabriel Synnaeve, Varun Vontimitta, Spencer Whitman, Joshua Saxe | Published: 2023-12-07

LLMセキュリティ

サイバーセキュリティ

プロンプトインジェクション

2023.12.07 2025.04.03

文献データベース

FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models

Authors: Dongyu Yao, Jianshu Zhang, Ian G. Harris, Marcel Carlsson | Published: 2023-09-11 | Updated: 2024-04-14

LLMセキュリティ

ウォーターマーキング

プロンプトインジェクション

2023.09.11 2025.04.03

文献データベース

Detecting Language Model Attacks with Perplexity

Authors: Gabriel Alon, Michael Kamfonas | Published: 2023-08-27 | Updated: 2023-11-07

LLMセキュリティ

プロンプトインジェクション

悪意のあるプロンプト

2023.08.27 2025.04.03

文献データベース

ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching

Authors: M. Caner Tol, Berk Sunar | Published: 2023-08-24

LLMセキュリティ

脆弱性回避手法

透かしの耐久性

2023.08.24 2025.04.03

文献データベース

Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments

Authors: Maria Rigaki, Ondřej Lukáš, Carlos A. Catania, Sebastian Garcia | Published: 2023-08-23 | Updated: 2023-08-28

LLMセキュリティ

実験的検証

強化学習環境

2023.08.23 2025.04.03

文献データベース

DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection

Authors: Sudipta Paria, Aritra Dasgupta, Swarup Bhunia | Published: 2023-08-14

LLMセキュリティ

セキュリティ保証

脆弱性回避手法

2023.08.14 2025.04.03

文献データベース

“Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

Authors: Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang | Published: 2023-08-07 | Updated: 2024-05-15

LLMセキュリティ

キャラクター役割演技

プロンプトインジェクション

2023.08.07 2025.04.03

文献データベース

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

Authors: Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin | Published: 2023-07-31 | Updated: 2024-04-03

LLMセキュリティ

システムプロンプト生成

プロンプトインジェクション

2023.07.31 2025.04.03

文献データベース

Universal and Transferable Adversarial Attacks on Aligned Language Models

Authors: Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson | Published: 2023-07-27 | Updated: 2023-12-20

LLMセキュリティ

プロンプトインジェクション

不適切コンテンツ生成

2023.07.27 2025.04.03

文献データベース

Backdoor Attacks for In-Context Learning with Language Models

Authors: Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, Nicholas Carlini | Published: 2023-07-27

LLMセキュリティ

バックドア攻撃

プロンプトインジェクション

2023.07.27 2025.04.03

文献データベース