プロンプトインジェクション

Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild

Authors: Giuseppe Siracusano, Davide Sanvito, Roberto Gonzalez, Manikantan Srinivasan, Sivakaman Kamatchi, Wataru Takahashi, Masaru Kawakita, Takahiro Kakumaru, Roberto Bifulco | Published: 2023-07-14

データセット生成

プロンプトインジェクション

攻撃パターン抽出

2023.07.14 2025.04.03

文献データベース

Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots

Authors: Bocheng Chen, Guangjing Wang, Hanqing Guo, Yuanda Wang, Qiben Yan | Published: 2023-07-14

プロンプトインジェクション

対話システム

攻撃の評価

2023.07.14 2025.04.03

文献データベース

Effective Prompt Extraction from Language Models

Authors: Yiming Zhang, Nicholas Carlini, Daphne Ippolito | Published: 2023-07-13 | Updated: 2024-08-07

プロンプトインジェクション

プロンプトリーキング

対話システム

2023.07.13 2025.04.03

文献データベース

Jailbroken: How Does LLM Safety Training Fail?

Authors: Alexander Wei, Nika Haghtalab, Jacob Steinhardt | Published: 2023-07-05

セキュリティ保証

プロンプトインジェクション

敵対的攻撃手法

2023.07.05 2025.04.03

文献データベース

On the Exploitability of Instruction Tuning

Authors: Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, Tom Goldstein | Published: 2023-06-28 | Updated: 2023-10-28

プロンプトインジェクション

ポイズニング

敵対的攻撃検出

2023.06.28 2025.04.03

文献データベース

Are aligned neural networks adversarially aligned?

Authors: Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt | Published: 2023-06-26 | Updated: 2024-05-06

プロンプトインジェクション

敵対的サンプル

敵対的攻撃手法

2023.06.26 2025.04.03

文献データベース

ChatIDS: Explainable Cybersecurity Using Generative AI

Authors: Victor Jüttner, Martin Grimmer, Erik Buchmann | Published: 2023-06-26

オンライン安全性アドバイス

プロンプトインジェクション

専門家の意見収集

2023.06.26 2025.04.03

文献データベース

On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions

Authors: Reza Fayyazi, Shanchieh Jay Yang | Published: 2023-06-24 | Updated: 2023-08-22

プロンプトインジェクション

マルウェア分類

自然言語処理

2023.06.24 2025.04.03

文献データベース

Visual Adversarial Examples Jailbreak Aligned Large Language Models

Authors: Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal | Published: 2023-06-22 | Updated: 2023-08-16

プロンプトインジェクション

不適切コンテンツ生成

敵対的攻撃

2023.06.22 2025.04.03

文献データベース

Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models

Authors: Myles Foley, Ambrish Rawat, Taesung Lee, Yufang Hou, Gabriele Picco, Giulio Zizzo | Published: 2023-06-15

LLM性能評価

アルゴリズム

プロンプトインジェクション

2023.06.15 2025.04.03

文献データベース