LLMセキュリティ

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Authors: Anselm Paulus, Arman Zharmagambetov, Chuan Guo, Brandon Amos, Yuandong Tian | Published: 2024-04-21

LLMセキュリティ

プロンプトインジェクション

プロンプトエンジニアリング

2024.04.21 2025.04.03

文献データベース

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Authors: Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil, David Molnar, Spencer Whitman, Joshua Saxe | Published: 2024-04-19

LLMセキュリティ

サイバーセキュリティ

プロンプトインジェクション

2024.04.19 2025.04.03

文献データベース

LLMs for Cyber Security: New Opportunities

Authors: Dinil Mon Divakaran, Sai Teja Peddinti | Published: 2024-04-17

LLMセキュリティ

サイバーセキュリティ

2024.04.17 2025.04.03

文献データベース

Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward

Authors: Xuan Xie, Jiayang Song, Zhehua Zhou, Yuheng Huang, Da Song, Lei Ma | Published: 2024-04-12

LLMセキュリティ

LLM性能評価

プロンプトインジェクション

2024.04.12 2025.04.03

文献データベース

Subtoxic Questions: Dive Into Attitude Change of LLM’s Response in Jailbreak Attempts

Authors: Tianyu Zhang, Zixuan Zhao, Jiaqi Huang, Jingyu Hua, Sheng Zhong | Published: 2024-04-12

LLMセキュリティ

プロンプトインジェクション

プロンプトエンジニアリング

2024.04.12 2025.04.03

文献データベース

Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs

Authors: Bibek Upadhayay, Vahid Behzadan | Published: 2024-04-09

LLMセキュリティ

プロンプトインジェクション

攻撃手法

2024.04.09 2025.04.03

文献データベース

Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security

Authors: Yihe Fan, Yuxin Cao, Ziyu Zhao, Ziyao Liu, Shaofeng Li | Published: 2024-04-08 | Updated: 2024-08-11

LLMセキュリティ

プロンプトインジェクション

脅威モデリング

2024.04.08 2025.04.03

文献データベース

PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics

Authors: Derui Zhu, Dingfan Chen, Qing Li, Zongxiong Chen, Lei Ma, Jens Grossklags, Mario Fritz | Published: 2024-04-06

LLMセキュリティ

LLM性能評価

評価手法

2024.04.06 2025.04.03

文献データベース

Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes

Authors: Divyanshu Kumar, Anurakt Kumar, Sahil Agarwal, Prashanth Harshangi | Published: 2024-04-05 | Updated: 2024-09-09

LLMセキュリティ

プロンプトインジェクション

安全性アライメント

2024.04.05 2025.04.03

文献データベース

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Authors: Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion | Published: 2024-04-02 | Updated: 2024-10-07

LLMセキュリティ

プロンプトインジェクション

攻撃手法

2024.04.02 2025.04.03

文献データベース