Generative Model

bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs

Authors: Wence Ji, Jiancan Wu, Aiying Li, Shuyi Zhang, Junkang Wu, An Zhang, Xiang Wang, Xiangnan He | Published: 2025-09-24
Disabling Safety Mechanisms of LLM
Prompt Injection
Generative Model

Exploring the Secondary Risks of Large Language Models

Authors: Jiawei Chen, Zhengwei Fang, Xiao Yang, Chao Yu, Zhaoxia Yin, Hang Su | Published: 2025-06-14 | Updated: 2025-09-25
Indirect Prompt Injection
Prompt leaking
Generative Model

GIFDL: Generated Image Fluctuation Distortion Learning for Enhancing Steganographic Security

Authors: Xiangkun Wang, Kejiang Chen, Yuang Qi, Ruiheng Liu, Weiming Zhang, Nenghai Yu | Published: 2025-04-21
Adversarial Learning
Generative Model
Watermarking Technology

Tempest: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search

Authors: Andy Zhou, Ron Arel | Published: 2025-03-13 | Updated: 2025-05-21
Disabling Safety Mechanisms of LLM
Attack Method
Generative Model

Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking

Authors: Yijie Xu, Aiwei Liu, Xuming Hu, Lijie Wen, Hui Xiong | Published: 2025-03-06 | Updated: 2025-03-15
Digital Watermarking for Generative AI
Generative Model
Watermark Removal Technology

Cost-Effective Hallucination Detection for LLMs

Authors: Simon Valentin, Jinmiao Fu, Gianluca Detommaso, Shaoyuan Xu, Giovanni Zappella, Bryan Wang | Published: 2024-07-31 | Updated: 2024-08-09
Hallucination
Detection of Hallucinations
Generative Model

SecretGen: Privacy Recovery on Pre-Trained Models via Distribution Discrimination

Authors: Zhuowen Yuan, Fan Wu, Yunhui Long, Chaowei Xiao, Bo Li | Published: 2022-07-25
Privacy Classification
Privacy Leakage
Generative Model

Generative Models for Security: Attacks, Defenses, and Opportunities

Authors: Luke A. Bauer, Vincent Bindschaedler | Published: 2021-07-21 | Updated: 2021-07-29
Poisoning
Attack Method
Generative Model

PassFlow: Guessing Passwords with Generative Flows

Authors: Giulio Pagnotta, Dorjan Hitaj, Fabio De Gaspari, Luigi V. Mancini | Published: 2021-05-13 | Updated: 2021-12-14
Password Guessing
Performance Evaluation
Generative Model

Improving Query Efficiency of Black-box Adversarial Attack

Authors: Yang Bai, Yuyuan Zeng, Yong Jiang, Yisen Wang, Shu-Tao Xia, Weiwei Guo | Published: 2020-09-24 | Updated: 2020-09-25
Performance Evaluation
Selection and Evaluation of Optimization Algorithms
Generative Model