AIセキュリティポータルbot

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

Authors: Shuyu Cheng, Yibo Miao, Yinpeng Dong, Xiao Yang, Xiao-Shan Gao, Jun Zhu | Published: 2024-05-29
Algorithm
Attack Method
Optimization Problem

Toxicity Detection for Free

Authors: Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner | Published: 2024-05-29 | Updated: 2024-11-08
Indirect Prompt Injection
Prompt validation
Malicious Prompt

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Authors: Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Alexander Branch, Gregory Pottie | Published: 2024-05-28 | Updated: 2024-06-02
Watermarking
Backdoor Attack
Poisoning

Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems

Authors: Ruochen Jiao, Shaoyuan Xie, Justin Yue, Takami Sato, Lixu Wang, Yixuan Wang, Qi Alfred Chen, Qi Zhu | Published: 2024-05-27 | Updated: 2025-04-30
LLM Security
Backdoor Attack
Prompt Injection

Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Authors: Xijie Huang, Xinyuan Wang, Hantao Zhang, Yinghao Zhu, Jiawen Xi, Jingkun An, Hao Wang, Hao Liang, Chengwei Pan | Published: 2024-05-26 | Updated: 2024-08-21
Prompt Injection
Threats of Medical AI
Attack Method

Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character

Authors: Siyuan Ma, Weidi Luo, Yu Wang, Xiaogeng Liu | Published: 2024-05-25 | Updated: 2024-06-12
LLM Security
Prompt Injection
Attack Method

Revisit, Extend, and Enhance Hessian-Free Influence Functions

Authors: Ziao Yang, Han Yue, Jian Chen, Hongfu Liu | Published: 2024-05-25 | Updated: 2024-10-20
Poisoning
Model Performance Evaluation
Evaluation Method

BadGD: A unified data-centric framework to identify gradient descent vulnerabilities

Authors: Chi-Hua Wang, Guang Cheng | Published: 2024-05-24
Backdoor Attack
Poisoning

Can Implicit Bias Imply Adversarial Robustness?

Authors: Hancheng Min, René Vidal | Published: 2024-05-24 | Updated: 2024-06-05
Algorithm
Bias
Adversarial Training

$$\mathbf{L^2\cdot M = C^2}$$ Large Language Models are Covert Channels

Authors: Simen Gaure, Stefanos Koffas, Stjepan Picek, Sondre Rønjom | Published: 2024-05-24 | Updated: 2024-10-07
LLM Performance Evaluation
Watermarking
Secure Communication Channel