AIセキュリティポータルbot

TuBA: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning

Authors: Xuanli He, Jun Wang, Qiongkai Xu, Pasquale Minervini, Pontus Stenetorp, Benjamin I. P. Rubinstein, Trevor Cohn | Published: 2024-04-30 | Updated: 2025-03-17
Content Moderation
Backdoor Attack
Prompt Injection

Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks

Authors: Yi Li, Renyou Xie, Chaojie Li, Yi Wang, Zhaoyang Dong | Published: 2024-04-30
Watermarking
Model Performance Evaluation
Personalization Method

Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots

Authors: Xi Xin, Giles Hooker, Fei Huang | Published: 2024-04-29 | Updated: 2024-05-01
Model Interpretability
Adversarial Training
Watermark Evaluation

Evaluating and Mitigating Linguistic Discrimination in Large Language Models

Authors: Guoliang Dong, Haoyu Wang, Jun Sun, Xinyu Wang | Published: 2024-04-29 | Updated: 2024-05-10
LLM Performance Evaluation
Bias
Prompt Injection

Exploring the Robustness of In-Context Learning with Noisy Labels

Authors: Chen Cheng, Xinzhi Yu, Haodong Wen, Jingsong Sun, Guanzhang Yue, Yihao Zhang, Zeming Wei | Published: 2024-04-28 | Updated: 2024-05-01
Model Performance Evaluation
Workshop Survey
Convergence Analysis

Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

Authors: Yunzhen Feng, Tim G. J. Rudner, Nikolaos Tsilivis, Julia Kempe | Published: 2024-04-27
Quantification of Uncertainty
Adversarial Example
Watermark Evaluation

Evaluations of Machine Learning Privacy Defenses are Misleading

Authors: Michael Aerni, Jie Zhang, Florian Tramèr | Published: 2024-04-26 | Updated: 2024-09-05
Privacy Protection Method
Membership Inference
Adversarial Example

Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

Authors: Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, Yu Jiang | Published: 2024-04-26
Poisoning attack on RAG
Prompt leaking
Poisoning

An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape

Authors: Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, Bimal Viswanath | Published: 2024-04-24
Poisoning
Watermark Evaluation
Defense Method

Attacks on Third-Party APIs of Large Language Models

Authors: Wanru Zhao, Vidit Khazanchi, Haodi Xing, Xuanli He, Qiongkai Xu, Nicholas Donald Lane | Published: 2024-04-24
LLM Security
Prompt Injection
Attack Method