Enhancing Robustness of Graph Neural Networks through p-Laplacian

Authors: Anuj Kumar Sirohi, Subhanu Halder, Kabir Kumar, Sandeep Kumar | Published: 2024-09-27

System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective

Authors: Fangzhou Wu, Ethan Cecchetti, Chaowei Xiao | Published: 2024-09-27 | Updated: 2024-10-10

An Adversarial Perspective on Machine Unlearning for AI Safety

Authors: Jakub Łucki, Boyi Wei, Yangsibo Huang, Peter Henderson, Florian Tramèr, Javier Rando | Published: 2024-09-26 | Updated: 2025-04-10

Weak-to-Strong Backdoor Attack for Large Language Models

Authors: Shuai Zhao, Leilei Gan, Zhongliang Guo, Xiaobao Wu, Luwei Xiao, Xiaoyu Xu, Cong-Duy Nguyen, Luu Anh Tuan | Published: 2024-09-26 | Updated: 2024-10-13

MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks

Authors: Giandomenico Cornacchia, Giulio Zizzo, Kieran Fraser, Muhammad Zaid Hameed, Ambrish Rawat, Mark Purcell | Published: 2024-09-26 | Updated: 2024-10-04

A novel application of Shapley values for large multidimensional time-series data: Applying explainable AI to a DNA profile classification neural network

Authors: Lauren Elborough, Duncan Taylor, Melissa Humphries | Published: 2024-09-26

Multi-Designated Detector Watermarking for Language Models

Authors: Zhengan Huang, Gongxian Zeng, Xin Mu, Yu Wang, Yue Yu | Published: 2024-09-26 | Updated: 2024-10-01

The poison of dimensionality

Authors: Lê-Nguyên Hoang | Published: 2024-09-25

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method

Authors: Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng | Published: 2024-09-23 | Updated: 2025-04-01

Order of Magnitude Speedups for LLM Membership Inference

Authors: Rongting Zhang, Martin Bertran, Aaron Roth | Published: 2024-09-22 | Updated: 2024-09-24