RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs

Authors: Xuan Chen, Yuzhou Nie, Lu Yan, Yunshu Mao, Wenbo Guo, Xiangyu Zhang | Published: 2024-06-13

Noise-Aware Differentially Private Regression via Meta-Learning

Authors: Ossi Räisä, Stratis Markou, Matthew Ashman, Wessel P. Bruinsma, Marlon Tobaben, Antti Honkela, Richard E. Turner | Published: 2024-06-12

Malicious URL Detection using optimized Hist Gradient Boosting Classifier based on grid search method

Authors: Mohammad Maftoun, Nima Shadkam, Seyedeh Somayeh Salehi Komamardakhi, Zulkefli Mansor, Javad Hassannataj Joloudari | Published: 2024-06-12

Efficient Network Traffic Feature Sets for IoT Intrusion Detection

Authors: Miguel Silva, João Vitorino, Eva Maia, Isabel Praça | Published: 2024-06-12

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

Authors: Edoardo Debenedetti, Javier Rando, Daniel Paleka, Silaghi Fineas Florin, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin, Robin Schmid, Victor Klemm, Takahiro Miki, Chenhao Li, Stefan Kraft, Mario Fritz, Florian Tramèr, Sahar Abdelnabi, Lea Schönherr | Published: 2024-06-12

A Study of Backdoors in Instruction Fine-tuned Language Models

Authors: Jayaram Raghuram, George Kesidis, David J. Miller | Published: 2024-06-12 | Updated: 2024-08-21

Knowledge Return Oriented Prompting (KROP)

Authors: Jason Martin, Kenneth Yeung | Published: 2024-06-11

LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing

Authors: Hongxiang Zhang, Yuyang Rong, Yifeng He, Hao Chen | Published: 2024-06-11 | Updated: 2024-06-13

Adversarial Machine Unlearning

Authors: Zonglin Di, Sixie Yu, Yevgeniy Vorobeychik, Yang Liu | Published: 2024-06-11

Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis

Authors: Matteo Esposito, Francesco Palagiano, Valentina Lenarduzzi, Davide Taibi | Published: 2024-06-11 | Updated: 2024-09-06