Privately Aligning Language Models with Reinforcement Learning

Authors: Fan Wu, Huseyin A. Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim | Published: 2023-10-25 | Updated: 2024-05-03

Detecting Pretraining Data from Large Language Models

Authors: Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke Zettlemoyer | Published: 2023-10-25 | Updated: 2024-03-09

Robust and Actively Secure Serverless Collaborative Learning

Authors: Olive Franzese, Adam Dziedzic, Christopher A. Choquette-Choo, Mark R. Thomas, Muhammad Ahmad Kaleem, Stephan Rabanser, Congyu Fang, Somesh Jha, Nicolas Papernot, Xiao Wang | Published: 2023-10-25

Radio Frequency Fingerprinting via Deep Learning: Challenges and Opportunities

Authors: Saeif Al-Hazbi, Ahmed Hussain, Savio Sciancalepore, Gabriele Oligeri, Panos Papadimitratos | Published: 2023-10-25 | Updated: 2024-04-15

Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

Authors: Jiexin Wang, Liuwen Cao, Xitong Luo, Zhiping Zhou, Jiayuan Xie, Adam Jatowt, Yi Cai | Published: 2023-10-25

Poison is Not Traceless: Fully-Agnostic Detection of Poisoning Attacks

Authors: Xinglong Chang, Katharina Dost, Gillian Dobbie, Jörg Wicker | Published: 2023-10-24

Locally Differentially Private Document Generation Using Zero Shot Prompting

Authors: Saiteja Utpala, Sara Hooker, Pin Yu Chen | Published: 2023-10-24 | Updated: 2023-11-30

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

Authors: Sander Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, Christopher Carnahan, Jordan Boyd-Graber | Published: 2023-10-24 | Updated: 2024-03-03

SoK: Memorization in General-Purpose Large Language Models

Authors: Valentin Hartmann, Anshuman Suri, Vincent Bindschaedler, David Evans, Shruti Tople, Robert West | Published: 2023-10-24

Deceptive Fairness Attacks on Graphs via Meta Learning

Authors: Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong | Published: 2023-10-24