Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Authors: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn | Published: 2023-05-29 | Updated: 2024-07-29

Membership Inference Attacks against Language Models via Neighbourhood Comparison

Authors: Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schölkopf, Mrinmaya Sachan, Taylor Berg-Kirkpatrick | Published: 2023-05-29 | Updated: 2023-08-07

LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers

Authors: Xuanqi Liu, Zhuotao Liu | Published: 2023-05-28 | Updated: 2023-12-15

The Curse of Recursion: Training on Generated Data Makes Models Forget

Authors: Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, Ross Anderson | Published: 2023-05-27 | Updated: 2024-04-14

Improved Privacy-Preserving PCA Using Optimized Homomorphic Matrix Multiplication

Authors: Xirong Ma | Published: 2023-05-27 | Updated: 2023-08-17

On Evaluating Adversarial Robustness of Large Vision-Language Models

Authors: Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongxuan Li, Ngai-Man Cheung, Min Lin | Published: 2023-05-26 | Updated: 2023-10-29

CyPhERS: A Cyber-Physical Event Reasoning System providing real-time situational awareness for attack and fault response

Authors: Nils Müller, Kaibin Bao, Jörg Matthes, Kai Heussen | Published: 2023-05-26

Undetectable Watermarks for Language Models

Authors: Miranda Christ, Sam Gunn, Or Zamir | Published: 2023-05-25

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Authors: Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen | Published: 2023-05-24 | Updated: 2023-10-23

Investigating Adversarial Vulnerability and Implicit Bias through Frequency Analysis

Authors: Lorenzo Basile, Nikos Karantzas, Alberto D'Onofrio, Luca Bortolussi, Alex Rodriguez, Fabio Anselmi | Published: 2023-05-24 | Updated: 2024-07-17