Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

TOP Literature Database Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2507.01752

PDF

https://arxiv.org/pdf/2507.01752

Paper Information

Author: Ismail Labiad,Mathurin Videau,Matthieu Kowalski,Marc Schoenauer,Alessandro Leite,Julia Kempe,Olivier Teytaud
Published: 7-2-2025
Affiliation: Meta FAIR
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Differential Privacy Privacy Assurance RAG

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, its reliance on large volumes of labeled data raises privacy and security concerns such as susceptibility to data poisoning attacks and the risk of overfitting. In contrast, black box optimization methods, which treat the model as an opaque function, relying solely on function evaluations to guide optimization, offer a promising alternative in scenarios where data access is restricted, adversarial risks are high, or overfitting is a concern. However, black box methods also pose significant challenges, including poor scalability to high-dimensional parameter spaces, as prevalent in large language models (LLMs), and high computational costs due to reliance on numerous model evaluations. This paper introduces BBoxER, an evolutionary black-box method for LLM post-training that induces an information bottleneck via implicit compression of the training data. Leveraging the tractability of information flow, we provide strong theoretical bounds on generalization, differential privacy, susceptibility to data poisoning attacks, and robustness to extraction attacks. BBoxER operates on top of pre-trained LLMs, offering a lightweight and modular enhancement suitable for deployment in restricted or privacy-sensitive environments, in addition to non-vacuous generalization guarantees. In experiments with LLMs, we demonstrate empirically that Retrofitting methods are able to learn, showing how a few iterations of BBoxER improve performance and generalize well on a benchmark of reasoning datasets. This positions BBoxER as an attractive add-on on top of gradient-based optimization.

External Datasets

GSM8K

MATH500

Hellaswag

GSM+

ARC Easy

ARC Challenge

AMC23

SVAMP