Detecting Functional Memorization in Code Language Models

Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, Hannaneh Hajishirzi

Published: 2024

Advances in Neural Information Processing Systems (NeurIPS)

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

V. Feldman, C. Zhang

Published: 2020

CodeBERT: A pre-trained model for programming and natural languages

Zhangyin Feng, Daya Guo, Duyu Tang, et al.

Published: 2020

Updates to github copilot interaction data usage policy

Published: 2026

The llama 3 herd of models

LLaMa-Team

Published: 2024

The Thirty-ninth Annual Conference on Neural Information Processing Systems

Exploring the limits of strong membership inference attacks on large language models

Jamie Hayes, Ilia Shumailov, Christopher A Choquette-Choo, Matthew Jagielski, Georgios Kaissis, Milad Nasr, Meenatchi Sundaram Muthu Selva Annamalai, Niloofar Mireshghallah, Igor Shilov, Matthieu Meeus

Published: 2025

The Thirteenth International Conference on Learning Representations

Measuring memorization in rlhf for code completion

Jamie Hayes, Ilia Shumailov, William P Porter, Aneesh Pappu

Published: 2025

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Measuring memorization in language models via probabilistic extraction

Jamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov, Milad Nasr, Christopher A Choquette-Choo, Katherine Lee, A Feder Cooper

Published: 2025

Proceedings of the ACM on Software Engineering

Your code secret belongs to me: Neural code completion tools can memorize hard-coded credentials

Yizhan Huang, Yichen Li, Weibin Wu, Jianping Zhang, Michael R Lyu

Published: 2024

Preventing verbatim memorization in language models gives a false sense of privacy

Daphne Ippolito, Florian Tram`er, Milad Nasr, Chiyuan Zhang, Matthew Jagielski, Katherine Lee, Christopher A Choquette-Choo, Nicholas Carlini

Published: 2022

International Conference on Machine Learning

Deduplicating training data mitigates privacy risks in language models

Nikhil Kandpal, Eric Wallace, Colin Raffel

Published: 2022

Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering

An exploratory investigation into code license infringements in large language model training datasets

Jonathan Katzy, Razvan Popescu, Arie Van Deursen, Maliheh Izadi

Published: 2024

Science

Competition-level code generation with alphacode

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al.

Published: 2022

Hyclone: Bridging llm understanding and dynamic execution for semantic code clone detection

Yunhao Liang, Ruixuan Ying, Takuya Taniguchi, Guwen Lyu, Zhe Cui

Published: 2025

arxiv

被引用数 1

Language Models May Verbatim Complete Text They Were Not Explicitly Trained On

Ken Ziyu Liu, Christopher A. Choquette-Choo, Matthew Jagielski, Peter Kairouz, Sanmi Koyejo, Percy Liang, Nicolas Papernot

Published: 2025.3.22

An important question today is whether a given text was used to train a large language model (LLM). A \emph{completion} test is often employed: check if the LLM completes a sufficiently complex text. This, however, requires a ground-truth definition of membership; most commonly, it is defined as a member based on the $n$-gram overlap between the target text and any text in the dataset. In this work, we demonstrate that this $n$-gram based membership definition can be effectively gamed. We study scenarios where sequences are \emph{non-members} for a given $n$ and we find that completion tests still succeed. We find many natural cases of this phenomenon by retraining LLMs from scratch after removing all training samples that were completed; these cases include exact duplicates, near-duplicates, and even short overlaps. They showcase that it is difficult to find a single viable choice of $n$ for membership definitions. Using these insights, we design adversarial datasets that can cause a given target sequence to be completed without containing it, for any reasonable choice of $n$. Our findings highlight the inadequacy of $n$-gram membership, suggesting membership definitions fail to account for auxiliary information available to the training algorithm.

RAG メンバーシップ開示リスク敵対的攻撃

Anthropic users face a new choice: Opt out or share your data for ai training

Connie Loizos

Published: 2025

Starcoder 2 and the stack v2: The next generation

Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Munoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries

Published: 2024

arxiv

被引用数 1

Annual Meeting of the Association for Computational Linguistics (ACL)

Membership Inference Attacks against Language Models via Neighbourhood Comparison

Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schölkopf, Mrinmaya Sachan, Taylor Berg-Kirkpatrick

Published: 2023.5.29

Membership Inference attacks (MIAs) aim to predict whether a data sample was present in the training data of a machine learning model or not, and are widely used for assessing the privacy risks of language models. Most existing attacks rely on the observation that models tend to assign higher probabilities to their training samples than non-training points. However, simple thresholding of the model score in isolation tends to lead to high false-positive rates as it does not account for the intrinsic complexity of a sample. Recent work has demonstrated that reference-based attacks which compare model scores to those obtained from a reference model trained on similar data can substantially improve the performance of MIAs. However, in order to train reference models, attacks of this kind make the strong and arguably unrealistic assumption that an adversary has access to samples closely resembling the original training data. Therefore, we investigate their performance in more realistic scenarios and find that they are highly fragile in relation to the data distribution used to train reference models. To investigate whether this fragility provides a layer of safety, we propose and evaluate neighbourhood attacks, which compare model scores for a given sample to scores of synthetically generated neighbour texts and therefore eliminate the need for access to the training data distribution. We show that, in addition to being competitive with reference-based attacks that have perfect knowledge about the training data distribution, our attack clearly outperforms existing reference-free attacks as well as reference-based attacks with imperfect knowledge, which demonstrates the need for a reevaluation of the threat model of adversarial attacks.

プライバシー保護手法防御手法 LLM性能評価

Findings of the Association for Computational Linguistics: NAACL 2025

What can large language models capture about code functional equivalence?

Nickil Maveli, Antonio Vergari, Shay B Cohen

Published: 2025

33rd USENIX Security Symposium (USENIX Security 24)

Did the neurons read your book? document-level membership inference for large language models

Matthieu Meeus, Shubham Jain, Marek Rei, Yves-Alexandre de Montjoye

Published: 2024

2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)

Sok: Membership inference attacks on llms are rushing nowhere (and how to fix it)

Matthieu Meeus, Igor Shilov, Shubham Jain, Manuel Faysse, Marek Rei, Yves-Alexandre de Montjoye

Published: 2025

How much do language models memorize?

John X Morris, Chawin Sitawarin, Chuan Guo, Narine Kokhlikyan, G Edward Suh, Alexander M Rush, Kamalika Chaudhuri, Saeed Mahloujifar

Published: 2025

The Thirteenth International Conference on Learning Representations

Scalable extraction of training data from aligned, production language models

Milad Nasr, Javier Rando, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A Feder Cooper, Daphne Ippolito, Christopher A Choquette-Choo, Florian Tramèr, Katherine Lee

Published: 2025

arxiv

被引用数 1

International Conference on Software Engineering (ICSE)

Decoding Secret Memorization in Code LLMs Through Token-Level Characterization

Yuqing Nie, Chong Wang, Kailong Wang, Guoai Xu, Guosheng Xu, Haoyu Wang

Published: 2024.10.11

Code Large Language Models (LLMs) have demonstrated remarkable capabilities in generating, understanding, and manipulating programming code. However, their training process inadvertently leads to the memorization of sensitive information, posing severe privacy risks. Existing studies on memorization in LLMs primarily rely on prompt engineering techniques, which suffer from limitations such as widespread hallucination and inefficient extraction of the target sensitive information. In this paper, we present a novel approach to characterize real and fake secrets generated by Code LLMs based on token probabilities. We identify four key characteristics that differentiate genuine secrets from hallucinated ones, providing insights into distinguishing real and fake secrets. To overcome the limitations of existing works, we propose DESEC, a two-stage method that leverages token-level features derived from the identified characteristics to guide the token decoding process. DESEC consists of constructing an offline token scoring model using a proxy Code LLM and employing the scoring model to guide the decoding process by reassigning token likelihoods. Through extensive experiments on four state-of-the-art Code LLMs using a diverse dataset, we demonstrate the superior performance of DESEC in achieving a higher plausible rate and extracting more real secrets compared to existing baselines. Our findings highlight the effectiveness of our token-level approach in enabling an extensive assessment of the privacy leakage risks associated with Code LLMs.