Machine Unlearning in Large Language Models

IEEE Trans. Dependable Secur. Comput.

Stealthy and flexible trojan in deep learning framework

Y. Wang, K. Chen, Y. Tan, S. Huang, W. Ma, Y. Li

Published: 2023

Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models

Jiashu Xu, Mingyu Derek Ma, Fei Wang, Chaowei Xiao, Muhao Chen

Published: 2023

ISA Transactions

Backdoor attacks against distributed swarm learning

K. Chen, H. Zhang, X. Feng, X. Zhang, B. Mi, Z. Jin

Published: 2023

arXiv

Quantifying privacy risks of masked language models using membership inference attacks

Fatemehsadat Mireshghallah, Kartik Goyal, Archit Uniyal, Taylor Berg-Kirkpatrick, Reza Shokri

Published: 2022

Science China Information Sciences

Model architecture level privacy leakage in neural networks

Y. Li, H. Yan, T. Huang, Z. Pan, J. Lai, X. Zhang, K. Chen, J. Li

Published: 2024

Pattern Recognition

Revisiting the transferability of adversarial examples via source-agnostic adversarial feature inducing method

Y. Xiao, J. Zhou, K. Chen, Z. Liu

Published: 2023

arxiv

Cited by 1

Computing Research Repository (CoRR)

Adversarial Demonstration Attacks on Large Language Models

Jiongxiao Wang, Zichen Liu, Keun Hee Park, Zhuojun Jiang, Zhaoheng Zheng, Zhuofeng Wu, Muhao Chen, Chaowei Xiao

Published: 5.24.2023

With the emergence of more powerful large language models (LLMs), such as ChatGPT and GPT-4, in-context learning (ICL) has gained significant prominence in leveraging these models for specific tasks by utilizing data-label pairs as precondition prompts. While incorporating demonstrations can greatly enhance the performance of LLMs across various tasks, it may introduce a new security concern: attackers can manipulate only the demonstrations without changing the input to perform an attack. In this paper, we investigate the security concern of ICL from an adversarial perspective, focusing on the impact of demonstrations. We propose a novel attack method named advICL, which aims to manipulate only the demonstration without changing the input to mislead the models. Our results demonstrate that as the number of demonstrations increases, the robustness of in-context learning would decrease. Additionally, we also identify the intrinsic property of the demonstrations is that they can be used (prepended) with different inputs. As a result, it introduces a more practical threat model in which an attacker can attack the test input example even without knowing and manipulating it. To achieve it, we propose the transferable version of advICL, named Transferable-advICL. Our experiment shows that the adversarial demonstration generated by Transferable-advICL can successfully attack the unseen test input examples. We hope that our study reveals the critical security risks associated with ICL and underscores the need for extensive research on the robustness of ICL, particularly given its increasing significance in the advancement of LLMs.

Adversarial attack Adversarial Example Malicious Demo Construction

Comput. Stand. Interfaces

An efficient adversarial example generation algorithm based on an accelerated gradient iterative fast gradient

J. Liu, Q. Zhang, K. Mo, X. Xiang, J. Li, D. Cheng, R. Gao, B. Liu, K. Chen, G. Wei

Published: 2022

Internet of Things and Cyber-Physical Systems

Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope

P. P. Ray

Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models

P. Manakul, A. Liusie, M. J. Gales

arxiv

Cited by 3

Conference on Neural Information Processing Systems (NeurIPS)

ProPILE: Probing Privacy Leakage in Large Language Models

Siwon Kim, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon, Seong Joon Oh

Published: 7.5.2023

The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.

Data Leakage Privacy Violation Prompting Strategy

arxiv

Cited by 12

Prompt Injection attack against LLM-integrated Applications

Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, Yang Liu

Published: 6.9.2023

Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation.

Prompt Injection Malicious Prompt

A comprehensive overview of backdoor attacks in large language models within communication networks

H. Yang, K. Xiang, H. Li, R. Lu

A survey of hallucination in large foundation models

V. Rawte, A. Sheth, A. Das

Nature

Quantum computational advantage with a programmable photonic processor

L. S. Madsen, F. Laudenbach, M. F. Askarani, F. Rortais, T. Vincent, J. F. Bulmer, F. M. Miatto, L. Neuhaus, L. G. Helt, M. J. Collins

Published: 2022

Addressing documentation debt in machine learning research: A retrospective datasheet for bookcorpus

J. Bandy, N. Vincent

Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

Data governance in the age of large-scale data-driven language technology

Y. Jernite, H. Nguyen, S. Biderman, A. Rogers, M. Masoud, V. Danchev, S. Tan, A. S. Luccioni, N. Subramani, I. Johnson

Published: 2022

Pazopanib in locally advanced or metastatic renal cell carcinoma: results of a randomized phase iii trial

C. N. Sternberg, I. D. Davis, J. Mardiak, C. Szczylik, E. Lee, J. Wagstaff, C. H. Barrios, P. Salman, O. A. Gladkov, A. Kavina

Cultural adaptation of recipes

Y. Cao, Y. Kementchedjhieva, R. Cui, A. Karamolegkou, L. Zhou, M. Dare, L. Donatelli, D. Hershcovich

Indiscriminate poisoning attacks on unsupervised contrastive learning

H. He, K. Zha, D. Katabi

arxiv

Cited by 1

ECCV Workshops

Deep k-NN Defense against Clean-label Data Poisoning Attacks

Neehar Peri, Neal Gupta, W. Ronny Huang, Liam Fowl, Chen Zhu, Soheil Feizi, Tom Goldstein, John P. Dickerson

Published: 9.30.2019

Targeted clean-label data poisoning is a type of adversarial attack on machine learning systems in which an adversary injects a few correctly-labeled, minimally-perturbed samples into the training data, causing a model to misclassify a particular test sample during inference. Although defenses have been proposed for general poisoning attacks, no reliable defense for clean-label attacks has been demonstrated, despite the attacks' effectiveness and realistic applications. In this work, we propose a simple, yet highly-effective Deep k-NN defense against both feature collision and convex polytope clean-label attacks on the CIFAR-10 dataset. We demonstrate that our proposed strategy is able to detect over 99% of poisoned examples in both attacks and remove them without compromising model performance. Additionally, through ablation studies, we discover simple guidelines for selecting the value of k as well as for implementing the Deep k-NN defense on real-world datasets with class imbalance. Our proposed defense shows that current clean-label poisoning attack strategies can be annulled, and serves as a strong yet simple-to-implement baseline defense to test future clean-label poisoning attacks. Our code is available at https://github.com/neeharperi/DeepKNNDefense

Detection of Poisonous Data Performance Evaluation Backdoor Attack

Policy Studies Journal

Institutional analysis with the institutional grammar

S. Siddiki, T. Heikkila, C. M. Weible, R. Pacheco-Vega, D. Carter, C. Curley, A. Deslatte, A. Bennett

Published: 2022

Journal of consumer policy

Does the gdpr enhance consumers’ control over personal data? an analysis from a behavioural perspective

I. Van Ooijen, H. U. Vrabec

Published: 2019

Brook. J. Corp. Fin. & Com. L.

The commodification of personal data and the road to consumer autonomy through the ccpa

B. Rose

Published: 2020

The Astrophysical Journal

Fast observations of frb 20220912a: burst properties and polarization characteristics

Y.-K. Zhang, D. Li, B. Zhang, S. Cao, Y. Feng, W.-Y. Wang, Y. Qu, J.-R. Niu, W.-W. Zhu, J.-L. Han

Published: 2023

Partners Universal International Innovation Journal

Extending detection and response: How mxdr evolves cybersecurity

A. S. George, S. Sagayarajan, T. Baskar, A. H. George

Published: 2023

A survey of machine unlearning

T. T. Nguyen, T. T. Huynh, P. L. Nguyen, A. W.-C. Liew, H. Yin, Q. V. H. Nguyen

Annals of Telecommunications

Privacy preserving machine unlearning for smart cities

K. Chen, Y. Huang, Y. Wang, X. Zhang, B. Mi, Y. Wang

Published: 2024

Electronics

Fast and accurate snn model strengthening for industrial applications

D. Zhou, W. Chen, K. Chen, B. Mi

Published: 2023

MetaRadiology

Summary of chatgpt-related research and perspective towards the future of large language models

Y. Liu, T. Han, S. Ma, J. Zhang, Y. Yang, J. Tian, H. He, A. Li, M. He, Z. Liu

Published: 2023

arXiv preprint

Large language model unlearning

Y. Yao, X. Xu, Y. Liu

Published: 2023

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ntire 2023 challenge on image super-resolution (x4): Methods and results

Y. Zhang, K. Zhang, Z. Chen, Y. Li, R. Timofte, J. Zhang, K. Zhang, R. Peng, Y. Ma, L. Jia

Published: 2023

Reward model deberta v3 large v2

OpenAssistant

Published: 2021

hallucination evaluation model

vectara

Published: 2023

On the exploitability of reinforcement learning with human feedback for large language models

J. Wang, J. Wu, M. Chen, Y. Vorobeychik, C. Xiao

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Halueval: A large-scale hallucination evaluation benchmark for large language models

J. Li, X. Cheng, W. X. Zhao, J.-Y. Nie, J.-R. Wen

Published: 2023