HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data

Dan Biderman, Jose Gonzalez Ortiz, Jacob Portes, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle

Published: 2024

OpenAI Technical Report

Language models are few-shot learners

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei

Published: 2020

CoRR

Multipl-e: A scalable and extensible approach to benchmarking neural code generation

F. Cassano, J. Gouwar, D. Nguyen, S. Nguyen, L. Phipps-Costin, D. Pinckney, M.-H. Yee, Y. Zi, C. J. Anderson, M. Q. Feldman, A. Guha, M. Greenberg, A. Jangda

Published: 2022

arXiv

A survey of data synthesis approaches

Hsin-Yu Chang, Pei-Yu Chen, Tun-Hsiang Chou, Chang-Sheng Kao, Hsuan-Yun Yu, Yen-Ting Lin, Yun-Nung Chen

Published: 2024

Code alpaca: An instruction-following llama model for code generation

Sahil Chaudhary

Published: 2023

arXiv

Evaluating large language models trained on code

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde, Jared Kaplan, Harrison Edwards, Yura Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, David W. Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William H. Guss, Alex Nichol, Igor Babuschkin, S. Arun Balaji, Shantanu Jain, Andrew Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew M. Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba

Published: 2021

IEEE Security & Privacy

Static analysis for security

Brian Chess, Gary McGraw

Published: 2004

Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), ACM

What developers want and need from program analysis: An empirical study

M. Christakis, C. Bird

Published: 2016

arXiv

Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model

DeepSeek-AI

Published: 2024

Proceedings of NAACL-HLT

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Published: 2019

difflib - helpers for computing deltas

difflib

Published: 2023

Github copilot is generally available to all developers

Thomas Dohmke

Published: 2022

arXiv

The llama 3 herd of models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan

Published: 2024

CodeBERT: A pre-trained model for programming and natural languages

Zhangyin Feng, Daya Guo, Duyu Tang, et al.

Published: 2020

Proceedings of the 14th USENIX Conference on Offensive Technologies

Afl++ combining incremental steps of fuzzing research

A. Fioraldi, D. Maier, H. Eißfeldt, M. Heuse

Published: 2020

CCS

Libafl: A framework to build modular and reusable fuzzers

Andrea Fioraldi, Dominik Christian Maier, Dongjia Zhang, Davide Balzarotti

Published: 2022

The Eleventh International Conference on Learning Representations

Incoder: A generative model for code infilling and synthesis

Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Scott Yih, Luke Zettlemoyer, Mike Lewis

Published: 2023

ICML

Scaling laws for reward model overoptimization

Leo Gao, John Schulman, Jacob Hilton

Published: 2023

Proceedings of the National Academy of Sciences

Chatgpt outperforms crowd workers for text-annotation tasks

F. Gilardi, M. Alizadeh, M. Kubli

Published: 2023

ICLR

Graphcodebert: Pre-training code representations with data flow

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, Ming Zhou

Published: 2021

Deepseek-coder: When the large language model meets programming – the rise of code intelligence

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, Wenfeng Liang

Published: 2024

arxiv

被引用数 1

Conference on Secure and Trustworthy Machine Learning (SaTML)

CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models

Hossein Hajipour, Keno Hassler, Thorsten Holz, Lea Schönherr, Mario Fritz

Published: 2023.2.8

Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Their advances in competition-level programming problems have made them an essential pillar of AI-assisted pair programming, and tools such as GitHub Copilot have emerged as part of the daily programming workflow used by millions of developers. The training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure. While these models have been extensively assessed for their ability to produce functionally correct programs, there remains a lack of comprehensive investigations and benchmarks addressing the security aspects of these models. In this work, we propose a method to systematically study the security issues of code language models to assess their susceptibility to generating vulnerable code. To this end, we introduce the first approach to automatically find generated code that contains vulnerabilities in black-box code generation models. To achieve this, we present an approach to approximate inversion of the black-box code generation models based on few-shot prompting. We evaluate the effectiveness of our approach by examining code language models in generating high-risk security weaknesses. Furthermore, we establish a collection of diverse non-secure prompts for various vulnerability scenarios using our method. This dataset forms a benchmark for evaluating and comparing the security weaknesses in code language models.

プロンプトインジェクション脆弱性分析コード生成

NAACL Findings

Simscood: Systematic analysis of out-of-distribution generalization in fine-tuned source code models

Hossein Hajipour, Ning Yu, Cristian-Alexandru Staicu, Mario Fritz

Published: 2024

IEEE Symposium on Security and Privacy Workshops

Just another copy and paste? comparing the security vulnerabilities of chatgpt generated code and stackoverflow answers

Sivana Hamer, Marcelo d’Amorim, Laurie Williams

Published: 2024

arxiv

被引用数 1

Annual ACM Conference on Computer and Communications Security (CCS)

Large Language Models for Code: Security Hardening and Adversarial Testing

Jingxuan He, Martin Vechev

Published: 2023.2.11

Large language models (large LMs) are increasingly trained on massive codebases and used to generate code. However, LMs lack awareness of security and are found to frequently produce unsafe code. This work studies the security of LMs along two important axes: (i) security hardening, which aims to enhance LMs' reliability in generating secure code, and (ii) adversarial testing, which seeks to evaluate LMs' security at an adversarial standpoint. We address both of these by formulating a new security task called controlled code generation. The task is parametric and takes as input a binary property to guide the LM to generate secure or unsafe code, while preserving the LM's capability of generating functionally correct code. We propose a novel learning-based approach called SVEN to solve this task. SVEN leverages property-specific continuous vectors to guide program generation towards the given property, without modifying the LM's weights. Our training procedure optimizes these continuous vectors by enforcing specialized loss terms on different regions of code, using a high-quality dataset carefully curated by us. Our extensive evaluation shows that SVEN is highly effective in achieving strong security control. For instance, a state-of-the-art CodeGen LM with 2.7B parameters generates secure code for 59.1% of the time. When we employ SVEN to perform security hardening (or adversarial testing) on this LM, the ratio is significantly boosted to 92.3% (or degraded to 36.8%). Importantly, SVEN closely matches the original LMs in functional correctness.

プロンプトインジェクション脆弱性分析セキュリティ保証

arXiv preprint

Instruction tuning for secure code generation

J. He, M. Vero, G. Krasnopolska, M. Vechev

Published: 2024

ICML

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly

Published: 2019

CoRR

Lora: Low-rank adaptation of large language models

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, W. Chen

Find bugs and reachable dependency vulnerabilities in code.

Semgrep Inc.

Published: 2024

arxiv

被引用数 1

How Secure is Code Generated by ChatGPT?

Raphaël Khoury, Anderson R. Avila, Jacob Brunelle, Baba Mamadou Camara

Published: 2023.4.19

In recent years, large language models have been responsible for great advances in the field of artificial intelligence (AI). ChatGPT in particular, an AI chatbot developed and recently released by OpenAI, has taken the field to the next level. The conversational model is able not only to process human-like text, but also to translate natural language into code. However, the safety of programs generated by ChatGPT should not be overlooked. In this paper, we perform an experiment to address this issue. Specifically, we ask ChatGPT to generate a number of program and evaluate the security of the resulting source code. We further investigate whether ChatGPT can be prodded to improve the security by appropriate prompts, and discuss the ethical aspects of using AI to generate code. Results suggest that ChatGPT is aware of potential vulnerabilities, but nonetheless often generates source code that are not robust to certain attacks.

セキュリティ分析脆弱性予測プログラムの検証

Starcoder: may the source be with you!

Raymond Li, Loubna Ben Allal, Yangtian Zi, et al.

Published: 2023

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

An empirical study on the effectiveness of static C code analyzers for vulnerability detection

Stephan Lipp, Sebastian Banescu, Alexander Pretschner

Published: 2022

arXiv

Best practices and lessons learned on synthetic data for language models

Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou

Published: 2024

Communication of the ACM

In defense of soundness: a manifesto

Benjamin Livshits, Manu Sridharan, Yannis Smaragdak, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, Dimitrios Vardoulakis

Published: 2015

arXiv

On llms-driven synthetic data generation, curation, and evaluation: A survey

Lin Long, Rui Wang, Ruixuan Xiao, Junbo Zhao, Xiao Ding, Gang Chen, Haobo Wang

Published: 2024

Starcoder 2 and the stack v2: The next generation

Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Munoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries

Published: 2024

arXiv

Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct

Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jian-guang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Dongmei Zhang

Published: 2023

WizardCoder: Empowering code large language models with evol-instruct

Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang

Published: 2023

Proceedings of the 40th International Conference on Machine Learning

Tuning language models as training data generators for augmentation-enhanced few-shot learning

Y. Meng, M. Michalski, J. Huang, Y. Zhang, T. Abdelzaher, J. Han

Published: 2023

CWE - Common Weakness Enumeration

MITRE

Published: 2022

Codegen: An open large language model for code with multi-turn program synthesis

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong

Published: 2022

Chatgpt: Optimizing language models for dialogue

OpenAI

Published: 2022

Gpt-4 technical report

OpenAI

Published: 2023

ACM Computing Surveys (CSUR)

Dynamic malware analysis in the modern era—a state of the art survey

O. Or-Meir, N. Nissim, Y. Elovici, L. Rokach

Published: 2019

2022 IEEE Symposium on Security and Privacy (SP)

Asleep at the keyboard? assessing the security of github copilot’s code contributions

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri

Published: 2022

OpenAI blog

Language models are unsupervised multitask learners

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever

Published: 2019

Code Llama: Open foundation models for code

Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, et al.

Published: 2023

arXiv

Security implications of large language model code assistants: A user study

Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Brendan Dolan-Gavitt, Siddharth Garg

Published: 2022

IEEE Symposium on Security and Privacy

SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis

Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, Giovanni Vigna

Published: 2016

MSR4 P and S

Securityeval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques

Mohammed Latif Siddiq, Joanna CS Santos

Published: 2022

IEEE Symposium on Security and Privacy

SoK: Eternal War in Memory

László Szekeres, Mathias Payer, Tao Wei, Dawn Song

Published: 2013

PROMISE

The formai dataset: Generative ai in software security through the lens of formal verification

Norbert Tihanyi, Tamas Bisztray, Ridhi Jain, Mohamed Amine Ferrag, Lucas C Cordeiro, Vasileios Mavroeidis