UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing

Evaluating large language models trained on code

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. H. Guss, A. Nichol, A. Paino, N. Tezak, J. Tang, I. Babuschkin, S. Balaji, S. Jain, W. Saunders, C. Hesse, A. N. Carr, J. Leike, J. Achiam, V. Misra, E. Morikawa, A. Radford, M. Knight, M. Brundage, M. Murati, K. Mayer, P. Welinder, B. McGrew, D. Amodei, S. McCandlish, I. Sutskever, W. Zaremba

Published: 2021

2018 IEEE Symposium on Security and Privacy (SP)

Angora: Efficient fuzzing by principled search

P. Chen, H. Chen

Published: 2018

Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security

Matryoshka: Fuzzing deeply nested branches

P. Chen, J. Liu, H. Chen

Published: 2019

SIGPLAN Notices

Coverage-directed differential testing of jvm implementations

Y. Chen, T. Su, C. Sun, Z. Su, J. Zhao

Published: 2016

Academic Press Ltd.

Chapter I: Notes on Structured Programming

E. W. Dijkstra

Published: 1972

Proceedings of the 44th International Conference on Software Engineering

Toga: A neural method for test oracle generation

E. Dinella, G. Ryan, T. Mytkowicz, S. K. Lahiri

Published: 2022

CodeBERT: A pre-trained model for programming and natural languages

Zhangyin Feng, Daya Guo, Duyu Tang, et al.

Published: 2020

Proceedings of the 14th USENIX Conference on Offensive Technologies

Afl++ combining incremental steps of fuzzing research

A. Fioraldi, D. Maier, H. Eißfeldt, M. Heuse

Published: 2020

The Eleventh International Conference on Learning Representations

Incoder: A generative model for code infilling and synthesis

Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Scott Yih, Luke Zettlemoyer, Mike Lewis

Published: 2023

Springer

Language Server Protocol and Implementation

N. Gunasinghe, N. Marcus

Published: 2021

ACL

Unixcoder: Unified cross-modal pre-training for code representation

D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, J. Yin

Published: 2022

Code representation pre-training with complements from program executions

J. Huang, J. Zhao, Y. Rong, Y. Guo, Y. He, H. Chen

Published: 2023

arXiv preprint

Codesearchnet challenge: Evaluating the state of semantic code search

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, Marc Brockschmidt

Published: 2019

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics

Summarizing source code using a neural attention model

S. Iyer, I. Konstas, A. Cheung, L. Zettlemoyer

Published: 2016

A method for stochastic optimization

Jimmy Ba, Diederik P. Kingma

Published: 2014

Proceedings of the National Academy of Sciences (PNAS)

Overcoming catastrophic forgetting in neural networks

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al.

Published: 2017

A neural model for generating natural language summaries of program subroutines

A. LeClair, S. Jiang, C. McMillan

Published: 2019

Starcoder: may the source be with you!

Raymond Li, Loubna Ben Allal, Yangtian Zi, et al.

Published: 2023

Science

Competition-level code generation with alphacode

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al.

Published: 2022

CoRR

SGDR: stochastic gradient descent with restarts

I. Loshchilov, F. Hutter

Published: 2016

CoRR

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation.

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu

Published: 2021

WizardCoder: Empowering code large language models with evol-instruct

Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang

Published: 2023

Language server protocol

Microsoft

Making Software

Evidence-based failure prediction

N. Nagappan, T. Ball

Published: 2010

Learning deep semantics for test completion

P. Nie, R. Banerjee, J. J. Li, R. J. Mooney, M. Gligoric

Published: 2023

Codegen2: Lessons for training llms on programming and natural languages

E. Nijkamp, H. Hayashi, C. Xiong, S. Savarese, Y. Zhou

Published: 2023

Codegen: An open large language model for code with multi-turn program synthesis

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong

Published: 2022

Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks

R. Puri, D. S. Kung, G. Janssen, W. Zhang, G. Domeniconi, V. Zolotov, J. Dolby, J. Chen, M. Choudhury, L. Decker, V. Thost, L. Buratti, S. Pujar, S. Ramji, U. Finkler, S. Malaika, F. Reiss

Published: 2021

OpenAI blog

Language models are unsupervised multitask learners

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever

Published: 2019

Cat-lm: Training language models on aligned code and tests

N. Rao, K. Jain, U. Alon, C. L. Goues, V. J. Hellendoorn

Published: 2023

The specification language server protocol: A proposal for standardised lsp extensions

J. K. Rask, F. P. Madsen, N. Battle, H. D. Macedo, P. G. Larsen

Published: 2021

International Conference on Security and Privacy in Communication Systems