AIセキュリティポータル K Program
UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing
Share
Abstract
The remarkable capability of large language models (LLMs) in generating high-quality code has drawn increasing attention in the software testing community. However, existing code LLMs often demonstrate unsatisfactory capabilities in generating accurate and complete tests since they were trained on code snippets collected without differentiating between code for testing purposes and other code. In this paper, we present a large-scale dataset UniTSyn, which is capable of enhancing the prowess of LLMs for Unit Test Synthesis. Associating tests with the tested functions is crucial for LLMs to infer the expected behavior and the logic paths to be verified. By leveraging Language Server Protocol, UniTSyn achieves the challenging goal of collecting focal-test pairs without per-project execution setups or per-language heuristics that tend to be fragile and difficult to scale. It contains 2.7 million focal-test pairs across five mainstream programming languages, making it possible to be utilized for enhancing the test generation ability of LLMs. The details of UniTSyn can be found in Table 1. Our experiments demonstrate that, by building an autoregressive model based on UniTSyn, we can achieve significant benefits in learning and understanding unit test representations, resulting in improved generation accuracy and code coverage across all evaluated programming languages. Code and data will be publicly available.
Evaluating large language models trained on code
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. H. Guss, A. Nichol, A. Paino, N. Tezak, J. Tang, I. Babuschkin, S. Balaji, S. Jain, W. Saunders, C. Hesse, A. N. Carr, J. Leike, J. Achiam, V. Misra, E. Morikawa, A. Radford, M. Knight, M. Brundage, M. Murati, K. Mayer, P. Welinder, B. McGrew, D. Amodei, S. McCandlish, I. Sutskever, W. Zaremba
Published: 2021
Angora: Efficient fuzzing by principled search
P. Chen, H. Chen
Published: 2018
Matryoshka: Fuzzing deeply nested branches
P. Chen, J. Liu, H. Chen
Published: 2019
Chapter I: Notes on Structured Programming
E. W. Dijkstra
Published: 1972
Afl++ combining incremental steps of fuzzing research
A. Fioraldi, D. Maier, H. Eißfeldt, M. Heuse
Published: 2020
Incoder: A generative model for code infilling and synthesis
Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Scott Yih, Luke Zettlemoyer, Mike Lewis
Published: 2023
Language Server Protocol and Implementation
N. Gunasinghe, N. Marcus
Published: 2021
Unixcoder: Unified cross-modal pre-training for code representation
D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, J. Yin
Published: 2022
Code representation pre-training with complements from program executions
J. Huang, J. Zhao, Y. Rong, Y. Guo, Y. He, H. Chen
Published: 2023
Overcoming catastrophic forgetting in neural networks
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al.
Published: 2017
A neural model for generating natural language summaries of program subroutines
A. LeClair, S. Jiang, C. McMillan
Published: 2019
Competition-level code generation with alphacode
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al.
Published: 2022
Evidence-based failure prediction
N. Nagappan, T. Ball
Published: 2010
Learning deep semantics for test completion
P. Nie, R. Banerjee, J. J. Li, R. J. Mooney, M. Gligoric
Published: 2023
Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks
R. Puri, D. S. Kung, G. Janssen, W. Zhang, G. Domeniconi, V. Zolotov, J. Dolby, J. Chen, M. Choudhury, L. Decker, V. Thost, L. Buratti, S. Pujar, S. Ramji, U. Finkler, S. Malaika, F. Reiss
Published: 2021
Language models are unsupervised multitask learners
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever
Published: 2019
Cat-lm: Training language models on aligned code and tests
N. Rao, K. Jain, U. Alon, C. L. Goues, V. J. Hellendoorn
Published: 2023
Integrity: Finding integer errors by targeted fuzzing
Y. Rong, P. Chen, H. Chen
Published: 2020
Valkyrie: Improving fuzzing performance through deterministic techniques
Y. Rong, C. Zhang, J. Liu, H. Chen
Published: 2024
An empirical evaluation of using large language models for automated unit test generation
M. Schafer, S. Nadi, A. Eghbali, F. Tip
Published: 2023
Continuous fuzzing with libfuzzer and addresssanitizer
K. Serebryany
Published: 2016
Unit test case generation with transformers and focal context
M. Tufano, D. Drain, A. Svyatkovskiy, S. K. Deng, N. Sundaresan
Published: 2021
Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation
Y. Wang, W. Wang, S. Joty, S. C. Hoi
Published: 2021
Understanding programs by exploiting (fuzzing) test cases
J. Zhao, Y. Rong, Y. Guo, Y. He, H. Chen
Published: 2023
The impact of continuous integration on other software development practices: A large-scale empirical study
Y. Zhao, A. Serebrenik, Y. Zhou, V. Filkov, B. Vasilescu
Published: 2017
Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x
Q. Zheng, X. Xia, X. Zou, Y. Dong, S. Wang, Y. Xue
Published: 2023
Share