Exploiting Code Symmetries for Learning Program Semantics

TOP Literature Database Exploiting Code Symmetries for Learning Program Semantics

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2308.03312

PDF

https://arxiv.org/pdf/2308.03312

Paper Information

Author: Kexin Pei;Weichen Li;Qirui Jin;Shuyang Liu;Scott Geng;Lorenzo Cavallaro;Junfeng Yang;Suman Jana
Published: 8-7-2023
Updated: 9-9-2024
Affiliation: Columbia University
Country: United States of America
Conference: International Conference on Machine Learning (ICML)

Labels Estimated by AI

Machine Learning Technology Program Interpretation Graph Vulnerability detection

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC, develops a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. SymC obtains superior performance on five program analysis tasks, outperforming state-of-the-art code models without any pre-training. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.

External Datasets

Java dataset collected by Allamanis et al. (2016)

Defects4J

27 open-source projects, such as OpenSSL, ImageMagic, CoreUtils, SQLite