AIセキュリティポータル K Program
Exploiting Code Symmetries for Learning Program Semantics
Share
Abstract
This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC, develops a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. SymC obtains superior performance on five program analysis tasks, outperforming state-of-the-art code models without any pre-training. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.
A convolutional attention network for extreme summarization of source code
M. Allamanis, H. Peng, C. Sutton
Published: 2016
code2vec: Learning distributed representations of code
Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav
Published: 2019
Dos and Don'ts of Machine Learning in Computer Security
Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexander Warnecke, Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, Konrad Rieck
Published: 2020.10.19
Equi-tuning: Group equivariant fine-tuning of pretrained models
S. Basu, P. Sattigeri, K. N. Ramamurthy, V. Chenthamarakshan, K. R. Varshney, L. R. Varshney, P. Das
Published: 2023
A few billion lines of code later: using static analysis to find bugs in the real world
A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, C. Henri-Gros, A. Kamsky, S. McPeak, D. Engler
Published: 2010
Learning to execute programs with instruction pointer attention graph neural networks
D. Bieber, C. Sutton, H. Larochelle, D. Tarlow
Published: 2020
Share