These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
This paper tackles the challenge of teaching code semantics to Large Language
Models (LLMs) for program analysis by incorporating code symmetries into the
model architecture. We introduce a group-theoretic framework that defines code
symmetries as semantics-preserving transformations, where forming a code
symmetry group enables precise and efficient reasoning of code semantics. Our
solution, SymC, develops a novel variant of self-attention that is provably
equivariant to code symmetries from the permutation group defined over the
program dependence graph. SymC obtains superior performance on five program
analysis tasks, outperforming state-of-the-art code models without any
pre-training. Our results suggest that code LLMs that encode the code
structural prior via the code symmetry group generalize better and faster.