These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Binary analysis plays a pivotal role in security domains such as malware
detection and vulnerability discovery, yet it remains labor-intensive and
heavily reliant on expert knowledge. General-purpose large language models
(LLMs) perform well in programming analysis on source code, while
binaryspecific LLMs are underexplored. In this work, we present ReCopilot, an
expert LLM designed for binary analysis tasks. ReCopilot integrates binary code
knowledge through a meticulously constructed dataset, encompassing continue
pretraining (CPT), supervised fine-tuning (SFT), and direct preference
optimization (DPO) stages. It leverages variable data flow and call graph to
enhance context awareness and employs test-time scaling to improve reasoning
capabilities. Evaluations on a comprehensive binary analysis benchmark
demonstrate that ReCopilot achieves state-of-the-art performance in tasks such
as function name recovery and variable type inference on the decompiled pseudo
code, outperforming both existing tools and LLMs by 13%. Our findings highlight
the effectiveness of domain-specific training and context enhancement, while
also revealing challenges in building super long chain-of-thought. ReCopilot
represents a significant step toward automating binary analysis with
interpretable and scalable AI assistance in this domain.