Binary code analysis plays a pivotal role in the field of software security
and is widely used in tasks such as software maintenance, malware detection,
software vulnerability discovery, patch analysis, etc. However, unlike source
code, reverse engineers face significant challenges in understanding binary
code due to the lack of intuitive semantic information. Although traditional
reverse tools can convert binary code into C-like pseudo code, the lack of code
comments and symbolic information such as function names still makes code
understanding difficult. In recent years, two groups of techniques have shown
promising prospects: (1) Deep learning-based techniques have demonstrated
competitive results in tasks related to binary code understanding, furthermore,
(2) Large Language Models (LLMs) have been extensively pre-trained at the
source-code level for tasks such as code understanding and generation. This has
left participants wondering about the capabilities of LLMs in binary code
understanding. To this end, this work proposes a benchmark to evaluate the
effectiveness of LLMs in real-world reverse engineering scenarios, which covers
two key binary code understanding tasks, i.e., function name recovery and
binary code summarization. To more comprehensively evaluate, we include
binaries with multiple target architectures as well as different optimization
options. We gain valuable insights into the capabilities and limitations
through extensive empirical studies of popular LLMs using our benchmark. Our
evaluations reveal that existing LLMs can understand binary code to a certain
extent, thereby improving the efficiency of binary code analysis. Our results
highlight the great potential of the LLMs in advancing the field of binary code
understanding, and provide new directions for binary code analysis techniques.