MUST Scholar Publishes Breakthrough Research at Top-Tier International Software Engineering Conference, Revealing the Impact of Compiler Optimization on Binary Diffing Tools

2025/07/25

Assistant Professor Xiaolei Ren from the School of Computer Science and Engineering, Faculty of Innovative Engineering at Macau University of Science and Technology (MUST) has published a paper at the ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE) 2025, a premier conference in the software engineering field. The study is the first to systematically reveal the significant challenges that compiler peephole optimizations pose to binary diffing tools, reassessing the common assumption of their "optimization-resilience" and offering new perspectives and directions for improvement in the field of software security.

FSE (also known as ESEC/FSE) is a top-tier international conference in software engineering, recognized as a Class A conference by the China Computer Federation (CCF). CSRankings, a leading academic ranking system for computer science institutions worldwide, uses top conferences like FSE as key indicators to assess the research prowess of universities and scholars. This publication marks the first time Macau University of Science and Technology has published research as the primary institution at FSE, signifying its growing research contributions and international influence in core areas such as programming languages and program analysis.

image.png

The ranking of MUST in the CSRankings

image.jpg

On June 24, 2025 Assistant Professor Xiaolei Ren was presenting her research findings at the FSE 2025 conference in Trondheim, Norway

Binary diffing tools are critical for software security, extensively used in vulnerability detection, malware analysis, code clone identification, and patch analysis. However, modern compiler optimizations—especially LLVM’s peephole optimization—can drastically alter binary code by swapping instruction sequences or simplifying control flow graphs (CFGs). This makes it tough for existing tools to keep up. The MUST team’s research dives deep into these challenges, highlighting why current tools struggle with optimized code and laying the groundwork for building more robust solutions.

image.png

Peephole Optimization: Examples and Effects

The research team of Xiaolei Ren employed the compiler test suite and the compiler verification tool to quantitatively and qualitatively assess the role of peephole optimization in the compiler optimization process and its profound impact on binary code. The key findings are as follows:

·             Prevalence of Peephole Optimization: Peephole optimization is one of the most frequently invoked optimization techniques in the LLVM compiler. Across optimization levels O1 to O3, it accounts for 14% to 39.7% of optimization calls, ranking first among the top ten most commonly used compiler optimization techniques, surpassing the second-ranked technique by at least 11 percentage points.

·             Effects in Code Structure: Peephole optimization significantly alters the instruction sequences and control flow of binary code. For instance, it can reduce a basic block containing 15 instructions to just 4 or simplify the control flow graph (CFG) by eliminating redundant branches. These transformations lead to a notable decrease in the matching accuracy of traditional binary analysis tools.

·             Impact on Tool Performance: Through evaluations of mainstream binary analysis tools (e.g., BinDiff and JTrans), the study found that even state-of-the-art tools exhibit significantly reduced accuracy when confronted with complex CFG changes induced by peephole optimization. BinDiff demonstrates relatively stronger robustness due to its integration of call graph topology, but remains limited by intricate inter-block variations. In contrast, JTrans can only identify partial matching pairs when handling inter-block changes, highlighting its limitations.

image.png

Overall Workflow of the Proposed Method

To improve the effectiveness of intra-procedural binary code comparison, the study proposes two solutions:

  • Call Graph      Integration (CGI):      This solution mitigates the impact of significant local code changes      caused by peephole optimization on similarity calculations by integrating      call graph topology.

  • Dynamic Boundary      Detection (DBD):      To address inter-basic-block changes resulting from optimization, this      method dynamically identifies the boundaries of affected basic blocks. It      achieves this by leveraging specific instruction signatures and contextual      information, such as shared predecessors and successors, to enable more      accurate matching.

Looking forward, the research team aims to further investigate the impact of compiler optimizations on program analysis, with a focus on developing more robust and reliable binary analysis tools. These advancements are expected to enhance capabilities in vulnerability detection, reverse engineering, and malware analysis within the domain of software security.

This study not only offers significant insights into software engineering and program analysis but also establishes MUST as a prominent contributor to global computer science research. The university remains dedicated to fostering cutting-edge research, promoting innovation in software engineering and security, and contributing to the advancement of global technological progress.

Paper publication URL: https://dl.acm.org/doi/abs/10.1145/3729389