Paper Review 13 - Learning Compiler Pass Orders using Coreset and Normalized Value Prediction

My thoughts on Learning Compiler Pass Orders using Coreset and Normalized Value Prediction


Today’s paper appeared in ICML ’23.

Overview

Ordering optimizations passes for a compiler is a hard problem. Real world compilers generally don’t attempt to find the most optimal set of passes for a given input, since that would be time consuming, and in many cases, provide little benefit over the default pass ordering. It is possible to convert code to some IR, and manually apply passes (e.g. compile to LLVM IR, and then use opt to apply passes), but as far as I know, no real-world project invests time in doing this - mainly because it’s hard to beat the default passes unless you’re able to exploit a specific structure in your code. This poses the question, can AI do better?

This paper attempts to explore this area by introducing an additional layer that they call a coreset, which encapsulates a pass sequence as opposed to individual passes, and then show that they were able to design and train a model (leveraging GNNs and using ProGraML to represent the program as a graph with sufficient features). Their findings show significant improvement in code size reduction, and they evaluated their system on a variety of benchmarks and a few real world large code bases (e.g. Linux). In order to bound the “vocabulary” (I assume this means vertex labels?), they also extended ProGraML to break down composite types into multiple nodes.

Thoughts

  • Does this system generalize to optimization as well? My gut instinct says that it might not.
  • Research in this area seems to be focused on representing programs as annotated IR graphs for learning. I wonder how it might look to have representations with higher level operators - combining “common” structures into a single node/edge. Maybe even retaining some aspects of the AST?
  • I wonder how well this technique applies to other domains, such as relational algebra rewriting. In many DBMS’ latency is critical, so it’s important for the compiler to both be optimal and fast. Another domain that might be interesting to look at is tool paths for 3D printers/CNC machines. From what I understand, it seems like there’s a similar phase of optimization/compilation that could benefit from AI assisted pass ordering.

AI assisted compilation is a very interesting field. I think it opens the door to building better compiler infrastructure more easily for specific domains, and I’m interested to see if there’s better results possible if the problem space is restricted to programs modeling a specific domain.

Written on September 20, 2024