January 17, 2012

Compiling by simplifying

When you read the original Lambda Papers carefully, one thing you notice is that Steele didn't view Scheme as just a programming language, but also (and I would argue, primarily) as a philosophy of constructing compilers.

This philosophy made clear and transparent the issues of translating a high-level language into Von Neumann-style register machine code, and provided mechanisms to radically simplify the transformation. Continuation-passing style reified issues of control transfer, temporary stack values, and register use. Function arguments become registers, function calls - jumps. Difficult issues like nested scopes, first-class functions, closures, and exceptions become easy.

The other great, simplifying thing about Steele's philosophy is the self-similarity of the code as it undergoes transformation. The input of each stage of the compiler is a Scheme program, and (until machine code generation), the output is a simpler Scheme program.

Many compilers miss out on the self-similarity aspect of Scheme in two major ways. Those compilers that do follow continuation-passing style usually implement the CPS portion of compilation using completely different data structures than the other portions. This adds needless boilerplate code and makes the CPS portion look more complicated than it is. This needless complexity shows up in some ML compilers - in particular, Andrew Appel's otherwise excellent and highly recommended Compiling with Continuations, and causes people to remark that implementing SSA is not all that much harder than doing CPS (Olin Shivers disagrees).

The subtler, but even more serious omission is ignoring Steele's advice to implement the parts of the language above the intermediate layers as macros. This spreads the complexity for implementing features like classes and objects throughout the different portions of the compiler, starting with the parser.

Following Steele's techniques, building a Scheme compiler that is several times faster (and many times smaller) than the compilers/virtual machines of popular dynamic languages like Python and Ruby is easy. Scheme 48 was originally written in 48 hours. If you don't have that much time, Marc Feeley can show you how to build a Scheme compiler in 90 minutes. And if you're really strapped for time and don't want to sit through CPS theory, Abdulaziz Ghuloum shows you how to dive right into generating x86 machine code in 10 pages.

No comments: