Compiling a Functional Language using C++
Series status: Complete
First post in 2019, last post in 2020
This series covers the implementation of a compiler for a lazily evaluated functional language using C++ and LLVM. The language is initially based on my project for a university course in compilers, but is extended with many additional features, such as polymorphism, let-in expressions, and polymorphism.
-
Intro
975 words, about 5 minutes to read.
During my last academic term, I was enrolled in a compilers course. We had a final project - develop a compiler for a basic Python subset, using LLVM. ...
-
Tokenizing
1955 words, about 10 minutes to read.
It makes sense to build a compiler bit by bit, following the stages we outlined in the first post of the series. ...
-
Parsing
3679 words, about 18 minutes to read.
In the previous post, we covered tokenizing. We learned how to convert an input string into logical segments, and even wrote up a tokenizer to do it according to the rules of our language. ...
-
Type Checking
5809 words, about 28 minutes to read.
I think tokenizing and parsing are boring. The thing is, looking at syntax is a pretty shallow measure of how interesting a language is. ...
-
Small Improvements
2150 words, about 11 minutes to read.
We’ve done quite a big push in the previous post. We defined type rules for our language, implemented unification, and then implemented unification to enforce these rules for our program. ...
-
Execution
4439 words, about 21 minutes to read.
We now have trees representing valid programs in our language, and it’s time to think about how to compile them into machine code, to be executed on hardware. ...
-
Compilation
4976 words, about 24 minutes to read.
In the previous post, we defined a machine for graph reduction, called a G-machine. However, this machine is still not particularly connected to our language. ...
-
Runtime
2168 words, about 11 minutes to read.
Wikipedia has the following definition for a runtime: A [runtime] primarily implements portions of an execution model. We know what our execution model is! ...
-
LLVM
4878 words, about 23 minutes to read.
We don’t want a compiler that can only generate code for a single platform. Our language should work on macOS, Windows, and Linux, on x86_64, ARM, and maybe some other architectures. ...
-
Garbage Collection
4832 words, about 23 minutes to read.
“When will you learn? When will you learn that your actions have consequences?” So far, we’ve entirely ignored the problem of memory management. ...
-
Polymorphism
8124 words, about 39 minutes to read.
In part 8, we wrote some pretty interesting programs in our little language. We successfully expressed arithmetic and recursion. But there’s one thing that we cannot express in our language without further changes: an if statement. ...
-
Polymorphic Data Types
4389 words, about 21 minutes to read.
In part 10, we managed to get our compiler to accept functions that were polymorphically typed. However, a piece of the puzzle is still missing: while our functions can handle values of different types, the same cannot be said for our data types. ...
-
Let/In and Lambdas
10322 words, about 49 minutes to read.
Now that our language’s type system is more fleshed out and pleasant to use, it’s time to shift our focus to the ergonomics of the language itself. ...
-
Cleanup
8297 words, about 39 minutes to read.
In part 12, we added let/in and lambda expressions to our compiler. At the end of that post, I mentioned that before we move on to bigger and better things, I wanted to take a step back and clean up the compiler. ...