Table of Contents

We now have trees representing valid programs in our language, and it’s time to think about how to compile them into machine code, to be executed on hardware. But how should we execute programs?

The programs we define are actually lists of definitions. But you can’t evaluate definitions - they just tell you, well, how things are defined. Expressions, on the other hand, can be simplified. So, let’s start by evaluating the body of the function called main, similarly to how C/C++ programs start.

Alright, we’ve made it past that hurdle. Next, to figure out how to evaluate expressions. It’s easy enough with binary operators: 3+2*6 becomes 3+12, and 3+12 becomes 15. Functions are when things get interesting. Consider:

double (160+3)

There’s many perfectly valid ways to evaluate the program. When we get to a function application, we can first evaluate the arguments, and then expand the function definition:

double (160+3)
double 163
163+163
326

Let’s come up with a more interesting program to illustrate execution. How about:

data Pair = { P Int Int }
defn fst p = {
    case p of {
        P x y -> { x }
    }
}
defn snd p = {
    case p of {
        P x y -> { y }
    }
}
defn slow x = { returns x after waiting for 1 second }
defn main = { fst (P (slow 320) (slow 6)) }

If we follow our rules for evaluating functions, the execution will follow the following steps:

fst (P (slow 320) (slow 6))
fst (P 320 (slow 6)) <- after 1 second
fst (P 320 6) <- after 1 second
320

We waited for two seconds, even though we really only needed to wait one. To avoid this, we could instead define our function application to substitute in the parameters of a function before evaluating them:

fst (P (slow 320) (slow 6))
(slow 320)
320 <- after 1 second

This seems good, until we try doubling an expression again:

double (slow 163)
(slow 163) + (slow 163)
163 + (slow 163) <- after 1 second
163 + 163 <- after 1 second
326

With ony one argument, we’ve actually spent two seconds on the evaluation! If we instead tried to triple using addition, we’d spend three seconds.

Observe that with these new rules (called “call by name” in programming language theory), we only waste time because we evaluate an expression that was passed in more than 1 time. What if we didn’t have to do that? Since we have a functional language, there’s no way that two expressions that are the same evaluate to a different value. Thus, once we know the result of an expression, we can replace all occurences of that expression with the result:

double (slow 163)
(slow 163) + (slow 163)
163 + 163 <- after 1 second
326

We’re back down to one second, and since we’re still substituting parameters before we evaluate them, we still only take one second.

Alright, this all sounds good. How do we go about implementing this? Since we’re substituting variables for whole expressions, we can’t just use values. Instead, because expressions are represented with trees, we might as well consider operating on trees. When we evaluate a tree, we can substitute it in-place with what it evaluates to. We’ll do this depth-first, replacing the children of a node with their reduced trees, and then moving on to the parent.

There’s only one problem with this: if we substitute a variable that occurs many times with the same expression tree, we no longer have a tree! Trees, by definition, have only one path from the root to any other node. Since we now have many ways to reach that expression we substituted, we instead have a graph. Indeed, the way we will be executing our functional code is called graph reduction.

Building Graphs

Naively, we might consider creating a tree for each function at the beginning of our program, and then, when that function is called, substituting the variables in it with the parameters of the application. But that approach quickly goes out the window when we realize that we could be applying a function multiple times - in fact, an arbitrary number of times. This means we can’t have a single tree, and we must build a new tree every time we call a function.

The question, then, is: how do we construct a new graph? We could reach into Plato’s Theory of Forms and have a “reference” tree which we then copy every time we apply the function. But how do you copy a tree? Copying a tree is usually a recursive function, and every time that we copy a tree, we’ll have to look at each node and decide whether or not to visit its children (or if it has any at all). If we copy a tree 100 times, we will have to look at each “reference” node 100 times. Since the reference tree doesn’t change, we’d be following the exact same sequence of decisions 100 times. That’s no good!

An alternative approach, one that we’ll use from now on, is to instead convert each function’s expression tree into a sequence of instructions that you can follow to build an identical tree. Every time we have to apply a function, we’ll follow the corresponding recipe for that function, and end up with a new tree that we continue evaluating.

G-machine

“Instructions” is a very generic term. Specifically, we will be creating instructions for a G-machine, an abstract architecture which we will use to reduce our graphs. The G-machine is stack-based - all operations push and pop items from a stack. The machine will also have a “dump”, which is a stack of stacks; this will help with separating evaluation of various graphs.

We will follow the same notation as Simon Peyton Jones in his book , which was my source of truth when implementing my compiler. The machine will be executing instructions that we give it, and as such, it must have an instruction queue, which we will reference as ii. We will write x:ix:i to mean “an instruction queue that starts with an instruction x and ends with instructions ii”. A stack machine obviously needs to have a stack - we will call it ss, and will adopt a similar notation to the instruction queue: a1,a2,a3:sa_1, a_2, a_3 : s will mean “a stack with the top values a1a_1, a2a_2, and a3a_3, and remaining instructions ss”. Finally, as we said, our stack machine has a dump, which we will write as dd. On this dump, we will push not only the current stack, but also the current instructions that we are executing, so we may resume execution later. We will write i,s:d\langle i, s \rangle : d to mean “a dump with instructions ii and stack ss on top, followed by instructions and stacks in dd”.

There’s one more thing the G-machine will have that we’ve not yet discussed at all, and it’s needed because of the following quip earlier in the post:

When we evaluate a tree, we can substitute it in-place with what it evaluates to.

How can we substitute a value in place? Surely we won’t iterate over the entire tree and look for an occurence of the tree we evaluted. Rather, wouldn’t it be nice if we could update all references to a tree to be something else? Indeed, we can achieve this effect by using pointers. I don’t mean specifically C/C++ pointers - I mean the more general concept of “an address in memory”. The G-machine has a heap, much like the heap of a C/C++ process. We can create a tree node on the heap, and then get an address of the node. We then have trees use these addresses to link their child nodes. If we want to replace a tree node with its reduced form, we keep its address the same, but change the value on the heap. This way, all trees that reference the node we change become updated, without us having to change them - their child address remains the same, but the child has now been updated. We represent the heap using hh. We write h[a:v]h[a : v] to say “the address aa points to value vv in the heap hh”. Now you also know why we used the letter aa when describing values on the stack - the stack contains addresses of (or pointers to) tree nodes.

Compiling Functional Languages: a tutorial also keeps another component of the G-machine, the global map, which maps function names to addresses of nodes that represent them. We’ll stick with this, and call this global map mm.

Finally, let’s talk about what kind of nodes our trees will be made of. We don’t have to include every node that we’ve defined as a subclass of ast - some nodes we can compile to instructions, without having to build them. We will also include nodes that we didn’t need for to represent expressions. Here’s the list of nodes types we’ll have:

With these nodes in mind, let’s try defining some instructions for the G-machine. We start with instructions we’ll use to assemble new version of function body trees as we discussed above. First up is PushInt:

PushInt
Before
PushInt  n:isdhm \text{PushInt} \; n : i \quad s \quad d \quad h \quad m
After
ia:sdh[a:NInt  n]m i \quad a : s \quad d \quad h[a : \text{NInt} \; n] \quad m
Description
Push an integer nn onto the stack.

Let’s go through this. We start with an instruction queue with PushInt n on top. We allocate a new NInt with the number n on the heap at address aa. We then push the address of the NInt node on top of the stack. Next, PushGlobal:

PushGlobal
Before
PushGlobal  f:isdhm[f:a] \text{PushGlobal} \; f : i \quad s \quad d \quad h \quad m[f : a]
After
ia:sdhm[f:a] i \quad a : s \quad d \quad h \quad m[f : a]
Description
Push a global function ff onto the stack.

We don’t allocate anything new on the heap for this one - we already have a node for the global function. Next up, Push:

Push
Before
Push  n:ia0,a1,...,an:sdhm \text{Push} \; n : i \quad a_0, a_1, ..., a_n : s \quad d \quad h \quad m
After
ian,a0,a1,...,an:sdhm i \quad a_n, a_0, a_1, ..., a_n : s \quad d \quad h \quad m
Description
Push a value at offset nn from the top of the stack onto the stack.

We define this instruction to work if and only if there exists an address on the stack at offset nn. We take the value at that offset, and push it onto the stack again. This can be helpful for something like f x x, where we use the same tree twice. Speaking of that - let’s define an instruction to combine two nodes into an application:

MkApp
Before
MkApp:ia0,a1:sdhm \text{MkApp} : i \quad a_0, a_1 : s \quad d \quad h \quad m
After
ia:sdh[a:NApp  a0  a1]m i \quad a : s \quad d \quad h[ a : \text{NApp} \; a_0 \; a_1] \quad m
Description
Apply a function at the top of the stack to a value after it.

We pop two things off the stack: first, the thing we’re applying, then the thing we apply it to. We then create a new node on the heap that is an NApp node, with its two children being the nodes we popped off. Finally, we push it onto the stack.

Let’s try use these instructions to get a feel for it. In order to conserve space, let’s use G\text{G} for PushGlobal, I\text{I} for PushInt, and A\text{A} for PushApp. Let’s say we want to construct a graph for double 326. We’ll use the instructions I  326\text{I} \; 326, G  double\text{G} \; \text{double}, and A\text{A}. Let’s watch these instructions play out: [I  326,G  double,A]sdhm[double:ad][G  double,A]a1:sdh[a1:NInt  326]m[double:ad][A]ad,a1:sdh[a1:NInt  326]m[double:ad][]a2:sdh[a1:NInt  326a2:NApp  ad  a1]m[double:ad] \begin{aligned} [\text{I} \; 326, \text{G} \; \text{double}, \text{A}] & \quad s \quad & d \quad & h \quad & m[\text{double} : a_d] \\ [\text{G} \; \text{double}, \text{A}] & \quad a_1 : s \quad & d \quad & h[a_1 : \text{NInt} \; 326] \quad & m[\text{double} : a_d] \\ [\text{A}] & \quad a_d, a_1 : s \quad & d \quad & h[a_1 : \text{NInt} \; 326] \quad & m[\text{double} : a_d] \\ [] & \quad a_2 : s \quad & d \quad & h[\substack{\begin{aligned}a_1 & : \text{NInt} \; 326 \\ a_2 & : \text{NApp} \; a_d \; a_1 \end{aligned}}] \quad & m[\text{double} : a_d] \\ \end{aligned} How did we come up with these instructions? We’ll answer this question with more generality later, but let’s take a look at this particular expression right now. We know it’s an application, so we’ll be using MkApp eventually. We also know that MkApp expects two values on top of the stack from which to make an application. The node on top has to be the function, and the next node is the value to be passed into that function. Since a stack is first-in-last-out, for the function (double, in our case) to be on top, we need to push it last. Thus, we push double first, then 326. Finally, we call MkApp now that the stack is in the right state.

Having defined instructions to build graphs, it’s now time to move on to instructions to reduce graphs - after all, we’re performing graph reduction. A crucial instruction for the G-machine is Unwind. What Unwind does depends on what nodes are on the stack. Its name comes from how it behaves when the top of the stack is an NApp node that is at the top of a potentially long chain of applications: given an application node, it pushes its left hand side onto the stack. It then continues to run Unwind. This is effectively a while loop: applications nodes continue to be expanded this way until the left hand side of an application is finally something that isn’t an application. Let’s write this rule as follows:

Unwind-App
Before
Unwind:ia:sdh[a:NApp  a0  a1]m \text{Unwind} : i \quad a : s \quad d \quad h[a : \text{NApp} \; a_0 \; a_1] \quad m
After
Unwind:ia0,a:sdh[a:NApp  a0  a1]m \text{Unwind} : i \quad a_0, a : s \quad d \quad h[ a : \text{NApp} \; a_0 \; a_1] \quad m
Description
Unwind an application by pushing its left node.

Let’s talk about what happens when Unwind hits a node that isn’t an application. Of all nodes we have described, NGlobal seems to be the most likely to be on top of the stack after an application chain has finished unwinding. In this case we want to run the instructions for building the referenced global function. Naturally, these instructions may reference the arguments of the application. We can find the first argument by looking at offset 1 on the stack, which will be an NApp node, and then going to its right child. The same can be done for the second and third arguments, if they exist. But this doesn’t feel right - we don’t want to constantly be looking at the right child of a node on the stack. Instead, we replace each application node on the stack with its right child. Once that’s done, we run the actual code for the global function:

Unwind-Global
Before
Unwind:ia,a0,a1,...,an1:sdh[a:NGlobalncak:NAppak1ak]m \text{Unwind} : i \quad a, a_0, a_1, ..., a_{n-1} : s \quad d \quad h[\substack{a : \text{NGlobal} \; n \; c \\ a_k : \text{NApp} \; a_{k-1} \; a_k'}] \quad m
After
ca0,a1,...,an1,an1:sdh[a:NGlobalncak:NAppak1ak]m c \quad a_0', a_1', ..., a_{n-1}', a_{n-1} : s \quad d \quad h[\substack{a : \text{NGlobal} \; n \; c \\ a_k : \text{NApp} \; a_{k-1} \; a_k'}] \quad m
Description
Call a global function.

In this rule, we used a general rule for aka_k, in which kk is any number between 1 and n1n-1. We also expect the NGlobal node to contain two parameters, nn and cc. nn is the arity of the function (the number of arguments it expects), and cc are the instructions to construct the function’s tree.

The attentive reader will have noticed a catch: we kept an1a_{n-1} on the stack! This once again goes back to replacing a node in-place. an1a_{n-1} is the address of the “root” of the whole expression we’re simplifying. Thus, to replace the value at this address, we need to keep the address until we have something to replace it with.

There’s one more thing that can be found at the leftmost end of a tree of applications: NInd. We simply replace NInd with the node it points to, and resume Unwind:

Unwind-Ind
Before
Unwind:ia:sdh[a:NInd  a]m \text{Unwind} : i \quad a : s \quad d \quad h[a : \text{NInd} \; a' ] \quad m
After
Unwind:ia:sdh[a:NInd  a]m \text{Unwind} : i \quad a' : s \quad d \quad h[a : \text{NInd} \; a' ] \quad m
Description
Replace indirection node with its target.

We’ve talked about replacing a node, and we’ve talked about indirection, but we haven’t yet an instruction to perform these actions. Let’s do so now:

Update
Before
Update  n:ia,a0,a1,...an:sdhm \text{Update} \; n : i \quad a,a_0,a_1,...a_n : s \quad d \quad h \quad m
After
ia0,a1,...,an:sdh[an:NInd  a]m i \quad a_0,a_1,...,a_n : s \quad d \quad h[a_n : \text{NInd} \; a ] \quad m
Description
Transform node at offset into an indirection.

This instruction pops an address from the top of the stack, and replaces a node at the given offset with an indirection to the popped node. After we evaluate a function call, we will use update to make sure it’s not evaluated again.

Now, let’s talk about data structures. We have mentioned an NData node, but we’ve given no explanation of how it will work. Obviously, we need to distinguish values of a type created by different constructors: If we have a value of type List, it could have been created either using Nil or Cons. Depending on which constructor was used to create a value of a type, we might treat it differently. Furthermore, it’s not always possible to know what constructor was used to create what value at compile time. So, we need a way to know, at runtime, how the value was constructed. We do this using a tag. A tag is an integer value that will be contained in the NData node. We assign a tag number to each constructor, and when we create a node with that constructor, we set the node’s tag accordingly. This way, we can easily tell if a List value is a Nil or a Cons, or if a Tree value is a Node or a Leaf.

To operate on NData nodes, we will need two primitive operations: Pack and Split. Pack will create an NData node with a tag from some number of nodes on the stack. These nodes will be placed into a dynamically allocated array:

Pack
Before
Pack  t  n:ia1,a2,...an:sdhm \text{Pack} \; t \; n : i \quad a_1,a_2,...a_n : s \quad d \quad h \quad m
After
ia:sdh[a:NData  t  [a1,a2,...,an]]m i \quad a : s \quad d \quad h[a : \text{NData} \; t \; [a_1, a_2, ..., a_n] ] \quad m
Description
Pack nn nodes from the stack into a node with tag tt.

Split will do the opposite, by popping of an NData node and moving the contents of its array onto the stack:

Split
Before
Split:ia:sdh[a:NData  t  [a1,a2,...,an]]m \text{Split} : i \quad a : s \quad d \quad h[a : \text{NData} \; t \; [a_1, a_2, ..., a_n] ] \quad m
After
ia1,a2,...,an:sdh[a:NData  t  [a1,a2,...,an]]m i \quad a_1, a_2, ...,a_n : s \quad d \quad h[a : \text{NData} \; t \; [a_1, a_2, ..., a_n] ] \quad m
Description
Unpack a data node on top of the stack.

These two instructions are a good start, but we’re missing something fairly big: case analysis. After we’ve constructed a data type, to perform operations on it, we want to figure out what constructor and values which were used to create it. In order to implement patterns and case expressions, we’ll need another instruction that’s capable of making a decision based on the tag of an NData node. We’ll call this instruction Jump, and define it to contain a mapping from tags to instructions to be executed for a value of that tag. For instance, if the constructor Nil has tag 0, and Cons has tag 1, the mapping for the case expression of a length function could be written as [0[PushInt  0],1[PushGlobal  length,...]][0 \rightarrow [\text{PushInt} \; 0], 1 \rightarrow [\text{PushGlobal} \; \text{length}, ...] ]. Let’s define the rule for it:

Jump
Before
Jump[...,tit,...]:ia:sdh[a:NData  t  as]m \text{Jump} [..., t \rightarrow i_t, ...] : i \quad a : s \quad d \quad h[a : \text{NData} \; t \; as ] \quad m
After
it,ia:sdh[a:NData  t  as]m i_t, i \quad a : s \quad d \quad h[a : \text{NData} \; t \; as ] \quad m
Description
Execute instructions corresponding to a tag.

Alright, we’ve made it through the interesting instructions, but there’s still a few that are needed, but less shiny and cool. For instance: imagine we’ve made a function call. As per the rules for Unwind, we’ve placed the right hand sides of all applications on the stack, and ran the instructions provided by the function, creating a final graph. We then continue to reduce this final graph. But we’ve left the function parameters on the stack! This is untidy. We define a Slide instruction, which keeps the address at the top of the stack, but gets rid of the next nn addresses:

Slide
Before
Slide  n:ia0,a1,...,an:sdhm \text{Slide} \; n : i \quad a_0, a_1, ..., a_n : s \quad d \quad h \quad m
After
ia0:sdhm i \quad a_0 : s \quad d \quad h \quad m
Description
Remove nn addresses after the top from the stack.

Just a few more. Next up, we observe that we have not defined any way for our G-machine to perform arithmetic, or indeed, any primitive operations. Since we’ve not defined any built-in type for booleans, let’s avoid talking about operations like <, ==, and so on (in fact, we’ve omitted them from the grammar so far). So instead, let’s talk about the closed operations, namely +, -, *, and /. We’ll define a special instruction for them, called BinOp:

BinOp
Before
BinOp  op:ia0,a1:sdh[a0:NIntna1:NIntm]m \text{BinOp} \; \text{op} : i \quad a_0, a_1 : s \quad d \quad h[\substack{a_0 : \text{NInt} \; n \\ a_1 : \text{NInt} \; m}] \quad m
After
ia:sdh[a0:NIntna1:NIntma:NInt(opnm)]m i \quad a : s \quad d \quad h[\substack{a_0 : \text{NInt} \; n \\ a_1 : \text{NInt} \; m \\ a : \text{NInt} \; (\text{op} \; n \; m)}] \quad m
Description
Apply a binary operator on integers.

Nothing should be particularly surprising here: the instruction pops two integers off the stack, applies the given binary operation to them, and places the result on the stack.

We’re not yet done with primitive operations, though. We have a lazy graph reduction machine, which means something like the expression 3*(2+6) might not be a binary operator applied to two NInt nodes. We keep around graphs until they really need to be reduced. So now we need an instruction to trigger reducing a graph, to say, “we need this value now”. We call this instruction Eval. This is where the dump finally comes in!

When we execute Eval, another graph becomes our “focus”, and we switch to a new stack. We obviously want to return from this once we’ve finished evaluating what we “focused” on, so we must store the program state somewhere - on the dump. Here’s the rule:

Eval
Before
Eval:ia:sdhm \text{Eval} : i \quad a : s \quad d \quad h \quad m
After
[Unwind][a]i,s:dhm [\text{Unwind}] \quad [a] \quad \langle i, s\rangle : d \quad h \quad m
Description
Evaluate graph to its normal form.

We store the current set of instructions and the current stack on the dump, and start with only Unwind and the value we want to evaluate. That does the job, but we’re missing one thing - a way to return to the state we placed onto the dump. To do this, we add another rule to Unwind:

Unwind-Return
Before
Unwind:ia:si,s:dh[a:NInt  n]m \text{Unwind} : i \quad a : s \quad \langle i', s'\rangle : d \quad h[a : \text{NInt} \; n] \quad m
After
ia:sdh[a:NInt  n]m i' \quad a : s' \quad d \quad h[a : \text{NInt} \; n] \quad m
Description
Return from Eval instruction.

Just a couple more special-purpose instructions, and we’re done!

Sometimes, it’s possible for a tree node to reference itself. For instance, Haskell defines the fixpoint combinator as follows:

fix f = let x = f x in x

In order to do this, an address that references a node must be present while the node is being constructed. We define an instruction, Alloc, which helps with that:

Alloc
Before
Alloc  n:isdhm \text{Alloc} \; n : i \quad s \quad d \quad h \quad m
After
isdh[ak:NInd  null]m i \quad s \quad d \quad h[a_k : \text{NInd} \; \text{null}] \quad m
Description
Allocate indirection nodes.

We can allocate an indirection on the stack, and call Update on it when we’ve constructed a node. While we’re constructing the tree, we can refer to the indirection when a self-reference is required.

Lastly, we also define a Pop instruction, which just removes some number of nodes from the stack. We want this because calling Update at the end of a function modifies a node further up the stack, leaving anything on top of the stack after that node as scratch work. We get rid of that scratch work simply by popping it.

Pop
Before
Pop  n:ia1,a2,...,an:sdhm \text{Pop} \; n : i \quad a_1, a_2, ..., a_n : s \quad d \quad h \quad m
After
isdhm i \quad s \quad d \quad h \quad m
Description
Pop nn nodes from the stack.

That’s it for the instructions. Knowing them, however, doesn’t tell us what to do with our ast structs. We’ll need to define rules to translate trees into these instructions, and I’ve already alluded to this when we went over double 326. However, this has already gotten pretty long, so we’ll do it in the next post: Part 6 - Compilation.