Huh? We’re on day 8? What happened to days 2 through 7?

Well, for the most part, I didn’t think they were that interesting from the Coq point of view. Day 7 got close, but not close enough to inspire me to create a formalization. Day 8, on the other hand, is [note: Especially to someone like me who's interested in programming languages! ] and took quite some time to formalize.

As before, here’s an (abridged) description of the problem:

Given a tiny assembly-like language, determine the state of its accumulator when the same instruction is executed twice.

Before we start on the Coq formalization, let’s talk about an idea from Programming Language Theory (PLT), big step operational semantics.

Big Step Operational Semantics

What we have in Advent of Code’s Day 8 is, undeniably, a small programming language. We are tasked with executing this language, or, in PLT lingo, defining its semantics. There are many ways of doing this - at university, I’ve been taught of denotational, axiomatic, and operational semantics. I believe that Coq’s mechanism of inductive definitions lends itself very well to operational semantics, so we’ll take that route. But even “operational semantics” doesn’t refer to a concrete technique - we have a choice between small-step (structural) and big-step (natural) operational semantics. The former describe the minimal “steps” a program takes as it’s being evaluated, while the latter define the final results of evaluating a program. I decided to go with big-step operational semantics, since they’re more intutive (natural!).

So, how does one go about “[defining] the final results of evaluating a program?” Most commonly, we go about using inference rules. Let’s talk about those next.

Inference Rules

Inference rules are a very general notion. The describe how we can determine (infer) a conclusion from a set of assumptions. It helps to look at an example. Here’s a silly little inference rule:

I’m allergic to catsMy friend has a catI will not visit my friend very much \frac {\text{I'm allergic to cats} \quad \text{My friend has a cat}} {\text{I will not visit my friend very much}}

It reads, “if I’m allergic to cats, and if my friend has a cat, then I will not visit my friend very much”. Here, “I’m allergic to cats” and “my friend has a cat” are premises, and “I will not visit my friend very much” is a conclusion. An inference rule states that if all its premises are true, then its conclusion must be true. Here’s another inference rule, this time with some mathematical notation instead of words:

n<mn+1<m+1 \frac {n < m} {n + 1 < m + 1}

This one reads, “if nn is less than mm, then n+1n+1 is less than m+1m+1”. We can use inference rules to define various constructs. As an example, let’s define what it means for a natural number to be even. It takes two rules:

0  is evenn  is evenn+2  is even \frac {} {0 \; \text{is even}} \quad \frac {n \; \text{is even}} {n+2 \; \text{is even}}

First of all, zero is even. We take this as fact - there are no premises for the first rule, so they are all trivially true. Next, if a number is even, then adding 2 to that number results in another even number. Using the two of these rules together, we can correctly determine whether any number is or isn’t even. We start knowing that 0 is even. Adding 2 we learn that 2 is even, and adding 2 again we see that 4 is even, as well. We can continue this to determine that 6, 8, 10, and so on are even too. Never in this process will we visit the numbers 1 or 3 or 5, and that’s good - they’re not even!

Let’s now extend this notion to programming languages, starting with a simple arithmetic language. This language is made up of natural numbers and the \square operation, which represents the addition of two numbers. Again, we need two rules:

nNn  evaluates to  ne1  evaluates to  n1e2  evaluates to  n2e1e2  evaluates to  n1+n2 \frac {n \in \mathbb{N}} {n \; \text{evaluates to} \; n} \quad \frac {e_1 \; \text{evaluates to} \; n_1 \quad e_2 \; \text{evaluates to} \; n_2} {e_1 \square e_2 \; \text{evaluates to} \; n_1 + n_2}

First, let me explain myself. I used \square to demonstrate two important points. First, languages can be made of any kind of characters we want; it’s the rules that we define that give these languages meaning. Second, while \square is the addition operation in our language, ++ is the mathematical addition operator. They are not the same - we use the latter to define how the former works.

Finally, writing “evaluates to” gets quite tedious, especially for complex languages. Instead, PLT people use notation to make their semantics more concise. The symbol \Downarrow is commonly used to mean “evaluates to”; thus, eve \Downarrow v reads “the expression ee evaluates to the value vv. Using this notation, our rules start to look like the following:

nNnne1n1e2n2e1e2n1+n2 \frac {n \in \mathbb{N}} {n \Downarrow n} \quad \frac {e_1 \Downarrow n_1 \quad e_2 \Downarrow n_2} {e_1 \square e_2 \Downarrow n_1 + n_2}

If nothing else, these are way more compact! Though these may look intimidating at first, it helps to simply read each symbol as its English meaning.

Encoding Inference Rules in Coq

Now that we’ve seen what inference rules are, we can take a look at how they can be represented in Coq. We can use Coq’s Inductive mechanism to define the rules. Let’s start with our “is even” property.

Inductive is_even : nat -> Prop :=
    | zero_even : is_even 0
    | plustwo_even : is_even n -> is_even (n+2).

The first line declares the property is_even, which, given a natural number, returns proposition. This means that is_even is not a proposition itself, but is_even 0, is_even 1, and is_even 2 are all propositions.

The following two lines each encode one of our aforementioned inference rules. The first rule, zero_even, is of type is_even 0. The zero_even rule doesn’t require any arguments, and we can use it to create a proof that 0 is even. On the other hand, the plustwo_even rule does require an argument, is_even n. To construct a proof that a number n+2 is even using plustwo_even, we need to provide a proof that n itself is even. From this definition we can see a general principle: we encode each inference rule as constructor of an inductive Coq type. Each rule encoded in this manner takes as arguments the proofs of its premises, and returns a proof of its conclusion.

For another example, let’s encode our simple addition language. First, we have to define the language itself:

Inductive tinylang : Type :=
    | number (n : nat) : tinylang
    | box (e1 e2 : tinylang) : tinylang.

This defines the two elements of our example language: number n corresponds to nn, and box e1 e2 corresponds to e1e2e_1 \square e_2. Finally, we define the inference rules:

1
2
3
4
5
Inductive tinylang_sem : tinylang -> nat -> Prop :=
    | number_sem : forall (n : nat), tinylang_sem (number n) n
    | box_sem : forall (e1 e2 : tinylang) (n1 n2 : nat),
        tinylang_sem e1 n1 -> tinylang_sem e2 n2 ->
        tinylang_sem (box e1 e2) (n1 + n2).

When we wrote our rules earlier, by using arbitrary variables like e1e_1 and n1n_1, we implicitly meant that our rules work for any number or expression. When writing Coq we have to make this assumption explicit by using forall. For instance, the rule on line 2 reads, “for any number n, the expression n evaluates to n”.

Semantics of Our Language

We’ve now written some example big-step operational semantics, both “on paper” and in Coq. Now, it’s time to take a look at the specific semantics of the language from Day 8! Our language consists of a few parts.

First, there are three opcodes: jmp\texttt{jmp}, nop\texttt{nop}, and add\texttt{add}. Opcodes, combined with an integer, make up an instruction. For example, the instruction add  3\texttt{add} \; 3 will increase the content of the accumulator by three. Finally, a program consists of a sequence of instructions; They’re separated by newlines in the puzzle input, but we’ll instead separate them by semicolons. For example, here’s a complete program.

add  0;  nop  2;  jmp  2 \texttt{add} \; 0; \; \texttt{nop} \; 2; \; \texttt{jmp} \; -2

Now, let’s try evaluating this program. Starting at the beginning and with 0 in the accumulator, it will add 0 to the accumulator (keeping it the same), do nothing, and finally jump back to the beginning. At this point, it will try to run the addition instruction again, which is not allowed; thus, the program will terminate.

Did you catch that? The semantics of this language will require more information than just our program itself (which we’ll denote by pp).

Now we have all the elements of our evaluation. Let’s define some notation. A program starts at some state, and terminates in another, possibly different state. In the course of a regular evaluation, the program never changes; only the state does. So I propose this (rather unorthodox) notation:

(c,a,v)p(c,a,v) (c, a, v) \Rightarrow_p (c', a', v')

This reads, “after starting at program counter cc, accumulator aa, and set of valid addresses vv, the program pp terminates with program counter cc', accumulator aa', and set of valid addresses vv'”. Before creating the inference rules for this evaluation relation, let’s define the effect of evaluating a single instruction, using notation (c,a)i(c,a)(c, a) \rightarrow_i (c', a'). An addition instruction changes the accumulator, and increases the program counter by 1.

(c,a)add  n(c+1,a+n) \frac{} {(c, a) \rightarrow_{\texttt{add} \; n} (c+1, a+n)}

A no-op instruction does even less. All it does is increment the program counter.

(c,a)nop  n(c+1,a) \frac{} {(c, a) \rightarrow_{\texttt{nop} \; n} (c+1, a)}

Finally, a jump instruction leaves the accumulator intact, but adds a number to the program counter itself!

(c,a)jmp  n(c+n,a) \frac{} {(c, a) \rightarrow_{\texttt{jmp} \; n} (c+n, a)}

None of these rules have any premises, and they really are quite simple. Now, let’s define the rules for evaluating a program. First of all, a program starting in a state that is not considered “valid” is done evaluating, and is in a “failed” state.

c∉vclength(p)(c,a,v)p(c,a,v) \frac{c \not \in v \quad c \not= \text{length}(p)} {(c, a, v) \Rightarrow_{p} (c, a, v)}

We use length(p)\text{length}(p) to represent the number of instructions in pp. Note the second premise: even if our program counter cc is not included in the valid set, if it’s “past the end of the program”, the program terminates in an “ok” state. [note: In the presented rule, we don't use the variable c all that much, and we know its concrete value (from the equality premise). We could thus avoid introducing the name cc by replacing it with said known value: (length(p),a,v)p(length(p),a,v) \frac{} {(\text{length}(p), a, v) \Rightarrow_{p} (\text{length}(p), a, v)} This introduces some duplication, but that is really because all "base case" evaluation rules start and stop in the same state. To work around this, we could define a separate proposition to mean "program pp is done in state ss", then ss will really only need to occur once, and so will length(p)\text{length}(p). This is, in fact, what we will do later on, since being able to talk abut "programs being done" will help us with components of our proof. ]

c=length(p)(c,a,v)p(c,a,v) \frac{c = \text{length}(p)} {(c, a, v) \Rightarrow_{p} (c, a, v)}

When our program counter reaches the end of the program, we are also done evaluating it. Even though both rules [note: In fact, if the end of the program is never included in the valid set, the second rule is completely redundant. ] it helps to distinguish the two possible outcomes. Finally, if neither of the termination conditions are met, our program can take a step, and continue evaluating from there.

cvp[c]=i(c,a)i(c,a)(c,a,v{c})p(c,a,v)(c,a,v)p(c,a,v) \frac{c \in v \quad p[c] = i \quad (c, a) \rightarrow_i (c', a') \quad (c', a', v - \{c\}) \Rightarrow_p (c'', a'', v'')} {(c, a, v) \Rightarrow_{p} (c'', a'', v'')}

This is quite a rule. A lot of things need to work out for a program to evauate from a state that isn’t currently the final state:

If all of these conditions are met, our program, starting at (c,a,v)(c, a, v), will terminate in the state (c,a,v)(c'', a'', v''). This third rule completes our semantics; a program being executed will keep running instructions using the third rule, until it finally hits an invalid program counter (terminating with the first rule) or gets to the end of the program (terminating with the second rule).

Aside: Vectors and Finite N\mathbb{N}

We’ll be getting to the Coq implementation of our semantics soon, but before we do: what type should cc be? It’s entirely possible for an instruction like jmp  10000\texttt{jmp} \; -10000 to throw our program counter way before the first instruction of our program, so at first, it seems as though we should use an integer. But the prompt doesn’t even specify what should happen in this case - it only says an instruction shouldn’t be run twice. The “valid set”, although it may help resolve this debate, is our invention, and isn’t part of the original specification.

There is, however, something we can infer from this problem. Since the problem of jumping “too far behind” or “too far ahead” is never mentioned, we can assume that all jumps will lead either to an instruction, or right to the end of a program. This means that cc is a natural number, with

0clength(p) 0 \leq c \leq \text{length}(p)

In a language like Coq, it’s possible to represent such a number. Since we’ve gotten familliar with inference rules, let’s present two rules that define such a number:

nN+Z:Fin  nf:Fin  nSf:Fin  (n+1) \frac {n \in \mathbb{N}^+} {Z : \text{Fin} \; n} \quad \frac {f : \text{Fin} \; n} {S f : \text{Fin} \; (n+1)}

This is a variation of the Peano encoding of natural numbers. It reads as follows: zero (ZZ) is a finite natural number less than any positive natural number nn. Then, if a finite natural number ff is less than nn, then adding one to that number (using the successor function SS) will create a natural number less than n+1n+1. We encode this in Coq as follows (originally from here):

Inductive t : nat -> Set :=
    | F1 : forall {n}, t (S n)
    | FS : forall {n}, t n -> t (S n).

The F1 constructor here is equivalent to our ZZ, and FS is equivalent to our SS. To represent positive natural numbers N+\mathbb{N}^+, we simply take a regular natural number from N\mathbb{N} and find its successor using S (simply adding 1). Again, we have to explicitly use forall in our type signatures.

We can use a similar technique to represent a list with a known number of elements, known in the Idris and Coq world as a vector. Again, we only need two inference rules to define such a vector:

t:Type[]:Vec  t  0x:txs:Vec  t  n(x::xs):Vec  t  (n+1) \frac {t : \text{Type}} {[] : \text{Vec} \; t \; 0} \quad \frac {x : \text{t} \quad \textit{xs} : \text{Vec} \; t \; n} {(x::\textit{xs}) : \text{Vec} \; t \; (n+1)}

These rules read: the empty list [][] is zero-length vector of any type tt. Then, if we take an element xx of type tt, and an nn-long vector xs\textit{xs} of tt, then we can prepend xx to xs\textit{xs} and get an (n+1)(n+1)-long vector of tt. In Coq, we write this as follows (originally from here):

Inductive t A : nat -> Type :=
    | nil : t A 0
    | cons : forall (h:A) (n:nat), t A n -> t A (S n).

The nil constructor represents the empty list [][], and cons represents the operation of prepending an element (called h in the code and xx in our inference rules) to another vector of length nn, which remains unnamed in the code but is called xs\textit{xs} in our rules.

These two definitions work together quite well. For instance, suppose we have a vector of length nn. If we were to access its elements by indices starting at 0, we’d be allowed to access indices 0 through n1n-1. These are precisely the values of the finite natural numbers less than nn, Fin  n\text{Fin} \; n . Thus, given such an index Fin  n\text{Fin} \; n and a vector Vec  t  n\text{Vec} \; t \; n, we are guaranteed to be able to retrieve the element at the given index! In our code, we will not have to worry about bounds checking.

Of course, if our program has nn elements, our program counter will be a finite number less than n+1n+1, since there’s always the possibility of it pointing past the instructions, indicating that we’ve finished running the program. This leads to some minor complications: we can’t safely access the program instruction at index Fin  (n+1)\text{Fin} \; (n+1). We can solve this problem by considering two cases: either our index points one past the end of the program (in which case its value is exactly the finite representation of nn), or it’s less than nn, in which case we can “tighten” the upper bound, and convert that index into a Fin  n\text{Fin} \; n. We formalize it in a lemma:

From day8.v, lines 80 through 82
80
81
82
  Lemma fin_big_or_small : forall {n} (f : fin (S n)),
    (f = nat_to_fin n) \/ (exists (f' : fin n), f = weaken_one f').
  Proof.

There’s a little bit of a gotcha here. Instead of translating our above statement literally, and returning a value that’s the result of “tightening” our input f, we return a value f' that can be “weakened” to f. This is because “tightening” is not a total function - it’s not always possible to convert a Fin  (n+1)\text{Fin} \; (n+1) into a Fin  n\text{Fin} \; n. However, “weakening” Fin  n\text{Fin} \; n is a total function, since a number less than nn is, by the transitive property of a total order, also less than n+1n+1.

The Coq proof for this claim is as follows:

From day8.v, lines 88 through 97
88
89
90
91
92
93
94
95
96
97
    apply Fin.rectS.
    - intros n. destruct n.
      + left. reflexivity.
      + right. exists F1. auto.
    - intros n p IH.
      destruct IH.
      + left. rewrite H. reflexivity.
      + right. destruct H as [f' Heq]. 
        exists (FS f'). simpl. rewrite Heq.
        reflexivity.

The Fin.rectS function is a convenient way to perform inductive proofs over our finite natural numbers. Informally, our proof proceeds as follows:

Next, let’s talk about addition, specifically the kind of addition done by the jmp\texttt{jmp} instruction. We can always add an integer to a natural number, but we can at best guarantee that the result will be an integer. For instance, we can add -1000 to 1, and get -999, which is not a natural number. We implement this kind of addition in a function called jump_t:

From day8.v, line 56
56
  Definition jump_t {n} (pc : fin n) (off : t) : t :=

At the moment, its definition is not particularly important. What is important, though, is that it takes a bounded natural number pc (our program counter), an integer off (the offset provided by the jump instruction) and returns another integer representing the final offset. Why are integers of type t? Well, it so happens that Coq provides facilities for working with arbitrary implementations of integers, without relying on how they are implemented under the hood. This can be seen in its Coq.ZArith.Int module, which describes what functions and types an implementation of integers should provide. Among those is t, the type of an integer in such an arbitrary implementation. We too will not make an assumption about how the integers are implemented, and simply use this generic t from now on.

Now, suppose we wanted to write a function that does return a valid program counter after adding the offset to it. Since it’s possible for this function to fail (for instance, if the offset is very negative), it has to return option (fin (S n)). That is, this function may either fail (returning None) or succeed, returning Some f, where f is of type fin (S n), aka Fin  (n+1)\text{Fin} \; (n + 1). Here’s the function in Coq (again, don’t worry too much about the definition):

From day8.v, line 61
61
  Definition valid_jump_t {n} (pc : fin n) (off : t) : option (fin (S n)) := @clamp (S n) (jump_t pc off).

We will make use of this function when we define and verify our semantics. Let’s take a look at that next.

Semantics in Coq

Now that we’ve seen finite sets and vectors, it’s time to use them to encode our semantics in Coq. Before we do anything else, we need to provide Coq definitions for the various components of our language, much like what we did with tinylang. We can start with opcodes:

From day8.v, lines 20 through 23
20
21
22
23
  Inductive opcode : Type :=
    | add
    | nop
    | jmp.

Now we can define a few other parts of our language and semantics, namely states, instructions and programs (which I called “inputs” since, we’ll, they’re our puzzle input). A state is simply the 3-tuple of the program counter, the set of valid program counters, and the accumulator. We write it as follows:

From day8.v, line 33
33
  Definition state n : Type := (fin (S n) * set (fin n) * t).

The star * is used here to represent a product type rather than arithmetic multiplication. Our state type accepts an argument, n, much like a finite natural number or a vector. In fact, this n is passed on to the state’s program counter and set types. Rightly, a state for a program of length nn will not be of the same type as a state for a program of length n+1n+1.

An instruction is also a tuple, but this time containing only two elements: the opcode and the number. We write this as follows:

From day8.v, line 36
36
  Definition inst : Type := (opcode * t).

Finally, we have to define the type of a program. This type will also be indexed by n, the program’s length. A program of length n is simply a vector of instructions inst of length n. This leads to the following definition:

From day8.v, line 38
38
  Definition input (n : nat) := VectorDef.t inst n.

So far, so good! Finally, it’s time to get started on the semantics themselves. We begin with the inductive definition of (i)(\rightarrow_i). I think this is fairly straightforward. However, we do use t instead of nn from the rules, and we use FS instead of +1+1. Also, we make the formerly implicit assumption that c+nc+n is valid explicit, by providing a proof that valid_jump_t pc t = Some pc'.

From day8.v, lines 103 through 110
103
104
105
106
107
108
109
110
  Inductive step_noswap {n} : inst -> (fin n * t) -> (fin (S n) * t) -> Prop :=
    | step_noswap_add : forall pc acc t,
        step_noswap (add, t) (pc, acc) (FS pc, M.add acc t)
    | step_noswap_nop : forall pc acc t,
        step_noswap (nop, t) (pc, acc) (FS pc, acc)
    | step_noswap_jmp : forall pc pc' acc t,
        valid_jump_t pc t = Some pc' ->
        step_noswap (jmp, t) (pc, acc) (pc', acc).

Next, it will help us to combine the premises for “failed” and “ok” terminations into Coq data types. This will make it easier for us to formulate a lemma later on. Here are the definitions:

From day8.v, lines 112 through 117
112
113
114
115
116
117
  Inductive done {n} : input n -> state n -> Prop :=
    | done_prog : forall inp v acc, done inp (nat_to_fin n, v, acc).

  Inductive stuck {n} : input n -> state n -> Prop :=
    | stuck_prog : forall inp pc' v acc,
        ~ set_In pc' v -> stuck inp (weaken_one pc', v, acc).

Since all of out “termination” rules start and end in the same state, there’s no reason to write that state twice. Thus, both done and stuck only take the input inp, and the state, which includes the accumulator acc, the set of allowed program counters v, and the program counter at which the program came to an end. When the program terminates successfully, this program counter will be equal to the length of the program n, so we use nat_to_fin n. On the other hand, if the program terminates in as stuck state, it must be that it terminated at a program counter that points to an instruction. Thus, this program counter is actually a Fin  n\text{Fin} \; n, and not a Fin (n+1)\text{Fin} \ (n+1), and is not in the set of allowed program counters. We use the same “weakening” trick we saw earlier to represent this.

Finally, we encode the three inference rules we came up with:

From day8.v, lines 119 through 126
119
120
121
122
123
124
125
126
  Inductive run_noswap {n} : input n -> state n -> state n -> Prop :=
    | run_noswap_ok : forall inp st, done inp st -> run_noswap inp st st
    | run_noswap_fail : forall inp st, stuck inp st -> run_noswap inp st st
    | run_noswap_trans : forall inp pc pc' v acc acc' st',
        set_In pc v ->
        step_noswap (nth inp pc) (pc, acc) (pc', acc') ->
        run_noswap inp (pc', set_remove Fin.eq_dec pc v, acc') st' ->
        run_noswap inp (weaken_one pc, v, acc) st'.

Notice that we fused two of the premises in the last rule. Instead of naming the instruction at the current program counter (by writing p[c]=ip[c] = i) and using it in another premise, we simply use nth inp pc, which corresponds to p[c]p[c] in our “paper” semantics.

Before we go on writing some actual proofs, we have one more thing we have to address. Earlier, we said:

All jumps will lead either to an instruction, or right to the end of a program.

To make Coq aware of this constraint, we’ll have to formalize it. To start off, we’ll define the notion of a “valid instruction”, which is guaranteed to keep the program counter in the correct range. There are a couple of ways to do this, but we’ll use yet another definition based on inference rules. First, though, observe that the same instruction may be valid for one program, and invalid for another. For instance, jmp  100\texttt{jmp} \; 100 is perfectly valid for a program with thousands of instructions, but if it occurs in a program with only 3 instructions, it will certainly lead to disaster. Specifically, the validity of an instruction depends on the length of the program in which it resides, and the program counter at which it’s encountered. Thus, we refine our idea of validity to “being valid for a program of length nn at program counter ff”. For this, we can use the following two inference rules:

c:Fin  nadd  t  valid for  n,cc:Fin  no{nop,jmp}Jv(c,t)=Some  co  t  valid for  n,c \frac {c : \text{Fin} \; n} {\texttt{add} \; t \; \text{valid for} \; n, c } \quad \frac {c : \text{Fin} \; n \quad o \in \{\texttt{nop}, \texttt{jmp}\} \quad J_v(c, t) = \text{Some} \; c' } {o \; t \; \text{valid for} \; n, c }

The first rule states that if a program has length nn, then add\texttt{add} is valid at any program counter whose value is less than nn. This is because running add\texttt{add} will increment the program counter cc by 1, and thus, create a new program counter that’s less than n+1n+1, which, as we discussed above, is perfectly valid.

The second rule works for the other two instructions. It has an extra premise: the result of jump_valid_t (written as JvJ_v) has to be Some  c\text{Some} \; c', that is, jump_valid_t must succeed. Note that we require this even for no-ops, since it later turns out that one of the them may be a jump after all.

We now have our validity rules. If an instruction satisfies them for a given program and at a given program counter, evaluating it will always result in a program counter that has a proper value. We encode the rules in Coq as follows:

From day8.v, lines 152 through 157
152
153
154
155
156
157
  Inductive valid_inst {n} : inst -> fin n -> Prop :=
    | valid_inst_add : forall t f, valid_inst (add, t) f
    | valid_inst_nop : forall t f f',
        valid_jump_t f t = Some f' -> valid_inst (nop, t) f
    | valid_inst_jmp : forall t f f',
        valid_jump_t f t = Some f' -> valid_inst (jmp, t) f.

Note that we have three rules instead of two. This is because we “unfolded” oo from our second rule: rather than using set notation (or “or”), we just generated two rules that vary in nothing but the operation involved.

Of course, we must have that every instruction in a program is valid. We don’t really need inference rules for this, as much as a “forall” quantifier. A program pp of length nn is valid if the following holds:

(c:Fin  n).p[c]  valid for  n,c \forall (c : \text{Fin} \; n). p[c] \; \text{valid for} \; n, c

That is, for every possible in-bounds program counter cc, the instruction at the program counter is valid. We can now encode this in Coq, too:

From day8.v, lines 160 through 161
160
161
  Definition valid_input {n} (inp : input n) : Prop := forall (pc : fin n),
    valid_inst (nth inp pc) pc.

In the above, n is made implicit where possible. Since cc (called pc in the code) is of type Fin  n\text{Fin} \; n, there’s no need to write nn again. The curly braces tell Coq to infer that argument where possible.

Proving Termination

Here we go! It’s finally time to make some claims about our definitions. Who knows - maybe we wrote down total garbage! We will be creating several related lemmas and theorems. All of them share two common assumptions:

We represent these grouped assumptions by opening a Coq Section, which we call ValidInput, and listing our assumptions:

From day8.v, lines 163 through 166
163
164
165
166
  Section ValidInput.
    Variable n : nat.
    Variable inp : input n.
    Hypothesis Hv : valid_input inp.

We had to also explicitly mention the length n of our program. From now on, the variables n, inp, and Hv will be available to all of the proofs we write in this section. The first proof is rather simple. The claim is:

For our valid program, at any program counter pc and accumulator acc, there must exist another program counter pc' and accumulator acc' such that the instruction evaluation relation (i)(\rightarrow_i) connects the two. That is, valid addresses aside, we can always make a step.

Here is this claim encoded in Coq:

From day8.v, lines 168 through 169
168
169
    Theorem valid_input_can_step : forall pc acc, exists pc' acc',
      step_noswap (nth inp pc) (pc, acc) (pc', acc').

We start our proof by introducing all the relevant variables into the global context. I’ve mentioned this when I wrote about day 1, but here’s the gist: the intros keyword takes variables from a forall, and makes them concrete. In short, intros x is very much like saying “suppose we have an x”, and going on with the proof.

From day8.v, lines 170 through 171
170
171
    Proof.
      intros pc acc.

Here, we said “take any program counter pc and any accumulator acc”. Now what? Well, first of all, we want to take a look at the instruction at the current pc. We know that this instruction is a combination of an opcode and a number, so we use destruct to get access to both of these parts:

From day8.v, line 172
172
      destruct (nth inp pc) eqn:Hop.

Now, Coq reports the following proof state:

1 subgoal

n : nat
inp : input n
Hv : valid_input inp
pc : Fin.t n
acc : t
o : opcode
t0 : t
Hop : nth inp pc = (o, t0)

========================= (1 / 1)

exists (pc' : fin (S n)) (acc' : t),
  step_noswap (o, t0) (pc, acc) (pc', acc')

We have some opcode o, and some associated number t0, and we must show that there exist a pc' and acc' to which we can move on. To prove that something exists in Coq, we must provide an instance of that “something”. If we claim that there exists a dog that’s not a good boy, we better have this elusive creature in hand. In other words, proofs in Coq are constructive. Without knowing the kind of operation we’re dealing with, we can’t say for sure how the step will proceed. Thus, we proceed by case analysis on o.

From day8.v, line 173
173
      destruct o.

There are three possible cases we have to consider, one for each type of instruction.

This is how these three cases are translated to Coq:

From day8.v, lines 174 through 177
174
175
176
177
      - exists (FS pc). exists (M.add acc t0). apply step_noswap_add. 
      - exists (FS pc). exists acc. eapply step_noswap_nop.
      - specialize (Hv pc). rewrite Hop in Hv. inversion Hv; subst.
        exists f'. exists acc. eapply step_noswap_jmp. apply H0.

For the first two cases, we simply provide the values we expect for pc' and acc', and apply the corresponding inference rule that is satisfied by these values. For the third case, we have to invoke Hv, the hypothesis that our input is valid. In particular, we care about the instruction at pc, so we use specialize to plug pc into the more general hypothesis. We then replace nth inp pc with its known value, (jmp, t0). This tells us the following, in Coq’s words:

Hv : valid_inst (jmp, t0) pc

That is, (jmp, t0) is a valid instruction at pc. Then, using Coq’s inversion tactic, we ask: how is this possible? There is only one inference rule that gives us such a conclusion, and it is named valid_inst_jmp in our Coq code. Since we have a proof that our jmp is valid, it must mean that this rule was used. Furthermore, since this rule requires that valid_jump_t evaluates to Some f', we know that this must be the case here! Coq now has adds the following two lines to our proof state:

f' : fin (S n)
H0 : valid_jump_t pc t0 = Some f'

Finally, we specify, as mentioned earlier, that pc' = f' and acc' = acc. As before, we apply the corresponding step rule for jmp. When it asks for a proof that valid_jump_t produces a valid program counter, we hand it H0 using apply H0. And with that, Coq is happy!

Next, we prove a claim that a valid program can always do something, and that something is one of three things:

Alternatively, we could say that one of the inference rules for (p)(\Rightarrow_p) must apply. This is not the case if the input is not valid, since, as I said before, an arbitrary input program can lead us to jump to a negative address (or to an address way past the end of the program). Here’s the claim, translated to Coq:

From day8.v, lines 181 through 186
181
182
183
184
185
186
    Theorem valid_input_progress : forall pc v acc,
      (pc = nat_to_fin n /\ done inp (pc, v, acc)) \/
      (exists pcs, pc = weaken_one pcs /\
        ((~ set_In pcs v /\ stuck inp (pc, v, acc)) \/
         (exists pc' acc', set_In pcs v /\
         step_noswap (nth inp pcs) (pcs, acc) (pc', acc')))).

Informally, we can prove this as follows:

Below is the Coq translation of the above.

From day8.v, lines 187 through 203
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
    Proof.
      intros pc v acc.
      (* Have we reached the end? *)
      destruct (fin_big_or_small pc).
      (* We're at the end, so we're done. *)
      left. rewrite H. split. reflexivity. apply done_prog.
      (* We're not at the end. *)
      right. destruct H as [pcs H]. exists pcs. rewrite H. split. reflexivity.
      (* We're not at the end. Is the PC valid? *)
      destruct (set_In_dec Fin.eq_dec pcs v).
      - (* It is. *)
        right.
        destruct (valid_input_can_step pcs acc) as [pc' [acc' Hstep]].
        exists pc'; exists acc'; auto.
      - (* It is not. *)
        left. split; auto. apply stuck_prog; auto.
    Qed.

It doesn’t seem like we’re that far from being done now. A program can always take a step, and each time it does, the set of valid program counters decreases in size. Eventually, this set will become empty, so if nothing else, our program will eventually terminate in an “error” state. Thus, it will stop running no matter what.

This seems like a task for induction, in this case on the size of the valid set. In particular, strong mathematical induction [note: Why strong induction? If we remove a single element from a set, its size should decrease strictly by 1. Thus, why would we need to care about sets of all sizes less than the current set's size?

Unfortunately, we're not working with purely mathematical sets. Coq's default facility for sets is simply a layer on top of good old lists, and makes no effort to be "correct by construction". It is thus perfectly possible to have a "set" which inlcudes an element twice. Depending on the implementation of set_remove, we may end up removing the repeated element multiple times, thereby shrinking the length of our list by more than 1. I'd rather not worry about implementation details like that. ]
Someone on StackOverflow implemented this, so I’ll just use it. The Coq theorem corresonding to strong induction on the length of a list is as follows:

From day8.v, lines 205 through 207
205
206
207
    Theorem list_length_induction {X : Type} (P : list X -> Prop) :
      (forall l, (forall l', length l' < length l -> P l') -> P l) ->
      forall l, P l.

It reads,

If for some list l, the property P holding for all lists shorter than l means that it also holds for l itself, then P holds for all lists.

This is perhaps not particularly elucidating. We can alternatively think of this as trying to prove some property for all lists l. We start with all empty lists. Here, we have nothing else to rely on; there are no lists shorter than the empty list, and our property must hold for all empty lists. Then, we move on to proving the property for all lists of length 1, already knowing that it holds for all empty lists. Once we’re done there, we move on to proving that P holds for all lists of length 2, now knowing that it holds for all empty lists and all lists of length 1. We continue doing this, eventually covering lists of any length.

Before proving termination, there’s one last thing we have to take care off. Coq’s standard library does not come with a proof that removing an element from a set makes it smaller; we have to provide it ourselves. Here’s the claim encoded in Coq:

From day8.v, lines 217 through 219
217
218
219
    Theorem set_remove_length : forall (f : fin n) (s : set (fin n)),
      set_In f s ->
      length (set_remove Fin.eq_dec f s) < length s.

This reads, “if a set s contains a finite natural number f, removing f from s reduces the set’s size”. The details of the proof are not particularly interesting, and I hope that you understand intuitively why this is true. Finally, we make our termination claim.

From day8.v, lines 230 through 231
230
231
    Theorem valid_input_terminates : forall (pc : fin (S n)) (v : set (fin n)) (acc : t),
      (exists pc', run_noswap inp (pc, v, acc) pc').

It’s quite a strong claim - given any program counter, set of valid addresses, and accumulator, a valid input program will terminate. Let’s take a look at the proof.

From day8.v, lines 232 through 234
232
233
234
    Proof.
      intros pc v. generalize dependent pc.
      induction v using list_length_induction.

We use intros again. However, it brings in variables in order, and we really only care about the second variable. We thus intros the first two, and then “put back” the first one using generalize dependent. Then, we proceed by induction on length, as seen above.

From day8.v, lines 235 through 236
235
236
      intros pc acc.
      destruct (valid_input_progress pc l acc) as [[_ Hd]|[pc' [Hw [[_ Hst]|[pc'' [acc'' [Hin Hst]]]]]]].

Now we’re in the “inductive step”. Our inductive hypothesis is that any set of valid addresses smaller than the current one will guarantee that the program will terminate. We must show that using our set, too, will guarantee termination. We already know that a valid input, given a state, can have one of three possible outcomes: “ok” termination, “failed” termination, or a “step”. We use destruct to take a look at each of these in turn. The first two cases (“ok” termination and “failed” termination) are fairly trivial:

From day8.v, lines 237 through 240
237
238
239
240
      - (* We're done. *)
        eexists. apply run_noswap_ok. assumption.
      - (* We're stuck. *)
        eexists. apply run_noswap_fail. assumption.

We basically connect the dots between the premises (in a form like done) and the corresponding inference rule (run_noswap_ok). The more interesting case is when we can take a step.

From day8.v, lines 241 through 253
241
242
243
244
245
246
247
248
249
250
251
252
253
      - (* We can make a step. This will remove our current PC from the valid list, *)
        edestruct (H (set_remove Fin.eq_dec pc' l)).
        (* Since the PC must be in the list, removing it makes the list smaller. *)
        apply (set_remove_length _ _ Hin).
        (* Without the current PC, our valid set shrinks.
           Since this is the inductive step, we have assumed
           that programs with smaller sets of valid PCs always
           terminate. Thus, after we make the step, we're done. *)
        exists x. subst. eapply run_noswap_trans.
        + auto.
        + apply Hst.
        + apply H0.
    Qed.

Since we know we can take a step, we know that we’ll be removing the current program counter from the set of valid addresses. This set must currently contain the present program counter (since otherwise we’d have “failed”), and thus will shrink when we remove it. This, in turn, lets us use the inductive hypothesis: it tells us that no matter the program counter or accumulator, if we start with this new “shrunk” set, we will terminate in some state. Coq’s constructive nature helps us here: it doesn’t just tells us that there is some state in which we terminate - it gives us that state! We use edestruct to get a handle on this final state, which Coq automatically names x. At this time Coq still isn’t convinced that our new set is smaller, so we invoke our earlier set_remove_length theorem to placate it.

We now have all the pieces: we know that we can take a step, removing the current program counter from our current set. We also know that with that newly shrunken set, we’ll terminate in some final state x. Thus, all that’s left to say is to apply our “step” rule. It asks us for three things:

  1. That the current program counter is in the set. We’ve long since established this, and auto takes care of that.
  2. That a step is possible. We’ve already established this, too, since we’re in the “can take a step” case. We apply Hst, the hypothesis that confirms that we can, indeed, step.
  3. That we terminate after this. The x we got from our induction hypothesis came with a proof that running with the “next” program counter and accumulator will result in termination. We apply this proof, automatically named H0 by Coq.

And that’s it! We’ve proved that a program terminates no matter what. This has also (almost!) given us a solution to part 1. Consider the case in which we start with program counter 0, accumulator 0, and the “full” set of allowed program counters. Since our proof works for all configurations, it will also work for this one. Furthermore, since Coq proofs are constructive, this proof will return to us the final program counter and accumulator! This is precisely what we’d need to solve part 1.

But wait, almost? What’s missing? We’re missing a few implementation details:

Well, I seem to have covered all the implementation details. Why not just go ahead and solve the problem? I tried, and ran into two issues:

So, we “theoretically” have a solution to part 1, down to the algorithm used to compute it and a proof that our algorithm works. In “reality”, though, we can’t actually use this solution to procure an answer. Like we did with day 1, we’ll have to settle for only a proof.

Let’s wrap up for this post. It would be more interesting to devise and formally verify an algorithm for part 2, but this post has already gotten quite long and contains a lot of information. Perhaps I will revisit this at a later time. Thanks for reading!