The first puzzle of this year’s Advent of Code was quite simple, which gave me a thought: “Hey, this feels within reach for me to formally verify!” At first, I wanted to formalize and prove the correctness of the twopointer solution. However, I didn’t have the time to mess around with the various properties of sorted lists and their traversals. So, I settled for the brute force solution. Despite the simplicity of its implementation, there is plenty to talk about when proving its correctness using Coq. Let’s get right into it!
Before we start, in the interest of keeping the post selfcontained, here’s the (paraphrased) problem statement:
Given an unsorted list of numbers, find two distinct numbers that add up to 2020.
With this in mind, we can move on to writing some Coq!
Defining the Functions
The first step to proving our code correct is to actually write the code! To start with,
let’s write a helper function that, given a number x
, tries to find another number
y
such that x + y = 2020
. In fact, rather than hardcoding the desired
sum to 2020
, let’s just use another argument called total
. The code is quite simple:


Here, is
is the list of numbers that we want to search.
We proceed by case analysis: if the list is empty, we can’t
find a match, so we return None
(the Coq equivalent of Haskell’s Nothing
).
On the other hand, if the list has at least one element y
, we see if it adds
up to total
, and return Some y
(equivalent to Just y
in Haskell) if it does.
If it doesn’t, we continue our search into the rest of the list.
It’s somewhat unusual, in my experience, to put the list argument first when writing
functions in a language with currying. However,
it seems as though Coq’s simpl
tactic, which we will use later, works better
for our purposes when the argument being case analyzed is given first.
We can now use find_matching
to define our find_sum
function, which solves part 1.
Here’s the code:


For every x
that we encounter in our input list is
, we want to check if there’s
a matching number in the rest of the list. We only search the remainder of the list
because we can’t use x
twice: the x
and y
we return that add up to total
must be different elements. We use find_matching
to try find a complementary number
for x
. If we don’t find it, this x
isn’t it, so we recursively move on to xs
.
On the other hand, if we do find a matching y
, we’re done! We return (x,y)
,
wrapped in Some
to indicate that we found something useful.
What about that (* Was buggy! *)
line? Well, it so happens that my initial
implementation had a bug on this line, one that came up as I was proving
the correctness of my function. When I wasn’t able to prove a particular
behavior in one of the cases, I realized something was wrong. In short,
my proof actually helped me find and fix a bug!
This is all the code we’ll need to get our solution. Next, let’s talk about some properties of our two functions.
Our First Lemma
When we call find_matching
, we want to be sure that if we get a number,
it does indeed add up to our expected total. We can state it a little bit more
formally as follows:
For any numbers
k
andx
, and for any list of numberis
, iffind_matching is k x
returns a numbery
, thenx + y = k
.
And this is how we write it in Coq:


The arrow, >
, reads “implies”. Other than that, I think this
property reads pretty well. The proof, unfortunately, is a little bit more involved.
Here are the first few lines:


We start with the intros is
tactic, which is akin to saying
“consider a particular list of integers is
”. We do this without losing
generality: by simply examining a concrete list, we’ve said nothing about
what that list is like. We then proceed by induction on is
.
To prove something by induction for a list, we need to prove two things:
 The base case. Whatever property we want to hold, it must
hold for the empty list, which is the simplest possible list.
In our case, this means
find_matching
searching an empty list.  The inductive case. Assuming that a property holds for any list
[b, c, ...]
, we want to show that the property also holds for the list[a, b, c, ...]
. That is, the property must remain true if we prepend an element to a list for which this property holds.
These two things combined give us a proof for all lists, which is exactly
what we want! If you don’t belive me, here’s how it works. Suppose you want
to prove that some property P
holds for [1,2,3,4]
. Given the base
case, we know that P []
holds. Next, by the inductive case, since
P []
holds, we can prepend 4
to the list, and the property will
still hold. Thus, P [4]
. Now that P [4]
holds, we can again prepend
an element to the list, this time a 3
, and conclude that P [3,4]
.
Repeating this twice more, we arrive at our desired fact: P [1,2,3,4]
.
When we write induction is
, Coq will generate two proof goals for us,
one for the base case, and one for the inductive case. We will have to prove
each of them separately. Since we have
not yet introduced the variables k
, x
, and y
, they remain
inside a forall
quantifier at that time. To be able to refer
to them, we want to use intros
. We want to do this in both the
base and the inductive case. To quickly do this, we use Coq’s ;
operator. When we write a; b
, Coq runs the tactic a
, and then
runs the tactic b
in every proof goal generated by a
. This is
exactly what we want.
There’s one more variable inside our second intros
: Hev
.
This variable refers to the hypothesis of our statement:
that is, the part on the left of the >
. To prove that A
implies B
, we assume that A
holds, and try to argue B
from there.
Here is no different: when we use intros Hev
, we say, “suppose that you have
a proof that find_matching
evaluates to Some y
, called Hev
”. The thing
on the right of >
becomes our proof goal.
Now, it’s time to look at the cases. To focus on one case at a time,
we use 
. The first case is our base case. Here’s what Coq prints
out at this time:
k, x, y : nat
Hev : find_matching nil k x = Some y
========================= (1 / 1)
x + y = k
All the stuff above the ===
line are our hypotheses. We know
that we have some k
, x
, and y
, all of which are numbers.
We also have the assumption that find_matching
returns Some y
.
In the base case, is
is just []
, and this is reflected in the
type for Hev
. To make this more clear, we’ll simplify the call to find_matching
in Hev
, using simpl in Hev
. Now, here’s what Coq has to say about Hev
:
Hev : None = Some y
Well, this doesn’t make any sense. How can something be equal to nothing?
We ask Coq this question using inversion Hev
. Effectively, the question
that inversion
asks is: what are the possible ways we could have acquired Hev
?
Coq generates a proof goal for each of these possible ways. Alas, there are
no ways to arrive at this contradictory assumption: the number of proof subgoals
is zero. This means we’re done with the base case!
The inductive case is the meat of this proof. Here’s the corresponding part of the Coq source file:


This time, the proof state is more complicated:
a : nat
is : list nat
IHis : forall k x y : nat, find_matching is k x = Some y > x + y = k
k, x, y : nat
Hev : find_matching (a :: is) k x = Some y
========================= (1 / 1)
x + y = k
Following the footsteps of our informal description of the inductive case,
Coq has us prove our property for (a :: is)
, or the list is
to which
a
is being prepended. Like before, we assume that our property holds for is
.
This is represented in the induction hypothesis IHis
. It states that if
find_matching
finds a y
in is
, it must add up to k
. However, IHis
doesn’t tell us anything about a :: is
: that’s our job. We also still have
Hev
, which is our assumption that find_matching
finds a y
in (a :: is)
.
Running simpl in Hev
gives us the following:
Hev : (if x + a =? k then Some a else find_matching is k x) = Some y
The result of find_matching
now depends on whether or not the new element a
adds up to k
. If it does, then find_matching
will return a
, which means
that y
is the same as a
. If not, it must be that find_matching
finds
the y
in the rest of the list, is
. We’re not sure which of the possibilities
is the case. Fortunately, we don’t need to be!
If we can prove that the y
that find_matching
finds is correct regardless
of whether a
adds up to k
or not, we’re good to go! To do this,
we perform case analysis using destruct
.
Our particular use of destruct
says: check any possible value for x + a ?= k
,
and create an equation Heq
that tells us what that value is. ?=
returns a boolean
value, and so destruct
generates two new goals: one where the function returns true
,
and one where it returns false
. We start with the former. Here’s the proof state:
a : nat
is : list nat
IHis : forall k x y : nat, find_matching is k x = Some y > x + y = k
k, x, y : nat
Heq : (x + a =? k) = true
Hev : Some a = Some y
========================= (1 / 1)
x + y = k
There is a new hypothesis: Heq
. It tells us that we’re currently
considering the case where ?=
evaluates to true
. Also,
Hev
has been considerably simplified: now that we know the condition
of the if
expression, we can just replace it with the then
branch.
Looking at Hev
, we can see that our prediction was right: a
is equal to y
. After all,
if they weren’t, Some a
wouldn’t equal to Some y
. To make Coq
take this information into account, we use injection
. This will create
a new hypothesis, a = y
. But if one is equal to the other, why don’t we
just use only one of these variables everywhere? We do exactly that by using
subst
, which replaces a
with y
everywhere in our proof.
The proof state is now:
is : list nat
IHis : forall k x y : nat, find_matching is k x = Some y > x + y = k
k, x, y : nat
Heq : (x + y =? k) = true
========================= (1 / 1)
x + y = k
We’re close, but there’s one more detail to keep in mind. Our goal, x + y = k
,
is the proposition that x + y
is equal to k
. However, Heq
tells us
that the function ?=
evaluates to true
. These are fundamentally different.
One talks about mathematical equality, while the other about some function ?=
defined somewhere in Coq’s standard library. Who knows  maybe there’s a bug in
Coq’s implementation! Fortunately, Coq comes with a proof that if two numbers
are equal according to ?=
, they are mathematically equal. This proof is
called eqb_nat_eq
. We tell Coq to use this with apply
. Our proof goal changes to:
true = (x + y =? k)
This is almost like Heq
, but flipped. Instead of manually flipping it and using apply
with Heq
, I let Coq do the rest of the work using auto
.
Phew! All this for the true
case of ?=
. Next, what happens if x + a
does not equal k
?
Here’s the proof state at this time:
a : nat
is : list nat
IHis : forall k x y : nat, find_matching is k x = Some y > x + y = k
k, x, y : nat
Heq : (x + a =? k) = false
Hev : find_matching is k x = Some y
========================= (1 / 1)
x + y = k
Since a
was not what it was looking for, find_matching
moved on to is
. But hey,
we’re in the inductive case! We are assuming that find_matching
will work properly
with the list is
. Since find_matching
found its y
in is
, this should be all we need!
We use our induction hypothesis IHis
with apply
. IHis
itself does not know that
find_matching
moved on to is
, so it asks us to prove it. Fortunately, Hev
tells us
exactly that, so we use assumption
, and the proof is complete! Quod erat demonstrandum, QED!
The Rest of the Owl
Here are a couple of other properties of find_matching
. For brevity’s sake, I will
not go through their proofs stepbystep. I find that the best way to understand
Coq proofs is to actually step through them in the IDE!
First on the list is find_matching_skip
. Here’s the type:


It reads: if we correctly find a number in a small list is
, we can find that same number
even if another number is prepended to is
. That makes sense: adding a number to
a list doesn’t remove whatever we found in it! I used this lemma to prove another,
find_matching_works
:


This reads, if there is an element y
in is
that adds up to k
with x
, then
find_matching
will find it. This is an important property. After all, if it didn’t
hold, it would mean that find_matching
would occasionally fail to find a matching
number, even though it’s there! We can’t have that.
Finally, we want to specify what it means for find_sum
, our solution function, to actually
work. The naive definition would be:
Given a list of integers,
find_sum
always finds a pair of numbers that add up tok
.
Unfortunately, this is not true. What if, for instance, we give find_sum
an empty list?
There are no numbers from that list to find and add together. Even a nonempty list
may not include such a pair! We need a way to characterize valid input lists. I claim
that all lists from this Advent of Code puzzle are guaranteed to have two numbers that
add up to our goal, and that these numbers are not equal to each other. In Coq,
we state this as follows:


This defines a new property, has_pair t is
(read “is
has a pair of numbers that add to t
”),
which means:
There are two numbers
n1
andn2
such that, they are not equal to each other (n1<>n2
) and the numbern1
is an element ofis
(In n1 is
) and the numbern2
is an element ofis
(In n2 is
) and the two numbers add up tot
(n1 + n2 = t
).
When making claims about the correctness of our algorithm, we will assume that this property holds. Finally, here’s the theorem we want to prove:


It reads, “for any total k
and list is
, if is
has a pair of numbers that add to k
,
then find_sum
will return a pair of numbers x
and y
that add to k
”.
There’s some nuance here. We hardly reference the has_pair
property in this definition,
and for good reason. Our has_pair
hypothesis only says that there is at least one
pair of numbers in is
that meets our criteria. However, this pair need not be the only
one, nor does it need to be the one returned by find_sum
! However, if we have many pairs,
we want to confirm that find_sum
will find one of them. Finally, here is the proof.
I will not be able to go through it in detail in this post, but I did comment it to
make it easier to read:


Coq seems happy with it, and so am I! The bug I mentioned earlier popped up on line 96.
I had accidentally made find_sum
return None
if it couldn’t find a complement
for the x
it encountered. This meant that it never recursed into the remaining
list xs
, and thus, the pair was never found at all! It this became impossible
to prove that find_some
will return Some y
, and I had to double back
and check my definitions.
I hope you enjoyed this post! If you’re interested to learn more about Coq, I strongly recommend checking out Software Foundations, a series of books on Coq written as comments in a Coq source file! In particular, check out Logical Foundations for an introduction to using Coq. Thanks for reading!