It rained in Sunriver on New Year’s Eve, and it continued to rain for the next couple of days. So, instead of going skiing as planned, to the dismay of my family and friends, I spent the majority of those days working on the third language for homework 3. It was quite the language, too - the homework has three problems, each of which has a solution independent of the others. I invite you to join me in my descent into madness as we construct another language.
Let’s take a look at the three homework problems. The first two are related, but are solved using a different technique:
This problem requires us to find the
k numbers closest to some
query (which I will call
n) from a list
xs. The list isn’t sorted, and the
problem must run in linear time. Sorting the list would require
The in this expression is not the same as the query
but rather the length of the list. In fact, I have not yet assigned
the length of the input
xs to any variable. If we say that
is a number that denotes that length, the proper expression
for the complexity is .
Thus, we have to take another route, which should
already be familiar: quickselect. Using quickselect, we can find the
closest number, and then collect all the numbers that are closer than the
closest number. So, we need a language that:
- Supports quickselect (and thus, list partitioning and recursion).
- Supports iteration,
Why would we need to iterate multiple times? Note that we could have a list
of numbers that are all the same,
[1,1,1,1,1]. Then, we'll need to know how many of the numbers equally close as the
kth element we need to include, which will require another pass through the list. ]
That’s a good start. Let’s take a look at the second problem:
This problem really is easier. We have to find the position of the closest element, and then try expand towards either the left or right, depending on which end is better. This expansion will take several steps, and will likely require a way to “look” at a given part of the list. So let’s add two more rules. We need a language that also:
- Supports looping control flow, such as
- [note: We could, of course, simply use list indexing. But then, we'd just be making a simple imperative language, and that's boring. So let's play around with our design a little, and experimentally add such a "list view" component. ] (like an abstraction over indexing).
This is shaping up to be a fun language. Let’s take a look at the last problem:
This problem requires more iterations of a list. We have several [note: I always make the language before I write the post, since a lot of design decisions change mid-implementation. I realize now that "cursors" would've been a better name for this language feature, but alas, it is too late. ] looking into the list, and depending if the values at each of the cursors add up, we do or do not add a new tuple to a list. So, two more requirements:
- The “cursors” must be able to interact.
- The language can represent [note: We could, of course, hack some other way to return a list of tuples, but it turns out tuples are pretty simple to implement, and help make for nicer programming in our language. ]
I think we’ve gathered what we want from the homework. Let’s move on to the language!
As is now usual, let’s envision a solution to the problems in our language. There
are actually quite a lot of functions to look at, so let’s see them one by one.
First, let’s look at
After the early return, the first interesting part of the language is the use of what I have decided to call a list traverser. The list traverser is a generalization of a list index. Whenever we use a list index variable, we generally use the following operations:
- Initialize: we set the list index to some initial value, such as 0.
- Step: If we’re walking the list from left to right, we increment the index. If we’re walking the list from right to left, we decrement the index.
- Validity Check: We check if the index is still valid (that is, we haven’t gone past the edge of the list).
- Access: Get the element the cursor is pointing to.
A fun fact is that we've just rediscovered C++
containers and their iterators provide us with the operations I described:
We can initialize an iterator like
auto it = list.begin(). We
can step the iterator using
it++. We can check its validity
it != list.end(), and access what it's pointing to using
*it. While C++ uses templates and inheritance for this,
we define a language feature specifically for lists.
describes these operations. The declartion for the
traverser creates a “cursor” over the list
xs, that goes between the 0th
and last elements of
xs. The declaration for the
pivot traverser creates
a “cursor” over the list
xs that jumps around random locations in the list.
The next interesting part of the language is a traverser macro. This thing,
that looks like a function call (but isn’t), performs an operation on the
cursor. For instance,
pop! removes the element at the cursor from the list,
bisect! categorizes the remaining elements in the cursor’s list
into two lists, using a boolean-returning lambda (written in Java syntax).
Note that this implementation of
qselect takes a function
c, which it
uses to judge the actual value of the number. This is because our
won’t be finding the smallest number, but the number with the smallest difference
n will be factored in via the function.
Next up, let’s take a look at the function that uses
Like we discussed, it finds the
kth closest element (calling it
and counts how many elements that are equal need to be included,
by setting the number to
k at first, and subtracting 1 for every number
it encounters that’s closer than
min. Notice that we use the
step! macros, which implement the operations we described above. Notice
that the user doesn’t deal with adding and subtracting numbers, and doing
comparisons. All they have to do is ask “am I still good to iterate?”
Next, let’s take a look at
closestSorted, which will require more
The first new macro is
canstep!. This macro just verifies that
the traverser can make another step. We need this for the “reverse” iterator,
which indicates the lower bound of the range of numbers we want to return,
subset! (which itself is just Python’s slice, like
xs[a:b]), uses an inclusive bottom
index, and thus, we can’t afford to step it before knowing that we can, and that
it’s a better choice after the step.
Similarly, we have the
at!(t, i) macro, which looks at the
t, with offset
We have two loops. The first loop runs as long as we can expand the range in both directions, and picks the better direction at each iteration. The second loop runs as long as we still want more numbers, but have already hit the edge of the list on the left or on the right.
Finally, let’s look at the solution to
I won’t go in depth, but notice that the expression in the
traverser declaration can access another traverser. We treat
as a feature the fact that this expression isn’t immediately evaluated at the place
of the traverser declaration. Rather, every time that a comparison for a traverser
operation is performed, this expression is re-evaluated. This allows us to put
dynamic bounds on traversers
z, one of which must not exceed the other.
Note also a new keyword that was just used:
sorted. This is a harmless little
language feature that automatically calls
.sort() on the first argument of
This is more than enough to work with. Let’s move on to the implementation.
Again, let’s not go too far into the details of implementing the language from scratch. Instead, let’s take a look into specific parts of the language that deserve attention.
Revenge of the State Monad
Our previous language was, indeed, a respite from complexity. Translation was straightforward, and the resulting expressions and statements were plugged straight into a handwritten AST. We cannot get away with this here; the language is powerful enough to implement three list-based problems, which comes at the cost of increased complexity.
We need, once again, to generate temporary variables. We also need to keep track of
which variables are traversers, and the properties of these traversers, throughout
each function of the language. We thus fall back to using
There’s one part of the state tuple that we haven’t yet explained: the list of statements.
Recall that our translation function for expressions in the first homework had the type:
translateExpr :: Expr -> Translator ([Py.PyStmt], Py.PyExpr)
We then had to use
do-notation, and explicitly concatenate lists
of emitted statements. In this language, I took an alternative route: I made
the statements part of the state. They are thus implicitly generated and
stored in the monad, and expression generators don’t have to worry about
concatenating them. When the program is ready to use the generated statements
(say, when an
if-statement needs to use the statements emitted by the condition
expression), we retrieve them from the monad:
I should note, for transparency, that there’s a bug in my use of this function.
When I compile
if-statements, I accidentally place statements generated by
the condition into the body of the
if. This bug doesn’t manifest
in the solutions to the homework problems, and so I decided not to spend any more
time on fixing it.
Validating Traverser Declarations
We declare two separate types that hold traverser data. The first is a kind of “draft”
This record holds all possible configurations of a traverser
that occur as the program is iterating through the various
key: value pairs in
the declaration. For instance, at the very beginning of processing a traverser declaration,
our program will use a “default”
TraverserData, with all fields set to
their default value. This value will then be modified by the first key/value pair,
changing, for instance, the list that the traverser operates on. This new modified
TraverserData will then be modified by the next key/value pair, and so on. Doing
this with every key/value pair (called an option in the below snippet)
is effectively a foldl operation.
The data may not have all the required fields until the very end, and its type
Maybe String here,
Maybe TraverserBounds there. We don’t
want to deal with unwrapping the
Maybe a values every time we use the traverser,
especially if we’ve done so before. So, we define a
that does not have
Maybe arguments, and thus, has all the required data. At the
end of a traverser declaration, we attempt to translate a
fail if we can’t, and storing the
into the state otherwise:
Then, every time we retrieve a traverser from the state, define a lookup monadic operation like this:
I didn’t call them macros for no reason. Clearly, we don’t want to generate
In fact, there's no easy way to do this at all. Python's integers (if we choose to
represent our traversers using integers), are immutable. Furthermore, unlike C++,
where passing by reference allows a function to change its parameters "outside"
the call, Python offers no way to reassign a different value to a variable given
to a function.
For an example use of C++'s pass-by-reference mechanic, consider
it's a function, but it modifies the two variables given to it. There's no
way to generically implement such a function in Python.
We also can’t allow arbitrary expressions to serve as traversers:
our translator keeps some context about which variables are traversers, what their
bounds are, and how they behave. Thus, calls to traverser macros are very much macros:
they operate on AST nodes, and require that their first argument is a variable,
named like the traverser. We use the
requireTraverser monadic operation
to get the traverser associated with the given variable name, and then perform
the operation as intended. The
at!(t) operation is straightforward:
at!(t,i) is less so, since it deals with the intricacies of accessing
the list at either a positive of negative offset, depending on the direction
of the traverser. We implement a function to properly generate an expression for the offset:
We then implement
at!(t,i) as follows:
The most complicated macro is
bisect!. It must be able to step the traverser,
and also return a tuple of two lists that the bisection yields. We also
prefer that it didn’t pollute the environment with extra variables. To
achieve this, we want
bisect! to be a function call. We want this
function to implement the iteration and list construction.
bisect!, by definition, takes a lambda. This lambda, in our language, is declared
in the lexical scope in which
bisect! is called. Thus, to guarantee correct translation,
we must do one of two things:
- Translate 1-to-1, and create a lambda, passing it to a fixed
bisectfunction declared elsewhere.
- Translate to a nested function declaration, [note: Inlining, in this case, means replacing a call to a function with the function's body. We do this to prevent the overhead of calling a function, which typically involves pushing on a stack and other extraneous work. If our function is simple, like a simple comparison, it doesn't make sense to spend the effort calling it. ]
Since I quite like the idea of inlining a lambda, let’s settle for that. To do this,
we pull a fresh temporary variable and declare a function, into which we place
the traverser iteration code, as well as the body of the lambda, with the variable
substituted for the list access expression.
Reading the lexical scope is one thing, but modifying it is another. To prevent
accidental changes to the variables outside a nested function, Python assumes
that variables assigned inside the function body are local to the function. Thus, to make
sure changing our variable (the traverser index) has an effect outside the function
(as it should) we must include the
nonlocal keyword, telling
Python that we're not declaring a new, local variable, but mutating the old one.
Let’s see what the compiler spits out:
from bisect import bisect import random def qselect(xs,k,c): if xs==: return 0 bisector = 0 pivot = random.randrange(len(xs)) pivotE = xs.pop(pivot) def temp1(): nonlocal bisector l =  r =  while bisector<len(xs): if c(xs[bisector])<c(pivotE): l.append(xs[bisector]) else: r.append(xs[bisector]) bisector = bisector+1 return (l, r) (leftList,rightList) = temp1() if k>len(leftList)+1: return qselect(rightList, k-len(leftList)-1, c) elif k==len(leftList)+1: return pivotE else: return qselect(leftList, k, c) def closestUnsorted(xs,k,n): min = qselect(list(xs), k, (lambda x: abs(x-n))) out =  countEqual = k iter = 0 while iter<len(xs): if abs(xs[iter]-n)<abs(min-n): countEqual = countEqual-1 iter = iter+1 0 iter = 0 while iter<len(xs): if abs(xs[iter]-n)==abs(min-n) and countEqual>0: countEqual = countEqual-1 out = out+[xs[iter]] elif abs(xs[iter]-n)<abs(min-n): out = out+[xs[iter]] iter = iter+1 0 return out def closestSorted(xs,k,n): start = bisect(xs, n) counter = 0 left = start right = start while counter!=k and left-1*1>=0 and right<len(xs): if abs(xs[left-1*1]-n)<abs(xs[right]-n): left = left-1 0 else: right = right+1 0 counter = counter+1 while counter!=k and (left-1*1>=0 or right<len(xs)): if left-1*1>=0: left = left-1 0 else: right = right+1 0 counter = counter+1 return xs[(left):(right)] def xyz(xs,k): xs.sort() x = 0 dest =  while x<len(xs): z = x+2 y = x+1 while y<z and z<len(xs): if xs[x]+xs[y]==xs[z]: dest = dest+[(xs[x], xs[y], xs[z])] z = z+1 0 elif xs[x]+xs[y]>xs[z]: z = z+1 0 else: y = y+1 0 x = x+1 0 return dest
Observe that the generated code just uses indices,
-, and various comparison operators.
Our traverser is an example of a zero cost abstraction, a feature that, conceptually,
operates at a higher level, making us no longer worry about adding, subtracting, and
comparing numbers, while, in the final output, not damaging the performance of safety
of the code. Also observe the various
0 standalone statements. This is an issue
with the translator: traverser macros may not always yield an expression, but
the type of
translateStmt effectively requires one. Thus,
when a macro doesn’t generate anything useful, we give it the placeholder expression
That concludes this third post in the series. I hope to see you in the next one!