Due to something of a streak of bad luck when it came to computers, I spent a significant amount of time using a Linux-based Chromebook, and then a Pinebook Pro. It was, in some way, enlightening. The things that I used to take for granted with a ‘powerful’ machine now became a rare luxury: StackOverflow, and other relatively static websites, took upwards of ten seconds to finish loading. On Slack, each of my keypresses could take longer than 500ms to appear on the screen, and sometimes, it would take several seconds. Some websites would present me with a white screen, and remain that way for much longer than I had time to wait. It was awful.
- Make it so that the mathematics are rendered on the back end.
I’ve previously written about math rendering, and made the observation that MathJax’s output for LaTeX is identical on every computer. From the MathJax 2.6 change log:
Improved CommonHTML output. The CommonHTML output now provides the same layout quality and MathML support as the HTML-CSS and SVG output. It is on average 40% faster than the other outputs and the markup it produces are identical on all browsers and thus can also be pre-generated on the server via MathJax-node.
It seems absurd, then, to offload this kind of work into the users, to be done over and over again. As should be clear from the title of this post, this made me settle for the second option: it was obviously within reach, especially for a statically-generated website like mine, to render math on the backend.
I settled on the following architecture:
- As before, I would generate my pages using Hugo.
- I would use the KaTeX NPM package to render math.
- To build the website no matter what system I was on, I would use Nix.
It so happens that Nix isn’t really required for using my approach in general. I will give my setup here, but feel free to skip ahead.
Setting Up A Nix Build
default.nix file looks like this:
node2nix to generate the
required-packages.nix file, which allows me,
even from a sandboxed Nix build, to download and install
npm packages. This is needed
so that I have access to the
katex binary at build time. I fed the following JSON file
The Ruby script I wrote for this (more on that soon) required the
nokogiri gem, which
I used for traversing the HTML generated for my site. Hugo was obviously required to
generate the HTML.
Converting LaTeX To HTML
After my first post complaining about the state of mathematics on the web, I received the following email (which the author allowed me to share):
Sorry for having a random stranger email you, but in your blog post (link) you seem to focus on MathJax’s difficulty in rendering things server-side, while quietly ignoring that KaTeX’s front page advertises server-side rendering. Their documentation (link) even shows (at least as of the time this email was sent) that it renders both HTML (to be arranged nicely with their CSS) for visuals and MathML for accessibility.
The author of the email then kindly provided a link to a page they generated using KaTeX and some Bash scripts. The math on this page was rendered at the time it was generated.
This is a great point, and KaTeX is indeed usable for server-side rendering. But I’ve seen few people who do actually use it. Unfortunately, as I pointed out in my previous post on the subject, few tools actually take your HTML page and replace LaTeX with rendered math. Here’s what I wrote about this last time:
[In MathJax,] The bigger issue, though, was that the
page2htmlprogram, which rendered all the mathematics in a single HTML page, was gone. I found
text2htmlcss, which could only render equations without the surrounding HTML. I also found
mjpage, which replaced mathematical expressions in a page with their SVG forms.
This is still the case, in both MathJax and KaTeX. The ability to render math in one step is the main selling point of front-end LaTeX renderers: all you have to do is drop in a file from a CDN, and voila, you have your math. There are no such easy answers for back-end rendering. In fact, as we will soon see, it’s not possible to just search-and-replace occurences of mathematics on your page, either. To actually get KaTeX working on the backend, you need access to tools that handle the potential variety of edge cases associated with HTML. Such tools, to my knowledge, do not currently exist.
I decided to write my own Ruby script to get the job done. From this script, I
would call the
katex command-line program, which would perform
the heavy lifting of rendering the mathematics.
There are two types of math on my website: inline math and display math.
On the command line (here are the docs),
the distinction is made using the
--display-mode argument. So, the general algorithm
is to replace the code inside the
$$...$$ with their display-rendered version,
and the code inside the
\(...\) with the inline-rendered version. I came up with
the following Ruby function:
cache argument is used to prevent re-running the
on an equation that was already rendered before (the output is the same, after all).
command is the specific shell command that we want to invoke; this would
katex -d. The
string is the math equation to render,
render_comment is the string to print to the console instead of the equation
(so that long, display math equations are not printed out to standard out).
Then, given a substring of the HTML file, we use regular expressions
to find the
$$...$$s, and use the
on the LaTeX code inside.
There’s a bit of a trick to the final layer of this script. We want to be
really careful about where we replace LaTeX, and where we don’t. In
particular, we don’t want to go into the
code tags. Otherwise,
it wouldn’t be possible to talk about LaTeX code! I also suspect that
some captions, alt texts, and similar elements should also be left alone.
However, I don’t have those on my website (yet), and I won’t worry about
them now. Either way, because of the code tags,
we can’t just search-and-replace over the entire page; we need to be context
aware. This is where
nokogiri comes in. We parse the HTML, and iterate
over all of the ‘text’ nodes, calling
perform_katex_sub on all
of those that aren’t inside code tags.
Fortunately, this kind of iteration is pretty easy to specify thanks to something called XPath.
This was my first time encountering it, but it seems extremely useful: it’s
a sort of language for selecting XML nodes. First, you provide an ‘axis’,
which is used to specify the positions of the nodes you want to look at
relative to the root node. The axis
/ looks at the immediate children
(this would be the
html tag in a properly formatted document, I would imagine).
// looks at all the transitive children. That is, it will look at the
children of the root, then its children, and so on. There’s also the
which looks at the node itself.
After you provide an axis, you need to specify the type of node that you want to
select. We can write
code, for instance, to pick only the
from the axis we’ve chosen. We can also use
* to select any node, and we can
text() to select text nodes, such as the
Hello inside of
We can also apply some more conditions to the nodes we pick using
For us, the relevant feature here is
not(...), which allows us to
select nodes that do not match a particular condition. This is all
we need to know.
//, starting to search for nodes everywhere, not just the root of the document.
*, to match any node. We want to replace math inside of
navs, all of the
hs, and so on.
[not(self::code)], cutting out all the
/, now selecting the nodes that are immediate descendants of the nodes we’ve selected.
text(), giving us the text contents of all the nodes we’ve selected.
All in all:
Finally, we use this XPath from
I named this script
convert.rb; it’s used from inside of the Nix expression
and its builder, which we will cover below.
Tying it All Together
Finally, I wanted an end-to-end script to generate HTML pages and render the LaTeX in them. I used Nix for this, but the below script will largely be compatible with a non-Nix system. I came up with the following, commenting on Nix-specific commands:
This is it! Using the two scripts,
was able to generate my blog with the math rendered on the back-end.
Please note, though, that I had to add the KaTeX CSS to my website’s
The main caveat of my approach is performance. For every piece of
mathematics that I render, I invoke the
katex command. This incurs
the penalty of Node’s startup time, every time, and makes my approach
take a few dozen seconds to run on my relatively small site. The
better approach would be to use a NodeJS script, rather than a Ruby one,
to perform the conversion. KaTeX also provides an API, so such a NodeJS
script can find the files, parse the HTML, and perform the substitutions.
I did quite like using
nokogiri here, though, and I hope that an equivalently
Re-rendering the whole website is also pretty wasteful. I rarely change the mathematics on more than one page at a time, but every time I do so, I have to re-run the script, and therefore re-render every page. This makes sense for me, since I use Nix, and my builds are pretty much always performed from scratch. On the other hand, for others, this may not be the best solution.
The same person who sent me the original email above also pointed out
pandoc filter for KaTeX.
I do not use Pandoc, but from what I can see, this fitler relies on
Math AST nodes, and applies KaTeX to each of those. This
should work, but wasn’t applicable in my case, since Hugo’s shrotcodes
don’t mix well with Pandoc. However, it certainly seems like a workable