Ever since I started the compiler series, I began to include more and more fragments of code into my blog. I didn’t want to be copy-pasting my code between my project and my Markdown files, so I quickly wrote up a Hugo shortcode to pull in other files in the local directory. I’ve since improved on this some more, so I thought I’d share what I created with others.
Including Entire Files and Lines
My needs for snippets were modest at first. For the most part, I had a single code file that I wanted to present, so it was acceptable to plop it in the middle of my post in one piece. The shortcode for that was quite simple:
{{ highlight (readFile (printf "code/%s" (.Get 1))) (.Get 0) "" }}
This leverages Hugo’s built-in highlight
function to provide syntax highlighting to the included snippet. Hugo
doesn’t guess at the language of the code, so you have to manually provide
it. Calling this shortcode looks as follows:
{{< codeblock "C++" "compiler/03/type.hpp" >}}
Note that this implicitly adds the code/
prefix to all
the files I include. This is a personal convention: I want
all my code to be inside a dedicated directory.
Of course, including entire files only takes you so far. What if you only need to discuss a small part of your code? Alternaitvely, what if you want to present code piece-by-piece, in the style of literate programming? I quickly ran into the need to do this, for which I wrote another shortcode:
{{ $s := (readFile (printf "code/%s" (.Get 1))) }}
{{ $t := split $s "\n" }}
{{ if not (eq (int (.Get 2)) 1) }}
{{ .Scratch.Set "u" (after (sub (int (.Get 2)) 1) $t) }}
{{ else }}
{{ .Scratch.Set "u" $t }}
{{ end }}
{{ $v := first (add (sub (int (.Get 3)) (int (.Get 2))) 1) (.Scratch.Get "u") }}
{{ if (.Get 4) }}
{{ .Scratch.Set "opts" (printf ",%s" (.Get 4)) }}
{{ else }}
{{ .Scratch.Set "opts" "" }}
{{ end }}
{{ highlight (delimit $v "\n") (.Get 0) (printf "linenos=table,linenostart=%d%s" (.Get 2) (.Scratch.Get "opts")) }}
This shortcode takes a language and a filename as before, but it also takes
the numbers of the first and last lines indicating the part of the code that should be included. After
splitting the contents of the file into lines, it throws away all lines before and
after the window of code that you want to include. It seems to me (from my commit history)
that Hugo’s after
function (which should behave
similarly to Haskell’s drop
) doesn’t like to be given an argument of 0
.
I had to add a special case for when this would occur, where I simply do not invoke after
at all.
The shortcode can be used as follows:
{{< codelines "C++" "compiler/04/ast.cpp" 19 22 >}}
To support a fuller range of Hugo’s functionality, I also added an optional argument that accepts Hugo’s Chroma settings. This way, I can do things like highlight certain lines in my code snippet, which is done as follows:
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 31 39 "hl_lines=7 8 9" >}}
Note that the hl_lines
field doesn’t seem to work properly with linenostart
, which means
that the highlighted lines are counted from 1 no matter what. This is why in the above snippet,
although I include lines 31 through 39, I feed lines 7, 8, and 9 to hl_lines
. It’s unusual,
but hey, it works!
Linking to Referenced Code
Some time after implementing my initial system for including lines of code,
I got an email from a reader who pointed out that it was hard for them to find
the exact file I was referencing, and to view the surrounding context of the
presented lines. To address this, I decided that I’d include the link
to the file in question. After all, my website and all the associated
code is on a Git server I host,
so any local file I’m referencing should – assuming it was properly committed –
show up there, too. I hardcoded the URL of the code
directory on the web interface,
and appended the relative path of each included file to it. The shortcode came out as follows:
{{ $s := (readFile (printf "code/%s" (.Get 1))) }}
{{ $t := split $s "\n" }}
{{ if not (eq (int (.Get 2)) 1) }}
{{ .Scratch.Set "u" (after (sub (int (.Get 2)) 1) $t) }}
{{ else }}
{{ .Scratch.Set "u" $t }}
{{ end }}
{{ $v := first (add (sub (int (.Get 3)) (int (.Get 2))) 1) (.Scratch.Get "u") }}
{{ if (.Get 4) }}
{{ .Scratch.Set "opts" (printf ",%s" (.Get 4)) }}
{{ else }}
{{ .Scratch.Set "opts" "" }}
{{ end }}
<div class="highlight-group">
<div class="highlight-label">From <a href="https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/{{ .Get 1 }}">{{ path.Base (.Get 1) }}</a>,
{{ if eq (.Get 2) (.Get 3) }}line {{ .Get 2 }}{{ else }} lines {{ .Get 2 }} through {{ .Get 3 }}{{ end }}</div>
{{ highlight (delimit $v "\n") (.Get 0) (printf "linenos=table,linenostart=%d%s" (.Get 2) (.Scratch.Get "opts")) }}
</div>
This results in code blocks like the one in the image below. The image
is the result of the codelines
call for the Idris language, presented above.
I got a lot of mileage out of this setup . . . until I wanted to include code from other git repositories. For instance, I wanted to talk about my Advent of Code submissions, without having to copy-paste the code into my blog repository!
Code from Submodules
My first thought when including code from other repositories was to use submodules.
This has the added advantage of “pinning” the version of the code I’m talking about,
which means that even if I push significant changes to the other repository, the code
in my blog will remain the same. This, in turn, means that all of my codelines
shortcodes will work as intended.
The problem is, most Git web interfaces (my own included) don’t display paths corresponding to submodules. Thus, even if all my code is checked out and Hugo correctly pulls the selected lines into its HTML output, the links to the file remain broken!
There’s no easy way to address this, particularly because different submodules
can be located on different hosts! The Git URL used for a submodule is
not known to Hugo (since, to the best of my knowledge, it can’t run
shell commands), and it could reside on dev.danilafe.com
, or github.com
,
or elsewhere. Fortunately, it’s fairly easy to tell when a file is part
of a submodule, and which submodule that is. It’s sufficient to find
the longest submodule path that matches the selected file. If no
submodule path matches, then the file is part of the blog repository,
and no special action is needed.
Of course, this means that Hugo needs to be made aware of the various submodules in my repository. It also needs to be aware of the submodules inside those submodules, and so on: it needs to be recursive. Git has a command to list all submodules recursively:
git submodule status --recursive
However, this only prints the commit, submodule path, and the upstream branch. I don’t think there’s a way to list the remotes’ URLs with this command; however, we do need the URLs, since that’s how we create links to the Git web interfaces.
There’s another issue: how do we let Hugo know about the various submodules, even if we can find them? Hugo can read files, but doing any serious text processing is downright impractical. However, Hugo itself is not able to run commands, so it needs to be able to read in the output of another command that can find submodules.
I settled on using Hugo’s params
configuration option. This
allows users to communicate arbitrary properties to Hugo themes
and templates. In my case, I want to communicate a collection
of submodules. I didn’t know about TOML’s inline tables, so
I decided to represent this collection as a map of (meaningless)
submodule names to tables:
[params]
[params.submoduleLinks]
[params.submoduleLinks.aoc2020]
url = "https://dev.danilafe.com/Advent-of-Code/AdventOfCode-2020/src/commit/7a8503c3fe1aa7e624e4d8672aa9b56d24b4ba82"
path = "aoc-2020"
Since it was seemingly impossible to wrangle Git into outputting
all of this information using one command, I decided
to write a quick Ruby script to generate a list of submodules
as follows. I had to use cd
in one of my calls to Git
because Git’s --git-dir
option doesn’t seem to work
with submodules, treating them like a “bare” checkout.
I also chose to use an allowlist of remote URLs,
since the URL format for linking to files in a
particular repository differs from service to service.
For now, I only use my own Git server, so only dev.danilafe.com
is allowed; however, just by adding elsif
s to my code,
I can add other services in the future.
puts "[params]"
puts " [params.submoduleLinks]"
def each_submodule(base_path)
`cd #{base_path} && git submodule status`.lines do |line|
hash, path = line[1..].split " "
full_path = "#{base_path}/#{path}"
url = `git config --file #{base_path}/.gitmodules --get 'submodule.#{path}.url'`.chomp.delete_suffix(".git")
safe_name = full_path.gsub(/\/|-|_\./, "")
if url =~ /dev.danilafe.com/
file_url = "#{url}/src/commit/#{hash}"
else
raise "Submodule URL #{url.dump} not in a known format!"
end
yield ({ :path => full_path, :url => file_url, :name => safe_name })
each_submodule(full_path) { |m| yield m }
end
end
each_submodule(".") do |m|
next unless m[:path].start_with? "./code/"
puts " [params.submoduleLinks.#{m[:name].delete_prefix(".code")}]"
puts " url = #{m[:url].dump}"
puts " path = #{m[:path].delete_prefix("./code/").dump}"
end
I pipe the output of this script into a separate configuration file
called config-gen.toml
, and then run Hugo as follows:
hugo --config config.toml,config-gen.toml
Finally, I had to modify my shortcode to find and handle the longest submodule prefix. Here’s the relevant portion, and you can view the entire file here.
{{ .Scratch.Set "bestLength" -1 }}
{{ .Scratch.Set "bestUrl" (printf "https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/%s" (.Get 1)) }}
{{ $filePath := (.Get 1) }}
{{ $scratch := .Scratch }}
{{ range $module, $props := .Site.Params.submoduleLinks }}
{{ $path := index $props "path" }}
{{ $bestLength := $scratch.Get "bestLength" }}
{{ if and (le $bestLength (len $path)) (hasPrefix $filePath $path) }}
{{ $scratch.Set "bestLength" (len $path) }}
{{ $scratch.Set "bestUrl" (printf "%s%s" (index $props "url") (strings.TrimPrefix $path $filePath)) }}
{{ end }}
{{ end }}
And that’s what I’m using at the time of writing!
Conclusion
My current system for code includes allows me to do the following things:
- Include entire files or sections of files into the page. This saves me from having to copy and paste code manually, which is error prone and can cause inconsistencies.
- Provide links to the files I reference on my Git interface. This allows users to easily view the entire file that I’m talking about.
- Correctly link to files in repositories other than my blog repository, when they are included using submodules. This means I don’t need to manually copy and update code from other projects.
I hope some of these shortcodes and script come in handy for someone else. Thank you for reading!