How do you explain how to write a program?
Instructional material is becoming more and more popular on the web, with thousands of programming tutorials for languages, frameworks, and technologies created on YouTube, Medium, and peole’s personal sites. And yet, there seem to be little standardization or progress towards an “effective” way. Everyone is pasting code examples, showing gists, or even sharing whole projects on GitHub. When I was writing the earliest posts on this site, I did the same. Write some code, copy paste it, be done. Write some code, link it, be done. If I’m feeling fancy, write some code, gist it, be done. It’s not unlikely for code presented in this way to become outdated and dysfunctional.
I discovered a whole new perspective when going through Software Foundations. What’s different about that book is that the line between source code and instructional text is blurred - the HTML is generated from the comments in the Coq file, and code from the Coq file is included as snippets in the book. Rather than having readers piece together the snippets from the HTML, it simply directed them to the Coq file from which the page was generated. It maintained both the benefits of a live code example, and of a textbook written to teach, not to simply explain what the code does.
This is reminiscent of Literate Programming, a style of programming in which the explanation of the program, in human-oriented order, is presented, with code as supporting material. Tools such as CWEB implement Literate Programming, allowing users to write files that are then converted into C source, and can be compiled as usual. I was intrigued by the idea, but in all honesty, found it lacking.
For one, there is the problem of an extra processing step. Compilers are written to compile C, and not CWEB files. Thus, a program must take CWEB source, convert it to C, and then a compiler must convert the C code to machine language. This doesn’t feel elegant - you’re effectively stripping the CWEB source files of the text you added to them. In technical terms, it’s not really that big of an issue - software build systems already have support for multiple processing steps, and it would be hard to CWEB a piece of software large enough that the intermediate step will cause problems.
Another issue is the lack of universality. CWEB is specialized for C. WEB, the original literate programming tool, is specialized for Pascal. There’s tools that are language agnostic, of course, such as noweb. But the Wikipedia page for noweb drops this bomb:
noweb defines a specific file format and a file is likely to interleave three different formats (noweb, latex and the language used for the software). This is not recognised by other software development tools and consequently using noweb excludes the use of UML or code documentation tools.
This may be the worst trade deal in the history of trade deals, maybe ever! By trying to explain how our code works, we sacrifce all other tooling. Worse, because Literal Programming encourages presenting code in fragments and out of order, it is particularly difficult to reason about programs in an automated setting.
When I present code to a reader, I want to write it with the use of existing tooling. I want my syntax highlighting. I want my linting. I want my build system. And in the same way, a user who is reading my code wants to be able to view it, change it, experiment with it. Furthermore, though, I want to be able to guide the reader’s attention. Text-in-comments works great for Coq, but other languages like C++, in which the order of declarations matters, may not be as suited for such an approach.
In essense, I want:
- The power of language-specific tooling, without having to extend the tooling itself
- A universal way of describing a program in any language
- A way of maintaining synchrony between the explanation and the source
I have an idea of a piece of software that can do such a thing.
A Language Server Based Tool
It is a well known problem that various editors support different languages with mixed success. The idea of the Language Server Protocol is to allow for a program (the server) to be in charge of making sense of the code, and then communicate the results to an editor. The editor, in that case, doesn’t have to do as much heavy lifting, and instead just queries the language server when it needs information.
While this technology is used for text editors, I think it can be adapted to educational texts that reference a particular codebase. I envision the following workflow:
- An author writes their tutorial/book/blog post in their markup language of choice (Markdown).
- They reference a fragment of code (a function, a variable) through a specialized syntax.
- When the HTML/LaTeX output is created, a language server is started. The language server uses information from the references in step 2 to insert code fragments into the generated output.
After each “conversion” of source text to HTML/LaTeX, the code in the generated snippets will be in sync with the codebase. At the same time, changing the source text will not require changing the source files. Finally, since language servers exist for most established languages, this sytem can work nearly out of the box, and even be added to established projects with no changes to the projects themselves.
Of course, this is just a rough idea. I’m not sure how plausible it is to include snippets with the use of Language Server Protocol. But I certainly would like to try!