Personal Software with the Help of LLMs

Table of Contents

In the previous post in this series, I wrote about a little utility I created for detecting underlined words in a book and creating vocabulary study material for them. Like I mentioned earlier, this was one of my earliest experiences with LLM-driven development, and I think it shaped my outlook on the technology quite a bit. For me, the bottom line is this: with LLMs, I was able to rapidly solve a problem that was holding me back in another area of my life. My goal was never to “produce software”, but to “acquire vocabulary”, and, viewed from this perspective, I think the experience has been a colossal success.

As someone who works on software, I am always reminded that end-users rarely care about the technology as much as we technologists; they care about having their problems solved. I find taking that perspective to be challenging (though valuable) because software is my craft, and because in thinking about the solution, I have to think about the elements that bring it to life.

With LLMs, I was able — allowed? — to view things more from the end-user perspective. I didn’t know, and didn’t need to know, the API for PyMuPDF, argostranslate, or spaCy. I didn’t need to understand the PDF format. I could move one step away from the nitty-gritty and focus on the ‘why’ and the ‘what’, on the challenge of what I wanted to accomplish. I wrestled with the inherent complexity and avoided altogether the unrelated difficulties that merely happened to be there (downloading language modules; learning translation APIs; etc.)

By enabling me to do this, the LLM let me make rapid progress, and to produce solutions to problems I would’ve previously deemed “too hard” or “too tedious”. This did, however, markedly reduce the care with which I was examining the output. I don’t think I’ve ever read the code that produces the pretty colored boxes in my program’s debug output. This shift, I think, has been a divisive element of AI discourse in technical communities. I think that this has to do, at least in part, with different views on code as a medium.

The Builders and the Craftsmen

There are two perspectives through which one may view software: as a craft in and of itself, and as a means to some end. My flashcard extractor can be viewed in vastly different ways when faced from these two perspectives. In terms of craft, I think that it is at best mediocre; most of the code is generated, slightly verbose and somewhat tedious. The codebase is far from inspiring, and if I had written it by hand, I would not be particularly proud of it. In terms of product, though, I think it tells an exciting story: here I am, reading Camus again, because I was able to improve the workflow around said reading. In a day, I was able to achieve what I couldn’t muster in a year or two on my own.

The truth is, the “builder vs. craftsman” distinction is a simplifying one, another in the long line of “us vs. them” classifications. Any one person is capable of being any combination of these two camps at any given time. Indeed, different sorts of software demand to be viewed through different lenses. I will still treat work on my long-term projects as a craft, because I will come back to it again and again, and because our craft has evolved to engender stability and maintainability.

However, I am more than happy to settle for ‘underwhelming’ when it means an individual need of mine can be addressed in record time. I think this gives rise to a new sort of software: highly individual, explicitly non-robust, and treated differently from software crafted with deliberate thought and foresight.

Personal Software

I think as time goes on, I am becoming more and more convinced by the idea of “personal software”. One might argue that much of the complexity in many pieces of software is driven by the need of that software to accommodate the diverse needs of many users. Still, software remains somewhat inflexible and unable to accommodate individual needs. Features or uses that demand changes at the software level move at a slower pace: finite developer time needs to be spent analyzing what users need, determining the costs of this new functionality, choosing which of the many possible requests to fulfill. On the other hand, software that enables the users to build their customizations for themselves, by exposing numerous configuration options and abstractions, becomes, over time, very complicated to grasp.

Now, suppose that the complexity of such software scales superlinearly with the number of features it provides. Suppose also that individual users leverage only a small subset of the software’s functionality. From these assumptions it would follow that individual programs, made to serve a single user’s need, would be significantly less complicated than the “whole”. By definition, these programs would also be better tailored to the users’ needs. With LLMs, we’re getting to a future where this might be possible.

I think that my flashcard generator is an early instance of such software. It doesn’t worry about various book formats, or various languages, or various page layouts. The heuristic was tweaked to fit my use case, and now works 100% of the time. I understand the software in its entirety. I thought about sharing it — and, in a way, I did, since it’s open source — but realized that outside of the constraints of my own problem, it likely will not be of that much use. I could experiment with more varied constraints, but that would turn it back into the sort of software I discussed above: general, robust, and complex.

Today, I think that there is a whole class of software that is amenable to being “personal”. My flashcard generator is one such piece of software; I imagine file-organization (as served by many “bulk rename and move” pieces of software out there), video wrangling (possible today with ffmpeg’s myriad of flags and switches), and data visualization to be other instances of problems in that class. I am merely intuiting here, but if I had to give a rough heuristic, it would be problems that:

fulfill a short-frequency need, because availability, deployment, etc. significantly raises the bar for quality.
- e.g., I collect flashcards once every two weeks; I organize my filesystem once a month; I don’t spend nearly enough money to want to regenerate cash flow charts very often
have an “answer” that’s relatively easy to assess, because LLMs are not perfect and iteration must be possible and easy.
- e.g., I can see that all the underlined words are listed in my web app; I know that my files are in the right folders, named appropriately, by inspection; my charts seem to track with reality
have a relatively complex technical implementation, because why would you bother invoking an LLM if you can “just” click a button somewhere?
- e.g., extracting data from PDFs requires some wrangling; bulk-renaming files requires some tedious and possibly case-specific pattern matching; cash flow between N accounts requires some graph analysis
have relatively low stakes, again, because LLMs are not perfect, and nor is (necessarily) one’s understanding of the problem.
- e.g., it’s OK if I miss some words I underlined; my cash flow charts only give me an impression of my spending;
- I recognize that moving files is a potentially destructive operation.

I dream of a world in which, to make use of my hardware, I just ask, and don’t worry much about languages, frameworks, or sharing my solution with others — that last one because they can just ask as well.

The Unfair Advantage of Being Technical

I recognize that my success described here did not come for free. There were numerous parts of the process where my software background helped me get the most out of Codex.

For one thing, writing software trains us to think precisely about problems. We learn to state exactly what we want, to decompose tasks into steps, and to intuit the exact size of these steps; to know what’s hard and what’s easy for the machine. When working with an LLM, these skills make it possible to hit the ground running, to know what to ask and to help pluck out a particular solution from the space of various approaches. I think that this greatly accelerates the effectiveness of using LLMs compared to non-technical experts.

For another, the boundary between ‘manual’ and ‘automatic’ is not always consistent. Though I didn’t touch any of the PyMuPDF code, I did need to look fairly closely at the logic that classified my squiggles as “underlines” and found associated words. It was not enough to treat LLM-generated code as a black box.

Another advantage software folks have when leveraging LLMs is the established rigor of software development. LLMs can and do make mistakes, but so do people. Our field has been built around reducing these mistakes’ impact and frequency. Knowing to use version control helps turn the pathological downward spiral of accumulating incorrect tweaks into monotonic, step-wise improvements. Knowing how to construct a test suite and thinking about edge cases can provide an agent LLM the grounding it needs to iterate rapidly and safely.

In this way, I think the dream of personal software is far from being realized for the general public. Without the foundation of experience and rigor, LLM-driven development can easily devolve into a frustrating and endless back-and-forth, or worse, successfully build software that is subtly and convincingly wrong.

The Shoulders of Giants

The only reason all of this was possible is that the authors of PyMuPDF, genanki, spaCy, and argos-translate made them available for me to use from my code. These libraries provided the bulk of the functionality that Codex and I were able to glue into a final product. It would be a mistake to forget this, and to confuse the sustained, thoughtful efforts of the people behind these projects for the one-off, hyper-specific software I’ve been talking about.

We need these packages, and others like them, to provide a foundation for the things we build. They bring stability, reuse, and the sort of cohesion that is not possible through an amalgamation of home-grown personal scripts. In my view, something like spaCy is to my flashcard script as a brick is to grout. There is a fundamental difference.

I don’t know how LLMs will integrate into the future of large-scale software development. The discipline becomes something else entirely when the constraints of “personal software” I floated above cease to apply. Though LLMs can still enable doing what was previously too difficult, tedious, or time consuming (like my little ‘underline visualizer’), it remains to be seen how to integrate this new ease into the software lifecycle without threatening its future.

Daniel's Blog