Thinking about things is good, actually

I’m going to try and convince you that LLM based developer tooling (in its current state) is probably bad for your code base.

Because this is a rant, I’m gonna start in a really weird way by talking about my favorite budgeting software, YNAB. I promise there’s a semblance of a train of thought here.

Caveats:

I am not affiliated with YNAB in any way, just a very satisfied customer.
It is entirely possible the YNAB engineering team completely disagrees with me about about LLMs, even likely, you'd have to ask them.
I was recently let go in a "restructuring".
That has something to do with why I'm particularly grateful for my choice of budgeting software right now.
If you're reading this and think thoughtful criticism has a part to play in your engineering team's AI adoption, I'm up for an interview. But be warned, although I'm capable of thoughtful criticism, this is not that, it's a rant.

Let’s start with budgeting.

YNAB is software that helps you practice “zero based budgeting”. This practice is often called the “envelope method”. Basically, you make a list of all the stuff you expect or intend to spend money on. Next, you take all the cash in all your different bank accounts, and you assign each dollar to one of those specific jobs.

When you want to spend some money, you look at your list to see if you already have some set aside. If you do, great. If you don’t, but you still want to spend the money, you have to move money from another category. It forces you to look at your other options for how you could spend that money and then make a specific trade off.

That’s the basic superpower of the software. After awhile, it leads to a confidence that you’re spending your money in a way you actually want and not accidentally de-funding something more important. Their learning/marketing material (which is fabulously informative BTW) calls the feeling “spendfulness”, which I find shockingly accurate as a description if a little corny.

The rest of the functionality in the software is either in service of making this process less laborious, so you can just make a choice and go about your day, or providing ways to reflect on the outcome of these choices in the long run. Crucially though, they never automate the making the choice part.

The main criticism typically leveled against YNAB is that it’s too laborious. It encourages the user to fixate on their spending when they could probably benefit as much from automating a set of financial rules of thumb. Auto deposit paychecks into different accounts following the 50/30/20 rule, set all your bills to auto pay, and as long as you’re paying off your credit cards on time, everything will be okay. I’m not setting this up as a straw man BTW, here’s the first pros and cons of YNAB article I could find. If you scroll to the bottom, the alternatives it suggests are the 50/30/20 approach and two budgeting apps that lean much more heavily into the automated, “check in every once in awhile” approach to personal finance.

YNAB’s response to this criticism has been pretty consistent. You do an awful lot to earn money. You give your time (like 1/3 of your time on Earth), your focus, your health. It’s worth taking the time to make thoughtful, personal decisions about how you spend it.

Having tried both approaches, I’m pretty sold on YNAB’s argument. The interesting thing to me though is that, as far as I can tell, most people aren’t. I think this probably has to do with a practical, even admirable, bias for simplicity. If something is “set and forget”, it frees up vital headspace for other things in life. In other words, most people think of managing money as a chore deserving of automation. As long as the end result is “bank account go up”, they don’t want to fuss with it any further. And honestly? Fair enough. The reason I disagree is that I think “Am I spending in a way that jeopardizes my larger financial health?” and “Am I spending in a way that reflects what I actually value?” are different questions, and the set and forget approach to personal finance only really answers the first one.

In other words, in my opinion, if you can manage to create time for it, thinking about things is good, actually.

Right about now, the LLM boosters reading this are positively seething because it sounds like I’m accusing them of neglecting to apply the appropriate amount of thought to their work, and of encouraging the same thoughtlessness throughout their organizations. Just to be completely and totally clear: that is precisely what I’m saying.

It doesn’t mean I don’t like you, dear LLM booster friends, relatives, and colleagues. It just means I think you’re being willfully short sighted.

So let’s talk about LLM based developer tooling. For anybody wondering what the hell I’m talking about, LLMs are “Large Language Models”, like ChatGPT, Claude, and Mecha Hitler. Over the past few years, these tools have been increasingly used to generate code, and have been packed in formats designed to make that more practical, like Claude Code.

Note: I’ll try to be specific about the technology I’m talking about here. Colloquial use of the term “AI” generally means “the thing I’m selling”. Often, those things are really dope. Machine learning? Computer vision? Incredibly useful. What I’m specifically not so keen on is large language models and the way they’re being applied to everything.

I also won’t be addressing the serious ethical issues with these tools, like how they were built with the mass theft of intellectual property, or that we’re burning rain forests and increasing everybody’s power bills to power them, or how they’re going to destroy our retirement accounts by being so damn over hyped. I’m mostly talking about their general usefulness (or lack thereof) here.

I’ve been using these tools a lot recently, and not just because dog-fooding (kinda gross engineering term for “be your own customer”) LLM based tooling has been a stated requirement of my job for a while now. The way these tools get used ranges from the training wheels version, basically using the LLM as a roundabout way to type whatever you planned to type in the first place, to becoming a full on puppet master of multiple “autonomous” agents (spoiler alert: they’re not actually autonomous).

The branding of LLMs is designed to make them sound intelligent. Reasoning models, autonomous agents, etc. Even the term for their tendency to produce outright nonsense, hallucinating, is meant to make it sound like the way they’re processing information is in some way equivalent to how a person does. It is absolutely nothing like what a person does.

This is a very non technical description, but we have to strike a balance between accessibility and specificity here: LLMs are digital kaleidoscopes. They have billions of little mirrors for refracting their input, carefully tuned to produce whatever results the entity training the model values. But the output is fundamentally a mash up of original inputs provided during training and whatever extra input is provided in the series of prompts it references. The output can be really impressive, even fooling us into thinking it’s art. But there’s no reasoning happening, no thought, no intention. Just weights and scales.

Maybe a better metaphor is that we combined a magic eight ball with an Oura ring (those rings that read various vital signs to help you monitor different aspects of your health). The die floating inside the magic eight ball has been specifically shaved and weighted based on the non-consensual beta testing of billions of users. Your vital signs can even be read from the skin on your hands when you shake it. That data is used to send a current through the liquid the die is floating in, further influencing the probabilistic outcome of a single shake.

For the love of God don’t actually build this, its a dystopian metaphor not a pitch for a start up.

And we take this admittedly impressive trinket and promptly insert it into every aspect of our decision making and communication. We rent access to ten or twenty magic eight balls, and automate the process of shaking them whenever we need to read and respond to an email or understand and attempt to fix a bug. The underlying assertion here is that “It’s not worth my time to read that email or understand that bug, and this broken clock is right an awful lot more often than twice a day, so maybe let’s just use that”.

My counter to this patently insane state of affairs is that thinking about things is good, actually.

Let’s narrow in on programming. Human generated code exists on a spectrum as well, from “if it fits it ships” to “a thoughtful and considered approach”. I’ve generally tended toward the former, but I have (or had, I should say) this coworker who tends strongly toward the latter. Let’s call him Joseph, but that’s not his name.

Joseph does something with remarkable consistency that sounds like the absolute basics but is rare in practice: he creates decision records. Whenever he makes a decision (in his code) that isn’t glaringly obvious, he writes it down, usually as a comment in the code or on the pull request (request to make a change to the code base, sometimes called a merge request depending on your world view). He even writes down why he made that decision. He often writes down a second option he considered, and why he didn’t choose that one.

To a non programmer, I imagine this sounds really trivial, but as an industry we’ve been subjected to the idea that writing down our intentions in plain English is bad practice for decades. We’re also told that long term planning or any thoughtful design is for fools and leads to failed projects. We need to be agile, to move fast and break things. We work in sprints, we find quick wins and 80/20 solutions, we get feedback and iterate in tight loops.

But Joseph trudges on with this outdated view of engineering, that thinking about things is good, actually.

And this decision record structure is a feature of nearly every one of his PRs (pull requests). The first time I reviewed his code, I immediately thought to myself “holy shit I wish I were more like that”.

The reason this is so incredible is because of something called Chesterton’s Fence. Instead of explaining the thought experiment, I’ll just explain how engineers slowly learn to practice it, even if they don’t have this particular name for it in their head.

When you first start out as a software engineer, you see a chunk of code and think to yourself, “That obviously doesn’t need to be there. I can clean that up no problem.” So you start refactoring (rewriting code without changing it’s main functionality, like editing a piece of writing without changing the thesis) until you hit a weird edge case. You make a small concession, and continue rewriting stuff. After roughly three hours, you arrive back at the exact code that was there in the first place, with a hard won understanding of why it was written that way. You do this with different bits of code in different projects maybe a few thousand times over the course of several years, and hardly ever think to make a note about it for the next poor sap to come along.

Joseph’s decision records break the cycle. They prevent optimistic but ultimately foolish engineers like me from wasting valuable time. (FYI, Joseph was restructured too, if any recruiters out there would like to talk to him).

Folks who love using LLMs in their workflow are shouting “LLMs make this easier! You can just tell it to always explain its thought process!!”. There is no thought process sport. I’m genuinely sorry you had to find out this way. Also I didn’t really steal your nose. It was just my thumb. I really didn’t think it would upset you so much.

When you have an LLM generate code for you, you can’t write a decision record. The reason is (and I can’t emphasize this enough) by using an LLM, you have decided not to make a decision at all, but rather to do whatever the magic eight ball said. Even if the magic eight ball said to do the same thing you would have decided on! You skipped the thinking portion, and now you have no idea if it’s actually a good solution or not, or if there’s a much better solution the LLM didn’t generate for some reason. So, if you’re a good engineer like Joseph, you do what you were always going to have to do in the first place, and think about it. Or, if you don’t feel up to that, because maybe this particular piece of code isn’t all that critical and the tests are green, or maybe you’re kinda tired today and it’s almost lunch, you ship it and move on.

When the code base starts to fill up with LLM generated code, you’re in a real pickle. Let’s pretend the code it generated, through the magic of statistics and mass theft, is really properly good. It appears well thought out, and is organized coherently at the small, medium, and large scale. (This is, for the record, almost certainly not the case). You still have a massive problem. You come across a piece of code, and think, “That obviously doesn’t need to be there. I can clean that up no problem.” But you’ve been around the block. You don’t dive in and start ripping out code. You try to figure out why it’s there. But there is no why. Not really. There’s the way it happens to behave. But there’s no true intention. Nobody sat down and thought about why those characters should be written in that order in that place. It’s just what the magic eight ball said to do.

To my mind, this is maybe the most insidious aspect of these tools. They’re clarity killers. At scale, they make it functionally impossible to understand why the hell your code base is shaped that way, because there is no why.

The LLM boosters are really frothing at the mouth now. They’re screaming about how great engineers are focused on writing clear architectural docs and feeding those to agents that can do the grunt work of writing the code, but that’s not how expertise works. An architect that’s never held a brick is probably a shitty architect, and laying that brick properly is equally as important to the structural integrity of the building as the larger design.

I realize I’m coming off as pretty “old man yells at cloud” coded right now, and that’s a fairly accurate description of my general vibe these days. I have a flip phone, do not participate in any social media, and say shit like “we don’t wonder things anymore, we google them.”

But thoughtlessness is attacking me on all fronts, not just by taking over my industry. The other day in a pottery class I was taking with my girlfriend, a young women chose to ask ChatGPT what colors of glaze to use on her bowls, despite having access to all the glazes and pictures of how different combinations of those glazes looked once fired. She couldn’t be bothered to think about which combination she found pretty. (I don’t think ChatGPT ended up giving her a satisfying result so she was sadly forced to have a lovely time expressing herself). My aforementioned partner has a day job as a receptionist, and when her coworker came in one morning to discover the computer was off, that coworker chose to ask ChatGPT how to turn it on.

What the actual fuck is going on? When did “thinking about things is good, actually” become a combative hot take? I know I probably shouldn’t have made the steal your nose joke, but it was sitting right there and this is a rant not piece of professional writing.

The character Treebeard has a quote in the Lord of the Rings:

… (Old Entish, the tree language) is a lovely language, but it takes a very long time saying anything in it, because we do not say anything in it, unless it is worth taking a long time to say, and to listen to.

Ron Swanson says basically the same thing in Parks and Rec

Don’t half ass two things. Whole ass one thing.

Why is the “future of work” about machines that make it easier to half ass parts of my job?

If you don’t care enough about what you’re writing, whether it be an email, some copy, or some code, to bother actually writing it yourself, maybe just don’t. If you don’t care enough, why the hell should I care enough to read or review what the LLM spat out? It’s disrespectful of your peer’s time.

If you keep hearing how LLMs are transforming the workplace, and you can’t get them to work well, it is not your fault. You’re not “holding them wrong”. These people are not actually 10x’ing their productivity, just their carbon footprint and the bullshit their colleagues are putting up with.

If you’re being told that the “top performers” or the “best engineers” are actively experimenting with LLMs to keep from being left behind, don’t listen. When someone is selling you something, it’s their job to provide clear and valuable use cases, not your job to find some for them. I don’t have to “keep tinkering and experimenting” to understand why a microwave is useful. It fucking heats stuff. Which is mostly what LLMs are good for as far as I can tell.

In summary, to steal a joke from Rich Hickey, this is a rant, there is no summary.