The adoption gap

I was skeptical during GPT-3.5 and GPT-4. Chat in a sidebar was autocomplete with extra steps — useful for a paragraph here or a quick lookup, not useful for the actual work of building and running software. I kept waiting for the thing everyone else kept claiming was already true, and it did not arrive. What arrived instead, about a year later, was a different primitive: agentic coding loops, where the model drives a small set of tools — read a file, edit a file, run a test, read the error — instead of just producing text. That is the version I have changed my tune on, and it is the version the rest of this post is about.

Agentic coding is a compounding advantage for developers who already know the shape of the work, and an information overload for the ones who do not — and the gap that creates is going to widen before it narrows. I do not think my own work would be as efficient if I was starting coding right now. The tool is a force multiplier on an existing mental model; the mental model is what tells you when the agent is confidently wrong, and the mental model is the expensive ingredient, not the model.

The overall shape

Almost everything currently on this blog was built with an agent sitting next to me. The WorkWear4All Magento 2 cluster is the day-job stage — a multi-store backend old enough to have grown a long tail of operational edges, where agentic coding tools are how I move through a codebase neither I nor any of the juniors wrote most of. Hourly is a menu-bar macOS time tracker in SwiftUI, and SwiftUI is a stack I came to fresh for that project. mijndiamondpainting.nl was a Dutch webshop I launched and shut down, and the entire frontend of that shop — Cropper.js upload flow, admin dashboards, the state-machine order funnel — is work I would never have produced at that pace by myself. The AI-agent experiments I am reworking sit adjacent; the niche-hunt pipeline I wrote up separately is another one. The site you are reading is itself an Express/EJS/Tailwind app built the same way.

In none of those cases is the agent producing a finished system on its own. It is producing the scaffold, the first honest draft, the parts I would otherwise have spent the most time on and enjoyed the least. What it is not producing is the judgement about whether the scaffold is the right shape in the first place, or whether the first draft is sitting on top of a quiet bug. That judgement is mine, and it is what has to improve as the agent gets faster.

Why the skeptic flipped

The GPT-3.5 and GPT-4 chat era failed for me on a specific axis: it could produce plausible-looking code but it could not do the work of running it, reading the error, and trying again. Every iteration required me to be the hands — paste into the editor, run, paste the error back into the chat, read, paste. The model was fast but the loop was slow, and the loop is the thing that matters on real work. What felt different about agentic loops was that the tool layer moved inside the loop. The model edits a file, runs the test, reads the stack trace, edits again. I am still the reviewer, but I am not also the courier.

The caveat I keep coming back to is that current agent subscriptions are almost certainly subsidised against the true inference cost of the frontier models behind them. The bill is real somewhere — the question is whether the price the individual developer sees is the price the economics of the thing eventually settles at. I do not think it is. The only version of this story that stays structurally healthy is one where small, locally running models close enough of the capability gap to run the hot path cheaply, and recent small-model progress has been enough that I am taking that bet seriously. I wrote up the shape of that bet on a different project already. The short version is: treat the model as one caller in a system, build good tools for it to use, and the difference between a frontier-scale model and a smaller one narrows faster than you would expect.

Where the agent earns its keep

Frontend design is the first place. I am not good at it, I am never happy with the result, and the time I used to sink into getting something presentable was enormous. On mijndiamondpainting.nl the entire public shop — the order funnel across six URLs, the photo upload with Cropper.js, the admin with Chart.js dashboards, a Tailwind layer I did not have to hand-tune, abandoned-cart email templates — is work I would have either skipped or delivered at a visibly worse quality bar on my own. Not because any one piece is hard, but because I would have given up on iteration four and shipped something ugly. With an agent and a clear direction of what I was trying to achieve, plus a few examples to anchor the aesthetic, the frontend design has really saved me hundreds of hours. That number is an estimate, not a measurement, but it is the right order of magnitude for a year of building the way I have been building.

Scaffolding a new project is the second. I never created boilerplate templates of my own, and I deliberately pick a different framework every so often because the constraint is interesting and I do not want to calcify into one stack. That combination used to mean a long, boring first week of every project where nothing was happening except copy-pasting from the last thing. Now the scaffold arrives in an afternoon, correct enough to start having opinions about. The opinions are the part I wanted to spend my time on in the first place.

Learning new things is the third, and Hourly is the instance where this matters most. SwiftUI, AppKit menu-bar glue, the Mac App Store build pipeline, code signing — the whole cliff that sits in front of anyone new to Apple-platform development. I have shipped enough software in enough languages that the concepts translate, but the specific incantations do not, and the Apple ecosystem is the kind of place where the specific incantation is the thing that blocks you for a day and a half. Being able to generate working code from the get-go — often one-shot, sometimes after a round of correction — is what made launching Hourly actually happen on a timeline I would entertain on a side project. Without it, I would probably still be stuck on some signing-configuration dialog box.

Each of those has a cost next to the saving. Generated frontend is rarely rigorous on the accessibility or performance axes unless I go back and audit it, and I have to remember to go back. Scaffolding in a framework I have not used much is fast but it can quietly commit me to choices I did not realise were choices — the one I caught most recently was a packaging convention I would have done differently if I had seen it earlier, and would not have realised at all if the agent had not run into it a second time and surfaced it. Learning-while-shipping means I am accumulating code I understand less well than code I wrote myself, which is fine until it isn’t.

Why background matters more now, not less

The inequality claim is the one I want to actually defend. The agent is a force multiplier, and force multipliers multiply whatever they are pointed at. If you know what “good” looks like in the stack — the conventions, the framework’s idioms, the shape of a healthy codebase, the kind of error that usually means the thing it says it means and the kind that usually means something else — the agent accelerates you because it generates, and you catch and correct. The feedback loop is fast because you are fast.

If you do not yet know what “good” looks like — because you are starting out, or because the stack is genuinely new to you and you have not built up the taste yet — the same tool is an information overload. It will happily produce twenty lines of code that reference four conventions you do not recognise, call three utilities you have never heard of, and sit on top of an architecture decision you are not ready to argue with. You cannot catch the hallucinations because you cannot smell them yet, and you cannot argue with the structural choices because you have not had the experience that would tell you to argue. What you can do is ship the thing, which is the problem. Confidently wrong code shipped by somebody who does not yet know it is confidently wrong is the failure mode, and it is the failure mode the tool makes easier, not harder.

I am not writing this to be dismissive of people starting out. I am writing it because I remember what it was like to not know what “good” looked like, and I know what it would have felt like to be handed a tool that produced plausible-looking code faster than I could form an opinion on it. The honest reading is that this is going to be harder to learn on now, not easier, and the gap between people who learned the slow way and people who will try to learn the fast way is going to take a while to close — if it closes at all. The force multiplier on background applies to the top of the curve more than it applies to the bottom.

The honest caveats

I have experienced some mental fatigue I did not expect. The agent moves fast enough that it forces me to make decisions about the next step more quickly than I am used to, and the cumulative effect across a day of dense work is real. It is a different kind of tired than writing code by hand is — less typing, more judgement per minute, and judgement is the expensive resource. It is especially bad at the start of a project with a lot of unknowns, or on an existing codebase with a lot of dependencies built into the shape of it.

I have had the agent break perfectly working code in exactly the kind of place the context around WorkWear4All warned me it would. Magento 2 is not a codebase where a module edits cleanly in isolation; the issues you hit are frequently the kind that are hard to troubleshoot without reaching into adjacent components of the stack to rule them out. An agent that confidently edits a module it has half-understood, without running the indexers or watching what the cache layer does, is very good at introducing a silent downstream break that does not show up until an admin user notices a product price is wrong two days later. That has happened to me. The diagnosis was a nightmare because the obvious place to look was not the place the problem had come from, and I had stopped paying close attention to the original change by the time the symptom surfaced.

Hallucinations are still real. I think part of that is a skill issue, in the sense that if you give the agent a vague task with poor context, you will get vague output with poor accuracy, and the same would be true of a junior developer handed the same ticket. The fix there is prompting and context, and that is the part of the work I am still actively getting better at. The part that is not a skill issue is the failure mode where the agent is confident and wrong about something specific — a library API shape, a framework’s behaviour in a corner case, a syntax detail — because the thing it is confident about is what it saw a lot of during training, not what the library actually does in the version I am on. Those are the ones you have to catch with your own knowledge of the stack, which brings the argument back to the main thesis.

The failure mode I am least proud of is blind trust at the end of a tired day. When I have already produced a lot of features, refined them, and just want to close out the last thing before stopping, I am more likely to accept what the agent produced without reading it closely enough. That is the opposite of what I should do — that is exactly when I should be most careful — but it is what happens in practice. I have been thinking about setting up a workflow where the agent builds verification into the code it produces by default, rather than as a thing I remember to ask for. It already does this most of the time for the obvious surfaces, but in a complex application there are a lot of things that could trigger a drift in a different part of the application that the model does not always test for. The workflow does not exist yet. It is a real gap.

Where the bet goes next

The direction I think this goes, and the direction my own work is already turning towards, is small local models running on the hot path with good tools around them — the operator using the agent as one caller among many, rather than as an autonomous reasoner trying to carry the whole task. That reframes the problem from “how do I pick a smarter model” to “what does the model need to see, from where, and in what shape” — which is a software-engineering problem, and a much more tractable one than the scaling one. I have written about that reframing in the context of an AI project I built on exactly that bet; the same bet applies to the operator’s own workflow, not just to agents they build for other people.

The concrete next step on my side is the verification workflow. Something lightweight enough to run automatically on every meaningful change the agent proposes, mechanical enough that I trust it at the end of a long day even when I would not trust myself to read the diff carefully. That is the thing I want to put between me and the “accept everything at 23:00” failure mode.

The part that generalises

The principle I keep coming back to on this, across the agents I build and the agents I use to build things, is that the tools are the design surface, not the model. Once the model is fixed — you picked your provider, you chose your size, you are not going to fine-tune it on a weekend — everything left to design is the set of affordances the model sees when it sits down to do the task. The same is true when the “agent” is the one editing your own codebase. The design question is what context you give it, what verification it has access to, what it is allowed to touch on its own authority, and what it has to hand back to you. The model is the constant in that equation. Everything else is the work.

The gap I described earlier — between people who already know the shape of the work and people who do not yet — is uncomfortable to write about honestly, because neither direction of it is flattering to say out loud. But the work I have actually done this year is the evidence for it, and the evidence is strong enough that I would rather be honest about it in writing than pretend the tool lifts everyone equally. It does not. It lifts the people who were already close to the ceiling, and it leaves the people who were still building the ladder holding a faster-moving tool than the ladder itself.

The overall shape

Why the skeptic flipped

Where the agent earns its keep

Why background matters more now, not less

The honest caveats

Where the bet goes next

The part that generalises

Related reading

An agent built around not calling the LLM

A diamond painting shop the code could not save

Magento's _cl tables had 902 million rows