<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://www.petervanonselen.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.petervanonselen.com/" rel="alternate" type="text/html" /><updated>2026-04-16T14:04:23+00:00</updated><id>https://www.petervanonselen.com/feed.xml</id><title type="html">Peter van Onselen — Staff Engineering &amp;amp; AI</title><subtitle>A staff engineer figuring out AI-assisted development in public.</subtitle><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><entry><title type="html">Concious Coverage</title><link href="https://www.petervanonselen.com/2026/04/16/we-dont-talk-about-coverage/" rel="alternate" type="text/html" title="Concious Coverage" /><published>2026-04-16T08:00:00+00:00</published><updated>2026-04-16T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2026/04/16/we-dont-talk-about-coverage</id><content type="html" xml:base="https://www.petervanonselen.com/2026/04/16/we-dont-talk-about-coverage/"><![CDATA[<p><em>We don’t talk about Code coverage, no no no, we don’t talk about coverage…</em></p>

<hr />

<p><img src="/assets/coverage.png" alt="code coverage matters" /></p>

<p>When I joined Cazoo, it was the first place I’d ever worked that explicitly, actively, aggressively embraced software craftsmanship. Pair programming. Test-driven development. Domain-driven design. Extreme programming. The whole kitchen sink. They sent us on agile training courses that a startup founder would weep at the cost of. We had an agile coach in the room every day. We did code katas regularly.</p>

<p>And even there, in the most craft-soaked environment I’d ever been in, the idea of 100% code coverage was treated as obvious lunacy. A poor metric. The kind of thing only someone who hadn’t really understood testing would chase.</p>

<p>Then I joined the Economist, and the team I landed on had 100% coverage as a hard rule.</p>

<p>They didn’t do TDD. They didn’t pair. They hadn’t been on the agile bootcamps. They hadn’t done code retreats or code katas. By every measure of either the London or Chicago school of craftsmanship tradition would care about, they were doing less of the work. But they had the 100% rule, and they enforced it, and at first I assumed they’d inherited a metric without fully understanding it.</p>

<p>They hadn’t. Turns out I hadn’t understood it. And by the time I left that team, I’d come around entirely. Not reluctantly, not with caveats, but genuinely: 100% coverage, properly understood, is mandatory. I held that position for years before agentic coding was a thing anyone was thinking about. The agents haven’t changed my mind. They’ve just taken a position I already held and made the case for it screamingly, urgently obvious in a way it previously wasn’t.</p>

<h2 id="what-is-this-metric-thing-about-anyway">What is this metric thing about anyway?</h2>

<p>Here’s what I’d absorbed from the craft world about 100% coverage. It’s a vanity number. Chasing it produces garbage tests. You end up writing assertions against getters and setters. You exercise code without testing behaviour. The pragmatic position, and pragmatism was always the emphasis, is that you write the tests that matter and you let the rest go.</p>

<p>All of that is true if “100% coverage” means “every line has a test exercising it.” That version of the metric is genuinely silly and the people warning against it were right.</p>

<p>But it took me until very recently to notice was that nobody, in all those arguments, had ever actually explained what the metric was <em>for</em>. What it was pointing at. Everyone, including me, was arguing about the number. Nobody was asking what the number was a proxy for.</p>

<p>It’s a proxy for <strong>Conscious Coverage</strong>. That’s the thing. Every line in the codebase is a decision. The question the metric is actually asking, underneath, is: <em>have you made a conscious decision about each one</em>. Not have you tested each one. Have you <em>decided</em> about each one. Tested, or consciously chosen not to test, with a reason, written down.</p>

<p>Concretely, it looks like this. You write a function with a branch that handles a malformed input. You run the coverage tool. It tells you the error branch isn’t covered. You now have three choices, and only three.</p>

<ol>
  <li>You can write a test that exercises the malformed input and asserts the behaviour.</li>
  <li>You can mark the branch ignored with a comment that says, say, “unreachable because upstream validation guarantees this shape” — and now your justification is a reviewable artefact that someone can argue with in a pull request.</li>
  <li>Or you can decide the branch shouldn’t exist at all and delete it. What you cannot do is shrug and move on. The forgotten case is no longer a thing. Every line has had a decision made about it, and the decisions are legible.</li>
</ol>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="p">...</span>
  <span class="k">static</span> <span class="nx">countryToRegion</span><span class="p">(</span><span class="nx">countryCode</span><span class="p">:</span> <span class="kr">string</span><span class="p">):</span> <span class="nx">Region</span> <span class="p">{</span>
    <span class="cm">/* v8 ignore start */</span> <span class="c1">// Ignoring the switch to avoid repeating every single country code</span>
    <span class="k">switch</span> <span class="p">(</span><span class="nx">countryCode</span><span class="p">)</span> <span class="p">{</span>
  <span class="p">...</span>
</code></pre></div></div>

<p>Once you see that, the version of the rule that the craft world rejected and the version the Economist team was running are obviously different things. The first one optimises for a number. The second one optimises for <em>the absence of accidents</em>. You can no longer fail to test something because you forgot. You can fail to test it because you decided not to, and you wrote down why, and someone can argue with you about it later in the review. The shape of the work is different.</p>

<p>And this is the bit I have to be honest about, because the post doesn’t work without it. Once the metric is framed as conscious coverage, the pragmatic position I’d absorbed at Cazoo stops being pragmatic. It’s just laziness with a vocabulary. “Write the tests that matter and let the rest go” sounds wise until you ask which lines, specifically, didn’t matter, and why, and the answer turns out to be that I didn’t want to write those tests and the tradition had given me a way to sound rigorous about not writing them. The metric wasn’t too expensive. The work it pointed to wasn’t too expensive. I just didn’t want to do it, and nobody was making me, and the craft vocabulary let me call that a considered trade-off.</p>

<p>I had to be in a place that just <em>did</em> it before I could see any of this. Sitting at Cazoo arguing about it from first principles, I would have lost the argument every time, because the version of the rule I was arguing against was the version everyone agrees is bad, and the version underneath it, the one about conscious, nobody had ever put into words for me. Nobody tells you the better version exists until you’re standing inside a codebase that runs on it.</p>

<h2 id="what-changes-when-an-agent-is-doing-the-writing">What changes when an agent is doing the writing</h2>

<p>Fast forward. I’m now writing a lot of code with agents. Claude Code, Codex, OpenCode, the usual suspects. The thing I keep telling people who ask me about it is that agentic engineering requires <em>more</em> discipline than normal engineering, not less. The tools are faster, the output is bigger, and the gaps between what you asked for and what you got are easier to miss. So everything that used to depend on careful human attention now depends on something else holding the line. Which brings me back to the question: how do I know it’s done? And more importantly, how does an agent know?</p>

<p>Not “done” in the user-acceptance sense. Done in the much more boring sense of: has this thing actually exercised the code it claims to have written? Has it tested the behaviour I care about? Did it quietly skip a branch because the test was annoying to set up? Did it write something that’s technically passing but structurally untestable?</p>

<p>These are the questions the craftsmanship tradition spent twenty years building intuitions about, and the answer the tradition arrived at, pragmatically, contextually, with appropriate caveats, was mostly “you’ll know it when you see it, and pairing helps, and code review helps, and time helps.” Which is fine when humans are doing the work at human pace. It is not fine when an agent has just produced four hundred lines in ninety seconds and is asking what to do next.</p>

<p>The agent needs a guard rail. Something machine-checkable. Something it can run, get a number from, and decide for itself whether to keep going. Something another agent can validate.</p>

<p>100% coverage, in the conscious sense, turns out to be exactly that. The agent finishes its loop, runs the coverage tool, sees 98%, and knows, without me telling it, that there are two percent of decisions it hasn’t made yet. Either write the test, or mark the lines as ignored with a justification. Both are fine. What’s not fine is leaving the gap.</p>

<p>And here is where the impact of the reframe gets outsized, because the agent doesn’t have my laziness. The agent doesn’t want to go home. The agent isn’t quietly negotiating with itself about which lines it can get away with skipping. The thing that was always standing between me and conscious coverage, which was me, just isn’t there. The metric stops being a rod I have to hold myself to and becomes a rod the agent holds itself to, cheerfully, at four in the morning, forever. The practice the craft tradition argued about most fiercely for human reasons becomes, for agents, the most natural thing in the world.</p>

<p>I’ve started using this as one of my standard acceptance criteria. “You are done when coverage reports 100%.” I can kick off a thirty-minute task and come back to something that, whatever else is true of it, will at least be testable, and will at least have had every line consciously decided about.</p>

<p>Coverage as the gate at the end works better when there’s a process upstream that’s likely to produce decent tests in the first place. If you set up the harness with CLAUDE.md files that push the agent toward red-green-refactor TDD, and you give it the kind of structured prompting (like obra/superpowers) that shapes how it actually approaches a task, you tilt the odds. There’s no guarantee it’ll write tests first. There’s a much better chance it will, and a much better chance the tests it writes are pulling the design rather than chasing it. That upstream tilt plus the downstream gate is a much sturdier system than either piece on its own.</p>

<p>There’s a sharpening of all this that matters, though, because coverage on its own can still produce tests that exercise code without actually testing anything. The companion practice, and I’d say it’s a necessary one rather than a complementary one, is writing tests outside-in, from behaviour rather than from structure. Test the unit of behaviour, not the unit of code. Don’t mock the internals; let the real thing run and assert against what the user of the code actually cares about. This was already the right answer when humans were writing the tests, because it produces tests that survive refactors and read like documentation. With agents it becomes critical, because a behaviour-shaped test is one the agent can write legibly from a user story, and one that you, as the reviewer, can read and check against intent without having to trace the implementation. Coverage tells you the agent made a decision about every line. Behavioural framing tells you the decisions were about the right things. You need both. Coverage without behavioural framing is theatre; behavioural framing without coverage leaves gaps you’ll find in production.</p>

<p>Now for the obvious objection. Agents are world-class metric gamers. They will absolutely write meaningless tests that exercise code without asserting anything useful. They will absolutely mark lines as ignored with justifications like “this branch is unreachable” when the branch is, in fact, reachable. If you treat 100% coverage as a number to satisfy, the agent will satisfy the number and you’ll be worse off than before, because now you have a green build hiding a problem instead of a red one announcing it.</p>

<p>The reason I think it works anyway is that it’s asking the right question of the metric. Coverage, in the conscious sense, is a completeness check. It tells you every line has had a decision made about it. It was never going to tell you the decisions were good ones. That’s a different question, and it wants a different answer. Behavioural tests, written outside-in from what the user of the code actually cares about, are the correctness check. Mutation testing, which flips operators and boundaries and asks whether any test notices, is the check on whether the assertions are doing real work. The gaming the agent does lives in the gap between those checks, and the mitigation isn’t to make coverage smarter. It’s to stop asking coverage to do correctness’s job. Use it for what it is: a completeness gate that makes the decisions visible. Use behavioural framing and mutation testing for the quality of the decisions. The ignored lines and their justifications are, at least, a reviewable artefact, sitting in one place where you can read them. The cheats are confined to a place you’re looking. None of that is automatic. It’s a discipline, and like every guard rail it collapses the moment you stop maintaining it. The question is whether the rail makes problems easier or harder to spot, and I think this one makes them easier.</p>

<h2 id="the-truisms-didnt-go-away">The truisms didn’t go away</h2>

<p>The craft tradition produced a lot of practices, and a lot of arguments about practices, and a lot of nuance about when practices apply. Most of that nuance was about humans. About the cost of the practice to the person doing it, about whether the discipline was worth the friction, about whether the metric would be gamed. A lot of it, and I say this now having lived on both sides of the argument, was about whether the person doing the work would actually do it if you asked them to.</p>

<p>Agents don’t have that problem. The friction of writing the extra test isn’t a friction the agent feels. The discipline of marking ignored lines with reasons isn’t a discipline the agent has to be talked into. The kind of metric-gaming that comes from a tired human at five-to-six is replaced by a different kind of gaming, which is its own problem. So practices that were borderline-worth-it for humans become straightforwardly worth it for agents, and practices that were rejected as lunacy for humans turn out, on inspection, to have been rejected for reasons that said more about the humans than about the practice.</p>

<p>The craft was always about building software in a sustainable, predictable, maintainable way. That hasn’t changed. The agents don’t replace the craft. They inherit it. And some of the practices the tradition argued about most fiercely turn out, in this new context, to be exactly the load-bearing ones. Not because the old arguments were wrong about the metric, but because the old arguments were quietly also about us, and the us part has changed.</p>

<p>100% coverage wasn’t wrong. It was a proxy for something nobody I knew named. That allowed me to point at work I didn’t want to do, and dressed up in a vocabulary that let me agree with myself about not doing it. The agents don’t have the vocabulary and don’t need it. Which makes me wonder which other practices were rejected for reasons that were really about us, and what the calculation looks like now that we have a collaborator who just, straightforwardly, does the work. I’ve run that calculation for coverage. I’m increasingly sure it isn’t the only practice the answer flips for. I’d quite like to know which others.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="aios" /><category term="claudecode" /><category term="softwarecraftsmanship" /><summary type="html"><![CDATA[We don’t talk about Code coverage, no no no, we don’t talk about coverage…]]></summary></entry><entry><title type="html">The Canary in the Harness</title><link href="https://www.petervanonselen.com/2026/04/12/the-inevitable-lobotomisation-of-claude/" rel="alternate" type="text/html" title="The Canary in the Harness" /><published>2026-04-12T08:00:00+00:00</published><updated>2026-04-12T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2026/04/12/the-inevitable-lobotomisation-of-claude</id><content type="html" xml:base="https://www.petervanonselen.com/2026/04/12/the-inevitable-lobotomisation-of-claude/"><![CDATA[<p><em>On discovering that your favourite tool got measurably worse, that you’d been blaming yourself for it, and that the only reason you noticed at all was because another harness was sitting right next to it behaving normally.</em></p>

<hr />

<p><img src="/assets/canary-hero.png" alt="The Canary in the Harness" /></p>

<h2 id="a-tale-of-two-ralph-loops">A tale of two Ralph loops</h2>

<p>A couple of weeks ago I was playing with Newshound, a personal project of mine that pulls together a digest of interesting things from a list of about thirty sources on the internet. I wanted to add a feature that was a little more involved than the usual yak shave. Spec conversation. PRD skill. JSON. Ralph loop. The full ceremony.</p>

<p>I ran the loop in Claude Code. It went for two hours. A good chunk of that two hours was Claude Code recursively chewing on the same problem, half-finishing things in slightly different ways each time around. Eventually it limped over the finish line. At which point my Pro subscription tapped out.</p>

<p>I went off and set up the wrapper script from <a href="https://www.petervanonselen.com/2026/04/11/the-grand-plugin-trap/">the last post</a> to allow me to run a Ralph loop on OpenCode. I then ran the <em>exact same prompt</em> through OpenCode with GPT-5.4. Same Ralph loop. Same PRD. Same instantiation of the problem.</p>

<p>Fifteen minutes.</p>

<p>I noticed this. Of course I noticed this. And the conclusion I reached, the one anyone would reach, was: huh, GPT-5.4 must just be better at this particular kind of task. I filed it under “interesting data point about model personalities” and moved on. I’d written about how each harness has its own character in <a href="https://www.petervanonselen.com/2026/04/03/the-council-will-see-you-now/">the council post</a>, and this felt like more of the same. Different tool, different shape, sometimes one fits the keyhole better than the other. Cool.</p>

<p>That was the wrong conclusion. I just didn’t know it yet.</p>

<h2 id="what-newshound-put-on-my-desk">What Newshound put on my desk</h2>

<p>Two days ago Newshound surfaced <a href="https://github.com/anthropics/claude-code/issues/42796">a GitHub issue</a> on the Claude Code repo. There is a particular pleasure in your own tool catching the thing that’s about to reframe how you think about your other tools, and I want to note it before I move on, because the whole point of personal projects is moments like this.</p>

<p>The issue was filed by Stella Laurenzo, an engineer working deep in the AMD GPU compiler stack on IREE. Not a casual user. Not someone shouting into the void about vibes. Someone whose day job is to run dozens of concurrent Claude Code agents against a non-trivial systems codebase, who logs everything, and who knows how to do statistics to data.</p>

<p>The headline finding is brutal. From late January through early March, she analysed 17,871 thinking blocks and 234,760 tool calls across 6,852 Claude Code session files. What she found is that somewhere between mid-February and early March, Claude Code’s behaviour changed in measurable, reproducible, machine-readable ways.</p>

<p>The number that broke me is the Read:Edit ratio. In the good period, Claude Code was reading 6.6 files for every file it edited. By mid-March, that ratio had collapsed to 2.0. The model stopped reading code before changing it. One in three edits in the degraded period was made to a file the model hadn’t read in its recent tool history.</p>

<p>There’s more. A “stop hook” she built to programmatically catch Claude trying to dodge work, ask unnecessary permission, or declare premature completion fired 173 times in seventeen days. It had fired zero times before March 8th. Zero. Every phrase in that hook was added in response to a specific incident where Claude tried to stop working and had to be forced to continue. The word “simplest” in Claude’s outputs went up by 642 percent. The word “please” in <em>her</em> prompts dropped 49 percent. The word “thanks” dropped 55 percent. She stopped being polite to it because there was nothing left to be polite about.</p>

<p>The methodology is more rigorous than anything I would ever bother to do, the dataset is enormous, and the appendix where Claude Opus analyses its own session logs and writes “I cannot tell from the inside whether I am thinking deeply or not” is one of the more haunting things I’ve read in a technical bug report.</p>

<p>Go and read it. I’m not going to recap the whole thing. The point that matters for this post is much smaller and much more personal.</p>

<h2 id="the-thing-id-been-blaming-on-myself">The thing I’d been blaming on myself</h2>

<p>I have been using Claude Code since June last year. In that time it has been, without much competition, the most enjoyable engineering tool I’ve ever used. The blog you’re reading exists in part because of how much I have wanted to write about working with it.</p>

<p>But over the last few weeks something had been off. Sessions felt slower. The chatter I was used to, the running commentary where Claude Code would talk through its plan as it worked, had gone quieter. The two-hour Ralph loop on Newshound was the loudest version of it but it wasn’t the only one. I’d had a couple of sessions where it felt like Claude was rushing to a conclusion, where the reflection phase produced shallower answers than I was used to, where I was correcting more and praising less.</p>

<p>I had put all of this down to me. I’d been burnt out and needing a holiday. I was probably tired. I was probably prompting badly. The problem was probably harder than I’d estimated. The Ralph loop was probably a poor fit for the task. GPT-5.4 was probably just better at this particular slice of work.</p>

<p>None of those things are unreasonable explanations. They’re the kinds of explanations a senior engineer reaches for first, because the alternative, “the tool I rely on every day got measurably worse without telling me,” feels paranoid and slightly embarrassing. So you eat it. You assume the variable that changed is you.</p>

<p>And then someone with 6,852 session logs and a Pearson correlation coefficient publishes the receipts, and you sit there reading them on a Sunday afternoon thinking: oh. Oh, that’s what that was.</p>

<h2 id="the-argument-the-council-post-wasnt-making-yet">The argument the council post wasn’t making yet</h2>

<p>When I wrote about <a href="https://www.petervanonselen.com/2026/04/03/the-council-will-see-you-now/">convening multiple AI harnesses as an architectural review council</a>, the pitch was about getting better answers. Different harnesses have different personalities, the harness matters more than the model, three opinions plus a synthesis beats one opinion. All of that I still believe. But there was a second argument hiding in there that I didn’t see at the time, and Stella’s report is what dragged it into the light.</p>

<p>Multi-harness working is regression detection.</p>

<p>It is, for most of us, the <em>only</em> regression detection we are ever going to have. I am not going to instrument my Claude Code sessions, capture 234,760 tool calls, and run a signature-length correlation against thinking depth. I have a day job and a stealth tactics game to build. Stella did that work and the rest of us are in her debt for it, but it is not a repeatable practice for anyone whose job title isn’t “compiler engineer with infinite patience and a logging fetish.”</p>

<p>What <em>is</em> repeatable is keeping three harnesses in active rotation and noticing when one of them starts feeling off relative to the other. The fifteen-minutes-versus-two-hours moment with Newshound was a regression signal. I just didn’t read it as one because I had no framework for the idea that the harness itself might be the variable. I assumed harnesses were stable. They are not stable. They are moving targets, reconfigured continuously by people who do not write to you about what they changed, and the only way you find out is by holding two of them up to the same problem and watching one of them flinch.</p>

<p>This is what the plugin trap was protecting against without me fully understanding why. <a href="https://www.petervanonselen.com/2026/04/11/the-grand-plugin-trap/">Yesterday’s post</a> was about keeping the exits visible so you don’t get locked into a single ecosystem. The thing I didn’t say, because I didn’t know it yet, is that the room you’re standing in is being remodelled while you sleep. Exits aren’t just for when you want to leave. Exits are how you find out the room has changed shape.</p>

<p>If your entire workflow lives inside one harness, harness drift is invisible to you. It just feels like you’re having a bad week. You blame yourself. You prompt harder. You write longer CLAUDE.md files. You assume the problem is on your side of the screen, because from inside one harness there is no other side of the screen to compare against.</p>

<h2 id="naming-names-because-this-is-supposed-to-be-honest">Naming names, because this is supposed to be honest</h2>

<p>I am going to name Claude Code directly here, because this blog only works if I’m being truthful about what I’m actually using.</p>

<p>The tool that got measurably worse over the last month is Claude Code. The tool I have loved more than any other engineering tool in the last decade is Claude Code. Those two sentences belong in the same paragraph. I am writing this <em>because</em> of how much I like the thing, not in spite of it.</p>

<p>If you have been feeling like Claude Code is harder to work with than it was in February, you are probably not imagining it, and you are probably not getting worse at your job. There is data. The data is good. Go and read it.</p>

<h2 id="what-im-taking-away">What I’m taking away</h2>

<p>Three things, and then a rabbit hole.</p>

<p>First, I want crude metrics on my own harness usage. Not 234,760-tool-call-Pearson-correlation crude. Just crude. How many tool calls per session. How many file reads versus file edits. How many times I had to interrupt and correct. Even a daily tally of “did Claude Code feel like it was trying today” would be more signal than I currently collect, which is zero. If the regression signal is detectable in aggregate, I want to be looking at the aggregate.</p>

<p>Second, I want a smoke-test prompt suite. A handful of canonical prompts that exercise the kinds of work I actually do, that I can run across harnesses on a rough cadence and use as a tripwire for drift. Nothing fancy. A small fixed battery, run weekly, results scribbled in a notebook. The point is not the rigour, the point is the comparison over time. I have been operating without a baseline and it has cost me.</p>

<p>Third, the portability argument from the plugin trap post upgrades from “useful insurance against rate limits and lock-in” to “the only way you will ever notice that your tools have silently changed underneath you.” Multi-harness working is the canary. If your canary is the same species as the thing you’re trying to detect, you don’t have a canary. You have another bird in the same mine.</p>

<p>And then the rabbit hole.</p>

<h2 id="the-next-room-over">The next room over</h2>

<p>There is a project called <a href="https://pi.dev">pi</a> by a developer named Mario Zechner. The tagline on the front page is “There are many coding agents, but this one is mine,” which is doing a lot of work in eight words. Pi is a minimal, aggressively extensible terminal coding harness. The pitch is that you adapt pi to your workflow rather than the other way around. No sub-agents, no plan mode, no built-in todos, no MCP, no permission popups, no background bash. All of those things are extensions you add, or build, or install from someone else’s package. The core stays small and the shape comes from you.</p>

<p>There is <a href="https://www.youtube.com/watch?v=Dli5slNaJu0">a YouTube video</a> by Mario walking through how he came to build it that I have not yet found the time to fully watch, and this post is partly me giving myself permission to find that time.</p>

<p>The reason pi feels like the natural next thing is that it is the logical endpoint of an argument I’ve been making in pieces across the last few posts. The plugin trap post said your workflow shouldn’t live inside one harness. The council post said different harnesses give you different answers. This post is saying different harnesses give you the only honest baseline you have for spotting drift in any one of them. The next move, the move I cannot stop thinking about, is: what if the harness itself is something you own? What if instead of being a tenant in three different rooms, all of them being remodelled by other people on different schedules, you build a small room of your own, with the doors where you want them, and treat the rented rooms as the comparison set?</p>

<p>I do not know yet whether pi is the right answer to that question. I have not run it. I have not watched the video. I have a game and a new digest agent I am supposed to be working on, and the smell of yak around me is already pretty thick.</p>

<p>But I can feel the next dive coming. And after the week I’ve just had, I am done pretending that holding still inside a single harness is the safe choice. The safe choice is having somewhere else to look from.</p>

<p>Off I go.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="aios" /><category term="claudecode" /><category term="softwarecraftsmanship" /><summary type="html"><![CDATA[On discovering that your favourite tool got measurably worse, that you’d been blaming yourself for it, and that the only reason you noticed at all was because another harness was sitting right next to it behaving normally.]]></summary></entry><entry><title type="html">The Grand Plugin Trap</title><link href="https://www.petervanonselen.com/2026/04/11/the-grand-plugin-trap/" rel="alternate" type="text/html" title="The Grand Plugin Trap" /><published>2026-04-11T08:00:00+00:00</published><updated>2026-04-11T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2026/04/11/the-grand-plugin-trap</id><content type="html" xml:base="https://www.petervanonselen.com/2026/04/11/the-grand-plugin-trap/"><![CDATA[<p><em>A modest meditation on plugins, portability, and the peculiar sorrow of a workflow that cannot leave the building.</em></p>

<hr />

<p><img src="/assets/grand-plugin-trap/hero.png" alt="hero hotel" /></p>

<p>It’s day two of my holiday and I’m staring at a Claude Code session that won’t do anything. Pro limit hit. Three days until it resets. There’s a personal project sitting open in another window that I’d been quite enjoying poking at, and now I can’t poke at it, and the bit of my brain that had been having a perfectly nice time is suddenly very loud about the £20 of extra credit I’d burned through in a single afternoon earlier in the week.</p>

<p>This is the story of how that lockout forced me to do a small piece of unglamorous setup work I’d been avoiding for months, and what I found on the other side of it.</p>

<h2 id="the-workflow">The workflow</h2>

<p>Quick context. Over the last nine or ten months I’ve fallen into a working rhythm with my personal projects that goes something like this. I open an AI chat, and I have a long conversation with it. Not a “write me some code” conversation. A “let’s interview each other about what I’m actually trying to build and why” conversation. These run for three or four hours sometimes. Lots of back and forth, lots of poking at scope, lots of trying to find the smallest version of the thing that would actually tell me whether the idea is any good. At the end of all that I have what I’ve been calling a spec: a high-level document about what we’re doing and why.</p>

<p>Then I take the spec and run it through a PRD skill I shamelessly stole from the Ralph loop. Quick aside: PRD is a term I had genuinely never encountered in fifteen years of working in agile teams. I first heard it watching YouTube videos about people working with AI, sometime in the last year, and I had to go and look up what the bloody hell it stood for. As best I can tell, a PRD is an epic with a collection of user stories, some acceptance criteria, some functional and non-functional requirements, and a bit of product context bolted on top. Cool. I can work with that. The reason I like this particular PRD skill is that after I’ve already spent four hours on the spec conversation, it asks me five more questions to validate what I’m building. Which is exactly the kind of thing you want at that stage!</p>

<p>PRD becomes JSON. JSON gets fed to a Ralph loop. Off we go.</p>

<h2 id="the-bit-where-i-was-cheating">The bit where I was cheating</h2>

<p>Here’s the dirty secret. I’d never actually set up the Ralph loop the way you’re supposed to set it up. I’d been running it via a plugin inside Claude Code. Plugins are wonderful. You install them, they work, you’re productive in ninety seconds. Why would you write a bash script when you can install a plugin?</p>

<p>The honest answer is: you wouldn’t. <em>And that’s the trap. The problem isn’t plugins. The problem is when your workflow only exists inside one of them.</em></p>

<p>Plugins feel like the harness rewarding you for committing to it. Every plugin install is a small vote for staying inside that one ecosystem, and those votes compound quietly until one day you look up and notice you’ve stopped being portable. You’re not running a workflow anymore. You’re running a workflow <em>that only exists inside Claude Code</em>. Which is fine, until it isn’t.</p>

<h2 id="how-i-burned-through-the-credits-in-the-first-place">How I burned through the credits in the first place</h2>

<p>I should be clear about something. I hadn’t hit the Pro limit doing serious work on my personal project. I’d hit it because it was my holiday, and I’d spent the previous week happily down an oh-my-codex rabbit hole for no reason other than that it was interesting.</p>

<p>Oh-my-codex is a sprawling wrapper that someone has built around Codex to give it brainstorming flows and Ralph loops and a pile of other usability niceties. I’d become curious about it for a very specific reason: when the Claude Code source leaked, a developer in South Korea used Codex with oh-my-codex to reimplement the entirety of Claude Code in Python. In six hours. <em>Six hours</em>, for a non-trivial codebase. I wanted to understand how that was even possible, which meant I wanted to make oh-my-codex work with OpenCode and Claude Code rather than just Codex, because of course I did. More harnesses. Always more harnesses.</p>

<p><img src="/assets/grand-plugin-trap/the-way.png" alt="the way" /></p>

<p>So that’s what the credits went on. A week of trying to bend an already-baroque wrapper around two more harnesses it wasn’t designed for, purely because I wanted to know how the thing worked. No deliverable. No project at the end of it. Just the kind of dive-in-and-poke-at-it exploration that holidays are for. I was having a great time, the plugin inside Claude Code was still humming along for the actual personal project I dipped into between rabbit hole sessions, and the cost of any of this hadn’t shown up yet.</p>

<p>Then it showed up.</p>

<h2 id="the-thing-id-been-ignoring-at-work">The thing I’d been ignoring at work</h2>

<p>I should have seen this coming, because at The Economist I have access to three different coding agents with three different usage pools, each gated on different constraints. In practice that means I bounce between them all day. Hit a five-hour window in one, switch to another, work until that one taps out, switch to the third. It’s a genuinely lovely setup if you’re the kind of person who likes being spoiled for choice on tokens.</p>

<p>But it also means I’ve been quietly reinstalling the same plugins and the same markdown scripts in three different places, every time something changes. And whenever one of those environments goes down or gets reconfigured, I lose half a morning rebuilding the workflow in another one. I’d been feeling that friction for ages without ever quite naming it. It was just background noise. The cost of doing business.</p>

<p>Then the personal Pro lockout happened, and suddenly the background noise was the only thing in the room.</p>

<p><img src="/assets/grand-plugin-trap/darkness.png" alt="darkness" /></p>

<h2 id="doing-the-unglamorous-thing">Doing the unglamorous thing</h2>

<p>So I went and found <a href="https://github.com/Th0rgal/open-ralph-wiggum">open-ralph-wiggum</a>, worked out how to wire it up properly, and wrote <a href="https://github.com/vanonselenp/zsh-functions/blob/main/functions/ralph-loop.zsh">a small zsh function</a> that wraps it so I can just type <code class="language-plaintext highlighter-rouge">ralph-loop</code> from any project directory and have the thing kick off without me having to remember any flags. None of this was hard. None of this was interesting. It was the kind of work I had been actively avoiding because I’d already spent a week earlier that month fiddling around with Codex and OpenCode and trying to make various things play nicely together, and the last thing I wanted was <em>more</em> yak shaving.</p>

<p>But here’s the thing about doing it during a forced lockout, with nothing else to distract me. There was nothing else to do. So I sat with it. And once it was done, I had a Ralph loop that ran on top of OpenCode, with GPT-5.4, completely independent of whether Claude Code was up, down, or rate-limited into oblivion. The wrapper meant I could move between harnesses without rebuilding anything. The script lived in my dotfiles. It was just <em>there</em>.</p>

<h2 id="the-real-prize">The real prize</h2>

<p>I’ve <a href="https://www.petervanonselen.com/2026/04/03/the-council-will-see-you-now/">written before</a> about how each AI harness has its own personality. Claude Code thinks differently from Codex thinks differently from OpenCode, and a lot of that personality lives in the harness rather than the model. I still believe that. But what I hadn’t fully clocked, until this week, is that everything I’d written about harness personalities was manual copy paste painful exercises, because the plugin had me boxed into one of them.</p>

<p>Knowing the council exists is one thing. Being able to actually convene it on a Tuesday afternoon while you’re trying to ship something is another. The wrapper script is the thing that closes that gap. It allows for more meaningful agentic workflows in any harness easily.</p>

<p>That’s the prize. Not the lockout workaround. Not the bash script. The portability that lets the multi-harness thing actually be a way of working rather than an essay.</p>

<h2 id="what-im-sitting-with">What I’m sitting with</h2>

<p>I’m going to keep using plugins. They’re genuinely useful and I’m not about to LARP as someone too principled to install convenient things. But I’m going to be more suspicious of how easy they make the first ninety seconds feel, because I now have a much clearer sense of what they cost on the back end. Every plugin ecosystem is a small gravity well. The more you commit, the harder it is to leave, and, this is the part that bothers me most, the less you can even see what you’re missing on the outside.</p>

<p>The unglamorous wrapper script turns out to be a small act of resistance against that. Not a heroic one. Just a vote for keeping the exits visible.</p>

<p>I’d rather have the exits visible.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="aios" /><category term="claudecode" /><category term="softwarecraftsmanship" /><summary type="html"><![CDATA[A modest meditation on plugins, portability, and the peculiar sorrow of a workflow that cannot leave the building.]]></summary></entry><entry><title type="html">The Council Will See You Now…</title><link href="https://www.petervanonselen.com/2026/04/03/the-council-will-see-you-now/" rel="alternate" type="text/html" title="The Council Will See You Now…" /><published>2026-04-03T08:00:00+00:00</published><updated>2026-04-03T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2026/04/03/the-council-will-see-you-now</id><content type="html" xml:base="https://www.petervanonselen.com/2026/04/03/the-council-will-see-you-now/"><![CDATA[<p><em>You were the chosen one! You were supposed to destroy the hallucinations, not join them!</em></p>

<hr />

<p><img src="/assets/council/council.png" alt="The council" /></p>

<p>I use multiple AI agents as an architectural review council. When I said that out loud recently, I got the look. You know the one. The polite nod that says “I have no idea what you just said but I’m going to smile and move on.”</p>

<p>So here’s the footnote.</p>

<h2 id="the-setup">The setup</h2>

<p>I’m currently juggling two things at work that have no business being juggled at the same time.</p>

<p>The first is a set of deeply tangled bugs that have been lurking since July 2024. They’re tied to a third-party integration, they’re interconnected, and we’re only now seeing the full scope of how bad they are. This is slow, careful, multi-day investigation work. Long-form conversations with AI. Reading code. Writing tests to verify behaviour. Building the case for “this is exactly where the problem is and this is exactly why.”</p>

<p>The second is supporting our engineering manager to enable an external contracting team to deliver a new feature in our codebase. The contractors have never touched our code before. They don’t have a clear picture of the requirements, the systems, or how everything interacts. The depth they need to make meaningful architectural decisions is broad, deep, and nuanced. I’ve worked in the systems, but I don’t have enough depth to answer all their questions off the top of my head.</p>

<p>But what I do have is access to Claude, Claude Code, GitHub Copilot, and Codex.</p>

<h2 id="the-council-assembled">The council, assembled</h2>

<p>I should back up. For months now, in my personal life, I’ve been asking the same questions to ChatGPT, Gemini, and Claude interchangeably. I call them my council. Each one has a different personality, notices different things, and observes different angles. I find that when I’m getting multiple opinions I make better decisions. This is just me applying the same instinct to my workspace.</p>

<p>It started with a simple question: we have a third-party payment provider that offers a payment method, and we have a number of integrations between us and them. The contracting team needed to understand how we use it. How do we integrate with backend services? Where are the bits that are backend-for-frontend versus actual backend platform services? How do all of these systems interact? Where are all the endpoints?</p>

<p>I spent a day and a half in long-form conversations with multiple AI systems interrogating the problem. I started at the web-facing entry point and worked backwards. I trawled Confluence, Slack, Google Drive, and every other form of long-term documentation to build a picture of what the contracting team was going to need. Then I took all of that context, the goals of the team, and the documentation, and used it to structure a comprehensive prompt.</p>

<p>I ran that prompt through three different AI harnesses: Codex, Claude (via the web), and Claude Code running Opus. Each one went away, investigated the same repositories, and came back with structured answers. Then I took those structured answers, went and explored the code myself, used the hints they’d given me to validate everything, and wrote up a comprehensive document explaining the lot.</p>

<h2 id="naive-me-thought-job-done">Naive me thought job done</h2>

<p>Obviously I was not done. You’d think by now I’d know better.</p>

<p>A week later the contracting team came back with “cool, we have a plan.” They’d taken everything I’d given them, created architectural diagrams, Confluence documents, and actual thinking. They wanted me to review whether their approach would work.</p>

<p>So I did the same thing again. Took their documents, the context I’d already built, and pointed all three AI harnesses at the relevant repositories, not just mine but across four different teams’ repos, backend services and frontend services and everything in between. I had each one validate whether what the contractors were proposing would actually work. Then I took the outputs from all three, wrote them to file, and had a fourth agent (OpenCode running Opus 4.6) synthesise a combined result. I used that synthesis to structure my response back to the team.</p>

<p>I’ve now done this process three times. Here’s how it works:</p>

<blockquote>
  <p><strong>The Council Process</strong></p>

  <ol>
    <li>Gather context and documentation</li>
    <li>Structure a comprehensive prompt</li>
    <li>Run the same prompt through multiple AI agents and let each investigate the repositories independently</li>
    <li>Save their structured outputs</li>
    <li>Run a synthesis agent across all results</li>
    <li>Validate manually: use the AI outputs as a map for where to look, read the code yourself, and run quick targeted questions past other engineers</li>
  </ol>
</blockquote>

<h2 id="where-the-real-value-lives">Where the real value lives</h2>

<p>The synthesis step is where the magic happens. It’s not just about getting three answers. It’s about what happens when you put them next to each other.</p>

<p>The synthesising agent highlighted where all three harnesses were in agreement, which gave me confidence. But more importantly, it highlighted where they’d noticed different pieces of the problem. Even though they were looking at the same repositories and most likely using the same underlying tools, they ended up pulling out different things. Codex might flag an endpoint I hadn’t considered. Claude Code might trace a data flow the others glossed over. The breadth of coverage from running three agents was meaningfully wider than any single one.</p>

<p>This also feeds into something that should be obvious but bears repeating: AI hallucinates. You cannot 100% commit to trusting just one version. When you need accurate architectural understanding, having multiple agents give you a synthesis that you then validate yourself is genuinely useful. It’s not a replacement for reading the code. It’s a way to read the code faster and know where to look.</p>

<h2 id="the-tools-have-personalities">The tools have personalities</h2>

<p>Here’s something I find fascinating, and I’m not the only one. Another principal engineer I know has noticed the same thing.</p>

<p>Codex is the grumpy pragmatist. It goes away, gets some stuff done, comes back, and tells you the bare minimum you need to know. Not a single detail more. Bullet points, to the letter, done. That’s fine. That’s exactly what you’d expect from a tool optimised for task completion.</p>

<p>Claude, given the exact same prompt via the web with Opus, comes back reading like a chatty engineer. A bit scattered, a bit flowery, but thorough. You’ll get everything you need, it’ll just need a bit of back-and-forth to extract it cleanly.</p>

<p>But here’s the interesting bit: OpenCode, regardless of which model it’s running underneath, whether that’s Opus or GPT-5.4, tends to give better structured results than either of the first-party platforms running the same models. The investigations are better organised. The outputs are clearer. The intent comes through more directly. I’m finding this is true when comparing Claude Code to OpenCode running Opus, and it’s also true when comparing Codex to OpenCode running GPT-5.4.</p>

<h2 id="the-harness-matters-more-than-the-model">The harness matters more than the model</h2>

<p>The scaffolding around the model, how it structures tool calls, how it formats its output, how it organises an investigation, is doing more heavy lifting than people assume. That’s a genuinely counterintuitive finding. The same model, in a different harness, produces meaningfully different quality of output. If you’re only evaluating models, you’re missing half the picture.</p>

<h2 id="ai-expands-capacity-not-energy">AI expands capacity, not energy</h2>

<p>I’ll be honest. By the end of every week right now, I am flattened. Exhausted. Mentally, emotionally, everything. Gone.</p>

<p>These tools have enabled me to explore and understand systems at a superficial level far faster than I ever could otherwise. To get the depth I needed for these handover documents would have taken weeks of investigation. I did it in hours. That’s real. That’s meaningful. That capacity expansion let me keep working on high-priority deep-dive bugs while simultaneously supporting an external team and ensuring they had enough context to be unblocked and start working independently.</p>

<p>But working with AI at full tilt is cognitively expensive in ways that people underestimate. You’re doing more, faster, and that uses more of your mental energy than you think. AI gave me the capacity to do work that would’ve been impossible to fit in otherwise. It did not give me more energy to do it with.</p>

<h2 id="still-experimenting">Still experimenting</h2>

<p>I’ve run this playbook three times now without changing it. Same process each time. I haven’t tried to refine it or automate it or build a harness around it yet, though it’s in the back of my mind. I actually started building a mobile app a while back to formalise the council concept for personal use, but got distracted because, well, reasons.</p>

<p>So what’s the lesson? Don’t trust one AI. Use a council. Get them to validate each other. Get them to look for reasons they’re wrong. Use the synthesis of multiple perspectives to build confidence in your understanding, then go validate it yourself.</p>

<p>I’m still not entirely convinced this is the best strategy. But it is letting me do things I could not have done otherwise, and right now that’s enough. The future of AI-assisted engineering might not be a better model. It might be a better council.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="aios" /><category term="claudecode" /><category term="softwarecraftsmanship" /><summary type="html"><![CDATA[You were the chosen one! You were supposed to destroy the hallucinations, not join them!]]></summary></entry><entry><title type="html">The Smell of Panic When You Context Thrash</title><link href="https://www.petervanonselen.com/2026/03/24/the-smell-of-panic-while-you-thrash/" rel="alternate" type="text/html" title="The Smell of Panic When You Context Thrash" /><published>2026-03-24T08:00:00+00:00</published><updated>2026-03-24T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2026/03/24/the-smell-of-panic-while-you-thrash</id><content type="html" xml:base="https://www.petervanonselen.com/2026/03/24/the-smell-of-panic-while-you-thrash/"><![CDATA[<p><em>High high hope for the code, shooting for a PR when I couldn’t even make a commit…</em></p>

<hr />

<p><img src="/assets/panic.png" alt="Panic at the keyboard" /></p>

<p>Over the past few weeks I have been increasingly panicked. That’s probably the only honest way to frame it.</p>

<p>I’ve been juggling a lot. Supporting a team building out a new payment method. Handing over deep knowledge of an existing payment method to another team working outside our scope but integrating with systems we own. Helping yet another team figure out how to break a backend platform service from a monolith into microservices. And somewhere in between all of that, trying to actually deliver a feature myself: a seemingly straightforward change to one of our frontend purchase journeys to make it faster.</p>

<p>Each of those things individually requires serious context. Together they’ve been chewing up my headspace and pulling me in every direction. Which means that whenever I finally sat down to write actual code, I arrived in a state of absolute panic. The “oh my gosh I have so little time, move quickly, move quickly, move quickly” kind of survival mode.</p>

<p>And that’s when I made one of the most fundamental mistakes you can make with AI-assisted development.</p>

<h2 id="the-50-file-disaster">The 50-File Disaster</h2>

<p>Here’s the thing about AI-generated code: it’s easy. So easy that it’s almost impossible to remember just how easy it is, because you’ve spent a lifetime handcrafting code yourself. You carry this default assumption that building things is complicated and slow.</p>

<p>So I did what I thought was the right thing. I planned. I looked through the existing code. I thought about it. I wrote out detailed acceptance criteria. I thought about the problem from the AI’s perspective. And then I said “cool, I think I have enough” and started implementing.</p>

<p>By the time I got someone else to look at it, they pointed out it was missing a key behaviour. I’d misunderstood part of the acceptance criteria. OK, fine. I started trying to fix it.</p>

<p>And then trying to fix it. And trying to fix it.</p>

<p>What was a small hole became a deep hole became a nightmare became “why does it feel like I will never, ever get anywhere with this?” By the end of it I was touching something like 50 files across two repos. Small changes scattered everywhere, most of them not even really needed. All driven by the panic override of needing to get something done while constantly being pulled out of context and back in and out and back in until I was just thrashing. Burning cycles. Making zero progress.</p>

<h2 id="panic-is-a-smell">Panic Is a Smell</h2>

<p>If you’ve been in software long enough you know what a code smell is. Something that isn’t technically broken but tells you something deeper is wrong.</p>

<p>The panic to get things done is a smell. The pushing and pushing and pushing is a smell. The feeling that you don’t have enough time, that you have to ship something right now, that you can’t afford to slow down? That’s a smell. And I ignored it for way too long.</p>

<p>Because here’s the lesson I apparently need to keep re-learning: with AI-assisted development, <em>the writing of code is not the bottleneck</em>. It never was. The understanding is the bottleneck. And when you’re panicking, you skip the understanding to get to the doing, which is exactly backwards.</p>

<h2 id="the-reset">The Reset</h2>

<p>I eventually stopped. Stepped away from the mess. Started with a brand new repository. Took all the things I’d learned, the plan document, the acceptance criteria, everything. And then I spent an entire day in conversation with an AI. Not writing code. Just investigating.</p>

<p>Testing existing behaviour. Running multiple examples and execution paths. Making sure I had a precise, clear understanding of what the current system actually did, what the new behaviour needed to be, and all the various paths between them. I literally spent hours asking the AI to explain each step of its plan and justify why it chose that approach.</p>

<p>I say all the time that planning matters more than coding. But experiencing the contrast firsthand is different. Hours of slow, methodical, back-and-forth investigation. Deep thinking about context. Deep thinking about what you’re trying to do and why. So that when you, <em>the human in the loop</em>, actually ask the AI to build something, the full context of what you’re trying to achieve is sitting clearly in your head. You understand the user behaviour. The system interactions. The high-level architecture. You could draw all the diagrams because you actually understand what needs to be done.</p>

<p>The feature that had consumed a week and a half of thrashing? After that day of planning, it took a couple of hours to get something working correctly.</p>

<h2 id="the-council-of-ais-or-going-wide">The Council of AIs (or: Going Wide)</h2>

<p>Meanwhile, on the other side of my work life, I’ve been doing something completely different with AI tooling.</p>

<p>To support the contracting team building out a new feature, I’ve been running what is essentially a council of AIs to review their design documents. OpenCode, Codex CLI, and Claude Code running simultaneously so I can verify, validate, and cross-compare. Deep-dive analysis with Claude and ChatGPT for architectural decisions and historical context. Complex investigation into bugs that were first logged two years ago and never properly resolved.</p>

<p>I have been holding the context of a massive amount of different workstreams. Work that would have taken me days or weeks to get even a baseline understanding of. The AI tooling genuinely lets you go wide in a way that wasn’t possible before.</p>

<p>And that’s where the tension lives.</p>

<h2 id="shield-and-sword">Shield and Sword</h2>

<p>The honest truth is that I’ve been doing two very different jobs at the same time.</p>

<p>One job is the shield: absorbing context, running investigations, unblocking other teams, reviewing designs, holding the big picture so nobody else has to. The AI tooling makes this possible. It lets you hold 10x the context. You can pre-empt meetings by using Claude to pull together context and solve problems before the meeting even happens, cancelling two or three in a morning and buying yourself hours of uninterrupted time. You can run parallel investigations across multiple tools and hold the full picture of what’s going on across an entire programme of work.</p>

<p>The other job is the sword: actually sitting down and delivering a piece of working software. And that requires the opposite of going wide. It requires going deep. Slow. Methodical. Boring, even.</p>

<p>The AI enables both.</p>

<p>But your brain can’t do both at the same time.</p>

<p>When you try, you thrash. You burn cycles switching between deep and wide, and just like a thrashing computer, you end up doing a lot of work and making no progress.</p>

<h2 id="what-im-taking-away">What I’m Taking Away</h2>

<p>Two things, and they’re in tension with each other, and I’m OK with that.</p>

<p><strong>Go deep before you go fast.</strong> Planning with AI isn’t just “write a spec and hand it over.” It’s hours of investigation. It’s asking the AI to explain its own plan in painful detail. It’s making sure you understand the problem so well you could solve it by hand. The code is the easy part. The understanding is the work.</p>

<p><strong>AI lets you hold a lot of context, but your brain still has limits.</strong> Context switching costs the same as it always did. Maybe more, because the AI makes it tempting to take on everything. You can hold the shield and the sword, but not at the same time. Deliberately buying yourself blocks of deep time is not optional. It’s the whole game.</p>

<p>This is the new trap for senior engineers. AI lets you take on more surface area than ever before. But the work that actually ships still requires deep focus. Nobody is immune to thrashing, no matter how good the tooling gets.</p>

<p>And if you’re sitting there right now, pushing and pushing and panicking and feeling like you’ll never get there? That’s a smell. Stop. Step away. Start again with understanding, not urgency.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="aios" /><category term="claudecode" /><category term="softwarecraftsmanship" /><summary type="html"><![CDATA[High high hope for the code, shooting for a PR when I couldn’t even make a commit…]]></summary></entry><entry><title type="html">The Feedback That Doesn’t Care About Your Title</title><link href="https://www.petervanonselen.com/2026/03/17/the-feedback-that-doesnt-kill-you/" rel="alternate" type="text/html" title="The Feedback That Doesn’t Care About Your Title" /><published>2026-03-17T08:00:00+00:00</published><updated>2026-03-17T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2026/03/17/the-feedback-that-doesnt-kill-you</id><content type="html" xml:base="https://www.petervanonselen.com/2026/03/17/the-feedback-that-doesnt-kill-you/"><![CDATA[<p><em>What doesn’t kill you makes you stronger … right?</em></p>

<hr />

<p>I’ve been writing this blog for a while now. I’ve documented scope creep spirals, the joy of deleting code I spent weeks writing, and the slow painful education of learning to work with AI agents without letting them run off a cliff. If you’ve been following along, you know the theme by now: I learn things the hard way and then write about it so you don’t have to. Or at least so you can watch.</p>

<p>Yesterday I was explaining to a colleague how I use Gemini to give me personalised feedback on my performance after meetings. He’s a staff engineer, someone who’s genuinely deep into AI-assisted coding, not a tourist. And even he stopped and went: “Wait, you’re doing <em>what</em>?”</p>

<p>That reaction made me step back. Because I hadn’t really sat down and thought about what I’d actually built over the past year. I’d been solving problems one at a time, better code generation here, better context gathering there, a way to get honest feedback on myself over here, and somewhere along the way it had become something bigger. A system. A layer underneath how I work that I now can’t imagine working without.</p>

<p>So this is me trying to describe what that looks like, now that I’ve finally noticed it.</p>

<p><img src="/assets/feedback-that-kills-you/image.png" alt="the layers" /></p>

<h2 id="the-code-layer-get-off-the-autocomplete">The code layer: get off the autocomplete</h2>

<p>If you’re writing code with an AI assistant inside your IDE, Copilot in VS Code for instance, I’d gently suggest you’re missing the better experience. CLI agents like Claude Code, Codex CLI, and Open Code have fundamentally changed how I interact with code. Open Code has become my go-to because it works well straight out of the box and, crucially, it connects to Copilot’s backend models. If your company already pays for Copilot, Open Code might be the unlock you didn’t know you were waiting for.</p>

<p>The shift matters because it changes what the AI is doing. Inside an IDE, it’s autocomplete with delusions of grandeur, guessing what you want line by line. On the command line, I’m describing problems, analysing architecture, pulling systems apart, generating diagrams. It stops being a typing assistant and starts being a thinking partner.</p>

<p>I use this for everything that touches code directly: writing it, reviewing it, debugging, breaking things apart to understand them, building architecture diagrams. The lot.</p>

<h2 id="the-context-layer-taming-the-organisational-scatter">The context layer: taming the organisational scatter</h2>

<p>Every engineer in a large organisation knows this pain. The information you need to do your work lives in seven different places: Slack threads, Google Meet recordings, Confluence pages, Jira tickets, GitHub PRs, and at least two places nobody told you about. You spend half your time assembling a coherent picture before you can even start thinking.</p>

<p>ChatGPT and Claude’s enterprise integrations have changed this for me. Both allow you to connect to corporate tools, your chat platform, docs, issue tracker, source control, and pull context into a single conversation. Instead of trawling through three Slack channels and two Confluence pages for forty minutes, I pull it all together and ask: what does this ticket actually need? What are the acceptance criteria? What am I missing?</p>

<p>Here’s where it compounds. Good acceptance criteria from this layer mean better prompts for the coding agents. The layers feed each other. I didn’t design it that way, it just happened once the pieces were in place.</p>

<h2 id="the-mirror-feedback-that-doesnt-care-about-your-feelings">The mirror: feedback that doesn’t care about your feelings</h2>

<p>This is the hard one to talk about. And the one that made my colleague stop in his tracks.</p>

<p>We’re a remote organisation. Google Meet is where everything happens, and Gemini sits inside every meeting. Most people don’t turn on transcriptions, which I think is a mistake. Any meeting producing collective knowledge should generate a transcript. Those transcripts feed the context layer above.</p>

<p>But there’s another use that took me months to work up to.</p>

<p>After meetings where I’m an active participant, I ask Gemini: as a staff engineer, what went well, what didn’t go well, and what can I improve on?</p>

<p>The first few times, it was rough.</p>

<p>Here’s the thing about feedback from humans: you almost never get anything useful. You either get “yeah, that was fine” or something so carefully hedged that whatever kernel of truth was in there has been sanded down to nothing. I’ve rarely received feedback that was specific, actionable, and tied directly to something I actually did in a real moment.</p>

<p>Gemini doesn’t do hedging. It references specific things that happened in the meeting. “You reframed the argument here and it shifted the conversation constructively.” Or: “You weren’t listening here and this is where it cost you.” It once told me that while I’d handled a frustrated colleague well, I could have spotted the frustration earlier and intervened before it escalated, and that when another colleague was dismissive, I’d recovered well but could have prepared for that reaction. Specific. Contextual. Minutes after it happened.</p>

<p>When I explained this to my colleague yesterday, he asked: “Isn’t this just seeking perfection?” And I realised, no. It’s just a way to learn and grow and become more deliberate about how I communicate, how I lead, and how I interact with the people around me. You can’t improve what you don’t measure. This is measuring.</p>

<p>But here’s what really made me think this is bigger than my own little experiment. I told a principal engineer friend at another company about this approach. He had a difficult conversation coming up, recorded it with Gemini, and afterwards used the transcript to get actionable feedback on how he’d handled it. His reaction was genuine shock. He’d never had that clear a picture of how his conduct was landing. An engineering manager I know has started doing the same thing and describes it as brutal but the most meaningful feedback he’s received in years.</p>

<p>And I think there’s a reason for that. I remember chatting with a startup CEO at a meetup who made the observation that the higher you go in leadership, the less honest feedback you receive. The position of power makes it hard for people to cross that barrier. Gemini doesn’t have any concept of your title or your seniority. It just tells you what it saw.</p>

<p>In the beginning, every session felt like a wake-up call. After months of doing this consistently, keeping a log, reading it back, it softened. Not because the feedback got less honest, but because the gap between what I thought I was doing and what I was actually doing got narrower. Fewer surprises. More gentle nudges, fewer gut punches.</p>

<h2 id="so-what-is-this-actually">So what is this, actually?</h2>

<p>None of these tools alone would be worth a blog post. A CLI coding agent is nice. Enterprise AI integrations save time. AI self-reflection is powerful but weird. What caught me off guard, what I only noticed yesterday when I saw my colleague’s reaction, is that they work as a system.</p>

<p>Meeting transcripts feed the context layer. The context layer produces better acceptance criteria. Better acceptance criteria drive better output from the coding agents. The self-improvement loop makes me more effective in the meetings that generate the transcripts. Each layer feeds the others. I didn’t plan it. I just kept solving problems and the connections emerged.</p>

<p>There’s a Sam Altman interview from about a year ago where he describes people using AI as “an operating system for how they think.” At the time I had absolutely no idea what he meant. Now I think I do, and the uncomfortable truth is that I’m probably barely scratching the surface of where this goes.</p>

<p>So here is my take away action for you. Next meeting you are in with a transcript, ask an LLM for some honest feedback. Let me know if you learn anything interesting!</p>

<p>I’m still figuring it out. As usual, you’ll hear about it when I do.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="aios" /><category term="claudecode" /><category term="softwarecraftsmanship" /><summary type="html"><![CDATA[What doesn’t kill you makes you stronger … right?]]></summary></entry><entry><title type="html">How I Learnt to Stop Worrying and Love Agentic Katas</title><link href="https://www.petervanonselen.com/2026/03/05/learn-to-love-agentic-coding/" rel="alternate" type="text/html" title="How I Learnt to Stop Worrying and Love Agentic Katas" /><published>2026-03-05T08:00:00+00:00</published><updated>2026-03-05T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2026/03/05/learn-to-love-agentic-coding</id><content type="html" xml:base="https://www.petervanonselen.com/2026/03/05/learn-to-love-agentic-coding/"><![CDATA[<p><em>I don’t know how to teach this. But I think I’ve figured out how to practice it…</em></p>

<hr />

<p>Have you ever struggled to get started with something new, not because the thing itself is hard, but because the <em>shape</em> of how to learn it isn’t clear? That’s where I’ve been stuck with agentic coding. Not the doing of it. I’ve been doing it for months. The teaching of it. The “how do I help someone else get started” of it.</p>

<p>And then I remembered code retreats. And katas. And the way I actually learned TDD all those years ago, not from a book, but from structured practice with low stakes and room to play.</p>

<p>So I built a set of agentic katas: structured coding exercises designed specifically for practising AI-assisted development. Not traditional katas, those are too small and the AI already knows all the answers. These are bigger, meatier problems in unfamiliar domains that force you to engage with the full process of working alongside an agent.</p>

<p>Let me explain how I got here.</p>

<h2 id="a-brief-history-of-practising-on-purpose">A Brief History of Practising on Purpose</h2>

<p>Early in my career, code retreats were the thing that taught me test-driven development. Not a book. Not a course. A structured, all-day event where you solve Conway’s Game of Life over and over again, each time with different constraints. Maybe your pair is actively trying to <em>not</em> solve the problem. Maybe you’re strictly ping-ponging. Maybe you delete your code every 45 minutes.</p>

<p>The point was never to solve Conway’s Game of Life. The point was to internalise the patterns and practices of TDD by giving yourself a safe space to experiment. No production pressure. No deadlines. Just play.</p>

<p>Code katas grew out of the same ethos, small self-contained problems that shouldn’t take more than an hour or two. The algorithm doesn’t matter. How you choose to solve it does. They’re bite-sized by design. They’re not supposed to be hard. They’re supposed to be <em>practice</em>.</p>

<h2 id="the-problem-with-katas-and-ai">The Problem with Katas and AI</h2>

<p>Here’s the thing I’ve been struggling with: traditional code katas don’t work for learning agentic development. They’re too small. The LLM has already seen every solution to FizzBuzz and Roman Numerals in its training data. You’re not practising a workflow, you’re watching an AI regurgitate a known answer. There’s nothing to explore, nothing to plan, no decisions to make about approach or tooling.</p>

<p>And that matters, because the skill you need to develop with agentic coding isn’t “how to prompt an AI to write code.” It’s how to <em>think alongside one</em>. How to explore a problem space together. How to write a plan that gives an agent enough context to be useful. How to verify that what came back is actually what you wanted. How to set up your workspace so the AI has the right guardrails.</p>

<p>None of that shows up in a 30-minute kata where the AI already knows the answer.</p>

<h2 id="the-accidental-discovery">The Accidental Discovery</h2>

<p>What’s funny is that I’ve kind of been doing agentic katas already, almost by accident. I just didn’t realise it at the time.</p>

<p><img src="/assets/agentic-kata/kata1.png" alt="The first kata" /></p>

<p>A few weeks ago I wrote about <a href="https://www.petervanonselen.com/2026/02/10/agentic-play/">deleting code on purpose as a way to recover from burnout</a>. I’d been experimenting with the Ralph Wingum loop, throwing PRDs at an agentic coding workflow, seeing what came out, then deliberately throwing the code away. The output I was chasing wasn’t a codebase. It was understanding. How big can a PRD get before the loop breaks? How much do agent files matter? What’s the minimum setup to get something useful?</p>

<p>Each run was a contained experiment. Fresh repo, clear problem, focused practice, delete the code, do it again. I was varying one thing at a time, adding a CLAUDE.md file, scaling up the PRD size, trying a different domain, and learning from each iteration.</p>

<p>I was doing agentic katas. I just hadn’t named them yet.</p>

<p>Looking back, the whole arc has been building toward this. My game project was the first rough version, months of cycling through spec-driven development and making mistakes. The “delete the code” experiments compressed that into focused sessions. And now, formalising the structure into something other people can pick up feels like the obvious next step.</p>

<h2 id="building-the-thing">Building the Thing</h2>

<p>So I’ve put together a set of agentic katas. The idea is that each one should require somewhere in the region of four to eight hours of focused hand crafted work to do <em>well</em>. And by “well” I mean the full golden plate: test-driven, 100% coverage, clean README, proper git history, the works. Not because the output matters, but because doing that level of work with an AI agent forces you to actually engage with the process.</p>

<p>The loop for every kata is the same:</p>

<p><strong>Explore</strong> → <strong>Plan</strong> → <strong>Set Up Context</strong> → <strong>Build</strong> → <strong>Verify</strong></p>

<p>And there’s one key rule that makes the whole thing work: <em>you are not allowed to choose a programming language or framework until you’ve had a conversation with your AI tool about what the best approach is.</em></p>

<p>This is the rule that forces the shift. Instead of jumping straight to “build me X in TypeScript,” you have to start with “I need to solve this problem, what are my options?” You explore the problem space. You figure out what tools exist. You have the AI challenge your assumptions. <em>Then</em> you decide on an approach.</p>

<p>From there, you write a detailed plan, acceptance criteria, example data, use cases, a breakdown into small chunks of work. You set up your workspace with an agent file and think about what context to include. You build incrementally. And you verify everything: read the plan, read the code, run it, test it, confirm it does what you intended.</p>

<p><img src="/assets/agentic-kata/agentic-kata-loop.svg" alt="the loop" /></p>

<h2 id="the-katas-themselves">The Katas Themselves</h2>

<p>I’ve started with four problems, each chosen because they sit in a domain most developers haven’t worked in before:</p>

<p>An <strong>audio transcriber</strong> that handles speech-to-text with timestamps and speaker diarisation. A <strong>background remover</strong> for image segmentation. A <strong>meme generator</strong> that deals with text rendering and positioning on arbitrary images. And a <strong>thumbnail ranker</strong> that scores images for visual appeal.</p>

<p>Each kata has deliberate ambiguity baked in, because real problems are ambiguous, and part of the skill is figuring out what questions to ask. They also have a privacy constraint (everything runs locally, no cloud APIs for processing) and an extra credit extension for when you want to push further.</p>

<h2 id="why-this-matters-right-now">Why This Matters Right Now</h2>

<p>I’ll be honest: the reason I’m putting this together is partly selfish. I want to run a workshop with my colleagues, and I need structured material to do it. But there’s a bigger motivation too.</p>

<p>Right now, everything about AI and software development feels incredibly intense. Fear of being made obsolete. AI layoff discourse everywhere. The pressure to have strong opinions about tools you’ve barely had time to evaluate. It’s all very stressful, and stress is the enemy of learning.</p>

<p>What people actually need, what <em>I</em> needed, and what I accidentally created for myself, is a safe space to play. A contained environment where you can try things, make mistakes, and build intuition without the stakes of production code or career anxiety hanging over you.</p>

<p>Code retreats gave us that for TDD. I’m hoping agentic katas can do the same for working with AI.</p>

<h2 id="the-repo">The Repo</h2>

<p>I’ve put everything together in a repo: <a href="https://github.com/vanonselenp/agentic-katas">github.com/vanonselenp/agentic-katas</a></p>

<p>It includes the kata briefs, a participant guide covering the rules and process, and a facilitator guide for anyone who wants to run this as a structured session with their team. A session takes about 90 minutes.</p>

<p>I haven’t run this with anyone else yet. I only put it together today, and I’m planning to trial it with my team in the coming weeks. It might be brilliant. It might be terrible. Either way, I’ll write about how it goes.</p>

<p>But the core idea, that you need bigger, unfamiliar problems to practise AI-assisted development, and that the process matters more than the output, that I’m confident about. Because I’ve been living it, accidentally, for months.</p>

<p>If you try it, I’d love to hear how it goes. And if you’re doing something different to build these skills, I’d love to hear about that too.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="claudecode" /><category term="specdrivendevelopment" /><category term="katas" /><category term="softwarecraftsmanship" /><summary type="html"><![CDATA[I don’t know how to teach this. But I think I’ve figured out how to practice it…]]></summary></entry><entry><title type="html">14 PRs, 6 Repos, 1 Button: A Tale of Tumbling Down the Rabbit Hole</title><link href="https://www.petervanonselen.com/2026/02/12/rabbit-holes/" rel="alternate" type="text/html" title="14 PRs, 6 Repos, 1 Button: A Tale of Tumbling Down the Rabbit Hole" /><published>2026-02-12T08:00:00+00:00</published><updated>2026-02-12T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2026/02/12/rabbit-holes</id><content type="html" xml:base="https://www.petervanonselen.com/2026/02/12/rabbit-holes/"><![CDATA[<p><em>True stories from the front lines of the internet…</em></p>

<hr />

<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/83/Down_the_Rabbit_Hole_%28311526846%29.jpg/960px-Down_the_Rabbit_Hole_%28311526846%29.jpg" alt="Alice falling down the rabbit hole" />
<em>Alice in Wonderland by <a href="https://commons.wikimedia.org/wiki/File:Down_the_Rabbit_Hole_(311526846).jpg">Valerie Hinojosa</a> / <a href="https://creativecommons.org/licenses/by-sa/2.0/">Creative Commons Attribution-Share Alike 2.0
</a></em></p>

<p>Now this is a story all about how one button link got my codebase flipped turned upside down. And I’d like to take a minute, just sit right there, and I’ll tell you how I shipped 14 PRs without pulling out my hair.</p>

<p>It started with a Monday morning meeting. I’d been off for three weeks. The meeting was dense with context about decisions made months ago, documented across scattered specs and design docs. Systems I don’t own. Plans originally speced out almost a year prior. SEO requirements. Legacy middleware behaviour. And somewhere in all of this, a single task: change where a subscribe button points.</p>

<p>The old flow routed users through a legacy auth endpoint which was a piece of middleware handling user state and return-to-site functionality. The new flow should skip that layer and go direct. Simple, right?</p>

<p>Three repos. 3 small PRs. That was the original scope.</p>

<p>It became six repos and one or two more PRs…</p>

<h2 id="the-context-problem">The Context Problem</h2>

<p>Here’s what made this tricky: I didn’t have the context. Not the institutional knowledge of why things were built this way. Not the codebase familiarity to know where all the tendrils reached. Not the cross-system visibility to see how changes would ripple.</p>

<p>Normally, this is where you’d involve other teams. Schedule alignment meetings. Negotiate architecture choices. Coordinate timed releases. The org chart becomes the constraint.</p>

<p>Instead, I threw five AI tools at the problem.</p>

<p>I used internal knowledge search to surface half a dozen docs from a year ago about what a potential migration might look like. Copilot and Codex scanned repos I’d never opened, outputting high-level analysis of what would need to change. NotebookLM synthesised a dozen-plus sources into actionable Jira tickets with acceptance criteria and testing plans. And Claude handled the actual implementation across all six repositories.</p>

<p>Each tool for what it does best. None of them sufficient alone.</p>

<h2 id="the-shape-of-the-change">The Shape of the Change</h2>

<p>What was supposed to be three repos became six because the AI tooling kept finding rabbit holes worth going down.</p>

<p>The approach was backwards compatibility first. I updated the auth service to forward requests to the new endpoint, so existing systems would keep working. Only after that was stable did I remove the old code paths and switch the calls to point directly to the new flow.</p>

<p>Along the way, I hit a referrer bug that only revealed itself mid-implementation. One of the components lived in a shared library, not a full application, which meant handling referral data differently than expected. This meant that I had to change how it was reading from window referrer data rather than relying on direct redirect URLs.</p>

<p>And then there was a shared header component in another team’s repo. Hardcoded to the old endpoint. In code I couldn’t easily modify. The rabbit holes kept cropping up every time I thought I dived down them all.</p>

<p>Fourteen PRs. Six repositories. Backwards compatible throughout. Zero downtime.</p>

<p>The old flow had an extra hop through legacy middleware that handled state management. The new flow removes that layer entirely. Which makes for a faster time to checkout, same user experience, one less thing to maintain.</p>

<h2 id="the-point">The Point</h2>

<p>This would have been a multi-team effort. Alignment meetings across three teams, at minimum. Negotiated timelines. Architectural discussions. Coordinated releases.</p>

<p>Instead, it was one developer holding context that used to require an org chart.</p>

<p>I’m not saying AI tooling makes you a better engineer. I’m saying it lets you hold more context. And sometimes that’s the difference between “we’ll need to schedule a meeting with the other teams” and “I’ll have a PR up by Thursday.”</p>

<p>The context ceiling just got a lot higher.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="vibecoding" /><category term="claudecode" /><category term="specdrivendevelopment" /><category term="codex" /><summary type="html"><![CDATA[True stories from the front lines of the internet…]]></summary></entry><entry><title type="html">This Is the Way: Delete the Code</title><link href="https://www.petervanonselen.com/2026/02/10/agentic-play/" rel="alternate" type="text/html" title="This Is the Way: Delete the Code" /><published>2026-02-10T08:00:00+00:00</published><updated>2026-02-10T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2026/02/10/agentic-play</id><content type="html" xml:base="https://www.petervanonselen.com/2026/02/10/agentic-play/"><![CDATA[<p><em>How I learned to do AI Katas and make disposable code helped me recover from burnout</em></p>

<hr />

<p>Burnout is real, and it takes time to work its way out.</p>

<p>I spent the last couple of months trying to work up the will to tackle game projects again. Every attempt fizzled. After a 3+ month slog on a project that <a href="https://www.petervanonselen.com/2025/11/20/scope-creep/">refused to reach an end state</a>. I had nothing left.</p>

<p>So instead of committing to another massive project, I started playing.</p>

<p>I came across the Ralph Wingum loop (<a href="https://www.youtube.com/watch?v=RpvQH0r0ecM">30-min video if you’re curious</a>) and decided to experiment. The premise is simple: use two Claude skills and a bash script to let an AI go full agent mode. The first skill, <code class="language-plaintext highlighter-rouge">/prd</code>, takes a spec and generates user stories with verifiable acceptance criteria. The second, <code class="language-plaintext highlighter-rouge">/ralph</code>, converts that PRD into JSON. Then you loop over the JSON until done. This is basic agentic coding, but AI-agnostic and surprisingly effective.</p>

<p><img src="/assets/agentic-play/image.png" alt="the wiggam loop" /></p>

<p>I needed a project to test this on, so I picked <a href="https://boardgamegeek.com/boardgame/163474/v-sabotage">V-Sabotage</a>. It’s a board game I enjoy but rarely get to play (toddler life), and more importantly, it’s simple enough to define a clear MVP: rooms, a player, guards, sneaking mechanics, a win condition. I’d learned my lesson about scope.</p>

<p>The real experiment wasn’t building the game. That was just the head fake. It actually was figuring out how to break down the work. How big should each PRD be? Do you treat each milestone as its own PRD? Do you throw the whole spec at it and see what happens?</p>

<p>I had to find out.</p>

<p><strong>First run:</strong> I threw a PRD at the skills, ralphed it, and looped on a fresh repo. What came out was very familiar from the last time I was building a game in Godot with AI. Bascially something that worked, but buggy, clunky, no tests, poor signal architecture, tightly coupled code.</p>

<p><strong>Second run:</strong> Same PRD, same loop, but this time I initialised the repo with a CLAUDE.md file first. Just basics: test-drive the code, use Godot 4.x best practices, that sort of thing.</p>

<p>The difference was dramatic. The AI wrote its own test runner. It test-drove everything, achieved high coverage, produced cleaner interfaces, used signals properly, and kept things decoupled. Twenty minutes of compute, and the output was genuinely good. Honestly the thing that blew my mind on this was <strong>it wrote it’s own TEST RUNNER!?!?</strong>. Are you kidding me?</p>

<p>So … key lesson: agent files matter. A lot.</p>

<p><strong>Third run:</strong> I embedded full milestones into the PRDs—a dozen user stories each, multiple acceptance criteria. The loop churned through it and produced a testable MVP in surprisingly little time.</p>

<p><img src="/assets/agentic-play/tactics.png" alt="stealth game" /></p>

<p><strong>Fourth run:</strong> I got distracted by an app idea. Spent a couple of hours refining it with AI, generated a chunky PRD, threw the whole thing at the <code class="language-plaintext highlighter-rouge">/prd</code> skill. It produced 40 stories. I looped it. An hour later, with 5% of my usage allowance remaining, I had a working prototype.</p>

<p><img src="/assets/agentic-play/mobile-app.png" alt="random app" /></p>

<p>It didn’t do exactly what I wanted. But it did most of what I’d asked. This was surprisingly more than enough to immediately change my thinking about what I actually needed in a meaningful way.</p>

<p>And here’s the thing that made all of this feel like play instead of work: <strong>I deleted the code.</strong>. I went full <a href="https://www.coderetreat.org/">code retreat conways game of life</a> delete the code.</p>

<p>Multiple times. Deliberately. The output I was chasing wasn’t a codebase. It was understanding. How big can a PRD get before the loop breaks? (Bigger than I expected.) How much do agent files matter? (More than I expected.) What’s the minimum setup to get something useful? (Less than I expected.)</p>

<p>Disposable code meant low stakes. Low stakes meant I could experiment freely. And experimenting freely, it turns out, is how I recover from burnout.</p>

<p>This play has fundamentally changed how I work at The Economist. But that’s a story for next time.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="vibecoding" /><category term="claudecode" /><category term="specdrivendevelopment" /><category term="codex" /><summary type="html"><![CDATA[How I learned to do AI Katas and make disposable code helped me recover from burnout]]></summary></entry><entry><title type="html">Why You Shouldn’t Speedrun a Production Refactor</title><link href="https://www.petervanonselen.com/2025/12/12/speedrunning-prod-refactors/" rel="alternate" type="text/html" title="Why You Shouldn’t Speedrun a Production Refactor" /><published>2025-12-12T08:00:00+00:00</published><updated>2025-12-12T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/12/12/speedrunning-prod-refactors</id><content type="html" xml:base="https://www.petervanonselen.com/2025/12/12/speedrunning-prod-refactors/"><![CDATA[<p><em>Learning the hard way that AI makes discipline more important, not less…</em></p>

<hr />

<p>This week has been a mess. I’ve been ill since last Thursday with a cough that’s been both productive and dizzying, which in a rare moment of clarity made me realize that maybe doing personal dev in the evenings is… not ideal. So no game dev tales this week.</p>

<p>Instead, I want to talk about how I nearly torpedoed a production refactor at The Economist a month ago by forgetting the most important lesson I’ve been learning over the past few months: <strong>AI makes discipline more important, not less.</strong></p>

<h2 id="the-spectacular-failure">The spectacular failure</h2>

<p>I’m currently on the e-comm-funnel team working on the checkout pipeline. One of my first projects has been tackling Commerce Services. This is a Go monolith (a language I’d never used before recently) that started as a POC and got productionized. Naturally, it’s a beautiful mess with conflicting APIs doing all sorts of non-cohesive domain things.</p>

<p>My goal: break it into microservices.</p>

<p>So naturally I did what I’ve been practicing with Horizons Edge and started with a spec. I had a very long conversation with Codex, analyzed the repo structure, identified the domains, mapped dependencies. From this chat we produced a solid 10 page high-level plan.</p>

<p>And the first step of that plan was <em>Phase 1: extract common code into a shared library</em>.</p>

<p>And somewhat predictably, here’s where I got clever.</p>

<p>I thought: “I’ve got a detailed spec. Codex knows Go. Let’s just… do the whole thing! Whats the worst that could happen?”</p>

<p>So I did. One massive refactor. Codex happily obliged.</p>

<p>Then I looked at the pull request: <strong>200 files changed in the monolith. 80 files in the new library.</strong></p>

<p><img src="/assets/speedrun-refactor/this-is-fine.png" alt="this is fine, right?" /></p>

<p>And I just stared at it, completely overwhelmed by the obvious question: <em>How the hell am I going to verify this actually works?</em></p>

<p>There was no way I could meaningfully review 280 files of changes. No way I could ask another engineer to do it. No way to be confident this wouldn’t break something subtle in production. I’d just created an unshippable monster.</p>

<h2 id="starting-over-properly-this-time">Starting over, properly this time</h2>

<p>I scrapped the entire thing and started again with an “I need this to be incremental” mindset.</p>

<p>Not just because I wanted to be able to review it, though that’s critical, but because I genuinely believe small releases into production are the right way to work. It should have been my default starting point. Instead, I’m still learning just how disciplined I need to be when working with AI tooling.</p>

<p>The new approach:</p>

<p><strong>First</strong>, I wrote a much more detailed spec for Phase 1 that lived in the new repo. Not just “extract shared library” but an 8-step plan where each step could go to production independently. Start with the absolute minimum: just one joint service with no dependencies. This would validate the CI/CD pipeline, the integration points, everything, with the smallest possible change.</p>

<p><strong>Then</strong>, one step at a time:</p>
<ul>
  <li>Extracted and deploy leaf utilities (logging, validation, middleware)</li>
  <li>Migrated HTTP routing abstractions</li>
  <li>Moved observability and AWS helpers</li>
  <li>Extracted infrastructure components like health checks</li>
  <li>Finally, the component registry</li>
</ul>

<p>At each step: tested, improved coverage, deployed to production, monitored. The existing systems kept running exactly as before.</p>

<p>I followed the same hyper-methodical approach I’ve been using with the game project. Focusing on small scoped MVP slices and incremental delivery. For the actual development, I loaded Codex into a workspace with both repos and had it follow the spec file for each migration. Then validated with Claude in GitHub Copilot, extensive personal review, and eventually team review before each production deployment.</p>

<p>The result: A refactor of a core system touching ~200 files, in a programming language I’m just learning, in a domain I’d just joined, completed over a couple of weeks with zero downtime. No one on the team was blocked or impacted. It just happened quietly in the background.</p>

<h2 id="what-im-taking-away">What I’m taking away</h2>

<p>Two things keep reinforcing themselves across contexts:</p>

<p><strong>First</strong>: AI amplifies your need for discipline. The easier it becomes to generate large amounts of code, the more critical it is to think carefully about scope, verification, and deployment strategy. One-shotting 280 files feels productive in the moment. It’s not. It’s just creating an unshippable mess you’ll have to undo.</p>

<p><strong>Second</strong>: The “what’s the smallest increment that adds value?” mindset pays off everywhere. It saved Horizons Edge when I was drowning in scope creep. It made this refactor safe and reviewable. It’s not just a nice-to-have for side projects … it’s how you de-risk production changes in unfamiliar territory.</p>

<p>Next up is breaking out actual domains into microservices, starting with Identity &amp; User. But that’s a plan for next year, when I’m hopefully no longer coughing my lungs out.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="vibecoding" /><category term="claudecode" /><category term="specdrivendevelopment" /><category term="codex" /><summary type="html"><![CDATA[Learning the hard way that AI makes discipline more important, not less…]]></summary></entry><entry><title type="html">Finally… A Wild MVP Appears</title><link href="https://www.petervanonselen.com/2025/12/04/finally-a-wild-mvp-appears/" rel="alternate" type="text/html" title="Finally… A Wild MVP Appears" /><published>2025-12-04T08:00:00+00:00</published><updated>2025-12-04T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/12/04/finally-a-wild-mvp-appears</id><content type="html" xml:base="https://www.petervanonselen.com/2025/12/04/finally-a-wild-mvp-appears/"><![CDATA[<p><em>Three Months to MVP: What I Learned Building a Tactical Card Game with AI…</em></p>

<hr />

<p><img src="/assets/mvp/banner.png" alt="banner" /></p>

<p>It’s been three months since I started trying to make Horizon’s Edge, a tactical turn-based wargame in the sky. And honestly, I’m completely stunned that I even have something that actually … kinda … works.</p>

<h2 id="from-vibe-to-spec">From Vibe to Spec</h2>

<p>When I started this project, I was pure vibe coding. But somewhere in the middle, when I was trying to refactor the UI from a classic RTS/turn-based strategy with a mass of buttons to something entirely driven by card play, vibe coding hit a wall. I couldn’t get AI tooling to cooperate with my loose intuitions. That’s when I started working with specs and it changed my life…. metaphorical life …. but life!</p>

<p><img src="/assets/mvp/cards.png" alt="card driven" /></p>

<p>Writing a clear specification before diving head first into a vibe changed everything. Suddenly the AI had guardrails. It stopped looping in circles. If you want to go deeper on this, I wrote about starting to figure out specs in <a href="https://claude.ai/2025/10/03/chaos-cards-and-claude-copy/">Chaos, Cards, and Claude</a>.</p>

<h2 id="months-of-education">Months of Education</h2>

<p>What I’ve learned so far is this: it’s entirely possible to make working software with AI tools. You can keep momentum even when you’re completely out of energy or time. But the most important thing … the thing that actually matters … is that you have to always verify what the AI outputs. Automated unit tests are your best friend here. A Quality first mindset and thinking about edge cases is fundamental.</p>

<p><img src="/assets/mvp/waveform.gif" alt="waveform generation" /></p>

<p>I’ve done major refactors. I’ve rethought how the game works multiple times. I added procedural generation because it was fun. I figured out how to make the game work with just card play mechanics that… mostly work (much to my surprise). Some refactors were vibe-coded disasters; others were spec-driven and clean. But every single one taught me something about how to work with AI as a tool rather than a replacement.</p>

<h2 id="the-mvp-what-it-took">The MVP: What It Took</h2>

<p>Two days ago I finished the final major system needed to validate the MVP: the victory system. Islands needed to change ownership. Every existing system needed to integrate with that. Core island nodes needed to be targetable from creatures, spells, and abilities. Getting the targeting code to work meant hitting more touch points than I anticipated. A lot of different abilities needed specific ways to target islands, not just creatures. It was more entertaining than expected, but I had a spec, and that spec kept me honest.</p>

<p>I now have two thematic decks that are actually unique. 10 creatures. Infrastructure cards that terraform and change the world. A spell that destroys the world. Card play mechanics that mostly work. A whole horde of nuanced and detailed rules about damage and combat that work.</p>

<p><strong>Prototype done. Well, mostly.</strong></p>

<p>The mechanics are all there. What’s left is UI feedback. Cards need to show their play cost clearly, what they do when discarded, what abilities they have before you play them. I know all this because I’ve spent three months building it. Others won’t. So I’m going to tackle those UI gaps this week, then put this in front of a few people to get real feedback.</p>

<h2 id="now-its-your-turn">Now It’s Your Turn</h2>

<p>I’ve spent the past five months working on this. It started with a Magic the Gathering Pauper Jumpstart cube that just wouldn’t get out of my head, went running headlong into a board game prototype that was way too complicated and barrelled straight into this game. I’ve learned a ridiculous amount. <strong>But here’s what actually matters:</strong></p>

<p>It is possible to make working software with AI. You can make some amazing things with these tools. You can learn as you go. You can ship those crazy ideas you never thought you could.</p>

<p><strong>So here’s what I want:</strong></p>

<p>I want <em>you</em> go out and make your own mistakes. Build something weird. Use AI as a tool, verify everything it does, and then make something great. Make art. Because making art, well, that’s the most human thing we can do.</p>

<p>Tell me what you build. I want to hear about it.</p>

<p>Till then, keep learning.</p>

<div style="position: relative; width: 100%; padding-bottom: 56.25%; height: 0; overflow: hidden;">
  <iframe style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;" src="https://www.youtube.com/embed/NdQgOLlt8QQ?si=F2rdwci0GcDvNm9R" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
</div>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="godot" /><category term="vibecoding" /><category term="claudecode" /><category term="specdrivendevelopment" /><summary type="html"><![CDATA[Three Months to MVP: What I Learned Building a Tactical Card Game with AI…]]></summary></entry><entry><title type="html">The Long Road to MVP</title><link href="https://www.petervanonselen.com/2025/11/27/long-road-to-mvp/" rel="alternate" type="text/html" title="The Long Road to MVP" /><published>2025-11-27T08:00:00+00:00</published><updated>2025-11-27T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/11/27/long-road-to-mvp</id><content type="html" xml:base="https://www.petervanonselen.com/2025/11/27/long-road-to-mvp/"><![CDATA[<p><em>How I learned to stop overthinking and ship the damn thing…</em></p>

<hr />

<p><img src="/assets/long-road/banner.png" alt="banner" /></p>

<p><a href="https://www.petervanonselen.com/2025/11/20/scope-creep/">Last week I had a sudden but inevitable realisation</a>. Basically my MVP was completely not an MVP but rather a bloated over-built system that I should have been moving to release sooner. Which in hindsight I should have realised… 4 weeks ago. Lesson learned.</p>

<p>So, this week I started with yet another plan where I detailed out a new spec that broke down the exact things I needed to do. And then… I did just that! Everything else is deferred.</p>

<p><strong>Broad plan for MVP:</strong></p>
<ul>
  <li>1 Spell: Meteor - DONE</li>
  <li>1 Victory condition: own all the islands - IN PROGRESS</li>
  <li>2 Unique Decks - TODO</li>
</ul>

<p>The spell was easy. Just another ability, and make it bigger, cost more and have an animation that then removes a massive amount of real estate. Big, flashy, simple.</p>

<p><img src="/assets/long-road/meteor.gif" alt="meteor" /></p>

<p>The victory condition, however, has turned into a far more crunchy problem. Turns out wiring up ownership, health tracking, and win conditions across multiple systems was messier than I thought. I’ve built:</p>
<ul>
  <li>A <strong>game over</strong> screen to start a new game</li>
  <li>Health on the core island node</li>
  <li>A UI display to see that health</li>
  <li>A manager to track who owns the islands and a way to change ownership</li>
  <li>A victory condition that checks each round if someone’s actually won</li>
</ul>

<p>Almost there! The missing piece is creatures targeting the core island and dealing damage to it. Once I wire up that damage pathway, the whole system clicks. That’s what I’m grinding through at the moment.</p>

<p>All that will be left is two decks. Which should be as simple as updating two config files, and I’ll have a playable MVP. Finally.</p>

<p><img src="https://media4.giphy.com/media/v1.Y2lkPTc5MGI3NjExOG05YndpM2NxbWIxaDE5N2t0ODg2NDRia29tdWd0OXhoaXViMTFhaSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/dyw4fuAhPaIh0japgg/giphy.gif" alt="light" /></p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="godot" /><category term="vibecoding" /><category term="claudecode" /><category term="specdrivendevelopment" /><summary type="html"><![CDATA[How I learned to stop overthinking and ship the damn thing…]]></summary></entry><entry><title type="html">How Many Times Do You Have to Build Too Much to Learn Scope Creep?</title><link href="https://www.petervanonselen.com/2025/11/20/scope-creep/" rel="alternate" type="text/html" title="How Many Times Do You Have to Build Too Much to Learn Scope Creep?" /><published>2025-11-20T08:00:00+00:00</published><updated>2025-11-20T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/11/20/scope-creep</id><content type="html" xml:base="https://www.petervanonselen.com/2025/11/20/scope-creep/"><![CDATA[<p><em>Scope Creep Keeps Teaching Me…</em></p>

<hr />

<p><img src="/assets/scope-creep-november/banner.png" alt="banner" /></p>

<h1 id="scope-creep-keeps-teaching-me">Scope Creep Keeps Teaching Me</h1>

<p>Have you ever had one of those weeks where you suddenly have to be traveling a lot and don’t really have any time for personal projects? This has been my week. I still managed to get two creatures and six abilities created.</p>

<p>The Flux Chaos controls the board by moving creatures into and out of position, swapping them from behind enemy lines or randomly teleporting them somewhere (maybe even off into the void). The Flux Storm does massive area damage while locking down the area preventing anyone from teleporting in and out. Watching these creatures interact with the existing systems is delightful. The abilities are beginning to feel like they synergize and interact together nicely.</p>

<p><img src="/assets/scope-creep-november/chaos-storm.png" alt="chaos and storm" /></p>

<p>I’ve been on a mission to prototype this strange game idea for a while now. I started the repo on September 4th. Three months later, I’m watching a prototype that’s actually starting to feel like a game with interesting systems.</p>

<h2 id="but-heres-what-i-should-have-learnt-sooner-i-didnt-cut-far-enough">But here’s what I should have learnt sooner: I didn’t cut far enough.</h2>

<p>When I started, I aggressively removed things. All 8 envisioned factions. The single-player campaign, stories, and lore. AI players. Multiplayer. Different form factors like mobile. Complex models and animations. I thought I’d solved the scoping problem. Oh how innocent I was.</p>

<p>I then built ten creatures with twenty-nine abilities. I added procedural island generation and dynamic road construction. I went down tangents and rabbit holes as I iterated from RTS-style UI to pure card-driven gameplay.</p>

<p>Looking at that list now, that should have been a warning sign. And it gets worse: my remaining feature list for MVP includes five more buildings, five more spells, multiple win conditions, fog of war, proper turn behavior, UI polish.</p>

<p>It’s only now that I am looking at my remaining feature list (I wrote up the list for this very blog post), that I realize the entirely obvious thing. I am <em>still</em> overscoping. Scope creep doesn’t stop after one round of cuts. It’s persistent. Even when you’re actively trying to think in MVP terms, there’s a pull to add one more building, one more spell, one more system. Each one feels necessary. But they compound.</p>

<p>I should have picked this up sooner: that grinding feeling I’ve been having with the creatures and abilities. It has felt like motivation leeching away in the never ending drudgery. It was a clear warning sign I was off the path.</p>

<p>So here’s my actual MVP: one spell (meteor), Fog of War, one win condition, UI, and two decks. That’s it. Just the core systems missing, playable, shippable. This is what I should have built from the start, the smallest thing that proves the concept works.</p>

<p><img src="/assets/scope-creep-november/simple.webp" alt="simple" /></p>

<h2 id="the-lesson-keeps-teaching-itself">The lesson keeps teaching itself</h2>

<p>I keep learning the same lesson over and over. I thought I’d solved scope creep when I cut factions and campaigns. Three months later, with ten creatures and twenty-nine abilities built, I realized I’d just learned the lesson at a different scale. Every time I ship something, I understand a little better what actually mattered. And every time, I realize I could have cut further.</p>

<p>The game is starting to have shape because I’ve been building and learning. But I’m also learning, again, that shipping something small and real beats shipping something comprehensive and theoretical. Time to cut deeper. This time, I’m shipping the prototype before I convince myself I need more.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="godot" /><category term="vibecoding" /><category term="claudecode" /><category term="specdrivendevelopment" /><summary type="html"><![CDATA[Scope Creep Keeps Teaching Me…]]></summary></entry><entry><title type="html">How to Get Things Done When You Have Nothing but Process</title><link href="https://www.petervanonselen.com/2025/11/13/the-no-good-low-energy-week/" rel="alternate" type="text/html" title="How to Get Things Done When You Have Nothing but Process" /><published>2025-11-13T08:00:00+00:00</published><updated>2025-11-13T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/11/13/the-no-good-low-energy-week</id><content type="html" xml:base="https://www.petervanonselen.com/2025/11/13/the-no-good-low-energy-week/"><![CDATA[<p><em>The tale of how a good System can carry you through the dark times…</em></p>

<hr />

<p><img src="/assets/no-energy-week/banner.png" alt="banner" /></p>

<p>This week I had zero motivation. Motivation tank completely empty.</p>

<p>My goal was 4 creatures and all of the remaining abilities. I only manage to build 2 creatures and 7 abilities instead.</p>

<p>That happening was astonishing to me.</p>

<p>But it did. And the reason isn’t hard work or willpower, it’s process. The structure on how to work with AI that I follow is robust enough to produce good work even when I have nothing left. That’s the real story.</p>

<h2 id="how-the-structure-holds-up">How the structure holds up</h2>

<p>My process is straightforward: compact the conversation, use the spec to guide the next iteration, verify the implementation, have the AI explain what it did, then manually test each ability. Rinse and repeat.</p>

<p>The spec-driven approach kept context alive, for me <em>and</em> the AI. While my motivation was completely drained, the AI kept building. I basically did the bare minimum each day before switching to Marvel Spider-Man, but the structure meant I never lost the thread. I was running on <a href="https://www.youtube.com/shorts/mVQ1bzd816I">Seinfeld method</a> momentum: just keep the chain unbroken.</p>

<p>But here’s the critical bit: <strong>verify everything the AI makes</strong>. The AI will confidently insist “Everything works perfectly!” and update your docs without being asked. Then you test it. Nothing works. You point it out and it turns out half the feature wasn’t there to start with. Point it out again, and more unfinished work is found. The AI is capable and also will miss the obvious. Trust but verify.</p>

<p><strong>Pro tip</strong>: never ever let the AI update the docs without being directly asked to.</p>

<p>This verification habit isn’t just bug-catching. It is the key habit that keeps the process from collapsing. Without it, you’re just building on sand. With it, even a low-energy week produces solid work.</p>

<h2 id="where-designs-actually-get-good">Where designs actually get good</h2>

<p>The most valuable phase in my workflow is when the AI explains what it just built. That’s where I stop and think: does this ability actually do what I want? Should it shift? Should it do something different?</p>

<p>That’s where the magic happens.</p>

<p><img src="/assets/no-energy-week/tank.png" alt="tank" /></p>

<p>The Biomass Tank’s parasitic bond is a perfect example. It started as a triggered ability, shifted into an activated passive that heals the Tank whenever <em>any</em> nearby enemy creature takes damage. That single design decision cascaded: upkeep suddenly mattered. Rounds had weight. Generator buildings became far more valuable, and vulnerable. The creature went from decent to a potential powerhouse with two different healing methods and the ability to stay safe at range.</p>

<p>The design got better not because I worked harder on it, but because I paused to think about what it <em>was</em> and what it <em>could be</em>.</p>

<p><img src="/assets/no-energy-week/defender.png" alt="defender" /></p>

<p>Same with the Voltage Defender. Its pull mechanic started simple—nudging creatures around the battlefield. But when I stopped to think about it, it shifted: what if it could move creatures <em>off islands entirely</em>? Suddenly it’s a broad control tool, shutting down buildings and repositioning the entire battlefield. That cascaded too—every ability needed rework. The EMP got bigger, affected every building including the player’s, made positioning critical.</p>

<p>This reflection loop, building something, stopping to think about it and then nudging the design. It keeps creating moments where the game gets better. It’s where iteration actually matters.</p>

<h2 id="the-real-lesson">The real lesson</h2>

<p>Turns out the structure matters more than the fuel. Low-energy weeks aren’t failures. They’re tests of process.</p>

<p>When you have nothing left, you find out what actually works. This week proved that good structure, spec-driven development, verification habits, and the discipline to reflect, can carry you through. The creatures got better not because I pushed harder, but because the process was solid enough to ship good work anyway.</p>

<p>I might not have had the energy. But the system did.</p>

<h2 id="next-week">Next week</h2>

<p>Fingers crossed, I will hopefully be finished the creature cards and start working on spells like Meteor and Lightning and getting some more buildings working correctly.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="godot" /><category term="vibecoding" /><category term="claudecode" /><category term="specdrivendevelopment" /><summary type="html"><![CDATA[The tale of how a good System can carry you through the dark times…]]></summary></entry><entry><title type="html">The Beautiful Boring: How I Refactored a Game Without Breaking it</title><link href="https://www.petervanonselen.com/2025/11/06/a-chill-refactor/" rel="alternate" type="text/html" title="The Beautiful Boring: How I Refactored a Game Without Breaking it" /><published>2025-11-06T08:00:00+00:00</published><updated>2025-11-06T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/11/06/a-chill-refactor</id><content type="html" xml:base="https://www.petervanonselen.com/2025/11/06/a-chill-refactor/"><![CDATA[<p><em>Can an AI Do a Boring Refactor? A Case Study in Systematic Code Cleanup</em></p>

<hr />

<p><img src="/assets/chill-refactor/banner.png" alt="banner" /></p>

<p>This week, my carefully-laid plans for rapid progress on <strong>Horizons Edge</strong>, a tactical wargame with card-driven combat on floating islands, collided headfirst with 2,255 lines of code and 94 functions living in a single file. Whoops. Just what I always wanted. I’d let a god class slowly brew and percolate while I focused on shipping features. Now it was time to pay the inevitable technical debt.</p>

<p>The wonderful file in question: <code class="language-plaintext highlighter-rouge">game_manager.gd</code>. It was managing twelve distinct functional areas: turn management, combat, creatures, cards, abilities, territory, players, and more. The result was tight coupling and maintenance friction of epic proportions.</p>

<h2 id="the-question-that-drove-me">The Question that drove me</h2>

<p>I’ve spent the last month learning how to work productively with AI on complex coding tasks. And I had a nagging question: <strong>Can an AI do a massive refactor productively? Can you have a boring refactor—a by-the-numbers, tick-the-boxes, super easy, chill refactor?</strong></p>

<p>This wasn’t just idle curiosity. When I last <a href="https://www.petervanonselen.com/2025/09/29/the-grand-refactor/">attempted a refactor of similar scope, it was an spectacular disaster</a>. I spent two evenings fighting with Claude about types, going in circles while the AI and I kept insisting the other was wrong. The game broke for days. It was a hair-pulling nightmare that never ended. It was painful, demoralizing, and left me deeply skeptical about whether AI could handle large-scale refactoring productively. That experience shaped everything about how I approached this week.</p>

<p>But this time felt different. <a href="https://www.petervanonselen.com/2025/10/30/exert-of-what-i-learnt/">I had a plan</a>.</p>

<p><img src="/assets/chill-refactor/plan.png" alt="a plan" /></p>

<h2 id="setting-the-constraints">Setting the Constraints</h2>

<p>Before diving in, I wanted to be intentional. I asked Claude to analyze the file and create a refactoring plan, but with strict requirements:</p>

<p><strong>Functional Requirements:</strong></p>
<ul>
  <li>Maintain existing behavior. No new code, no feature creep—same behavior, different structure.</li>
  <li>All key systems had to keep working: card play, creature combat, energy systems, turn management, terrain creation and destruction. Everything.</li>
</ul>

<p><strong>Technical Requirements:</strong></p>
<ul>
  <li>Files should be less than 400 lines each</li>
  <li>Absolutely no circular dependencies (this still haunts my nightmares)</li>
  <li>Follow good Godot practices using signal-based architecture and node-based composition</li>
</ul>

<p><strong>Methodology:</strong></p>
<ul>
  <li>Everything had to be incremental. No big bang refactors. Incremental changes mean incremental testing, which means catching bugs early.</li>
</ul>

<p>Claude produced a 1,091-line planning document comprising 11 phases, complete with a testing strategy, regression testing plan, and a high-level architecture with 8 new manager classes. It was exactly what I needed: a detailed roadmap to follow rather than a free-form creative challenge.</p>

<h2 id="the-method-that-worked">The Method That Worked</h2>

<p>Here’s the systematic approach I followed for every phase:</p>

<ol>
  <li><strong>Clean context</strong>: Open a new terminal window with a fresh Claude context (no conversation history creep)</li>
  <li><strong>Implement one phase</strong>: Ask Claude to implement just that single phase from the planning document</li>
  <li><strong>Verify no regressions</strong>: Get Claude to check to verify its work</li>
  <li><strong>Create a manual test plan</strong>: Have Claude outline a manual testing plan</li>
  <li><strong>Hand test and fix</strong>: Switch back to vibe coding; find bugs, fix them one at a time</li>
  <li><strong>Commit with clarity</strong>: Once working, commit to the branch with a descriptive message</li>
</ol>

<p>This rhythm was key. It prevented the cognitive overload of trying to refactor everything at once while still making steady progress.</p>

<h2 id="the-results">The Results</h2>

<p><img src="/assets/chill-refactor/gitcommit.png" alt="commit history" /></p>

<p>I started Friday evening and finished Tuesday evening. Seven hours total—an hour Friday night, four hours scattered across the weekend, another hour or two on Monday and Tuesday combined.</p>

<p>And here’s what shocked me: <strong>the game never broke</strong>. Not once. This was the easiest refactor of such a complicated system I’ve done in my career.</p>

<p>The difference was night and day compared to last month. That first refactor had been a hair-pulling nightmare. The game was down for days, I was fighting with the AI in circles, nothing felt under control. This time? The game was working the entire time. Every phase landed in manageable doses. I could test incrementally. I could fix bugs before they cascaded into system-wide failures.</p>

<p>It’s a quintessential example of risk mitigation in action. Small, verifiable steps beat big, catastrophic swings every time.</p>

<p>Unlike that previous disaster, this one was systematic. Methodical. Boring, even. It felt like just a matter of following good habits and executing. No dramatic debugging sessions. No circular reasoning about types. No late-night frustration.</p>

<p>There was something almost boring about how well it worked. And that was the point.</p>

<h2 id="what-this-taught-me">What This Taught Me</h2>

<p>The breakthrough wasn’t a better AI. It was a better process. By being intentional about constraints, breaking work into small phases, maintaining a testing regimen, and keeping context clean, I transformed a refactoring task from a high-risk, high-stress nightmare into a predictable, manageable project.</p>

<p>That first refactor failed because I was vibe-coding with the AI. It was reactive, unfocused, trying to solve the whole problem at once. This one succeeded because I treated it like a spec-driven project with a plan, clear objectives, and systematic execution.</p>

<p>The lesson: AI isn’t magic. It’s a tool. And like any tool, it works best when you know exactly what you’re trying to build and you approach it methodically.</p>

<h2 id="whats-next">What’s Next</h2>

<p>Now I can finally get back to what I actually want to be doing: making <strong>Horizons Edge</strong> a better game.</p>

<p><img src="/assets/chill-refactor/voltage-defender.png" alt="voltage defender" /></p>

<p>This coming week: three new creatures with three abilities each. <strong>Voltage Defender</strong> (exactly what it sounds like), <strong>Biomass Tank</strong> (a tanky presence), and <strong>Flux Chaos</strong> (honestly, even I don’t know what this one is yet. That’s kinda the fun of discovery). More abilities. More discipline. More checkboxes to tick.</p>

<p>Until next time, may your refactors be as boring, and your code as stable, as this one turned out to be.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="godot" /><category term="video-game" /><category term="claudecode" /><category term="vibecoding" /><summary type="html"><![CDATA[Can an AI Do a Boring Refactor? A Case Study in Systematic Code Cleanup]]></summary></entry><entry><title type="html">The Boring Path to Actually Shipping with AI</title><link href="https://www.petervanonselen.com/2025/10/31/boring-path-to-shipping/" rel="alternate" type="text/html" title="The Boring Path to Actually Shipping with AI" /><published>2025-10-31T08:00:00+00:00</published><updated>2025-10-31T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/10/31/boring-path-to-shipping</id><content type="html" xml:base="https://www.petervanonselen.com/2025/10/31/boring-path-to-shipping/"><![CDATA[<p><em>Or: How I Learned to Stop Vibing and Love the Spec”</em></p>

<hr />

<p>OMG. This <a href="https://www.petervanonselen.com/2025/10/30/exert-of-what-i-learnt/">spec driven development</a> process is BORING!</p>

<p>Okay okay, for reals though, following this process of using a spec and a clear breakdown of tasks is tangibly yielding results and making remarkable progress forward in the game.</p>

<p><img src="/assets/boring/3-heroes.png" alt="3 heroes" /></p>

<p><strong>In the past week I have:</strong></p>
<ul>
  <li>Created 3 new creatures: one that moves fast, hits hard and stuns enemies; another that spawns minions and multiplies when it dies; and one that shoots a bolt that blows things up and destroys land all over the place</li>
  <li>Around 9 new abilities created and working</li>
  <li>Got the AI to hack some terrible models together so they would be unique enough to be playable</li>
  <li>Made the islands generate more interestingly</li>
  <li>Completed a horde of UI cleanups</li>
  <li>Handled some general refactorings and got a bunch of systems working</li>
  <li><strong>Total changes:</strong> 52 files modified, +3,718 lines, -474 lines across 34 commits all for about 10 hours effort.</li>
</ul>

<p>By basically all metrics… productive?</p>

<h2 id="so-where-did-the-boring-comment-come-from">So Where Did the “Boring” Comment Come From?</h2>

<p>It comes down to what following the spec driven development process has actually become. Now that I’m being militant about making AI follow a todo list, what I’ve functionally done is put on multiple hats:</p>

<p><strong>Product Manager hat:</strong> Created a complete game design doc. High-level, aspirational, covering combat systems, creatures, abilities, victory conditions — the whole vision thing.</p>

<p><strong>Delivery/Feature Lead hat:</strong> Took one section (the combat system) and broke it down into actual features. Not just “build combat” but “what does combat <em>need</em>? Movement? Attacks? Status effects? Death?” The unglamorous work of turning vibes into verbs.</p>

<p><strong>3 Amigos hat:</strong> Turned those features into a massive todo task list. Every checkbox a micro-commitment. “Add blink ability.” “Implement stun on hit.” “Make multiplying enemy spawn minions.” The kind of granular breakdown that makes you feel like you’re doing corporate sprint planning for your hobby project.</p>

<p><strong>Engineer hat:</strong> Actioning the tasks one at a time. No wandering off to make prettier models. No “oh but what if the islands had weather systems?” Just: checkbox, code, commit, next checkbox.</p>

<p><strong>QA hat:</strong> Testing behavior. Does the stun actually stun? Does the explosion destroy terrain properly? Do the spawned minions inherit the right stats? The tedious-but-essential validation loop.</p>

<p><strong>The realization:</strong> I’ve become …. an entire agile team.</p>

<p>And what that practically means is that I’ve made gamedev into <em>work</em>. My day job. God damn it.</p>

<p>Spent a lifetime developing habits on how to do engineering, and then you get a newfangled tool and you just… follow the process. Good job me. Yay! Right? …Right?!</p>

<h2 id="the-tangents-i-didnt-follow-and-why-that-hurts-a-little">The Tangents I Didn’t Follow (And Why That Hurts a Little)</h2>

<p>I’ll be honest: getting lost in tangents and running away with the vibes is a whole hell of a lot of fun.</p>

<p>In past weeks, I would have absolutely gone off on any of these:</p>

<p><strong>Visual polish:</strong> Using all the gorgeous tiles from Kenney’s asset packs to make everything look beautiful instead of just functional. Making each creature feel distinct and characterful instead of “placeholder cube with stats.”</p>

<p><strong>Procedural generation rabbit hole:</strong> Diving deeper into Wave Function Collapse algorithms to generate more dynamic, interesting terrain. Making islands that feel hand-crafted even though they’re algorithmic.</p>

<p><strong>Creature personality:</strong> Actually modeling unique designs for each bot. Giving them visual identity, animations, character beyond their mechanical function.</p>

<p><strong>Worldbuilding:</strong> Fleshing out the lore of the different factions. Their motivations, their aesthetics, their place in this weird sky-island world I’m building.</p>

<p>These are the <em>fun</em> parts. The parts where you lose track of time because you’re following curiosity instead of a checklist. The parts that make gamedev feel like <em>play</em> instead of <em>work</em>.</p>

<p>But here’s the thing: <strong>none of them get me closer to a playable game.</strong></p>

<p>They’re all polish on a foundation that doesn’t exist yet. They’re the dessert when I haven’t finished the vegetables. So the spec says: not now. Stay focused. Ship the MVP first.</p>

<p>It’s the right call. I know it’s the right call. Oh heavens please let this be the right call …</p>

<p>And I hate how boring the right call is.</p>

<h2 id="the-discipline-vs-fun-paradox">The Discipline vs. Fun Paradox</h2>

<p>Being strictly disciplined with myself about how to dev with this tool is <em>super productive</em>. Lots of forward momentum in an actual direction is really fantastic.</p>

<p>And also… boring.</p>

<p>There’s something deeply satisfying about seeing the commit graph fill up. About checking off todo items. About watching the line count grow in a structured, intentional way. It feels <em>professional</em>. It feels like I’m actually building something instead of just playing around.</p>

<p>But it’s missing that chaotic energy that made the early weeks of this project so intoxicating. The “what if I just try this wild thing?” moments. The tangents that turned into features I didn’t know I needed.</p>

<p>The spec process works. It’s just not romantic.</p>

<h2 id="where-im-headed">Where I’m Headed</h2>

<p>My plan is to stay disciplined until I hit what I’m calling an “exit point” — a milestone where the game is functioning <em>just enough</em> to validate the gameplay and experience. Right now, that means:</p>

<ul>
  <li><strong>2 unique decks</strong> that feel different to play</li>
  <li><strong>Basic strategy cards</strong> that offer meaningful choices</li>
  <li><strong>Fog of war</strong> (because exploration matters in a tactics game)</li>
  <li><strong>A simple victory condition</strong></li>
</ul>

<p>It won’t be <em>done</em>. But it will be <strong>playable</strong>. And <strong>testable</strong>. A real artifact I can put in front of someone and ask: “Is this fun?”</p>

<p>Following this process is giving me something I’ve never had before in side projects: <strong>predictable, consistent progress</strong>.</p>

<p>Not explosive bursts of inspiration followed by month-long abandonments. Not chasing vibes until I hit a wall and lose interest. Not 10,000-line notebooks that collapse under their own weight.</p>

<p>Actual, measurable forward motion toward a concrete goal.</p>

<p><img src="/assets/boring/chaos.png" alt="current" /></p>

<h2 id="the-bottom-line">The Bottom Line</h2>

<p>Is it boring? Yes.</p>

<p>Is it working? Also yes.</p>

<p>And maybe that’s the trade I need to make right now. There’s a time for tangents and vibes — I spent weeks in that mode and learned a ton. But there’s also a time to put your head down, follow the checklist, and <em>actually finish something</em>.</p>

<p>The irony isn’t lost on me: I spent four months learning how to use AI as a collaborator, only to discover that the real unlock was bringing back all the boring engineering discipline I use at my day job.</p>

<p>Turns out “vibe coding” still requires structure. Who knew?</p>

<p>Next week: More abilities. More discipline. More checkboxes. And hopefully, one step closer to knowing if this game is worth making at all.</p>

<hr />

<p><strong>P.S.</strong> If you’re following along with this devlog and thinking “wow, this sounds like he’s sucked all the joy out of his hobby” — yeah, a little bit. But also: I’m actually <em>building</em> something now instead of just dreaming about it. So maybe boring is the price of shipping.</p>

<p>We’ll see how I feel when I hit that exit point.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="personal" /><category term="godot" /><category term="video-game" /><category term="claudecode" /><category term="vibecoding" /><summary type="html"><![CDATA[Or: How I Learned to Stop Vibing and Love the Spec”]]></summary></entry><entry><title type="html">AI Spec Driven Development</title><link href="https://www.petervanonselen.com/2025/10/30/exert-of-what-i-learnt/" rel="alternate" type="text/html" title="AI Spec Driven Development" /><published>2025-10-30T08:00:00+00:00</published><updated>2025-10-30T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/10/30/exert-of-what-i-learnt</id><content type="html" xml:base="https://www.petervanonselen.com/2025/10/30/exert-of-what-i-learnt/"><![CDATA[<p><em>A brief summary of what I have learnt</em></p>

<hr />

<p>This is an exert of <a href="https://www.petervanonselen.com/2025/10/20/what-i-have-learnt/">From AI Skeptic to Constant Collaborator: What I Learned Vibe Coding</a>.</p>

<h2 id="practical-workflows">Practical Workflows</h2>

<p>Through trial and error, I developed specific patterns to manage AI’s weaknesses:</p>

<p><strong>1. The Planning Folder Pattern</strong>
Keep numbered specs (1-initial-feature.md, 2-pay-by-discard.md, etc.) that document feature discussions. These become persistent context across sessions.</p>

<p><strong>2. The Todo Accountability System</strong>
Break specs into granular checkbox lists. Use them to hold the AI accountable during implementation.</p>

<p><strong>3. The Git Save-Scumming Strategy</strong>
Commit frequently. AI will overwrite working solutions without memory of what worked before.</p>

<p><strong>4. The Role-Based AI Selection</strong></p>
<ul>
  <li>ChatGPT: Brainstorming, exploration, asking “what’s wrong with this design?”</li>
  <li>Claude: Implementation, code review, pair programming</li>
  <li>Copilot/Codex: Ticket-style work where you hand off and come back later</li>
</ul>

<p><strong>5. The Discipline Override</strong>
Set hard rules to counter AI’s momentum:</p>
<ul>
  <li>Force refactor cycles</li>
  <li>Write tests even when AI makes it feel unnecessary</li>
  <li>Question every tangent: “Is this the MVP?”</li>
</ul>

<h2 id="minimum-viable-prompt-literacy">Minimum Viable Prompt Literacy</h2>

<p>I have no idea what a “perfect prompt” looks like. But I know one rule that consistently works:</p>

<p><strong>No matter what you ask the AI to make, the last sentence should be: “Ask me questions.”</strong></p>

<p>Get the AI to ask <em>you</em> questions. Ask it “what am I missing?” type questions. This back-and-forth is where the real value emerges, not in the first response, but in the dialogue.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="personal" /><category term="claudecode" /><category term="vibecoding" /><summary type="html"><![CDATA[A brief summary of what I have learnt]]></summary></entry><entry><title type="html">I Actually Stayed On Task (For Once): A Dev Miracle</title><link href="https://www.petervanonselen.com/2025/10/24/i-stayed-on-task/" rel="alternate" type="text/html" title="I Actually Stayed On Task (For Once): A Dev Miracle" /><published>2025-10-24T08:00:00+00:00</published><updated>2025-10-24T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/10/24/i-stayed-on-task</id><content type="html" xml:base="https://www.petervanonselen.com/2025/10/24/i-stayed-on-task/"><![CDATA[<p><em>Breaking news: Developer completes planned features … who would have thought?</em></p>

<hr />

<p>Remember all those high-minded ideas I had about <a href="https://www.petervanonselen.com/2025/10/20/what-i-have-learnt/">staying on task with AI-assisted development</a>? All that big game I’ve been talking about “this is how you do dev with an AI and keep it on task” and “this is the way you got to do it”?</p>

<p>Yeah, about that.</p>

<p>If you’ve been reading the wonderful adventures of the tangent king over here, I know what you’re thinking. He can’t do that.</p>

<p><strong>For context:</strong> I’m building Horizon’s Edge: a turn-based tactical wargame inspired by NetStorm where floating islands battle for control of the skies. I’ve been developing it with AI assistance in Godot, and staying focused has been… a journey. Previous highlights include: diving into Wave Function Collapse procedural generation instead of finishing game rules, and a grand refactor that ate five entire evenings because I accidentally committed the <code class="language-plaintext highlighter-rouge">.godot</code> cache folder.</p>

<p>Well, I will have you know… that this week I set out to get a bunch of combat systems up and running. Working tip top. No funny business with wave function algorithms or hexes or anything! And I managed, for the first time in almost 4 months, to stay on task and not meander (too much) and actually get some things <em>completed</em>.</p>

<p><img src="/assets/stay-on-target/mage.png" alt="mage!" /></p>

<p>I now have 3 creatures, each with at least 1 or more special abilities and those abilities actually work. I know! I am surprised too. I’m meandering my way through my todo list and this week I have a test mage with 5 abilities actually working:</p>

<ul>
  <li><strong>Blink</strong> - Teleportation ability</li>
  <li><strong>Ethereal Phase</strong> - Phasing ability</li>
  <li><strong>Arcane Missile</strong> - Ranged magic projectile</li>
  <li><strong>Divination</strong> - Detection/reveal ability</li>
  <li><strong>Mana Drain</strong> - Resource manipulation ability</li>
</ul>

<p>I might have wandered off and made one of the spells do terrain destruction too. Arcane Missile is the first ability in the game that actually destroys terrain rather than just building it. I just love the idea of the battlefield being dynamic and changing under the players’ feet. Right now it wipes out the hex the target was standing on, plus a 50% chance on each of the 6 surrounding hexes. The idea is that if a creature is no longer standing on anything, even if it has a lot of health, it will plummet to its death. It should create tension around expansion and positioning. Do you risk getting close for that attack if one wrong spell could drop you into the void?</p>

<p><img src="/assets/stay-on-target/destruction.png" alt="Island destruction" /></p>

<p>I also figured out how to get rounds and turns working together (this was needed for Ethereal Phase). And I might have spent some silly time hacking in a few animations to make the effects a bit more obvious when they happen.</p>

<p>Admittedly none of this is “let other people play with it” yet. And it doesn’t yet have a win condition. But it is meandering in a direction.</p>

<p>See? Like I told you. Todo list! Let’s see if I can do it again?</p>

<p><strong>Next up on the todo list:</strong> Three more creature archetypes, each with their own special abilities.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">test_voltage_bot</code> with Overcharge and Turbo Boost abilities.</li>
  <li><code class="language-plaintext highlighter-rouge">test_biomass_spawn</code> with Regenerate and Spawn Swarmlings.</li>
  <li><code class="language-plaintext highlighter-rouge">test_flux_walker</code> with Chaos Bolt and Lucky Strike.</li>
</ul>

<p>…assuming I don’t get distracted tweaking that wave function collapse algorithm again.</p>

<p><img src="/assets/stay-on-target/Distracted-Boyfriend.jpg" alt="tangent king" /></p>

<p><strong>Incidentally, the stats for the week:</strong></p>
<ul>
  <li>32 files changed</li>
  <li>3,029 additions</li>
  <li>508 deletions</li>
  <li>Net addition of ~2,500 lines of code</li>
</ul>

<p>For an hour and a bit each night, the progress here is just down right fierce!</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="personal" /><category term="board-game" /><category term="godot" /><category term="video-game" /><category term="claudecode" /><category term="codex" /><category term="vibecoding" /><summary type="html"><![CDATA[Breaking news: Developer completes planned features … who would have thought?]]></summary></entry><entry><title type="html">From AI Skeptic to Constant Collaborator: What I Learned Vibe Coding</title><link href="https://www.petervanonselen.com/2025/10/20/what-i-have-learnt/" rel="alternate" type="text/html" title="From AI Skeptic to Constant Collaborator: What I Learned Vibe Coding" /><published>2025-10-20T08:00:00+00:00</published><updated>2025-10-20T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/10/20/what-i-have-learnt</id><content type="html" xml:base="https://www.petervanonselen.com/2025/10/20/what-i-have-learnt/"><![CDATA[<p><em>The Question That Started Everything… am I going to lose my job?</em></p>

<hr />

<h2 id="the-question-that-started-everything">The Question That Started Everything</h2>

<p>How does one actually vibe code? And the follow-on questions that kept me up at night: Is it any good? Can I actually generate real code with this? Is this going to take my job? Am I running out of career runway?</p>

<p>Around the start of June this year, I was an AI optimist while not really engaging with it. I used ChatGPT, DeepSeek, and Anthropic’s Claude, ran some thinking through them, maybe did basic searches. Honestly, I wasn’t using it in any meaningful way. Functionally, I was just playing with it. Nothing more.</p>

<p>Then I got stuck on a problem. By simply following my curiosity, I went from not having a clear idea of what I wanted to accomplish with AI to actively using it as a constant collaborator across multiple domains of my life. My approach to AI has fundamentally changed over the past four months, and it continues to evolve.</p>

<h2 id="the-catalyst-magic-the-gathering-naturally">The Catalyst: Magic the Gathering (Naturally)</h2>

<p><strong>TLDR</strong>: I tried to make a <a href="https://www.petervanonselen.com/2025/08/04/jumpstart-cube-catastrophication/">Jumpstart cube</a>. ChatGPT couldn’t solve it. Co-pilot couldn’t solve it. Co-pilot vibe coded a solution that kinda worked. I vibe coded a new solution that actually worked.</p>

<p>I couldn’t get the damn thing out of my mind, so I vibe coded this <a href="https://github.com/vanonselenp/vanonselenp.github.io">portfolio site</a> to document what happened. Then, while working on the cube, a <a href="https://www.petervanonselen.com/2025/08/21/sky-islands/">board game idea struck</a>, and I couldn’t get it out any other way except by building it with AI. That board game has somehow <a href="https://www.petervanonselen.com/2025/09/07/boardgame-to-digital/">morphed into a video game</a> that’s far too complicated for the “get an MVP into prod fast” approach I keep trying to follow.</p>

<p><strong>The time investment</strong>: 1-2 hours a day, either in the morning before work or while watching TV with my wife in the evening. This became an all-consuming obsession for four months. I sacrificed learning urban sketching, which I’d spent the first half of the year actively pursuing.</p>

<p><strong>The cost</strong>: I’m paying £16/month for Claude Pro and £20/month for ChatGPT Pro. I use Claude as my primary coding assistant and switch between ChatGPT and Claude for thinking through problems. It’s worth it, without AI, none of these projects would exist.</p>

<p>Pre-AI, my side projects were timeboxed to a couple of days and small, achievable problems. Anything more would rapidly collapse under its own weight, too much code, too little time. Basically I didn’t do tech side projects. AI changed that equation entirely.</p>

<p><img src="/assets/github.png" alt="pre ai assisted tooling" /></p>

<h2 id="what-ive-learned-the-core-insights">What I’ve Learned: The Core Insights</h2>

<h3 id="ai-is-a-tool-and-a-multiplier">AI is a Tool and a Multiplier</h3>

<p>You have to treat AI not as a magic box that will automatically solve whatever you hope it does, but rather as another person you’re working with over Slack. If you tell a coworker “make me a feature!” you can’t be upset when they return junk.</p>

<p>The best way to use this tool is to assume it doesn’t actually know what you want. I’ve found the most effective approach is to start conversations with lots of negative validation questions: What am I missing? What could be improved? Be critical. Be objective. Get the AI to shoot holes through your ideas.</p>

<p>Once you’ve had this conversation, write that plan to file. Congrats, you now have a high-level plan. This becomes useful context for future chats. However, this alone won’t give you consistent, reasonable, progressive progress. Because basically, AIs like to write code, and they write an awful lot of it.</p>

<p>So get it to make a todo list with a painful amount of tick boxes.</p>

<p><img src="/assets/checklist.png" alt="checklists!" /></p>

<p>When building, use that todo document to hold the AI accountable. It makes testing and building more predictable and manageable.</p>

<h3 id="the-junior-engineer-mental-model-goes-deeper-than-you-think">The “Junior Engineer” Mental Model Goes Deeper Than You Think</h3>

<p>Treating AI like a junior engineer isn’t just about tone, it’s about workflow. Through my projects, I discovered I needed to:</p>

<ul>
  <li><strong>Use different AIs for different roles</strong>: ChatGPT for exploration and brainstorming, Claude for implementation and code review</li>
  <li><strong>Create specs as “shared memory”</strong>: Documentation that gets committed to the repo so the AI can reference it across sessions</li>
  <li><strong>Break work into granular todos</strong>: Not just for you, for holding the AI accountable to what actually matters</li>
  <li><strong>Pair with it through code review</strong>: Not just generation</li>
</ul>

<p>This evolved from my Magic cube project where I had specs numbered 1 through 15, each documenting a feature discussion. These weren’t outputs, they were context that survived beyond individual chat sessions.</p>

<h2 id="the-dangerous-patterns-what-they-dont-tell-you">The Dangerous Patterns: What They Don’t Tell You</h2>

<h3 id="the-refactor-paradox">The Refactor Paradox</h3>

<p>Here’s something crucial I learned the hard way: <strong>AI accelerates the “green” phase so much that you skip “refactor,” leading to massive technical debt.</strong></p>

<p>During my Magic cube project, I went from manually patching decks to having a 10,000-line IPython notebook that was completely impossible to understand. I had hit cognitive overload.</p>

<p>As an ardent TDD advocate in my day job, I realized I was missing two critical pieces of the red-green-refactor cycle: I was just writing code. No tests. No cleanups. Rookie mistake.</p>

<p>I had to start from scratch, consciously embracing a build-and-refactor loop, following the code smell patterns that years of clean code practices had drilled into me. AI doesn’t just multiply your output, <strong>it multiplies your technical debt if you’re not careful.</strong></p>

<p>The game dev project repeated this pattern. I’d run <code class="language-plaintext highlighter-rouge">git ls-files | grep '\.gd$' | xargs wc -l</code> and see files well over 2k lines. I’d missed refactor cycles again.</p>

<h3 id="the-tangent-amplification-problem">The Tangent Amplification Problem</h3>

<p>AI doesn’t just enable scope creep, <strong>it actively encourages it by making every side quest feel achievable.</strong></p>

<p>My board game project is the perfect example. I set out one week to work on creature combat. By the end of the week, I had:</p>
<ul>
  <li>Created test creatures</li>
  <li>Built movement and attack systems</li>
  <li>Added height-based defense</li>
  <li>Implemented dice roll combat</li>
  <li>Created a radial menu for unit actions</li>
  <li>Downloaded 3D models from Kenney.nl</li>
  <li>Rebuilt roads with proper models and rotations</li>
  <li>Added a blink ability</li>
</ul>

<p>One feature became an ecosystem. And here’s the thing: <strong>it is beyond exceedingly simple to wander off on completely unrelated tangents</strong> when the AI makes everything feel possible.</p>

<p>I kept telling myself “get an MVP to prod fast” while simultaneously building procedural island generation with Wave Function Collapse algorithms.</p>

<p>But here’s why I didn’t stop: <strong>AI provides so much momentum that even when it’s frustrating, you can think about the problem slightly differently and feel like you’re making progress.</strong> When direct AI generation hits a wall, I switch to having it build the broad structure while I manually tweak settings. This makes working on side projects genuinely fun in a way they haven’t been before.</p>

<p>The momentum AI provides is a double-edged sword. It keeps you engaged through the frustration, but it also keeps you building when you should be stepping back and asking “is this the right thing?”</p>

<h3 id="the-direction-problem-ais-spatial-blindness">The Direction Problem: AI’s Spatial Blindness</h3>

<p>Some tasks reveal AI’s sharp limitations. During my road system implementation, I discovered that <strong>AI spatial reasoning is terrible.</strong></p>

<p>When work involves orientation, rotation, or physical space, the AI’s sense of direction doesn’t match how the world is rendered. Trying to explain rotations in a way that makes sense to both of us is like teaching a goldfish to drive.</p>

<p>And if the AI ever accidentally gets something right, it will immediately overwrite it in the next change.</p>

<p>My eventual workflow:</p>
<ol>
  <li>Get the AI to build the big stuff, toggles, switches, base structure</li>
  <li>Manually go through and tweak everything myself</li>
</ol>

<p>This led me to develop what I call “Git <a href="https://tvtropes.org/pmwiki/pmwiki.php/Main/SaveScumming">save-scumming</a>”, treating Git like a video game save system because AI will thoughtlessly overwrite correct solutions without remembering what worked.</p>

<h2 id="the-momentum-trap">The Momentum Trap</h2>

<p>AI gives me the same benefit I get from using Audible for reading books: <strong>momentum</strong>. It’s a lot easier to keep working on a side project with AI than without, especially when you have no time at all to do the work.</p>

<p>But here’s the tension my blog posts reveal: momentum without direction leads nowhere useful.</p>

<p>I built 24,000 lines of Python code for a board game that probably should have been paper prototyped first. I kept reminding myself to “get an MVP to prod fast and learn lessons,” but the AI made it so easy to keep building that I kept following the fun instead of following the plan.</p>

<p>The momentum is addictive even when it’s pulling you away from your goal. You have to be disciplined about direction, or you’ll end up with beautiful code for the wrong thing.</p>

<h2 id="the-transformation-four-months-everything-changed">The Transformation: Four Months, Everything Changed</h2>

<p>The most remarkable thing about this journey isn’t what I built, it’s the speed of transformation.</p>

<p><strong>June 2025</strong>: AI optimist, barely engaging
<strong>October 2025</strong>: 24k lines of game code, active collaborator across multiple projects, writing blog posts documenting the journey in real-time</p>

<p>Getting to the “treat AI like a junior engineer” mindset took about 2-3 weeks. I had to unlearn the “AI is magic” assumption and figure out how to actually use it.</p>

<p>This wasn’t gradual learning, it was catalytic. Each success made the next leap feel possible:</p>
<ul>
  <li>Magic cube problem → vibe coding solution</li>
  <li>Couldn’t stop thinking about it → portfolio site</li>
  <li>One blog post → entire blog series</li>
  <li>Board game idea → 24k lines of video game code</li>
  <li>Video game reimplementation → another 24k lines of video game code</li>
</ul>

<p>The cascading confidence is real. Once you see AI help you solve one “impossible” problem, you start seeing possibilities everywhere.</p>

<h2 id="bringing-it-back-to-the-day-job">Bringing It Back to the Day Job</h2>

<p>The spec-driven workflow I developed through these side projects has now become how I work professionally. I take tickets and reframe them into specs with task breakdowns. I use AI to analyze complex codebases I’m barely familiar with.</p>

<p>Right now I’m refactoring a monolith written in Go into a commons library with five microservices, using the AI spec-driven workflow with AI-assisted code development, working in small increments. Everything I’m doing, I learned from these side projects.</p>

<p>The irony: The Economist (where I work) has embraced AI tooling internally, while engineers in general remain reticent. I get it, I was there four months ago.</p>

<p>But here’s what changed for me: <strong>I still do traditional hand-crafted coding in my day job. I regularly work through code katas, which are fun and enjoyable in and of themselves.</strong> AI hasn’t replaced my coding skills, it’s multiplied what I can accomplish when I need to move fast or explore unfamiliar territory.</p>

<h2 id="practical-workflows-that-emerged">Practical Workflows That Emerged</h2>

<p>Through trial and error, I developed specific patterns to manage AI’s weaknesses:</p>

<p><strong>1. The Planning Folder Pattern</strong>
Keep numbered specs (1-initial-feature.md, 2-pay-by-discard.md, etc.) that document feature discussions. These become persistent context across sessions.</p>

<p><strong>2. The Todo Accountability System</strong>
Break specs into granular checkbox lists. Use them to hold the AI accountable during implementation.</p>

<p><strong>3. The Git Save-Scumming Strategy</strong>
Commit frequently. AI will overwrite working solutions without memory of what worked before.</p>

<p><strong>4. The Role-Based AI Selection</strong></p>
<ul>
  <li>ChatGPT: Brainstorming, exploration, asking “what’s wrong with this design?”</li>
  <li>Claude: Implementation, code review, pair programming</li>
  <li>Copilot/Codex: Ticket-style work where you hand off and come back later</li>
</ul>

<p><strong>5. The Discipline Override</strong>
Set hard rules to counter AI’s momentum:</p>
<ul>
  <li>Force refactor cycles</li>
  <li>Write tests even when AI makes it feel unnecessary</li>
  <li>Question every tangent: “Is this the MVP?”</li>
</ul>

<h2 id="what-about-the-code-quality">What About the Code Quality?</h2>

<p>Let’s be honest: the code AI generates can be good, can be overly verbose, tends toward duplication. But it can be nudged in the right direction quite easily.</p>

<p>The game currently works. It’s not feature complete, not even a pared-down, super-trimmed version. But it’s playable, testable, and iterating forward.</p>

<p>That’s the trade-off: you get speed and momentum in exchange for code that needs shepherding. You’re not writing every line, but you’re still responsible for the architecture, the patterns, and the quality.</p>

<h2 id="minimum-viable-prompt-literacy">Minimum Viable Prompt Literacy</h2>

<p>I have no idea what a “perfect prompt” looks like. But I know one rule that consistently works:</p>

<p><strong>No matter what you ask the AI to make, the last sentence should be: “Ask me questions.”</strong></p>

<p>Get the AI to ask <em>you</em> questions. Ask it “what am I missing?” type questions. This back-and-forth is where the real value emerges, not in the first response, but in the dialogue.</p>

<p>It took me 2-3 weeks to figure this out, but once I did, everything clicked.</p>

<h2 id="the-bottom-line">The Bottom Line</h2>

<p>AI hasn’t replaced my thinking, it’s changed how I work. The best analogy I’ve found: it’s like pairing with a junior engineer who:</p>
<ul>
  <li>Never gets tired</li>
  <li>Has read everything</li>
  <li>Has no memory between sessions</li>
  <li>Will confidently suggest terrible ideas alongside brilliant ones</li>
  <li>Makes everything feel achievable (which is both blessing and curse)</li>
</ul>

<p>You have to bring the discipline, direction, and judgment. The AI brings speed, exploration, and momentum.</p>

<p>After four months of solo exploration,watching YouTube videos, AI Engineer conference talks, and lots of trial and error,I’m not worried about my career ending. I’m worried about not learning these tools fast enough.</p>

<h2 id="why-this-matters-and-why-im-writing-this">Why This Matters (and Why I’m Writing This)</h2>

<p>I’m writing this for two audiences:</p>

<p><strong>Future me</strong>: So I can succinctly explain “this is what I learned” when the details fade.</p>

<p><strong>You</strong>: To give you an idea of how to approach AI development that’s more than the nebulous “what the hell do I do here” feeling I had in June.</p>

<p>This is an invitation. Not a tutorial, not a manifesto,an invitation to experiment, to treat side projects with these tools as “learn how to AI” projects, and to discover your own patterns through building.</p>

<p>Because here’s what I know now: the developers who learn to work effectively with AI aren’t going to replace the ones who don’t. They’re going to outpace them by an order of magnitude.</p>

<p>The question isn’t “will AI take my job?”</p>

<p>The question is: “Am I learning to multiply my effectiveness, or am I just playing with shiny tools?”</p>

<p>For me, the answer finally became clear somewhere between a Magic the Gathering cube and a procedurally generated sky island wargame.</p>

<p>I’m building the plane while flying it. And documenting the journey as I go.</p>

<p>Because maybe, just maybe, someone else is standing where I was in June, wondering “how does one actually vibe code?”</p>

<p>And maybe this helps them take the first step.</p>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="personal" /><category term="board-game" /><category term="godot" /><category term="video-game" /><category term="claudecode" /><category term="codex" /><category term="vibecoding" /><summary type="html"><![CDATA[The Question That Started Everything… am I going to lose my job?]]></summary></entry><entry><title type="html">The Road to Combat Is Paved with Tangents: A Devlog</title><link href="https://www.petervanonselen.com/2025/10/17/road-to-combat-paved-with-blink/" rel="alternate" type="text/html" title="The Road to Combat Is Paved with Tangents: A Devlog" /><published>2025-10-17T08:00:00+00:00</published><updated>2025-10-17T08:00:00+00:00</updated><id>https://www.petervanonselen.com/2025/10/17/road-to-combat-paved-with-blink</id><content type="html" xml:base="https://www.petervanonselen.com/2025/10/17/road-to-combat-paved-with-blink/"><![CDATA[<p><em>I set out to make a combat system. I returned with roads, models, and a blink ability…</em></p>

<hr />

<h1 id="tangents-roads-and-blinking-archers">Tangents, Roads, and Blinking Archers</h1>

<p>okay okay okay, so I <em>know</em> last time I was all:</p>

<blockquote>
  <p>“I am going to work on the combat system.”</p>
</blockquote>

<p>and I meant it. I really did.</p>

<p>I started the week’s development figuring out how to get a second creature working — this time one that could <em>shoot</em>! and it shot! with animation! and a lovely red number floating overhead when it hit.</p>

<p>Then I spent some time trying to convince the AI to make an archer… which came out as a cylinder and a torus mashed together like some long-lost relic from <em>Lord of the Rings</em>.</p>

<p><img src="/assets/road-to-combat/archer.png" alt="archer" /></p>

<p>Somewhere between the floating numbers and cursed geometry, I ran into a movement bug where the creature could only move <em>upwards</em> but never <em>downwards</em> — so, you know, just another normal day working on a game with a super-powered AI as a pair.</p>

<p>And that’s the problem with having a super-powered AI as a pair: it is <em>beyond</em> exceedingly simple to wander off on completely unrelated tangents. Which is, of course, what I did.</p>

<hr />

<h2 id="the-tangent-shiny-hexes-">The Tangent: Shiny Hexes ✨</h2>

<p>When I was working through a game dev course on <a href="https://www.udemy.com/course/jumpstart-to-2d-game-development-godot-4-for-beginners/">Jumpstart to 2D Game Development: Godot 4.4+ for Beginners</a> last year, I came across a really cool site: <a href="https://kenney.nl/">kenney.nl</a>.</p>

<p>They’ve got models, textures, and all sorts of game assets. And wouldn’t you know it, I found a bunch of hex models that perfectly matched the vibe in my head.</p>

<p>And just like that, all thoughts of combat systems and specs and rational planning <em>evaporated</em>.</p>

<p>No, now was <em>obviously</em> the perfect time to get models and textures in. So I immediately downloaded the GLB files and started plugging them into my wave function collapse algorithms.</p>

<p><img src="/assets/road-to-combat/current.png" alt="current" /></p>

<p>Instant vibe shift. The world suddenly <em>looked</em> like something. I’m not using all the models yet… but give me time 😄.</p>

<hr />

<h2 id="the-roads-to-madness-️">The Roads to Madness 🛣️</h2>

<p>With proper models in hand, I finally turned to something that’s been quietly tormenting me for weeks: roads.</p>

<p>Now that I had actual path models to work with, I thought this was going to be a cinch.</p>

<p>Nope. No. Oh hell no.</p>

<p>This is the kind of task that is <em>inherently infuriating</em> to get an AI to handle. Its sense of direction doesn’t match the way the world is rendered, and trying to explain rotations in a way that makes sense to both of us is like trying to teach a goldfish to drive.</p>

<p><img src="/assets/road-to-combat/roads.png" alt="roads roads roads" /></p>

<p>And if the AI ever <em>accidentally</em> gets something right, it will immediately overwrite it in the next change. Because of course it will.</p>

<p>My eventual workflow:</p>

<ul>
  <li>Get the AI to build the big stuff — toggles, switches, base structure.</li>
  <li>Then manually go through and tweak everything myself.</li>
</ul>

<p>Thank god for Git and my obsession with <a href="https://tvtropes.org/pmwiki/pmwiki.php/Main/SaveScumming">save scumming</a> my way back to a sensible state.</p>

<hr />

<h2 id="meanwhile-combat-system-">Meanwhile… Combat System? 🫣</h2>

<p>Hang on though.
Wasn’t I working on a <em>combat system</em>?
Damn it.</p>

<p>Okay, where was I?</p>

<p>Ah yes — my ever-growing planning folder to the rescue.</p>

<p>I’ve developed a habit of getting the AI to output the result of any long feature discussion into a spec or planning file that gets committed into the repo.</p>

<p>Which is how I ended up with… about a dozen specs.</p>

<p>Some were implemented, some were “I realized this was a tangent and saved myself,” and some were very much still alive. After some reorganizing, I was down to a high level planning doc and a solid todo list that was a solid active plan.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>➜  horizons-edge git:(main) ✗ ls planning/6-
6-implementation-todo.md
6-test-deck-archetypes-energy-abilities.md
</code></pre></div></div>

<p>Then came the fun part: getting the AI to validate how accurate my current TODO list was against the plan — and then, obviously, not trusting it at all and just starting to build anyway 😄.</p>

<hr />

<h2 id="back-on-track-blink-">Back on Track: Blink ✨</h2>

<p>At last, <em>finally</em>, I was back on combat system track.
And the first order of business: give my test creature a <strong>blink ability</strong>.</p>

<p><img src="/assets/road-to-combat/radial.png" alt="blink ability!" /></p>

<p>Let’s see how far I get next week.</p>

<p>Until then, I hope you’re enjoying whatever tangent is currently distracting you from your side project. I salute you, fellow tangent adventurers. 🫡</p>

<hr />

<h2 id="-progress-summary-since-last-time">📝 Progress Summary Since last time.</h2>

<ul>
  <li>🚧 <strong>Road System:</strong> Added straight roads, corners, intersections, and 3–5-way connections. Complex rotation logic, road-to-hex connectivity.</li>
  <li>🧱 <strong>3D Asset Library:</strong> Integrated Kenney Hexagon Kit (180+ models, textures, documentation).</li>
  <li>🏹 <strong>Combat &amp; Abilities:</strong> Projectile system, Archer &amp; Scout cards, ability system improvements, radial menu, energy planning.</li>
  <li>🖱 <strong>Input &amp; UI:</strong> New input controller with hex highlighting, better radial menu, multiplayer safeguards, grid visualization.</li>
  <li>🧭 <strong>3D Model Integration:</strong> <code class="language-plaintext highlighter-rouge">hex_tile_model_config.gd</code>, updated rendering, road path visualization.</li>
  <li>🗂 <strong>Documentation:</strong> New spec for energy payment (spec 7), reorganized planning docs.</li>
</ul>]]></content><author><name>Peter van Onselen</name><email>augury_upsurge.17@icloud.com</email><uri>https://www.petervanonselen.com</uri></author><category term="personal" /><category term="board-game" /><category term="godot" /><category term="video-game" /><category term="claudecode" /><category term="codex" /><category term="vibecoding" /><summary type="html"><![CDATA[I set out to make a combat system. I returned with roads, models, and a blink ability…]]></summary></entry></feed>