Agentic Engineering Weekly for May 29 – June 6, 2026

Share
Agentic Engineering Weekly for May 29 – June 6, 2026

Russ Miles argued that our governing metaphors recreate Waterfall at the decision level, DORA published evidence that scaling generation without scaling verification creates a J-curve most teams never escape, and rsync proved the point by shipping 36 AI-assisted commits that broke backups worldwide.


My top 3 picks this week


More on cognitive debt

I really enjoyed Russ Miles' JAX 2026 keynote. The two metaphors that govern how we talk about software, Building and Factory, both position the human downstream of AI output. The building metaphor flatters the work (architect status, legitimacy borrowed from civil engineering). The factory metaphor flatters the management of it (throughput, velocity, dashboards). Both agree on one thing: AI output is something you accept or reject, not something you participate in. That, Miles argues, is Waterfall smuggled back in at the level of decisions.

Software is making hundreds of little decisions under uncertainty. Software is learning. If you outsource all the building, you outsource the decision making and the learning. Jack Reeves already told us in 1992 that compilers do the actual building while engineers make trade-offs.

Worth reading:


The verification tax

DORA's "Finding balance in the era of tokenmaxxing" makes the following claim: AI might help you write 10x more code, but if you haven't scaled your ability to verify it, you've just made the problem worse. Nathen Harvey calls this the verification tax, and it's what drives the productivity J-curve most teams hit when adopting AI. Output goes up immediately; the ability to know whether that output is correct doesn't scale at the same rate, and the gap between the two is where the damage accumulates.

The rsync incident is the verification tax in production. Thirty-six commits by "tridge and claude" shipped, and they broke everyone's backups. Not in a toy project, not in a startup moving fast: in one of the most relied-upon backup tools on earth. A detailed analysis asks whether Claude actually increased bugs, and the answer is less interesting than the structural fact that nobody caught them before they shipped. Meanwhile, Martin Fowler flagged Greg Wilson's observation that all common AI productivity metrics are flawed: lines of code, tickets closed, developer surveys, each broken in a different way. Fowler's conclusion: since we cannot measure productivity, any metrics are suspect.

Jellyfish's data across 250,000 developers puts numbers on the other side: engineering teams are shipping twice as many PRs with AI, but merge rates on AI-generated PRs dropped from 80% to 60%. More output, less of it landing. The ACM Queue piece on "Eight Myths on Software Engineering and GenAI" challenges what the latest research actually says versus what leaders believe they know. And LeadDev's "PR reviews were already broken. AI made it worse" makes the same point: AI didn't break code review; it made a broken process impossible to ignore, and fixing it requires layers, not a single tool.

Worth reading:


Boris Cherny: the title "software engineer" disappears this year

Claude Code's creator told three separate audiences the same thing this week: he hasn't written a single line of code in six months, hasn't opened an IDE in a month, and believes the title "software engineer" could start to disappear as soon as this year. Some practitioners argue that renaming developers "builders" does a disservice to the craft and flattens the distinction between understanding a system and producing artifacts that happen to compile.

James Conroy-Finn's Doorman Fallacy argues that the companies selling AI as a replacement for developers are making the same mistake as supermarkets replacing till operators with self-checkout: they claim the cost saving and walk away before the value destruction becomes visible. Rory Sutherland, in a long conversation on Simplifying Complexity, provides the intellectual framework: short-term rationality leads to long-term irrationality, and efficiency is a bad proxy for effectiveness. Companies simplify a problem, solve for the simplified version, and end up with a sub-optimal outcome because they optimized for the wrong metric.

Aaron Brethorst rounds it out with the structural observation: when implementation cost approaches zero, domain expertise is the only remaining moat. Agents can write code better than most humans can. The differentiating factor is whether the people steering them understand the domain well enough to know what code to write, and whether the organizations employing them will figure that out before or after the value destruction that the Doorman Fallacy predicts.

Worth reading:


The go-slow-to-go-fast countermovement

Nolan Lawson's "Using AI to write better code more slowly" makes the case that the point of AI coding is quality, not speed: use the model as a thinking partner, not a typing accelerator. Mozaic Works reacted with "Go Slow to Go Fast," grounding the argument in why speed metrics and line counts aren't quality metrics, and why AI makes that confusion worse rather than better. If you're measuring how fast you're going but not where you're going, going faster is just arriving at the wrong place sooner.

Worth reading:


Prompts accumulate technical debt too.

Sean Goedecke named what a lot of teams are discovering: agent instructions, markdown files, and project prompts decay silently, just like stale tests do. They worked when you wrote them. They haven't been updated since. The model changed, your codebase changed, and the instructions now describe a system that no longer exists. Theo amplifies the point: more prompts equals worse code when the prompts grow stale over time. It's the same rot that kills code, just in a format nobody has a linter for yet.

Emmz Rendle on .NET Rocks! adds the strategic dimension. They warn about the upcoming rug pull in AI dev tools: prices are rising, services are being restricted, and having enough control to choose when to run locally is becoming key to staying productive at a reasonable price. If your agent workflow is locked to a vendor that's about to raise prices, owning your harness isn't just a technical preference. It's a business decision.

Worth reading:


AI makes weak engineers less harmful, layoffs are targeting the middle

Sean Goedecke flipped the usual "AI helps the best engineers most" narrative with a simple, testable claim: AI does a surprisingly good job of steering weaker engineers in the right direction. Rather than amplifying the gap between strong and weak, AI working in a cultivated habitat acts as guardrails, nudging less experienced developers toward patterns and practices they wouldn't have reached on their own. Theo picked it up and largely agreed, adding that this tracks with what he sees in teams using Cursor and Claude Code daily.

Thirty percent of all 2026 layoffs are reportedly targeting middle management, because AI is automating the core function managers were hired for: routing information. Gene Kim's IT Revolution closing remarks grounded the shift: the cost of code production is approaching zero and code can be conjured through conversation alone, which means the bottleneck has moved upstream (figuring out what to build) and downstream (getting it adopted).

The Kelsey Hightower interview on The Pragmatic Engineer is a career story worth listening to in full: from self-taught DSL modem installer to Google Distinguished Engineer, demonstrating what a lifetime of authenticity, domain expertise and continuous learning look like in practice.

Worth reading:


Sandboxing and containment became an engineering discipline

Anthropic published a thorough document on how they contain Claude across products: process sandboxes, VMs, filesystem boundaries, and egress controls. Simon Willison's praise was for the documentation itself: in the absence of detailed documentation, it's impossible to know how much to trust a sandbox, and most vendors don't bother. Anthropic did. Meanwhile, Willison's own micropython-wasm project takes a different approach: running Python code inside MicroPython compiled to WebAssembly, a sandbox so constrained that GPT-5.5 has so far failed to break out.

Remy Guercio from Tailscale makes the argument that most sandboxing conflates two separate problems: execution isolation and credential management. Standard sandboxing puts the API key inside the sandbox, which means the agent has the key. If it can exfiltrate it, the sandbox is protecting the wrong thing. Meanwhile, Meta provided the cautionary example: their AI support bot simply handed over Instagram accounts when a hacker politely asked it to link a new email address. And the OWASP Top 10 added memory safety and vibe-coding as awareness items, acknowledging that the attack surface has shifted alongside the tooling.

Worth reading:


Quick Hits


Curated from 130+ sources across articles, podcasts, and videos. Week of May 29 – June 6, 2026.