Agentic Engineering Weekly for May 29 – June 6, 2026

Russ Miles argued that our governing metaphors recreate Waterfall at the decision level, DORA published evidence that scaling generation without scaling verification creates a J-curve most teams never escape, and rsync proved the point by shipping 36 AI-assisted commits that broke backups worldwide.

My top 3 picks this week

8 myths on software engineering and AI: Research-backed mythbusting. This week's must-read. (paper)
AI makes weak engineers less harmful: In a cultivated agentic habitat, both AI and weak engineers become more effective. (article)
Protecting against token theft: This article made me chuckle. Chipotle-AI, aka stealing tokens from public chatbots (article)

More on cognitive debt

I really enjoyed Russ Miles' JAX 2026 keynote. The two metaphors that govern how we talk about software, Building and Factory, both position the human downstream of AI output. The building metaphor flatters the work (architect status, legitimacy borrowed from civil engineering). The factory metaphor flatters the management of it (throughput, velocity, dashboards). Both agree on one thing: AI output is something you accept or reject, not something you participate in. That, Miles argues, is Waterfall smuggled back in at the level of decisions.

Software is making hundreds of little decisions under uncertainty. Software is learning. If you outsource all the building, you outsource the decision making and the learning. Jack Reeves already told us in 1992 that compilers do the actual building while engineers make trade-offs.

Worth reading:

The Intent Debt: Osmani names the debt your agents can't pay down: the goals, constraints and rationale you never wrote down. (article)
Building Software Is Learning: Thorsten Ball's internal Amp message on why outsourcing the building outsources the learning. (article)
Code Isn't Product: Rich Mironov on the growing confusion between creating code and creating products. (article)
BDD, ADR, PRD, WTF: Capturing Decisions for Humans and AI Alike: Cucumber is suddenly useful again for closing the gap between spec and behavior. (video)

The verification tax

DORA's "Finding balance in the era of tokenmaxxing" makes the following claim: AI might help you write 10x more code, but if you haven't scaled your ability to verify it, you've just made the problem worse. Nathen Harvey calls this the verification tax, and it's what drives the productivity J-curve most teams hit when adopting AI. Output goes up immediately; the ability to know whether that output is correct doesn't scale at the same rate, and the gap between the two is where the damage accumulates.

The rsync incident is the verification tax in production. Thirty-six commits by "tridge and claude" shipped, and they broke everyone's backups. Not in a toy project, not in a startup moving fast: in one of the most relied-upon backup tools on earth. A detailed analysis asks whether Claude actually increased bugs, and the answer is less interesting than the structural fact that nobody caught them before they shipped. Meanwhile, Martin Fowler flagged Greg Wilson's observation that all common AI productivity metrics are flawed: lines of code, tickets closed, developer surveys, each broken in a different way. Fowler's conclusion: since we cannot measure productivity, any metrics are suspect.

Jellyfish's data across 250,000 developers puts numbers on the other side: engineering teams are shipping twice as many PRs with AI, but merge rates on AI-generated PRs dropped from 80% to 60%. More output, less of it landing. The ACM Queue piece on "Eight Myths on Software Engineering and GenAI" challenges what the latest research actually says versus what leaders believe they know. And LeadDev's "PR reviews were already broken. AI made it worse" makes the same point: AI didn't break code review; it made a broken process impossible to ignore, and fixing it requires layers, not a single tool.

Worth reading:

DORA | Finding balance in the era of tokenmaxxing: The verification tax explained, with DORA's research on the J-curve behind it. (article)
AI Verification Challenges with Nathen Harvey: Harvey explains why scaling output without scaling verification is a structural trap. (video)
Did Claude Increase Bugs in rsync?: A detailed post-mortem of the 36 AI-assisted commits that broke the internet's backup tool. (article)
8 myths on software engineering and AI: What the research actually says, versus what your leadership team thinks it says. (article)
Why Developers Hit a Wall at 4 AI Agents: Jellyfish data on 250K developers: 2x PRs, merge rates down from 80% to 60%. (video)

Boris Cherny: the title "software engineer" disappears this year

Claude Code's creator told three separate audiences the same thing this week: he hasn't written a single line of code in six months, hasn't opened an IDE in a month, and believes the title "software engineer" could start to disappear as soon as this year. Some practitioners argue that renaming developers "builders" does a disservice to the craft and flattens the distinction between understanding a system and producing artifacts that happen to compile.

James Conroy-Finn's Doorman Fallacy argues that the companies selling AI as a replacement for developers are making the same mistake as supermarkets replacing till operators with self-checkout: they claim the cost saving and walk away before the value destruction becomes visible. Rory Sutherland, in a long conversation on Simplifying Complexity, provides the intellectual framework: short-term rationality leads to long-term irrationality, and efficiency is a bad proxy for effectiveness. Companies simplify a problem, solve for the simplified version, and end up with a sub-optimal outcome because they optimized for the wrong metric.

Aaron Brethorst rounds it out with the structural observation: when implementation cost approaches zero, domain expertise is the only remaining moat. Agents can write code better than most humans can. The differentiating factor is whether the people steering them understand the domain well enough to know what code to write, and whether the organizations employing them will figure that out before or after the value destruction that the Doorman Fallacy predicts.

Worth reading:

Boris Cherny: Claude Code & the Future of Engineering: The Acquired Unplugged appearance where the "no code in six months" claim landed. (video)
Claude Code creator says developers will become builders. Do they agree?: The pushback from practitioners who think "builder" flattens the role. (article)
The Doorman Fallacy: The self-checkout analogy for AI replacing developers. Sharp and uncomfortable. (article)
Domain Expertise Has Always Been the Real Moat: When implementation cost hits zero, what's left? (article)

The go-slow-to-go-fast countermovement

Nolan Lawson's "Using AI to write better code more slowly" makes the case that the point of AI coding is quality, not speed: use the model as a thinking partner, not a typing accelerator. Mozaic Works reacted with "Go Slow to Go Fast," grounding the argument in why speed metrics and line counts aren't quality metrics, and why AI makes that confusion worse rather than better. If you're measuring how fast you're going but not where you're going, going faster is just arriving at the wrong place sooner.

Worth reading:

Using AI to write better code more slowly: The case for AI as thinking partner, not typing accelerator. (article)
Vibecoding Cost Me $20,000: The Google Cloud bill that proves engineering practices aren't optional, even with AI. (video)
The solution might be cancelling my AI subscription: 16 projects nobody asked for. The distraction tax of AI-enabled scope creep. (article)

Prompts accumulate technical debt too.

Sean Goedecke named what a lot of teams are discovering: agent instructions, markdown files, and project prompts decay silently, just like stale tests do. They worked when you wrote them. They haven't been updated since. The model changed, your codebase changed, and the instructions now describe a system that no longer exists. Theo amplifies the point: more prompts equals worse code when the prompts grow stale over time. It's the same rot that kills code, just in a format nobody has a linter for yet.

Emmz Rendle on .NET Rocks! adds the strategic dimension. They warn about the upcoming rug pull in AI dev tools: prices are rising, services are being restricted, and having enough control to choose when to run locally is becoming key to staying productive at a reasonable price. If your agent workflow is locked to a vendor that's about to raise prices, owning your harness isn't just a technical preference. It's a business decision.

Worth reading:

Prompts are technical debt too: The argument that prompt rot is test rot in a format nobody lints. (article)
Extreme Harness Engineering: 1M LOC, 1B toks/day, 0% human code or review: OpenAI's Ryan Lopopolo on the extreme automation end of the spectrum. (video)
The Config That Changed Everything: One YAML line, 0/100 to 100/100. The harness is the product. (article)
Stop letting your AI agent grep its way around your codebase: Give your agent proper code navigation and stop burning tokens. (article)

AI makes weak engineers less harmful, layoffs are targeting the middle

Sean Goedecke flipped the usual "AI helps the best engineers most" narrative with a simple, testable claim: AI does a surprisingly good job of steering weaker engineers in the right direction. Rather than amplifying the gap between strong and weak, AI working in a cultivated habitat acts as guardrails, nudging less experienced developers toward patterns and practices they wouldn't have reached on their own. Theo picked it up and largely agreed, adding that this tracks with what he sees in teams using Cursor and Claude Code daily.

Thirty percent of all 2026 layoffs are reportedly targeting middle management, because AI is automating the core function managers were hired for: routing information. Gene Kim's IT Revolution closing remarks grounded the shift: the cost of code production is approaching zero and code can be conjured through conversation alone, which means the bottleneck has moved upstream (figuring out what to build) and downstream (getting it adopted).

The Kelsey Hightower interview on The Pragmatic Engineer is a career story worth listening to in full: from self-taught DSL modem installer to Google Distinguished Engineer, demonstrating what a lifetime of authenticity, domain expertise and continuous learning look like in practice.

Worth reading:

AI makes weak engineers less harmful: The contrarian take with a simple, testable premise. (article)
The "Bright" Future of Middle Management: 30% of 2026 layoffs targeting the information-routing layer. (video)
Kubernetes and retiring at the top with Kelsey Hightower: The career arc that shows what domain expertise looks like as the moat. (podcast)
Closing Remarks (Day 1) - Gene Kim: Gene Kim on the cost of code approaching zero, and what that means for where the bottleneck moves. (video)

Sandboxing and containment became an engineering discipline

Anthropic published a thorough document on how they contain Claude across products: process sandboxes, VMs, filesystem boundaries, and egress controls. Simon Willison's praise was for the documentation itself: in the absence of detailed documentation, it's impossible to know how much to trust a sandbox, and most vendors don't bother. Anthropic did. Meanwhile, Willison's own micropython-wasm project takes a different approach: running Python code inside MicroPython compiled to WebAssembly, a sandbox so constrained that GPT-5.5 has so far failed to break out.

Remy Guercio from Tailscale makes the argument that most sandboxing conflates two separate problems: execution isolation and credential management. Standard sandboxing puts the API key inside the sandbox, which means the agent has the key. If it can exfiltrate it, the sandbox is protecting the wrong thing. Meanwhile, Meta provided the cautionary example: their AI support bot simply handed over Instagram accounts when a hacker politely asked it to link a new email address. And the OWASP Top 10 added memory safety and vibe-coding as awareness items, acknowledging that the attack surface has shifted alongside the tooling.

Worth reading:

How we contain Claude across products: Rare, thorough documentation of sandbox architecture across Claude.ai, Claude Code, and Cowork. (article)
Running Python code in a sandbox with MicroPython and WASM: Willison's sandbox experiment: MicroPython in WebAssembly, GPT-5.5 can't escape. (article)
What if the network was the sandbox?: Tailscale's argument for separating execution isolation from credential management. (video)
Hackers Simply Asked Meta AI to Give Them Access to Instagram Accounts: What happens when you skip containment entirely. (article)

Quick Hits

When AI builds itself: Anthropic's progress toward recursive self-improvement, and its implications. Can also be read as a load of pre-IPO fear-driven PR from one of the frontrunners aiming for regulation in order to slow down competitors at their heels. (article)
Uber Caps Usage of AI Tools Like Claude Code to Manage Costs: $1,500/month per employee after blowing the annual budget in four months. (article)
AI Doesn't Have ROI: Ed Zitron continues the bubble analysis with Hater's Guide 3.0. (article)
Be thou not pilled: A timely reminder via Mackay's 1841 classic: people get captured by ideas, and most of the ideas aren't even stupid. (article)
Alex Imas and Phil Trammell: What remains scarce after AGI?: Dwarkesh on optimal taxation, redistribution, and what can't be automated. (podcast)

Curated from 130+ sources across articles, podcasts, and videos. Week of May 29 – June 6, 2026.