Agentic Engineering Weekly for May 22-30, 2026

Code is the result of hundreds of decisions an hour, coding agents that makes them in your stead are not saving you any time. Big tech is waking up to the fact that tokenmaxxing only results in higher inference bills.

My top 3 picks this week

Demo: The Human + AI-Agent Dev Team: Henrik Kniberg walks you through Abundly's agentic setup. A very rare view inside of a team that is 10x'ing with the help of AI (video)
What is spec-driven development?: Birgitta Boeckeler and Laura Tacho on how SDD is already suffering semantic diffusion. (podcast)
Building OpenCode with Dax Raad: Very down-to-earth perspective on today's agentic engineering landscape by someone knee-deep in it. (podcast)

Software engineering is decisions and comprehension, neither one parallelizes

Priscila Andre de Oliveira did something most AI-productivity takes never bother with: she counted. Analyzing 116 of her own Claude sessions at Sentry, in a codebase with 15 years of history and roughly 100 PRs merged per day, she found 67% were comprehension and 2% were code generation. The scarce resource is building enough of a mental model of a large, living system to know what change is even safe to make. Coding agents don't remove all of that work.

If generation is only a fraction of the effort, optimizing generation speed optimizes for the wrong part of the job. Stack Overflow's "artisans and builders" piece circles the same point from the craft angle, and Adam Tornhill's "Illusion of Known Scope" gives it a historical spine: implementation was always discovery and learning, never just transcription. Context is the lever, not model choice. The teams getting value feed the agent a sharp understanding of the system and keep their own model of it current. What we used to call "writing code" was mostly comprehension and making trade-off decisions in disguise.

Russ Miles, at JAX 2026, gives this failure mode its sharpest name. The two metaphors that govern how we talk about software, Building and Factory, both position the human downstream of AI output: an architect reviewing what the builder made, or a manager watching throughput dashboards light up green. Both metaphors agree that AI output is something you accept or reject, not something you participate in. That, Miles argues, is Waterfall smuggled back in at the level of decisions. A developer makes hundreds of nano-decisions an hour and code is just the public record of all of them, calling AI output "just implementation" surrenders the actual substance of a lot of the work. Each accepted-but-unexamined decision is cognitive debt, a choice made on your behalf that erodes your future capacity to decide.

What makes this point worth some serious consideration: your comprehension doesn't parallellize. Addy Osmani named it the Orchestration Tax. Spawning more agents is trivial now, but more agents running doesn't mean more of you available. The judgement to steer them, understand their output, and merge their code all routes through exactly one serial processor, which is you and your teammates' neurons. Brandon Waselnuk's demo makes the goal concrete: "AI-generated code should feel like it was written by someone who's been on the your team for years". Consistent patterns and shared mental models lower the comprehension cost of AI-generated code.

Osmani says architect your attention like a concurrent system; Miles says regulate to cadence, not throughput, where the right speed is the speed at which you can stay in the decision stream. Surface the silent decisions so they become objects of attention, trim them to the handful you can engage with well, and run at a pace you can hold. Stop reviewing what the AI built; stay in the comprehension and the decisions, because that was always the 98%.

Or as I like to frame it: you still draw the picture, let the agent fill in the colors.

Worth reading:

The AI Skill I Rely On Daily — Priscila Andre de Oliveira, Sentry: The 67%/2% number from 116 real sessions in a 15-year codebase. Hard data, not vibes. (video)
The Orchestration Tax: Addy Osmani's coined concept, and the "architect your attention like a concurrent system" prescription. (article)
Does AI deliver waterfall? — Russ Miles (JAX 2026): Why Building and Factory metaphors recreate Waterfall at the decision level, and the cognitive-debt model that follows. (article)
Stop babysitting your agents — Brandon Waselnuk, Unblocked: The 2.5-hours-vs-25-minutes numbers that make the tax concrete. (video)
A Blast from the Past: SDD and the Illusion of Known Scope: Tornhill's reminder that implementation was always learning, not typing. (article)

Tokenmaxxing finally blows up

Tokenmaxxing is the emerging corporate habit of treating token spend as a proxy for developer value. The problem is that token consumption measures activity, not output. It's the line-of-code metric reincarnated with a worse unit and a real-time billing meter attached. Burn more tokens and you look busy; whether anything shippable came out the other end is a separate question nobody's has been asking lately.

Uber reportedly burned through a full year of AI budget in four months, and the diagnosis wasn't that the technology failed. It was that the company was both measuring the wrong thing and budgeted for older, cheaper models. A UX designer captured the whole arc on Reddit: they were told to go all-in on AI one year, put on an "extreme token diet" the next. The subsidy era has officialy ended.

The message to take home isn't "AI is too expensive". It's that spend is not a productivity metric, and any org optimizing for token throughput is measuring effort and calling it value. If you can't connect token spend to outcomes, you're funding theater.

Worth reading:

The Dumbest New Trend in Coding Productivity Setting Money on Fire: Internet of Bugs traces tokenmaxxing from hype to balance sheet. (video)
AI Is More Expensive Than Humans: The Uber "year of budget in four months" story, and what it was actually measuring. (video)

Anthropic found product-market fit, the bubble debate continues

Anthropic's $65B Series H announcement slipped in that run-rate revenue crossed $47B earlier in the month, and Simon Willison read the wave of "our LLM bill is shockingly large" tokenmaxxing enterprise stories as the confirmation of product-market fit: companies are now paying real API prices because the tools are worth it to them. That's a different claim than "the hype is justified." It's that the demand is real and metered, whatever happens to the valuations stacked on top.

Ed Zitron answered with the third installment of his AI-bubble series, and the two positions are less contradictory than they look. Genuine product-market fit and an inflating bubble can coexist: real revenue doesn't make the capacity spend sustainable, and Zitron's scenarios (token-based billing, the capacity crunch, hyperscaler exposure) are about whether the cost structure holds, not whether anyone wants the product. A structural story sits underneath both: the memory shortage, driven by just three remaining manufacturers splitting fixed wafer capacity, is repricing consumer electronics, because the chips that train models compete for the same fab time as the ones in your PS5. I don't think I've ever seen my gaming hardware appreciate in value before, but the compute crunch is making it a reality.

The takeaway for practitioners is to hold both threads at once. The tools have found their market and the economics are still unresolved. Build on them, but build on the assumption that today's prices are a floor, not a ceiling, and that the cost of the compute under your agents is going to keep showing up in places you didn't expect, including the hardware on your desk.

Worth reading:

I think Anthropic and OpenAI have found product-market fit: Willison's read that the surprise bills are the tell, not the warning. (article)
Anthropic's run-rate revenue hits $47 billion: The number from the Series H announcement, and what run-rate actually means. (article)
The memory shortage is causing a repricing of consumer electronics: Why the data-center buildout makes your next laptop more expensive. (article)
What If...We're In An AI Bubble? (Part 3): Zitron's case that real revenue doesn't make the cost structure hold. (article)

Opus 4.8 and the refreshing honesty of "modest but tangible"

Anthropic shipped Claude Opus 4.8 and, in its own release notes, called it "a modest but tangible improvement on its predecessor," adding that it's still working on delivering Opus-level capability at lower cost. Simon Willison's favorite part of the launch was exactly that honesty: an AI lab describing a release as a minor incremental bump instead of a paradigm shift. After two years of every checkpoint being heralded as a revolution, a vendor calling its own update incremental is genuinely novel, and a small signal that the marketing temperature might finally be dropping toward something useful.

Meanwhile the review economy spun up overnight, with channels racing to post leaderboard comparisons within 48 hours, which is its own lesson about how the incremental-release ritual now runs on a schedule independent of how much the model actually changed.

Worth reading:

Claude Opus 4.8: "a modest but tangible improvement": The case that honest, boring release notes are a feature. (article)
No hype Claude Opus 4.8 review — my real experience: Early-access hands-on across Claude Code and Cowork, minus the leaderboard theatre. (video)

Quick Hits

Notes on Pope Leo XIV's encyclical on AI: The Vatican's Magnifica Humanitas is some of the clearest writing on AI ethics yet, named after Leo XIII's 1891 encyclical on capital and labor. (article)
AI didn't kill your junior pipeline. You did: If we stop training juniors now, we'll be buying seniors at a premium from an empty market in five years. (article)
The just-say-no engineer was a ZIRP phenomenon: The engineer whose value was refusing things was a zero-interest-rate luxury. (article)
Building OpenCode with Dax Raad: OpenCode's surge from ~650K to nearly 8M monthly active users in months. (podcast)
Software engineering at the tipping point: Systems thinking for how developer ecosystems steer software evolution under AI. (video)

Curated from articles, podcasts, and videos. Week of May 22-30, 2026.