Agentic Engineering Weekly for May 22-30, 2026

Share
Agentic Engineering Weekly for May 22-30, 2026

Code is the result of hundreds of decisions an hour, coding agents that makes them in your stead are not saving you any time. Big tech is waking up to the fact that tokenmaxxing only results in higher inference bills.


My top 3 picks this week


Software engineering is decisions and comprehension, neither one parallelizes

Priscila Andre de Oliveira did something most AI-productivity takes never bother with: she counted. Analyzing 116 of her own Claude sessions at Sentry, in a codebase with 15 years of history and roughly 100 PRs merged per day, she found 67% were comprehension and 2% were code generation. The scarce resource is building enough of a mental model of a large, living system to know what change is even safe to make. Coding agents don't remove all of that work.

If generation is only a fraction of the effort, optimizing generation speed optimizes for the wrong part of the job. Stack Overflow's "artisans and builders" piece circles the same point from the craft angle, and Adam Tornhill's "Illusion of Known Scope" gives it a historical spine: implementation was always discovery and learning, never just transcription. Context is the lever, not model choice. The teams getting value feed the agent a sharp understanding of the system and keep their own model of it current. What we used to call "writing code" was mostly comprehension and making trade-off decisions in disguise.

Russ Miles, at JAX 2026, gives this failure mode its sharpest name. The two metaphors that govern how we talk about software, Building and Factory, both position the human downstream of AI output: an architect reviewing what the builder made, or a manager watching throughput dashboards light up green. Both metaphors agree that AI output is something you accept or reject, not something you participate in. That, Miles argues, is Waterfall smuggled back in at the level of decisions. A developer makes hundreds of nano-decisions an hour and code is just the public record of all of them, calling AI output "just implementation" surrenders the actual substance of a lot of the work. Each accepted-but-unexamined decision is cognitive debt, a choice made on your behalf that erodes your future capacity to decide.

What makes this point worth some serious consideration: your comprehension doesn't parallellize. Addy Osmani named it the Orchestration Tax. Spawning more agents is trivial now, but more agents running doesn't mean more of you available. The judgement to steer them, understand their output, and merge their code all routes through exactly one serial processor, which is you and your teammates' neurons. Brandon Waselnuk's demo makes the goal concrete: "AI-generated code should feel like it was written by someone who's been on the your team for years". Consistent patterns and shared mental models lower the comprehension cost of AI-generated code.

Osmani says architect your attention like a concurrent system; Miles says regulate to cadence, not throughput, where the right speed is the speed at which you can stay in the decision stream. Surface the silent decisions so they become objects of attention, trim them to the handful you can engage with well, and run at a pace you can hold. Stop reviewing what the AI built; stay in the comprehension and the decisions, because that was always the 98%.

Or as I like to frame it: you still draw the picture, let the agent fill in the colors.

Worth reading:


Tokenmaxxing finally blows up

Tokenmaxxing is the emerging corporate habit of treating token spend as a proxy for developer value. The problem is that token consumption measures activity, not output. It's the line-of-code metric reincarnated with a worse unit and a real-time billing meter attached. Burn more tokens and you look busy; whether anything shippable came out the other end is a separate question nobody's has been asking lately.

Uber reportedly burned through a full year of AI budget in four months, and the diagnosis wasn't that the technology failed. It was that the company was both measuring the wrong thing and budgeted for older, cheaper models. A UX designer captured the whole arc on Reddit: they were told to go all-in on AI one year, put on an "extreme token diet" the next. The subsidy era has officialy ended.

The message to take home isn't "AI is too expensive". It's that spend is not a productivity metric, and any org optimizing for token throughput is measuring effort and calling it value. If you can't connect token spend to outcomes, you're funding theater.

Worth reading:


Anthropic found product-market fit, the bubble debate continues

Anthropic's $65B Series H announcement slipped in that run-rate revenue crossed $47B earlier in the month, and Simon Willison read the wave of "our LLM bill is shockingly large" tokenmaxxing enterprise stories as the confirmation of product-market fit: companies are now paying real API prices because the tools are worth it to them. That's a different claim than "the hype is justified." It's that the demand is real and metered, whatever happens to the valuations stacked on top.

Ed Zitron answered with the third installment of his AI-bubble series, and the two positions are less contradictory than they look. Genuine product-market fit and an inflating bubble can coexist: real revenue doesn't make the capacity spend sustainable, and Zitron's scenarios (token-based billing, the capacity crunch, hyperscaler exposure) are about whether the cost structure holds, not whether anyone wants the product. A structural story sits underneath both: the memory shortage, driven by just three remaining manufacturers splitting fixed wafer capacity, is repricing consumer electronics, because the chips that train models compete for the same fab time as the ones in your PS5. I don't think I've ever seen my gaming hardware appreciate in value before, but the compute crunch is making it a reality.

The takeaway for practitioners is to hold both threads at once. The tools have found their market and the economics are still unresolved. Build on them, but build on the assumption that today's prices are a floor, not a ceiling, and that the cost of the compute under your agents is going to keep showing up in places you didn't expect, including the hardware on your desk.

Worth reading:


Opus 4.8 and the refreshing honesty of "modest but tangible"

Anthropic shipped Claude Opus 4.8 and, in its own release notes, called it "a modest but tangible improvement on its predecessor," adding that it's still working on delivering Opus-level capability at lower cost. Simon Willison's favorite part of the launch was exactly that honesty: an AI lab describing a release as a minor incremental bump instead of a paradigm shift. After two years of every checkpoint being heralded as a revolution, a vendor calling its own update incremental is genuinely novel, and a small signal that the marketing temperature might finally be dropping toward something useful.

Meanwhile the review economy spun up overnight, with channels racing to post leaderboard comparisons within 48 hours, which is its own lesson about how the incremental-release ritual now runs on a schedule independent of how much the model actually changed.

Worth reading:


Quick Hits


Curated from articles, podcasts, and videos. Week of May 22-30, 2026.