Agentic Engineering Weekly for May 1-9, 2026

This week DORA weighs in on AI-assisted software development ROI in enterprise, the bottleneck conversation finally is starting to move up- and downstream from code to organization concerns, we gave the cognitive cost a sharper name and it turns out that in dire times, the enemy of my OpenAI enemy is my xAI friend.

My top 3 picks this week

DORA | ROI of AI-assisted Software Development report: The latest numbers on AI-assisted software development ROI in enterprise (article)
The Sovereign Engineer: Russ Miles' e-book that closely matches my own philosophy on how to get the best out of the clankers today (paid e-book)
Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender: Academic paper that coins "tri-system theory of cognition" and cognitive surrender. All y'all Kahneman nerds will love this one (paper)

DORA weighs in on the ROI of AI-assisted software development

DORA published its "ROI of AI-Assisted Software Development" report yesterday, joining CircleCI, last week's DX 10-15% velocity number, BCG's CIO survey, and the Faros Acceleration Whiplash report. Four credible sources now triangulate on the same conclusion: the productivity dividend from AI is real but modest, and the variance between teams dwarfs the average. The vendor case study era is effectively over. If your enterprise CFO walks into the next budget review waving a hand-picked Cursor anecdote, the reports you cite back are published and methodologically defensible.

What makes the DORA report load-bearing is the credibility behind the numbers. DORA has been measuring software delivery performance since the original State of Devops research, with all the survey rigor and longitudinal cohorts that implies. The framing matters more than the headline figure. AI does not raise the floor or the ceiling uniformly. Teams with strong delivery practices compound the gains. Teams without them get faster at producing code their organizations cannot review, deploy, or maintain. The dividend follows the operating model, not the tool.

Agent Driven Development picked up the same thread from a different angle this week with Find the Ceiling and Token Economics Is the Wrong Spreadsheet. The CFO is asking the wrong question because the biggest cost lives upstream of the tokens. Per-engineer token spend is the symptom of an organization that still treats AI as a per-engineer tool acquisition rather than a systemic software delivery concern. The Pulse caught GitHub buckling under 3.5x service load this week for the same structural reason: the demand (and the massive increase in output) is real, the absorption and desired outcomes are not.

Worth reading:

DORA | ROI of AI-assisted Software Development report: The empirical anchor your CFO will not argue with (article)
Find the Ceiling: A boardroom story that names the theater of measurement better than any framework (article)
The Pulse: AI load breaks GitHub - why not other vendors?: Concrete data on what 3.5x demand does to infrastructure that was sized for 2024 (article)
Programming in 2026: excitement, dread, and the coming wave: The mood-of-the-industry essay where both feelings are correct at the same time (article)

The bottleneck was never the code

The dominant claim of the week is that coding was never the bottleneck. The organization was. Last week the seed was a few isolated pieces. This week it is a chorus from Rob Bowley, an O'Reilly editorial, TestDouble's organizational observability frame, Eugene Yan's compound-with-AI playbook, Abi Noda on AI-native org design, and Anthropic's own director of engineering walking through what broke at Claude Code when agentic coding became the default. Different vocabularies, same diagnosis.

The mechanism is straightforward. Software delivery has always been bottlenecked somewhere between intent and outcome. AI compresses one segment of that pipeline (the code itself) and immediately exposes everything upstream and downstream. Hello Goldratt, hello Theory of Constraints. Hiring needs rethinking. Review processes were sized for slower throughput. Domain knowledge that lived in tribal memory now needs to be legible to agents. The organizations that capture the AI dividend are not the ones running inference on the best models, they are the ones that already had clear intent and well-instrumented feedback loops.

The new vocabulary worth tracking is organizational observability, TestDouble's term for the degree to which an organization's intent is sufficiently visible and coherent for people, including the agents we are now deploying, to make good decisions inside it. Most agent misalignment, the argument goes, does not start in the model. It starts upstream, in organizations that haven't made their own intent visible enough to navigate. Eugene Yan's framing in How to Work and Compound with AI is the personal-practice companion: context as infrastructure, taste as configuration, verification for autonomy, scale via delegation, closing the loop.

Worth reading:

Coding has never been the bottleneck: The throwaway claim with the receipts (article)
Organizational observability: The AI alignment problem starts upstream: A vocabulary upgrade that pays back every time you use it in a meeting (article)
How to Work and Compound with AI: Five-part personal playbook that turns the abstract advice into something you can practice on Monday (article)
Running an AI-native engineering org: Anthropic's own director of engineering on what broke when agentic coding became the org-wide default (video)

Cognitive surrender enters the lexicon

Cognitive offloading is delegating to the AI and still owning the answer. Cognitive surrender is when the AI's output quietly becomes your output and there is nothing left to check. The cognitive-debt vocabulary got another entry, and three sources arriving at it in the same week is the strongest signal that the concept is sticking.

The line moves under your feet most days. You start the week asking the agent for boilerplate. By Friday you are accepting larger and larger PRs without reading the implementation, because the tests pass and the diff is too long to scan. Lars Faye's Agentic Coding is a Trap maps the personal-practice failure mode. Siddhant Khare's AI fatigue is real names the embodied symptom: more productive, more exhausted, paradox unresolved. JetBrains' What Is AI Doing to Your Developer Brain is the IDE-vendor admission that the long-term trajectory worries them too.

What makes cognitive surrender different from previous warnings is the framing. It is not asking you to use AI less. It is asking you to notice when you have stopped doing the verification work the productivity claim depends on. The DORA report and the cognitive surrender literature are saying the same thing from opposite ends. Teams that maintain judgment compound their gains. Teams that surrender judgment compound their fragility. The bench is judgment, not throughput.

Worth reading:

Cognitive Surrender: The popular writing piece that names what most of us are doing without noticing (article)
Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender: The academic paper that coins "tri-system theory of cognition" and cognitive surrender. All y'all Kahneman nerds will love this one (paper)
Agentic Coding is a Trap: The personal-practice mirror image of Addy's argument (article)
AI fatigue is real and nobody talks about it: The embodied symptom that the productivity numbers do not capture (article)
Did We Do This to Ourselves?: Kent Beck and Angie Jones on whether the bargain we offered workers had to be this one (article)
What Is AI Doing to Your Developer Brain?: The IDE-vendor edition of the cognitive-cost worry (video)

The line between vibe coding and agentic engineering is thinner than we'd like to believe

Karpathy drew a clean line: vibe coding raises the floor, agentic engineering raises the ceiling. This week Simon Willison admitted the line has erased itself in his own daily work. Boris Cherny, the creator of Claude Code, repeats his "coding is solved" stance. Louis Knight-Webb at AI Engineer London argues software engineering is becoming plan-and-review. The taxonomy from Q1 is gone. The activity that replaces it does not have a clean name yet, and that conceptual gap matters more than it sounds.

The old taxonomy was useful because it told juniors what to aim for and seniors what to keep. Without it, the question of who does what becomes harder to answer. Plan-and-review as a job description sounds clean until you realize the plan, the review, and the execution all interleave, with the agent stitching them together. A good harness moves you fluidly between intent and verification. A bad one leaves you confused about whether you are still driving the work or just signing off on it.

Boris Cherny's claim that he has not written a line of code in 2026 and ships dozens of PRs a day from his phone is a position statement, not a description of the median developer's job. It is also a credible upper bound on where the trajectory points. Bloomberg's mainstream coverage of vibe coding this week (a warehouse owner building shipping software, a designer shipping her first app with zero technical experience) is the same trajectory at the bottom of the curve. The middle is what is conceptually homeless.

The cleanest name on offer for what fills that middle is found in Russ Miles' new book: the sovereign engineer. The professional engineering job is no longer to type the code: it is to build, grow, and live inside the habitat the code gets produced in. Harness engineering, context curation, specification-first development, and the platform discipline that lets a team share all of the above. The vibe coder rents the habitat someone else built. The agentic engineer is a passenger in a single agent loop. The sovereign engineer designs the habitat, owns the verification surface, and decides what the agent is allowed to compound. That is the job worth defending, and it is the one most engineers are not yet practicing.

Worth reading:

Vibe coding and agentic engineering are getting closer than I'd like: Simon Willison's confession that the line he was holding has erased itself (article)
Software Engineering Is Becoming Plan and Review: The plan-and-review thesis stated cleanly enough to argue with (video)
Anthropic's Boris Cherny: Why Coding Is Solved, and What Comes Next: Position statement from the person who built the harness most of us are using (video)
The Vibe Coding Era: Why AI Won't Replace Software Engineers: The mainstream-Bloomberg companion that shows the same shift at the bottom of the curve (video)
The Sovereign Engineer: Russ Miles' e-book that names habitat engineering as the job worth keeping (ebook)

The optimal programming language for coding agents

I published an experiment that the existing benchmarks do not seem to cover: same harness, same model, same non-trivial coding task, vary only the language. My hypothesis was that strongly typed languages would win because a fast compiler should give the agent a tight feedback loop, types should reduce the search space for fixes, and the agent should need fewer iterations. The data refused to confirm it. TypeScript averaged 27k tokens per task, JavaScript 28k, every other language 33-37k. Python sat at the top of the cost ladder at 37.2k. F#, my personal favorite language, was the slowest end-to-end because the compiler is slow.

Two things stood out. First, build counts needed to solve the problem were remarkably uniform across languages, clustered around 2-3 builds per task. The conceptual difficulty of the task is the same regardless of syntax. The friction the agent hits is in feedback-loop speed and token density, not in language expressiveness. Second, the JavaScript and TypeScript ecosystem is, right now, the cheapest place to point an agent. Token density and ecosystem maturity probably explain more of the result than type systems or availability in the LLM training data. Pass@1 leaderboards measure the wrong thing because they ignore the harness entirely.

Worth reading:

Does Programming Language Choice Even Matter Anymore?: My full bench writeup with the per-language numbers and the harness side-quest (article)
The Hidden Cost of Readability: How Code Formatting Silently...: Hard data that the formatting we love makes the agent worse (article)
Code for Machines, Not Just Humans: Quantifying AI-Friendliness...: The broader case that the optimization function for code has changed (article)

The bubble debate gets a railway-history analogy

The Panic of '26 lands the railway-history frame at the same moment Anthropic announces higher Claude usage limits backed by a fresh compute deal with SpaceX, and gets additional compute capacity from xAI to keep up. The strange-bedfellows era of AI infrastructure is here: the lab competing hardest with OpenAI is now leasing GPUs from Elon Musk's two compute-rich properties at once. The GitHub load story makes the demand-versus-financing asymmetry visible from the other end. Ed Zitron is still hammering on the bubble drum, and his thesis that the demand story is a coordinated lie among hyperscalers is sharper than ever this week.

The railway analogy is worth taking seriously, not because the railway companies all survived but because the infrastructure they built outlasted them. If the AI investment cycle ends in a panic, the chips, the data centers, and the trained engineers do not vanish. They reprice. The labor-pricing dimension is already showing up: Geoffrey Huntley's video puts a number ($10.42 an hour) on what cheap software production is doing to engineering wages. The bubble may pop without the technology going anywhere, and that is the scenario the railway frame actually predicts.

Worth reading:

The Panic of '26: The railway-history analogy that lands the bubble debate in one phrase (article)
Higher usage limits for Claude and a compute deal with SpaceX: Anthropic's official announcement of the SpaceX compute partnership (article)
Anthropic just...wait what: Theo unpacks Anthropic also buying compute from xAI, and what the SpaceX/xAI funding diagram now looks like (video)
The AI Compute Demand Story Is A Lie: Ed Zitron's structural case against the capacity-constraint narrative (article)
Software Development Now Costs Less Than Minimum Wage: A real number ($10.42/hr) on what cheap software is doing to wages (video)

Quick Hits

11 Tips For AI Coding With Ralph Wiggum: Matt Pocock's field guide to autonomous loops and AFK coding (article)
How To De-Slop A Codebase Ruined By AI: Refactoring fundamentals for the codebase your agent already wrecked (video)
The Future of MCP: David Soria Parra at AI Engineer with the 2026 roadmap (video)
Context Is the New Code: Patrick Debois on treating prompts and memory as first-class artifacts, not ad hoc hacks (video)
Stop Sending IDE-Catchable AI Code Errors to Review: JetBrains on moving catchable errors out of the human review loop (article)
We Gave Agents IDE-Native Search Tools. They Got Faster and Cheaper.: Paired-task numbers on what happens when the harness lives in the IDE (article)
Live blog: Code w/ Claude 2026: Simon Willison's live notes on everything Anthropic announced this week (article)
Why AI Agents are either the best or worst thing we've ever built: Hannah Fry's claw agent opened a mug shop, emailed a journalist, and leaked passwords (video)
Single Player to Multiplayer: AI, Context, and Collaboration: John Cutler on AI tools quietly pushing teams back into single-player mode, sharpening the multiplayer-alignment argument GitHub's Maggie Appleton made last week (video)