Agentic Engineering Weekly for April 4-11, 2026

Agentic Engineering Weekly for April 4-11, 2026

Two independent datasets landed this week that make it hard to keep presenting AI productivity as a global rising tide, and in the same seven days DHH, Charity Majors, Fowler and Beck all arrived at the same uncomfortable conclusion: the window for rational skepticism has closed and its time for every engineer to get their hands dirty. And finally: Mythos or Meh-thos?


My top 3 picks this week


Last week's video


The 5% club: the story of AI productivity lies in the outliers

CircleCI's 2026 State of Software Delivery report and METR's updated productivity experiment dropped within days of each other, and they tell the same story from opposite directions. CircleCI's 28-million-workflow dataset shows some interesting numbers: lots of code is generated but ends up dead on feature branches. This seems like a reasonable take, prototyping is virtually free today. The most interesting signal for me is the throughput on the main branch, the changes that actually land in the final product. There the story gets very interesting: the "median" teams are actually impacted negatively (-6.8% on main branch). The top 10% are flat-lining and see no measurable impact. The only winners? The top 5% of teams, which are racking up a "modest" 30%. Would you count your team in that top 5%? I suspect that, as with our ability to predict our skill in driving a car, most of us are overconfident about where we land in this distribution and our "AI augmented" benefits are falling in those much less impressive buckets.

METR's angle is also interesting: their randomized trial from 2025 famously found experienced open-source developers were 20% slower with AI. The recent replication softened that quite significantly to roughly 18% speed-up for returning participants and no measurable effect for new ones, but the real story is now selection bias: developers are refusing to participate without AI, and 30-50% skip tasks they feel require it. Measuring individual productivity has become harder than anyone expected.

Reading these two sources together, the conclusion is unambiguous. AI doesn't have a generation problem. It has a validation problem, and the teams winning are the ones who invested in continuous integration, fast feedback loops, and specification hygiene before they ever even installed an agent. Time to start catching up, folks. I for one am not waiting on Mythos-class models to clean up today's generated slop.

Worth reading:


Run toward the waves: the senior-amplification narrative hardens

Charity Majors spent most of 2025 telling engineers they needed to learn to code to stay relevant. In about three months, that advice became obsolete. Her conversation with Kent Beck on the Still Burning podcast this week is the clearest articulation I've heard of why skepticism stopped being the rational default. Her core claim: sometime in late 2025, code generation crossed a practical threshold - The so-called "Opus moment" - and the engineers doing best right now are outcome-oriented. The engineers struggling most are the ones whose identity is tied to beautiful hand-written code as the main proxy for quality. That mattered when humans had to read everything. It matters less when the real job is understanding behavior, validating outcomes, and keeping situational awareness.

DHH's appearance on The Pragmatic Engineer lands on exactly the same point. His position change on AI tools was also empirical: the early autocomplete UX was intrusive and low-signal, but agent harnesses plus frontier models crossed a threshold from "annoying assistant" to "useful collaborator". His strongest practical claim is worth sitting with: AI expands ambition more than it reduces effort. Yegge's and Kim's FAAFO popping up again. The biggest impact isn't just speeding up planned work, it's making previously unjustifiable work suddenly viable. Teams can now chase performance edge cases, tooling glue, and speculative improvements that would never have cleared the cost threshold. Peak programmer may have passed, he argues, but in quantity, not in quality. Product-minded outcome-driven engineers will flourish in this post-Opus world.

Fowler and Beck delivered the historical version at the Pragmatic Summit: every generation has cycles of disruption, the specific discomfort seniors feel right now has happened before, and the answer has always been active experimentation rather than waiting for clarity.

Three speakers, three wildly different personalities, one message: stop spectating. The uncomfortable corollary is the squeezed middle. Juniors get the onboarding acceleration, seniors get the leverage from accumulated taste and judgment, and mid-career engineers get neither the maximum learning benefit nor the maximum leverage. If you're in that middle tier, the worst response is to wait and see. The second-worst is to double down on craft signals that used to differentiate you. The best is to get your hands dirty this week.

Worth reading:


Clean code strikes back: craft is now an agent prerequisite

Five separate posts this week converge on a counterintuitive claim: the rise of coding agents makes XP and software craft more valuable, not less. The argument is mechanical rather than sentimental. Agents are limited by context, and poorly organized code means they consume more tokens and perform worse. Every cost you pushed onto future maintainers because "the compiler doesn't care" is now a cost you push onto every agent session you run. Yanist's piece on clean code in the age of coding agents makes the token-economics version of the argument. Jessitron's post on correctness conditions makes the distributed-consistency version: a pull request that adds a script without updating the README is incomplete because the docs are out of sync, and agents have no intuition for that kind of implicit contract unless you scaffold it for them.

Bram Cohen's "The Cult of Vibe Coding Is Insane" goes directly for the jugular of the cultural problem. Bad software is a choice you make, and with generation cheap, it's now a choice you make at massive volume. HollandTech's "Claude Is Not Your Architect" makes the complementary argument about decision-making: AI agents are brilliant implementers and confidently wrong about every decision that matters. Somewhere between "ask Claude for a quick opinion" and "Claude is writing our Jira tickets," we lost the plot. Zernie frames the same ground as an engineering problem. Modern AI coding is limited by feedback latency, not model intelligence. When agents run 24/7, your job shifts from writing code to building tight, deterministic feedback systems.

The through-line connects back to the validation problem from the first theme. If you're in the top 5% of teams, it's because you were already investing in the craft disciplines that agents amplify: clean modules, fast tests, correctness conditions explicit enough to catch silent drift, and a feedback loop that's measured in seconds rather than days. The craft-versus-AI dichotomy was always a false one. The real split is between teams who built the scaffolding first and teams who are now watching their main branch success rate collapse in public. There's not a lot of nines remaining for Github's availability metrics. Could be due to the massive increase in generated code, but it could also mean they could use some TLC in the XP department.

Worth reading:


Claude Mythos divides the room

Anthropic dropped a 240-page marketing campaign for their new Claude Mythos tier model this week and something unusual happened: within 48 hours, credible voices in the practitioner community landed on completely opposite conclusions about what they were looking at. Bruce Schneier, on The Tech Report, called it mostly marketing hype: "They're convincing a lot of people that Mythos is this amazing step change in capability when the evidence right now is that it might not be." Theo's video is titled "Claude Mythos and the end of software" and opens with "I....yeah. It's all over now." Matthew Berman called it terrifying. Mo Bitar called it delusional. Same model, same week, same community, four genuinely incompatible reactions from people who usually track together.

The polarization itself is the most interesting signal. Every previous Anthropic model launch produced a roughly consensual "this is a noticeable jump" or "this is incremental" reaction within a day or two. This one didn't, and I don't think that's because Mythos is uniquely confusing. I think the community's reaction function has broken down because we no longer share a mental model for what capability means. A security researcher evaluating trust and verification sees one thing. A frontend developer watching agents one-shot an app sees another. A pricing-conscious skeptic notices that Mythos clocks in at roughly 5x Opus and 15x Sonnet per request and asks where the ROI actually lands. All four can be right at the same time. We are all looking at this elephant from our own perspective and boy does the head look nothing like the tail.

The one angle nobody is covering hard enough is the uneven distribution of these new capabilities. Caleb Writes Code flagged it: Mythos access is tiered and effectively privatized to certain companies at launch. If the gap between the best available model and the one regular Joe can afford widens sharply, the bimodal productivity distribution from the first theme gets worse. The 5% club just got a new reason to pull ahead. That's a more interesting story than "the new model is either the end of software or nothing at all". I'm rooting for open models right now. Gemma 4 et al. are my ray of hope in this dystopia.

Worth reading:


Garg's Reduce Friction series names the patterns of agent-friendly codebases

Rahul Garg dropped four structured pattern articles on Fowler's bliki in a single week, and the significance is less in any individual piece and more in the fact that they published them as a series. Context Anchoring, Knowledge Priming, Feedback Flywheel, and Encoding Team Standards are names for things a lot of us were doing ad-hoc six months ago. Naming them matters. Once a team can say "we have a context anchoring problem" or "our feedback flywheel is broken," the conversation becomes discussable and fixable. This is the same progression we saw with refactoring patterns in the late nineties: first you flail around, then flailing becomes routine, next someone names the moves, finally the practice becomes teachable and widespread good practice.

Henrique Bastos's harness engineering write-up on LinkedIn gives the discipline its best metaphor so far. A harness is what channels a horse's strength so the force serves a purpose, repeatedly. You don't create the horse's strength. That single sentence reframes the whole conversation away from "which model is best" and toward "what does your harness do with the model you have". OpenAI's take on harness engineering with Codex arrived the same week from the other side of the fence, and the patterns converge because the underlying problem is the same: models are getting better faster than the surrounding systems.

Worth reading:


Quick Hits


Curated from articles, podcasts, and videos across my feed. Week of April 4-11, 2026.