Agentic Engineering Weekly for April 4-11, 2026
Two independent datasets landed this week that make it hard to keep presenting AI productivity as a global rising tide, and in the same seven days DHH, Charity Majors, Fowler and Beck all arrived at the same uncomfortable conclusion: the window for rational skepticism has closed and its time for every engineer to get their hands dirty. And finally: Mythos or Meh-thos?
My top 3 picks this week
- Run Out to Meet It: Charity Majors and Kent Beck on what changed and who is adapting (podcast)
- The 2026 State of Software Delivery - CircleCI: The 5%/50%/95% split is the most-cited chart in my writing this week for good reason (article)
- Gemma 4: Byte for byte, the most capable open models: While Anthropic's Mythos publicity stunt is gobbling up all attention, open models are not giving up the fight. And this model series is worth taking a closer look at (article)
Last week's video
The 5% club: the story of AI productivity lies in the outliers
CircleCI's 2026 State of Software Delivery report and METR's updated productivity experiment dropped within days of each other, and they tell the same story from opposite directions. CircleCI's 28-million-workflow dataset shows some interesting numbers: lots of code is generated but ends up dead on feature branches. This seems like a reasonable take, prototyping is virtually free today. The most interesting signal for me is the throughput on the main branch, the changes that actually land in the final product. There the story gets very interesting: the "median" teams are actually impacted negatively (-6.8% on main branch). The top 10% are flat-lining and see no measurable impact. The only winners? The top 5% of teams, which are racking up a "modest" 30%. Would you count your team in that top 5%? I suspect that, as with our ability to predict our skill in driving a car, most of us are overconfident about where we land in this distribution and our "AI augmented" benefits are falling in those much less impressive buckets.
METR's angle is also interesting: their randomized trial from 2025 famously found experienced open-source developers were 20% slower with AI. The recent replication softened that quite significantly to roughly 18% speed-up for returning participants and no measurable effect for new ones, but the real story is now selection bias: developers are refusing to participate without AI, and 30-50% skip tasks they feel require it. Measuring individual productivity has become harder than anyone expected.
Reading these two sources together, the conclusion is unambiguous. AI doesn't have a generation problem. It has a validation problem, and the teams winning are the ones who invested in continuous integration, fast feedback loops, and specification hygiene before they ever even installed an agent. Time to start catching up, folks. I for one am not waiting on Mythos-class models to clean up today's generated slop.
Worth reading:
- The 2026 State of Software Delivery - CircleCI: The 5%/50%/95% split is the most-cited chart in my feed this week for good reason (article)
- We are Changing our Developer Productivity Experiment Design - METR: Rare intellectual honesty on why individual productivity measurement is now structurally broken (article)
- What 28 million workflows reveal about AI coding's biggest risk: The main-branch-declining detail that most takes miss (article)
- Addy Osmani on Why 2026 Seniors are just highly-paid Code Editors: The senior-engineer-as-orchestrator reframing grounded in real workflow (video)
- How METR measures Long Tasks and Experienced Open Source Dev Productivity: Joel Becker on why the counterfactual keeps moving (video)
Run toward the waves: the senior-amplification narrative hardens
Charity Majors spent most of 2025 telling engineers they needed to learn to code to stay relevant. In about three months, that advice became obsolete. Her conversation with Kent Beck on the Still Burning podcast this week is the clearest articulation I've heard of why skepticism stopped being the rational default. Her core claim: sometime in late 2025, code generation crossed a practical threshold - The so-called "Opus moment" - and the engineers doing best right now are outcome-oriented. The engineers struggling most are the ones whose identity is tied to beautiful hand-written code as the main proxy for quality. That mattered when humans had to read everything. It matters less when the real job is understanding behavior, validating outcomes, and keeping situational awareness.
DHH's appearance on The Pragmatic Engineer lands on exactly the same point. His position change on AI tools was also empirical: the early autocomplete UX was intrusive and low-signal, but agent harnesses plus frontier models crossed a threshold from "annoying assistant" to "useful collaborator". His strongest practical claim is worth sitting with: AI expands ambition more than it reduces effort. Yegge's and Kim's FAAFO popping up again. The biggest impact isn't just speeding up planned work, it's making previously unjustifiable work suddenly viable. Teams can now chase performance edge cases, tooling glue, and speculative improvements that would never have cleared the cost threshold. Peak programmer may have passed, he argues, but in quantity, not in quality. Product-minded outcome-driven engineers will flourish in this post-Opus world.
Fowler and Beck delivered the historical version at the Pragmatic Summit: every generation has cycles of disruption, the specific discomfort seniors feel right now has happened before, and the answer has always been active experimentation rather than waiting for clarity.
Three speakers, three wildly different personalities, one message: stop spectating. The uncomfortable corollary is the squeezed middle. Juniors get the onboarding acceleration, seniors get the leverage from accumulated taste and judgment, and mid-career engineers get neither the maximum learning benefit nor the maximum leverage. If you're in that middle tier, the worst response is to wait and see. The second-worst is to double down on craft signals that used to differentiate you. The best is to get your hands dirty this week.
Worth reading:
- Run Out to Meet It: Charity Majors and Kent Beck on what changed and who is adapting (podcast)
- DHH's new way of writing code: The agent-first workflow from someone who was publicly skeptical six months ago (podcast)
- Martin Fowler & Kent Beck: Frameworks for reinventing software, again and again: Two pioneers with enough pattern-matching to put the current moment in historical context (video)
- Cycles of disruption in the tech industry: with Kent Beck & Martin Fowler: Companion write-up with TDD-is-back-in-style as a side thread (article)
Clean code strikes back: craft is now an agent prerequisite
Five separate posts this week converge on a counterintuitive claim: the rise of coding agents makes XP and software craft more valuable, not less. The argument is mechanical rather than sentimental. Agents are limited by context, and poorly organized code means they consume more tokens and perform worse. Every cost you pushed onto future maintainers because "the compiler doesn't care" is now a cost you push onto every agent session you run. Yanist's piece on clean code in the age of coding agents makes the token-economics version of the argument. Jessitron's post on correctness conditions makes the distributed-consistency version: a pull request that adds a script without updating the README is incomplete because the docs are out of sync, and agents have no intuition for that kind of implicit contract unless you scaffold it for them.
Bram Cohen's "The Cult of Vibe Coding Is Insane" goes directly for the jugular of the cultural problem. Bad software is a choice you make, and with generation cheap, it's now a choice you make at massive volume. HollandTech's "Claude Is Not Your Architect" makes the complementary argument about decision-making: AI agents are brilliant implementers and confidently wrong about every decision that matters. Somewhere between "ask Claude for a quick opinion" and "Claude is writing our Jira tickets," we lost the plot. Zernie frames the same ground as an engineering problem. Modern AI coding is limited by feedback latency, not model intelligence. When agents run 24/7, your job shifts from writing code to building tight, deterministic feedback systems.
The through-line connects back to the validation problem from the first theme. If you're in the top 5% of teams, it's because you were already investing in the craft disciplines that agents amplify: clean modules, fast tests, correctness conditions explicit enough to catch silent drift, and a feedback loop that's measured in seconds rather than days. The craft-versus-AI dichotomy was always a false one. The real split is between teams who built the scaffolding first and teams who are now watching their main branch success rate collapse in public. There's not a lot of nines remaining for Github's availability metrics. Could be due to the massive increase in generated code, but it could also mean they could use some TLC in the XP department.
Worth reading:
- Clean code in the age of coding agents: The token-economics argument for craft that your CFO will actually understand (article)
- The Cult Of Vibe Coding Is Insane: Bram Cohen on why "bad software is a choice you make" is the whole argument (article)
- Adding Correctness Conditions to Code Changes: Concrete small example of the invisible contracts agents can't infer (article)
- The Feedback Loop Is All You Need: Reframes "AI productivity" as a feedback-latency problem, which it mostly is (article)
- Claude Is Not Your Architect. Stop Letting It Pretend.: The cleanest statement of what to delegate and what to keep (article)
Claude Mythos divides the room
Anthropic dropped a 240-page marketing campaign for their new Claude Mythos tier model this week and something unusual happened: within 48 hours, credible voices in the practitioner community landed on completely opposite conclusions about what they were looking at. Bruce Schneier, on The Tech Report, called it mostly marketing hype: "They're convincing a lot of people that Mythos is this amazing step change in capability when the evidence right now is that it might not be." Theo's video is titled "Claude Mythos and the end of software" and opens with "I....yeah. It's all over now." Matthew Berman called it terrifying. Mo Bitar called it delusional. Same model, same week, same community, four genuinely incompatible reactions from people who usually track together.
The polarization itself is the most interesting signal. Every previous Anthropic model launch produced a roughly consensual "this is a noticeable jump" or "this is incremental" reaction within a day or two. This one didn't, and I don't think that's because Mythos is uniquely confusing. I think the community's reaction function has broken down because we no longer share a mental model for what capability means. A security researcher evaluating trust and verification sees one thing. A frontend developer watching agents one-shot an app sees another. A pricing-conscious skeptic notices that Mythos clocks in at roughly 5x Opus and 15x Sonnet per request and asks where the ROI actually lands. All four can be right at the same time. We are all looking at this elephant from our own perspective and boy does the head look nothing like the tail.
The one angle nobody is covering hard enough is the uneven distribution of these new capabilities. Caleb Writes Code flagged it: Mythos access is tiered and effectively privatized to certain companies at launch. If the gap between the best available model and the one regular Joe can afford widens sharply, the bimodal productivity distribution from the first theme gets worse. The 5% club just got a new reason to pull ahead. That's a more interesting story than "the new model is either the end of software or nothing at all". I'm rooting for open models right now. Gemma 4 et al. are my ray of hope in this dystopia.
Worth reading:
- Claude Mythos Preview: Anthropic's official preview, if you want to start from source (article)
- Claude Mythos is mostly 'marketing hype' | Bruce Schneier: The skeptical take from someone with zero incentive to soften it (video)
- Claude Mythos and the end of software: The maximalist framing, worth watching alongside the skeptic for contrast (video)
- Claude Mythos explained..: The only reaction this week that takes the token-disparity angle seriously (video)
- Is Mythos too Dangerous?: Prime's stream with the leaked benchmarks and community reactions collated (video)
- Gemma 4: Byte for byte, the most capable open models: Open models are not giving up the fight, and this one is worth taking a look at (article)
Garg's Reduce Friction series names the patterns of agent-friendly codebases
Rahul Garg dropped four structured pattern articles on Fowler's bliki in a single week, and the significance is less in any individual piece and more in the fact that they published them as a series. Context Anchoring, Knowledge Priming, Feedback Flywheel, and Encoding Team Standards are names for things a lot of us were doing ad-hoc six months ago. Naming them matters. Once a team can say "we have a context anchoring problem" or "our feedback flywheel is broken," the conversation becomes discussable and fixable. This is the same progression we saw with refactoring patterns in the late nineties: first you flail around, then flailing becomes routine, next someone names the moves, finally the practice becomes teachable and widespread good practice.
Henrique Bastos's harness engineering write-up on LinkedIn gives the discipline its best metaphor so far. A harness is what channels a horse's strength so the force serves a purpose, repeatedly. You don't create the horse's strength. That single sentence reframes the whole conversation away from "which model is best" and toward "what does your harness do with the model you have". OpenAI's take on harness engineering with Codex arrived the same week from the other side of the fence, and the patterns converge because the underlying problem is the same: models are getting better faster than the surrounding systems.
Worth reading:
- Context Anchoring: The first of four new patterns, and the one most worth stealing immediately (article)
- Encoding Team Standards: Tribal knowledge is the first thing agents break; this is how to externalize it (article)
- Making sense of Harness Engineering: The horse-and-harness metaphor is worth the whole post (article)
Quick Hits
- Your parallel Agent limit: Addy Osmani on why 3-5 is the cognitive ceiling for parallel agents (article)
- AI Developer Maturity Framework: 5 Levels to Orchestrator: Another five-level ladder, worth comparing to Shapiro's and Yegge's (article)
- I Still Prefer MCP Over Skills: A careful architectural take on MCP versus the Skills pattern (article)
- Test-Driven Agentic Behaviours (TDAB): Developing SKILL.md files test-first with Claude Code, a concrete loop for harness engineers (article)
- How it feels to be a software engineer when AI is changing our relationship with code: Thoughtworks on the emotional reality of the shift (video)
- Spec-Driven Development: Agentic Coding at FAANG Scale - Al Harris, Amazon Kiro: How Amazon's Kiro team uses spec-driven development for reproducible agentic delivery (video)
- How to survive and thrive as a dev (team) in the exponential age of AI: Sander Hoogendoorn at Devoxx on team-level adaptation (video)
- Claude Code Leak: What we actually learned: Retrospective on the leaked Claude Code internals and what they reveal about harness design (video)
- "Historically is like six months ago": Jessitron and Avdi on code review in codebases with a lot of history (podcast)
Curated from articles, podcasts, and videos across my feed. Week of April 4-11, 2026.