Agentic Engineering Weekly for April 11-18, 2026

Agentic Engineering Weekly for April 11-18, 2026

Generation never was the bottleneck. This week the industry finally started noticing and we're landing on the uncomfortable truth: attention might be all transformer architectures need, but our own attention does not scale. Opus 4.7 shipped with a 35% token tax (but gets more done with fewer tokens), Uber's CTO already blew his whole 2026 AI budget while still hiring engineers, inverting your testing pyramid "because AI" is a short-term hack that will shoot you in the foot sooner rather than later.


My top 4 picks this week

I couldn't make up my mind as there was so much good stuff this week. You get an extra pick for free!


Last week's video


Generation is solved, attention is the bottleneck

AI can produce a research brief in 45 seconds, a design doc in two minutes, a full test suite in ten. None of that matters if you never read any of it. Generation was never the bottleneck, it's attention, and attention does not scale. JA Westenberg's "delete your AI notes" video landed hard because it names a pattern I see most of us experience daily: folders of AI-generated summaries we will never revisit, feature branches of slop code that never get run let alone reviewed, treated like inventory that will someday pay off. Inventory is waste. The lean people figured this out decades ago, our industry is rediscovering it with some AI sprinkle on top.

I'm finding the Japanese word for this is worth borrowing: tsundoku, the habit of acquiring books and letting them pile up unread. Our vaults are becoming tsundoku for AI artifacts, and the Zeigarnik effect makes every unprocessed note a small tax on our remaining - and very limited - attention. The right move is just-in-time. Don't generate a summary until you're about to read it. Don't save the summary unless you've processed it. The value of a note is not in its existence, it's in the synthesis you performed to produce and absorb it. Everything else is a second inbox pretending to be a second brain. If you allow AI into your Obsidian vault, give it its own sandbox, delete ruthlessly and be very critical about what you let into your own and highly curated second brain.

The deeper point is that this is not an AI problem, it's a productivity physics problem that AI has finally made impossible to ignore. Data is cheap, information is less cheap, knowledge is expensive, and judgment is fundamentally human. Any workflow that pretends cheap data automatically becomes knowledge will collapse under the weight of its own output. The fix isn't a better AI stack. The fix is a capture habit accompanied with ruthless downstream discipline about what you read, what you act on, and especially what you delete.

Worth reading:


Opus 4.7 and the 35% tokenizer tax

Anthropic shipped Opus 4.7 this week with a new tokenizer that makes the same prompt cost 30-35% more tokens. The headline numbers don't mention it. Theo called it "kind of a disaster" from a pure cost perspective. Better Stack took the opposite framing: genuinely better model, but you're paying for it whether you wanted to or not. Both are right, which is the point.

This matters because it lands in a market that was already cost-sensitive. The Pragmatic Engineer's 2026 AI survey, published the same week, found mounting AI costs and engineers hitting usage limits as the top two concerns from working developers. A 35% tokenizer shift is not a footnote in that context, it's a cost structure change. The $200 Claude Code subscription and the enterprise seats both reprice overnight. Teams that were on the edge of "we can justify this" are now having the same conversation again with worse numbers.

The practitioner response was refreshingly fast. Caveman prompts (strip filler, hedging, pleasantries), token-killer wrappers like rtk for build and test output, diff-based linting, conscious context management. My own benchmark on a standard backend task landed at 38% fewer tokens using Caveman plus rtk versus vanilla Sonnet 4.6, and that was before the tokenizer change made the discipline even more valuable. What looked last month like clever optimization is becoming table-stakes hygiene. If you are not actively managing your context and output size, Opus 4.7 just made your existing workflow more expensive without even asking nicely first.

Worth reading:


The unused middle: AI adoption is narrower than the timeline suggests

Steve Yegge dropped the line of the week in a single tweet: Google engineering's AI adoption footprint looks roughly like John Deere's. Most engineers are not using this stuff seriously. Read that again. One of the most publicly AI-forward companies on earth has an internal adoption pattern closer to a tractor manufacturer than to the practitioner X discourse. If that is true at Google, the baseline elsewhere is almost certainly lower. The loud 5% is not the median, it's the outlier.

The Uber data point reinforces the shape. Praveen Neppalli Naga, Uber's CTO, maxed his entire 2026 Claude Code budget months into the year. 11% of Uber's live code changes are now written by AI agents. And Uber is still hiring engineers at pace. Pick your framing. The adoption maximalists said AI would collapse team sizes; here's a company adopting hard and hiring hard. Another illustration of Jevon's paradox. The adoption skeptics said AI was theater; here's an 11% footprint in production code. Both narratives break against the same data, and what you get when they break is reality: adoption is real, benefits are real (although a bit over-hyped), organizational demand is real, the story is in the outliers and this time they're not the usual suspects.

The consequence for anyone planning team composition, hiring, or skill development is that the "bimodal productivity distribution" story from recent quarters has a flip side people aren't naming. If the top 5% are doubling throughput, the other 95% are somewhere between net-neutral and totally foot-gunning themselves. Most engineering organizations are still in the untouched camp. That is a product problem, a training problem, a culture problem, and a leadership problem, and none of those are solved by a better model. They are solved by someone in the room choosing to invest in the boring organizational layer that turns capable tools into adopted practice.

Worth reading:


Habitat engineering: naming the environment the agent lives in

A fresh framework dropped this week that rephrases everything I've been writing about in the last year. My critique on last week's "harness engineering" articla was that it was missing the time dimension. Habitat engineering also explicitly calls this out.

Instead of arguing about better agent SDK's or better harnesses, it reframes the whole problem as habitat. The core claim: every AI (and human) coding failure is an environment problem. Your agent is not broken. The codebase has entropy it has to wade through. Your constraints are suggestions rather than automated enforcement. Your context is oral tradition rather than engineered artifact. The agent is doing its best in a hostile environment, and the environment is what you actually own.

The vocabulary is young and deliberately biological. Agents are colleagues, not tools (I personally don't like this take and am starting to call them my "clankers" for this reason alone, AI is a tool with sharp edges and should not be anthropomorphized or treated like a fellow human being). Habitats have entropy, and fighting entropy is a first-class engineering concern. Constraints must bite, meaning they have to be mechanical rather than aspirational, because agents (like juniors) ignore the rules that only live in reviewers' heads. There's a looping habitat layer where the environment gets smarter every session, and a context-engineering layer where you teach agents what your team already knows implicitly. As a pattern language it's denser than most first drafts, and the companion docs site ties it into existing concepts (Dan North's CUPID, literate programming, harness engineering) rather than starting from scratch.

Whether habitat engineering ends up a sibling vocabulary to harness engineering or a replacement for it is unclear. Single-source frameworks often struggle to cross the vocabulary-adoption threshold, and this one is still entirely one author's substack plus a companion docs site. But the underlying argument is strong: the part of the system you control is the environment, not the agent, and the environment has entropy, constraints, context, and memory as its core properties. That's a useful set of handles regardless of which vocabulary wins.

Worth reading:

  • The Habitat Hypothesis: The sharpest single-sentence framing of the week, every coding failure is an environment problem (article)
  • Habitat Engineering: The full framework in one page, if you want the vocabulary without the narrative (article)
  • Constraints That Bite: Why automated enforcement beats good intentions, with agents as the forcing function (article)
  • Agents as Colleagues: Orchestration, review, and trust boundaries, with agents framed as juniors with defined scopes (article)

Don't let AI invert your testing pyramid

Thierry De Pauw named something this week that had been hiding in plain sight: both AI coding agents and vibe coders have a natural bias toward the top of the testing pyramid. E2E tests and integration tests have visible structure, obvious scaffolding, and a satisfying "runs the whole thing" feel that plays well with how agents generate code. Unit tests require deeper understanding of the design of the product and the craft of software engineering, which is exactly the place where agents still struggle most. The result is that teams who let agents drive their test strategy end up with slow, brittle, coverage-heavy suites that catch bugs late and block refactoring.

This lands in the middle of a broader AI-TDD conversation that is just catching fire. Owain Lewis's AI-TDD walkthrough this week made the point from the other direction: without a tight unit-level test loop, you have no way to know when the agent broke something, and no fast feedback to steer the next iteration. E2E tests are not enough, because a two-day AI debugging session on a subtle logic flaw (see Hak's recent video) is exactly the failure mode E2E tests cannot surface. The shape of the loop matters as much as the speed.

The practical pattern is older than AI: keep the pyramid pyramid-shaped, and make the unit layer the first line of feedback. I tend to go for higher-level unit tests that heavily rely on hexagonal design instead of the oft-touted microtests hitting only a single method or class, but YMMV. The new wrinkle is that you now need to push back actively against your tooling, because the defaults lean the wrong way. Agents will happily generate you a thick layer of E2E tests. You have to be the one who says no, start with the unit tests, prove the design, then add coverage where it's cheaper to maintain.

Worth reading:


The harness is no longer a hobby: it's shipping as open source

Mere weeks ago, harness engineering was an early metaphor. This week it's an open-source category with at least four distinct entrants. Archon bills itself explicitly as a harness builder, with a thesis that AI coding should be deterministic and repeatable. Wix published an AI Coding Agent Manifesto, setting out principles for production-grade agentic coding on the grounds that "vibe coding was never going to cut it". And Anthropic's own "Seeing like an agent" post reads as the closest thing yet to a vendor-supplied harness design manual.

What makes this notable is that the vocabulary is still catching up to the craft. People are shipping harness infrastructure faster than anyone is writing it down. The discipline now has a metaphor (Henrique Bastos: a harness channels the horse's strength so the force serves a purpose, repeatedly), concrete building blocks (skills, memory, tools, hooks, deterministic wrappers), and early shared libraries. What it still lacks is an agreed-on pattern language. Fowler's team started one with the Reduce Friction series last week. Boeckeler's 2x2 (feedforward versus feedback, computational versus inferential) is the most complete taxonomy so far. Expect a lot of parallel reinventions before consolidation starts.

The implication: your skills.md files have an expiration date. The harness you wrote three months ago is already partially obsolete because the model changed, the tokenizer changed, or your own understanding of what the model does well shifted. Treat the harness as disposable infrastructure with high replacement velocity, not as a long-lived asset. Today's harness is tomorrow's legacy glue, and that's fine as long as you know which one you're writing.

Worth reading:


Quick Hits

Read more