Agentic Engineering Weekly for June 19–27, 2026

The spotlight slid off the model and onto everything we wrap around it. We stopped prompting and started writing loops, finance finally found where the tokens are going (on Powerpoint, it turns out), and agents are starting to show up in the team chat with their own logins. The engineering that matters now lives in the harness, not the model. How much of what we ship do we still genuinely understand? Let's dive in.

My top 3 picks this week

Verum Factum and the Creator's Confidence: You only truly understand and therefore trust what you build yourself, and that's a big problem when AI does the building (article)
The next paradigm shift (according to Karpathy): the clearest explanation of why Claude Tag is more than a Slack bot, channel context and permissions as the point (video)
What happened after 2,000 people tried to hack my AI assistant: 6,000 attempts, $500 in tokens, and nobody leaked the secret, a rare win for the defense (article)

More on Loops

"I don't prompt Claude anymore," Boris Cherny said in a line that is still being passed around. "I have loops running that prompt Claude and figuring out what to do. My job is to write loops." Work goes into a queue, a cron job picks it up, works on it until completion, and picks up the next item. The thing you author is no longer just the spec.

"Loop" tooling is starting to become more widespread. Most harnesses support CRON-like scheduling. OpenAI's Codex threads can spawn new threads themselves allowing for recursive workflows, Claude code generates code on-the-fly that itself spawns dynamic workflows.

Public loop libraries are starting to show up: reusable, named loops you can drop into your own setup. When a pattern goes from "interesting idea" to "here is a library of them" that fast, it has stopped being speculation. If you are still hand-driving a chat window, the move this quarter is to wrap one repetitive task in a loop and watch where it takes you.

Worth reading:

The Coming Loop: Armin Ronacher on why even loop skeptics end up building them, anchored by the Boris Cherny quote (article)
What an AI Agent Actually Is: Model, Tools, Loop, Harness: the cleanest current definition of an agent, and why its short shelf life is the exciting part (article)
The Clockmaker's Apprentice: Russ Miles' take on loop engineering (article)

Lots of tokens are being spent on tasks an LLM should not be doing

This week we learned that at big consulting firm Accenture a big source of AI token "chewing" turns out to be office workers converting PDFs into presentation slides. Sit with that. The cost of frontier intelligence, spent at scale, on generating slide decks.

Cheap tokens spent on work that should not exist is waste. The lever to pull isn't just the price per token. It is which kinds work you point AI at. Just ask your AI to generate a darn pdf-to-markdown python script already next time you want a summary of a large document.

Sean Goedecke makes the case that serving inference is obviously profitable, and that the "it must all be subsidized" story does not survive contact with the actual unit economics.

Inference is a real business, and most organizations are spending on it with no idea whether the output is worth the cost. Stop metering tokens, start measuring ROI on token spend. The absurd edge case of the week, two competing review agents locking into a $41K disagreement loop over a single dependency, is funny right up until it is your API key burning those tokens.

Worth reading:

The Tokenpocalypse Is Here: the PDF-to-slides detail alone is worth the read, a perfect example of spend with no value attached (article)
Put Tokens in the P&L, Not in a Developer Expense Report: the governance reframe that actually changes behavior, not another dashboard (article)
AI inference is obviously profitable: argued from unit economics rather than vibes (article)

Verum factum: you only really know what you made

An old idea I've been going on and on about resurfaced this week. Verum factum, the maxim that we only really know what we made, is the philosophical twin of a feeling every engineer using AI has had and few have named. You shipped the code but you understand it less than anything you ever hand-built. Jessitron writes about missing that gut-level certainty about her own software now that the model does the coding, and then the strange moments when she still gets it back. The problem defined: the loss of the confidence that comes from having built the thing yourself. It's not that the generated code is bad. You're no longer holding the theory of why it works, and the tests only catch what you already knew to specify upfront. You cannot unit test for taste and understanding.

So the goal is not to type more lines out of pride, it's not about being pro- or anti-AI. It is to keep the creator's confidence even when you did not write the code, which means deliberately rebuilding the theory that generation skipped: reading the diff like you would review a colleague, reconstructing the why, keeping a mental model that survives the next change.

Worth reading:

Verum Factum and the Creator's Confidence: names a feeling most engineers are carrying quietly, and does it without doom (article)
Relocating Rigor: why the new freedom is not free (article)
I Used AI. It Worked. I Hated It.: worth it for the discomfort it articulates (article)
You can't unit test for taste: a crisp argument for why judgement stays human even as output gets automated (article)

Memory was step one, dreaming is the next paradigm

Memory was one of the biggest unlocks of the last year: it gave agents growing context. This week the frontier conversation moved past it. The proposed next step has a deliberately evocative name, dreaming: a second-derivative process that periodically prunes and curates what an agent has learned, the way sleep consolidates a day rather than just recording it. If you've ever messed around with Hermes agent, this might all sound familiar; this harness has been generating its own custom skills and memory banks for a good while now.

It connects to a bet several people are now making out loud. If pretraining is bumping into a data wall, the next leap is not a bigger run but models that keep learning after deployment, closing the gap between a frozen checkpoint and an employee who actually improves on the job. The memory layer is one approach to simulate that. That is a different shape of system, and it changes the questions that matter: not "how big is the context window" but "what context does the model carry forward, and who decides."

Worth reading:

Learning while you sleep: Beyond memory to dreaming: introduces the dreaming concept concretely, the clearest articulation of what comes after memory (video)
The next big breakthrough will be AIs learning on the job: Dwarkesh's case for on-the-job learning as the next leap (podcast)
What does the next training paradigm look like?: the data-wall framing that makes continual learning the obvious next bet (video)

Single-player to multiplayer: this agent is getting their own identity on the team

Anthropic shipped Claude Tag this week and at first glance it looks like a Slack bot. Karpathy disagreed, calling it "a new paradigm for interacting with Claude," and the reason is worth understanding even if you never touch the product. The novelty is the agent joining your team's collaboration surface: picking up a channel's existing context, working alongside people for days, and acting under its own scoped, auditable identity instead of borrowing your credentials. Agentic work, so far a solo activity, is going multiplayer.

The moment an agent operates autonomously across shared spaces, "it runs with your credentials" stops being acceptable. You need it scoped to each channel, governed by admins, and fully auditable, because otherwise every action it takes is laundered through a human's credentials with no way to tell who actually did what. This is the same boundary problem that has dogged service accounts forever, surfacing again now that the service account can reason and act on its own.

Maggie Appleton points out that agentic engineering has been a single-player story: one developer, a dozen agents, moving fast in a personal CLI with no shared context. Scale that across a team and you get duplication, drift, and a lot of wasted tokens, because speed without alignment is just expensive divergence. The moat is made of people, and the habitat they hold with (and for) agents.

Worth reading:

The next paradigm shift (according to Karpathy): the clearest explanation of why Claude Tag is more than a Slack bot, channel context and permissions as the point (video)
Agent identity: A new access model for Claude Tag: the access-model details that matter even if you never use the product (video)
Collaborative AI Engineering: One Dev, Two Dozen Agents, Zero Alignment: Maggie Appleton on why solo agent speed turns into team-level chaos (video)
The moat is made of people: the argument that shared human context, not the stack, is what differentiates (article)

Senior engineers are publishing their harnesses like dotfiles

More principal and staff engineers are showing their complete agentic setup, end to end. An L8 principal walks his whole stack. The value is not any single tool but seeing how a senior wires them into one coherent system.

Google released its own playbook this week: vibe coding does not scale, agentic engineering does. The model matters for 10%. The other 90% is the harness, the context and tooling and feedback loops you build around the model. If that ratio is even roughly right, then the setup itself becomes the valuable asset, the copyable artifact, which is exactly why these workflow showcases are suddenly everywhere. The dotfiles repo of the AI age has arrived.

The thing to take from this is not any individual's vibecoded tool list, which will be stale in a quarter. It is the move itself. Treat your harness as a first-class artifact worth versioning, documenting, and sharing, the same way you would your editor config. Watching how a strong engineer composes their loop teaches more than any framework tutorial, because the composition, the judgement about what to automate and what to keep in your hands, is the actual skill.

Worth reading:

Google Just Dropped a Masterclass on Agentic Engineering: the source of the model-is-10%, harness-is-90% framing, with the 50-page playbook behind it (video)
L8 Principal's Agentic Engineering Workflow: a full senior setup shown end to end, useful for how the pieces fit, not the tool list (video)
An Ex-Meta L8's Agentic Engineering Setup: the written version, a head start that shortcuts your own exploration (article)

Quick Hits

U.S. government will decide who gets to use latest upgrade to ChatGPT: GPT-5.6's release is also being gated behind government vetting, the same export-control playbook that pulled Anthropic's Fable is now impacting OpenAI releases (article)
What happened after 2,000 people tried to hack my AI assistant: 6,000 attempts, $500 in tokens, and nobody leaked the secret, a rare win for the defense (article)
An opinionated (and mainly correct) guide to naming: naming as reconstruction-cost minimization, and it matters more now that agents read the names too (article)
The Fixed Point Theorem - Numberphile: stir your coffee and one particle of the bean juice ends up exactly where it started, the non-tech wonder for the week (video)

Curated from 30+ sources across articles, podcasts, and videos. Week of June 19–27, 2026.