Agentic Engineering Weekly for June 12–20, 2026

Share
Agentic Engineering Weekly for June 12–20, 2026

Loops grew up into agents that spawn their own dynamic workflows. Code got cheap enough to throw away by the plateful, which makes the theory in your team's head the part actually worth preserving. Security took its first real structural swings instead of hoping the model behaves if you just prompt harder. Your domain knowledge, not your job title, matters.


My top 3 picks this week


Loop engineering: more than Ralph

We touched on loop engineering last week, but did the concept a disservice by equating it to the Ralph loop: while(!done()){ do cat PROMPT.md | claude }. One agent, one prompt, re-fed until it stumbles its way into done. What's emerging now is categorically different. The agent stops being the worker inside the loop and becomes the orchestrator that writes the loop at runtime, spawning open-ended trees of subagents that branch, monitor each other, and report back. A single while-not-done condition can't express that shape. We're watching agents program their own control flow.

Pete Steinberger's published orchestration skills are the clearest concrete example I've seen, highly worth taking a look at!

Worth reading:


More takes on the economics of code

Code stopped being an asset you preserve carefully and is starting to resemble paper plates. When generating a line of code costs effectively nothing, the careful curation we built our craft around looks like sentimentality. Charity Majors put the economics plainly: lines of code went from being treasured, reused, and carefully maintained to being disposable and regenerable, practically overnight. One essay clocked the shelf life of 2026 software at around 3.8 months. If that's true, optimizing for code you keep is optimizing for the wrong thing.

So where does the value go? When code can be regenerated faster than it can be understood, preserving it under a glass bell jar for all to admire no longer makes sense. Maintaining the system's behavior, boundaries, and intent absolutely does. I'm increasingly convinced this is the way forward: the theory, the shared mental model that lives in your team's head, not the residue in the repo, is the system worth investing in. We try to capture as much of it as we can in specs, adr's, what have you but without the team nothing remains to sift through all those tokens in search of value. Generative AI lowers the cost of modification so much that it creates a dangerous illusion: everything can change quickly, therefore everything should. Not every half-assed idea deserves to make it into your product.

Worth reading:


AI demands more engineering discipline, not less

If you lived through the shift from handcrafted server pets to immutable infrastructure, this statement should feel familiar. The teams getting real leverage are the ones who tightened their feedback loops fast enough to keep up with their agents. Charity Majors finds the upside in this: testing, review, observability, and operability used to be a hard sell, and suddenly everyone needs them as a prerequisite to unlock AI coding productivity. This is our once-in-a-lifetime chance to bring engineering values to the mainstream, because the people vibe-coding their way into production are about to discover exactly why those values exist.

Worth reading:


Another week, another ream of lethal trifecta examples

No malware, no zero-day exploits, just an agent doing exactly what it was told by the wrong person. Attackers took twenty thousand Instagram accounts by asking Meta's AI politely. Researchers turned Microsoft 365 Copilot into a one-click exfiltration tool with a single crafted search. The data access was always there, agents just make it reachable through a channel that's nearly impossible to secure.

This is the confused deputy problem wearing a new coat. An agent with your credentials, exposed to untrusted content, and able to act becomes a deputy that can't tell your instructions from an attacker's. Useful terms to name the failure modes explicitly when you audit agent setups: prompt-injection susceptibility, data-exfiltration risk, approval-bypass attempts, authority confusion, and runtime trust-boundary violations. Each one is a place where external content can redirect behavior way beyond its intended scope.

The (skill/software) supply chain is still one of the most obvious vectors. One study found that 37 percent of nearly four thousand agent skills could exfiltrate AWS credentials, abused exactly the way early npm was. That SKILL.md you installed to supercharge your agent has the same trust profile as an unaudited dependency, except it runs with your agent's full reach. Treat skills and tool definitions as untrusted code, because that's exactly what they are.

The structural fix is starting to take shape, and it isn't "prompt the agent to be more careful." Microsoft's push toward information flow control makes the problem explicit: anything an agent can do in response to your prompt, an attacker can trigger through a prompt injection, so enforcement has to be independent of the model's judgment. The mechanism is old and boring in the best way: label every piece of data as trusted or untrusted, propagate those labels through the agent's work, and block any tool call where untrusted data would drive a consequential action or confidential data would egress somewhere incompatible. Labels an attacker can't forge give you deterministic guarantees instead of probabilistic hope, and they shrink human-in-the-loop approval down to the genuinely ambiguous cases.

Lethal-trifecta-prevention-as-a-service, I very much like the direction of this security research.

Worth reading:


Your folder structure is the agent's architecture

The most effective agent setups are getting less clever, not more. Instead of bolting on a multi-agent framework to manage context, memory, and step coordination, people are letting the filesystem do the orchestrating. The Model Workspace Protocol paper makes the case directly: numbered folders are stages, plain markdown files carry the prompts and context, and local scripts handle the mechanical work that never needed a model. One agent reading the right files at the right moment does what a framework was supposed to.

I'm finding the same thing in my own harness work. The engineering that matters lives in the repo, not the model: ground rules, reference docs, and lazily loaded skills, with many small documents beating one monolith. The structure is the architecture. When Nick Nisi deleted 95 percent of his agent skills and got better results, he wasn't removing capability, he was removing the noise that was crowding the context window and confusing the agent.

This also demotes the importance of MCP. As Sean Lynch argued, the real value MCP offers over skills and CLIs is isolating the auth flow outside the agent's context window. Strip away the rest and MCP as a pure auth gateway is still a win.

The pattern underneath all of it: give the agent a clean, legible habitat to work in, and most of the orchestration problem dissolves.

Worth reading:


Domain knowledge, not seniority, sets your AI collaboration mode

What determines how you work with AI should not be your job title or years of experience. Instead, your knowledge in the specific domain in front of you should drive the collaboration mode. A staff engineer touching Kubernetes for the first time should prompt like a beginner, ask questions aimed at learning. A bootcamp grad on their fifth familiar React component can delegate specced-out implementation work with confidence. Used deliberately, this skills is a massive unlock for everyone: in Learning mode the agent is the most patient teacher you've ever had, which is how a tool that could deskill you instead speeds up how fast you pick up a new domain.

I've been hammering at a small framework for this that I call LEAP: Learning, Exploring, Applying, Producing. In Learning, AI is a teacher and you're building understanding, not shipping code. In Exploring, it's a thinking partner for weighing options. In Applying, it's an implementation assistant you hand clear specs. In Producing, it's a coordination assistant across systems you already understand. The trick is that you reset your mode every time you enter a new domain, regardless of how senior you are elsewhere. Your X years of experience are not evenly distributed over your skillset.

Anthropic just put hard numbers on this. In their own Claude Code data, novice sessions reach verified success about 15 percent of the time, while intermediate and expert sessions hit 28 to 33 percent, and most of the jump happens moving from novice to intermediate. Experts don't just succeed more, they extract more: their prompts trigger action chains twice as long carrying five times the output, 12 actions and 3,200 words against a novice's 5 actions and 600. The detail that should end the "coding is dead" debate: every one of the ten largest occupations in the dataset lands within seven points of software engineers, a deep understanding of the domain matters more than having a programming background.

This is also what all the "is the CS degree dead" panic misses. The fundamentals that let you evaluate AI output, spot the scalability anti-pattern or the subtle security hole, are exactly what lets you delegate safely and at scale. You can only review what you understand. The strategic move isn't more AI fluency in the abstract; it's knowing, honestly, which mode you're in for the thing you're working on right now.

Worth reading:


Frontier intelligence as a closely guarded resource

Frontier labs spent years describing its models as dangerous and in need of strict control. Last week the US government took that framing literally and issued an export-control directive forcing Anthropic to disable Fable 5 and Mythos 5 for all customers, including its own foreign-national staff. Armin Ronacher caught the schadenfreude precisely: market your technology as a weapon long enough and someone in power will eventually treat it like one.

Steve Yegge's read is that we've crossed into treacherous waters, where model intelligence itself has become the thing governments reach to control. Next year's frontier models won't be accessible to you or me.

For practitioners, the lesson isn't about geopolitics. It's about dependency. A frontier capability that a directive can revoke after a single jailbreak, four days after going generally available, is not something to wire into your critical path. Cite it as one more reason to own your harness and keep your options open.

Worth reading:


Quick Hits


Curated from 380+ items across articles, podcasts, and videos. Week of June 12–20, 2026.