Loop Engineering 101

Stop hand-prompting your coding agent. Start designing the closed feedback loop that prompts it for you — and knows when it's done. This is a primer on loop engineering: what it is, why the leverage point moved up a level, and how to build a loop you can trust to run unattended. Examples lean toward graphics and GPU work, where the success signals happen to be unusually clean.

A closed agentic loop drawn as a control diagram A success condition feeds a comparison; an agent writes code into a codebase; a verifier checks the output; on failure the signal returns to the agent, on success the loop exits. Goal success condition Σ + Agent writes code Codebase your repo Verifier checks output pass → exit fail → revise setpoint compare controller plant sensor
The loop, as a control system. Red is the forward path; magenta is the feedback that closes it. When the error goes to zero — the success condition holds — the loop exits and hands off to you.

01The shift: from prompting to loops

For about two years, getting work out of a coding agent meant a manual cycle: write a careful prompt, read what comes back, write the next one. The agent was a tool you held the whole time — and you were the feedback path.

That posture is what people now call an open loop. There is no automatic check on the output; a human inspects each turn and decides what happens next. It works, but it doesn't scale past the one conversation you're actively babysitting.

The new idea is to remove yourself from the per-turn path. You build a small system that finds the work, hands it to the agent, checks the result against a goal, records what's done, and decides the next move — then lets that system poke the agent instead of you. Two people at the center of agentic coding arrived at the same conclusion independently in June 2026:

You should be "designing loops that prompt your agents."

Peter Steinberger — creator of OpenClaw

"I don't prompt Claude anymore." His job, he says, is to write the loops.

Boris Cherny — Claude Code, Anthropic
Definition

Loop engineering is designing the autonomous system that decides what to prompt an agent, when to prompt it, and whether the result is acceptable — then letting that system drive the agent, instead of you typing each turn.

The leverage point moved up a level. This is harder than prompt engineering, not easier: the difficulty shifts from wording a single instruction to designing a control system you can trust to run unattended.

02Anatomy of a closed loop

The name is borrowed straight from control theory, which is why the diagram above reads the way it does. Every part maps onto a familiar block.

The success condition is your setpoint. The maker agent is the controller. The codebase is the plant. The verifier is the sensor and comparator. The agent acts, the verifier measures the result against the setpoint, and the resulting error — "not done yet" — drives another iteration. When the error reaches zero, the loop has converged: the condition holds, and it exits.

That return path is the entire concept. It's the one thing separating a loop that self-corrects from a one-shot run or a blind re-run. Two properties make the loop actually close, and both are easy to get wrong:

1 · A machine-checkable success condition

"Make it good" gives the loop no error signal, so it can never converge — the same reason you can't close a feedback loop around an output you can't measure. The condition has to be a hard predicate the system can evaluate by itself.

setpointgoal: all tests in test/auth pass && lint is clean && build succeeds

2 · The maker and the checker are different

The model that wrote the code is far too generous grading its own work. A second agent — different instructions, often a stronger model on higher effort — catches what the first one talked itself into. This maker ≠ checker split is the only reason the loop's "done" means anything, and therefore the only reason you can walk away from it.

03The building blocks

A loop needs five primitives plus one place to remember things. A year ago you wrote and maintained a pile of bash for this. Now the pieces ship inside the products — and the shape is nearly identical across them.

01 · heartbeat

Automations

Scheduled runs that do discovery and triage on their own. This is what makes it a loop and not a single run you did once.

02 · isolation

Worktrees

Separate checkouts on their own branches, so two agents working in parallel can't overwrite each other's files.

03 · knowledge

Skills

Project conventions written down once (in a SKILL.md) so the agent reads them every run instead of re-guessing them.

04 · reach

Plugins & connectors

MCP wiring into your real tools — issue tracker, database, staging API, chat — so the loop can act, not just suggest.

05 · review

Sub-agents

One agent has the idea, a different one checks it. This is where the maker/checker split physically lives.

06 · memory

State, on disk

A markdown file or board that holds what's done and what's next. The model forgets between runs; the repo doesn't.

The capability is the same across tools even when the names differ — once you notice that, you stop arguing about which tool and just design a loop that survives whichever one you're sitting in.

PrimitiveJob in the loopClaude CodeCodex
AutomationsDiscovery + triage on a schedule/loop, cron tasks, hooks, GitHub ActionsAutomations tab + Triage inbox
WorktreesIsolate parallel featuresgit worktree, --worktree, isolation: worktreeBuilt-in worktree per thread
SkillsCodify project knowledgeAgent Skills (SKILL.md)Agent Skills (SKILL.md)
ConnectorsTouch your real toolsMCP servers + pluginsConnectors (MCP) + plugins
Sub-agentsIdeate, then verify.claude/agents/, agent teams.codex/agents/ (TOML)
StateTrack what's doneMarkdown (AGENTS.md) or Linear via MCPMarkdown or Linear via connector

04How to build one

You don't hand-roll a while loop anymore. The closing mechanism is a shipped primitive — your job is to give it a real target and a trustworthy judge.

  1. Write a checkable goal

    State the exit predicate precisely and make every clause something the system can evaluate without you. This is your setpoint.

  2. Use /goal to run until it's true

    Both Claude Code and Codex have a /goal primitive that keeps working across turns until a condition you wrote actually holds — and after every turn, a separate small model decides whether it's done, so the writer isn't the grader. (/loop just re-runs on a cadence: that's the open-loop heartbeat. /goal is what closes it.)

    claude code · codex/goal "all tests in test/auth pass and lint is clean"
  3. Split the maker from the checker

    Define your agents as files — .claude/agents/ in Claude Code, .codex/agents/*.toml in Codex. Make the verifier a stronger model on higher reasoning effort; let the explorer be something fast. Spend the extra tokens where a second opinion is worth paying for.

  4. Give it memory and a schedule

    A state file plus an automation turns one run into a standing loop: the schedule re-fires it, and the state file lets tomorrow's run pick up where today's stopped.

05The engine: the /goal command

Everything above leans on one Claude Code primitive. /goal is what turns a normal session into a run-until-done loop — the built-in version of the keep-going loops people used to hand-roll in bash. In the control diagram at the top, it's the comparator.

Normally Claude takes a turn and hands control back to you. You give /goal a finish line instead — /goal all tests in test/auth pass and the lint step is clean — and it starts a turn immediately with that condition as the directive. After every turn a separate, smaller model (Haiku by default) reads the conversation and answers one question: is the condition met? A "no" comes back with a short reason, and that reason becomes the instruction for the next turn. A "yes" clears the goal and hands control back. Under the hood it's a thin wrapper over a session-scoped Stop hook — which is also why it runs only in a trusted workspace with hooks enabled.

The evaluator reads — it does not run

The checker is a separate model, but not an independent check: it only reads the transcript, runs nothing itself, and sees only what the worker wrote down. If Claude writes "fixed" without actually running the check and printing the output, the evaluator returns "not yet verified" and the turn continues. The condition has to be something the worker's own printed output can prove.

Write a condition, not a wish

A strong condition combines three things: a check that does the verifying, a measurable end state, and a constraint that must not change. Each clause flips cleanly from false to true.

a condition it can prove/goal npm test exits 0, git status shows no uncommitted changes, and no file in src/auth exceeds 200 lines

A vague ask has no point where it objectively becomes true, so the loop either churns on something it can't finish or quietly decides it's done. This is the single most common way /goal goes wrong:

a condition it can't prove/goal clean up the code and make it nicer # no measurable flip from "messy" to "clean"

The command surface

CommandWhat it does
/goal <condition>Set or replace the goal — starts a turn immediately. One per session; condition up to ~4,000 characters.
/goalShow status: runtime, turns and tokens spent, and the evaluator's last reason. A ◎ /goal active indicator shows while it runs.
/goal clearStop the goal. Aliases: stop, off, reset, none, cancel. Ctrl+C also kills it; /clear clears it too.
claude -p "/goal …"Run the whole loop to completion headless, in a single invocation.

It needs Claude Code v2.1.139 or later, plus a workspace whose trust dialog you've accepted and hooks left enabled. A goal still active when a session ends is restored when you resume with --resume or --continue. Always cap a condition with a clause like "or stop after N turns" — but that bound is judged by the model from the transcript, not enforced as a hard mechanical stop, so Ctrl+C stays your real kill switch.

When it's /goal and when it's /loop

/goal is progress-driven: the next turn starts the moment the last one finishes, and it stops when the evaluator confirms you're done. /loop is clock-driven: it re-runs a prompt on a time interval and stops when you say so. Reach for /goal when you're pushing work to a finish line, and /loop when you're polling for something external to change — a deploy finishing, a build going green, new reviews landing on a PR. The loop-research example below is a /goal: it pushes toward a verified report, then stops.

06A loop in practice

Stack the pieces and one thread becomes a small control panel that runs while you sleep.

An automation fires every morning on the repo. Its prompt calls a triage skill that reads yesterday's CI failures, the open issues, and recent commits, and writes the findings to a state file. For each finding worth doing, the thread opens an isolated worktree and sends a maker sub-agent to draft a fix; a second sub-agent reviews that draft against the project's skills and existing tests. Connectors open the PR and update the ticket; anything the loop can't handle lands in a triage inbox for you. You designed that once — you prompted none of those steps.

For graphics & GPU work

Your domain is unusually loop-friendly

The thing a closed loop needs most — an objective, machine-checkable signal — is exactly what graphics and GPU work produces in abundance. These make far better setpoints than "looks right," and they're the difference between a loop that converges and one that drifts:

  • Compilation is clean — the shader or kernel builds with no errors or new warnings.
  • Golden-image diff under threshold — SSIM / PSNR against a reference frame stays above a set bar.
  • Validation layers are silent — zero Vulkan / D3D validation or debug-layer messages on the run.
  • Frame time within budget — a fixed scene or captured replay renders under N ms on target hardware.
  • Perf counters in band — occupancy, memory bandwidth, or stall deltas stay within an allowed window.
  • Conformance / unit suite green — the existing test corpus passes end to end.

07What the loop doesn't do for you

The loop changes the work; it does not delete you from it. Three problems get sharper as the loop gets better — not easier.

Verification is still yours

A loop running unattended is also a loop making mistakes unattended. Splitting the verifier is what makes "done" mean something — but even then, "done" is a claim, not a proof. Ship code you confirmed works.

Comprehension debt compounds

The faster the loop ships code you didn't write, the wider the gap between what exists and what you actually understand. A smooth loop just grows that gap faster — unless you read the diffs.

The comfortable posture is the dangerous one

When the loop runs itself, it's tempting to stop having an opinion and accept whatever it returns. Designing the loop is the cure when you do it with judgment, and the accelerant when you do it to avoid thinking. Same action, opposite result — and the loop can't tell which one you're doing. You can.

08Worked example: loop-research

A concrete closed loop for Claude Code: hand it a topic, and it researches, writes a report and a slide deck, fact-checks every claim, critiques its own work, and keeps refining until the data is fully verified and only one minor suggestion remains — then does a final read-through and stops.

It maps cleanly onto the anatomy from section 02. The only Claude-Code-specific piece you need is /goal: it runs turn after turn until a separate, smaller evaluator model (Haiku) confirms your condition holds.

Control roleIn loop-research
Setpoint0 unverified data points · at most 1 minor suggestion · final review passed
ControllerThe main /goal session — researches, writes report.md + slides.md, renders the deck, applies fixes
PlantThe deliverables: report.md, slides.md, slides.pptx
SensorTwo read-only sub-agents — a verifier (data, links, tables, figures) and a critic (improvement suggestions)
ComparatorThe /goal evaluator, reading a STATUS block each turn
FeedbackFailed ledger rows + open suggestions become the next turn's work
Memoryverification.md + suggestions.md + the sources on disk

The one gotcha that dictates the whole design

The /goal evaluator only reads the transcript — it never runs commands or opens files itself. So verification has to be done by the worker and printed into the conversation. That's why loop-research uses a real verifier sub-agent and ends every turn with a machine-readable STATUS block: the evaluator decides "done" by reading that block, not by trusting a "looks correct."

Set it up

  1. Check prerequisites

    /goal needs Claude Code v2.1.139 or later, a workspace you've accepted the trust dialog for, and hooks enabled (it's a wrapper over a Stop hook). Work from your project root.

    shellclaude --version # need ≥ 2.1.139 cd ~/projects/research-loop
  2. Create the two checker sub-agents

    Both are read-only — they inspect and report, never edit. Keeping the maker (your main session) apart from these checkers is the maker ≠ checker split made real. Bump the verifier to a stronger model if accuracy matters more than cost.

    sub-agent · read-only# .claude/agents/loop-research-verifier.md --- name: loop-research-verifier description: Fact-check a research report and slide outline. Verifies every data point, link, table, and figure against primary sources and returns a ledger. Read-only. tools: Read, Grep, Glob, WebSearch, WebFetch model: sonnet --- You are a meticulous fact-checker. You are given report.md and slides.md. For every checkable claim — statistics, dates, named facts, quotes, table cells, chart/figure values — and every URL: - Open each link with WebFetch; confirm it loads and actually supports the claim. - Cross-check each number or fact against at least one primary source. - Flag anything fabricated, unsupported, stale, or internally inconsistent (totals that don't add up, a caption that disagrees with its data). Return ONLY a markdown ledger, one row per data point: | id | claim | source | status | note | status is VERIFIED or FAILED; for FAILED, the note says what is wrong and the correct value if known. End with: LEDGER: <verified>/<total> verified Never edit files. Never soften a FAILED to VERIFIED.
    sub-agent · read-only# .claude/agents/loop-research-critic.md --- name: loop-research-critic description: Critique a research report and slide deck. Returns a ranked list of concrete improvement suggestions with severities. Read-only. tools: Read, Grep, Glob model: sonnet --- You are a demanding editor and presentation reviewer, given report.md and slides.md. Review argument quality, clarity, evidence, narrative flow, and slide design (one idea per slide, readable density, logical order). Be specific and point to the exact section or slide. Return ONLY a markdown list: - [severity] suggestion — where severity is blocker, major, or minor; list blockers first. Raise blocker/major only when correctness or usefulness is materially affected — do not inflate severities or invent trivia to pad the list. End with: OPEN: <b> blocker, <m> major, <k> minor Never edit files.
  3. Create the loop-research skill

    The skill is the recipe: how to handle the topic and language, what to do each turn, where files live, and when to stop. Those files on disk are the loop's memory — the sub-agents start fresh every turn, but the report and ledgers persist.

    skill# .claude/skills/loop-research/SKILL.md --- name: loop-research description: Iteratively research a topic and produce a verified report and slide deck, refining until the data is fully verified and at most one minor suggestion remains. Topic may be English or Chinese; output is English unless Chinese is requested. --- # loop-research ## Inputs - TOPIC — the subject (may be English or Chinese). - OUTPUT_LANG — `en` by default. If the request says Chinese / 中文 / --lang zh, set `zh` and write ALL deliverables in Chinese. The topic's own language never changes OUTPUT_LANG. ## Canonical files, in ./loop-research/<slug>/ - report.md the report (source of truth) - slides.md the deck outline, one H2 per slide (source of truth) - slides.pptx rendered from slides.md with python-pptx - verification.md latest ledger from the verifier - suggestions.md open/closed suggestions with severities ## Each turn 1. If report.md / slides.md don't exist, research TOPIC (WebSearch/WebFetch) and draft both in OUTPUT_LANG. 2. Apply every FAILED row in verification.md and every accepted open suggestion in suggestions.md; re-render slides.pptx. 3. Delegate to loop-research-verifier on report.md + slides.md; save its ledger to verification.md. 4. Delegate to loop-research-critic on report.md + slides.md; merge into suggestions.md (keep severities, close resolved ones). 5. End the turn by printing the verifier summary, the open-suggestion counts, and the STATUS block below. ## Stop rule When 0 data points are FAILED AND open suggestions are 0 blocker + 0 major + at most 1 minor: do ONE final review — re-read report.md and slides.md end to end, open slides.pptx to confirm it renders, fix anything trivial — then set final_review. ## Print at the end of EVERY turn, verbatim === LOOP-RESEARCH STATUS === verified: <n>/<total> open: <b> blocker, <m> major, <k> minor final_review: <PASS|PENDING|FAILED> ============================
  4. Let it run hands-off (optional)

    /goal drops the approval prompt between turns, but tool calls within a turn still ask unless you enable Auto mode. Turn on Auto mode — your session's autonomy toggle — so each turn runs unattended.

  5. Kick off the goal

    The condition reads the STATUS block, not the work itself. The stop after N turns clause is your safety cap — a critic can always find one more thing, so the loop needs a hard ceiling.

    in session/goal Using the loop-research skill, produce a verified report and slide deck on "<your topic>" (output in English). Met only when the latest LOOP-RESEARCH STATUS shows verified == total with 0 FAILED, open is 0 blocker and 0 major and at most 1 minor, and final_review: PASS. Stop after 25 turns regardless.

    For a Chinese deliverable, swap the parenthetical to (output in Chinese / 中文). To run it headless instead of interactively:

    shell · non-interactiveclaude -p "/goal Using the loop-research skill … final_review: PASS. Stop after 25 turns."
  6. Watch and stop

    A ◎ /goal active indicator shows runtime. Type /goal for status and the evaluator's last reason; /goal clear (or Ctrl+C in headless) stops it early. Otherwise it stops itself the moment final_review reads PASS — or at the turn cap.

Notice the shape: you set this up once. After that, no turn is hand-prompted — the verifier and critic generate the feedback, the worker acts on it, and the evaluator decides when the data is clean and the deck is done. That is the closed loop from the top of this post, pointed at research instead of code.

Where this comes from

Synthesizes the June 2026 loop-engineering discourse. Tool-specific commands (/goal, /loop, agent directories) evolve quickly — verify against current Claude Code and Codex documentation before relying on exact syntax.