Why "do B2" fails: handing your agent the full context with the build-plan format

How to Write Instructions for an AI Agent — build-plan Format from B1 to B17

Why does “Do B2” fail — why the agent needs the full context handed to it at once

Building myWiki, I picked up exactly one habit. Instead of telling the agent “Do B2,” I started writing @docs/guide/build-plan.md B2 진행하자. At first I thought it was just tacking on one extra file path — but doing it made a world of difference.

Copy just the line and all the metadata disappears

build-plan.md has a mode baked into each item. plan mode means the agent shows you the plan first and waits for your approval before moving on. auto mode means the agent uses the design documents as its guardrails and makes decisions autonomously, executing immediately. Items marked with a gate (✓) sit at the contract layer — if something goes wrong there it propagates to the whole branch — so those always require a human review.

But if you just copy the line “Do B2” and hand it over, all that mode information, gate markers, and dependency relationships vanish. The agent just does its own thing. When I referenced @build-plan.md in full and B2 (Persistence Base) entered plan mode, I was surprised at first — but it turned out the agent had read the plan tag and the ✓ gate sitting next to B2. It wasn’t a malfunction. The instruction document was working exactly as intended.

The structure of build-plan: from trunk to branches

B1 through B17 have different characters depending on the layer. The trunk (B1–B5) is the contract layer. If something goes wrong here, all six branches built on top inherit that error. So every item runs in plan mode with a gate attached, and I reviewed and approved each plan before proceeding.

The branches (B6–B11) are vertical slices. Each slice touches only its own files, implements in isolation inside a worktree-b{N}-{name} worktree, passes type checks and linting, then merges with --no-ff. This stretch benefits the most from auto mode. Through B6–B10, the agent handled dependency isolation decisions on its own — when B6 Space was still incomplete, the design doc said B7 Document should keep spaceId as a scalar uuid and formalize it as ManyToOne at merge time, and the agent read that and handled it by itself. I didn’t need to explain a thing.

plan vs auto: the difference is when the agent shows you its decisions

There’s one easy misunderstanding here. Both plan mode and auto mode involve the agent making decisions. The difference is whether it shows you that decision before executing or after.

Even running in auto mode, the agent will sometimes ask about something important. Mid-B2, it flagged that the design was for MikroORM v6 but the setup was v7, which would change the entire scaffold, and asked for confirmation. This happens because two axes are moving independently. The mode axis (does it show you the plan first?) and the escalation axis (is this decision one for a human?) are separate. Even in auto mode, if a decision is irreversible or has a large blast radius — like contract drift — the agent punches through the mode and comes to you. That’s the living safety net.

Integration and deployment: why the human gates remain

The integration stretch (B12–B15) is where layers start to interlock. B13 Organize pipeline in particular — triage → apply handlers, lock guards, Revision accumulation — is the core intelligence of the system, so even if the agent drafts it, a human validates it. Semi-automatic.

Deployment (B17) has hong-server secrets and the Caddyfile in play, so it’s a full human gate. Server secrets and routing are not decisions you can hand to an agent.

What it means to write an instruction document

In the end, build-plan.md is not a checklist. It’s an instruction document you hand to the agent. The mode, gate, and dependency metadata on each item are part of the instruction. You need the agent to reference the whole document so it moves with the context intact. If you hand it just one line, the agent fills in the gaps with its own judgment — and that judgment might be right, or it might not.

When you’re handing work to an AI, try having it reference the whole document the instruction lives in, not just the instruction itself. When the context is alive, the agent goes off the rails less. The context management lever in the 4-lever method is grounded in exactly this principle.