It Wasn’t the AI’s Fault, I Just Didn’t Know the Tools — 5 Automation Mistakes from My Stock App
Splitting build from verify, doc sync, merge hell, the finish — five burns later, I changed the order I work in
I was pretty serious about development automation while building my stock app. I split the implementing agent from the verifying agent, let them hand work back and forth, passed context through documents, and spun up several worktrees to run things in parallel. The assumption underneath was simple: the less a human is in the loop, the faster it goes.
To skip to the result: I got burned in five places. Mostly not because the AI couldn’t do it, but because I didn’t know how to handle the tools, so I brought it on myself. And those five burns ended up completely changing the order I work in on the next project.
1. I split build from verify, and that was the mistake
Verification felt slow, so I split building from verifying. The main session handled divergence (discussion, priorities), and a background session handled convergence (code changes, tests). Mixing both modes in one session pollutes the context. I went a step further and made it so that the moment a Worker finished and published, a hook would spawn a Tester session in the background. The next cycle would roll on without me calling “hey tester” every time.
Looking back, I had this wrong. Building and verifying are one cycle to begin with. Writing it, running it, fixing it — that’s a single breath, and I forced it into two. The moment you split them, a new job appears: stitching the two back together. And the harder I leaned on automating that connection, the more cumbersome the connection itself got.
On top of that, I didn’t run e2e automatically. Tokens are expensive. So whether the screen actually worked, in the end, I had to check with my own eyes once. The more I peeled verification off, the longer I just waited for that “look at it myself” turn to come around. Splitting wasn’t the answer.
2. Documents were the context, and the documents kept drifting
A Claude session has no memory between sessions. A decision or discovery from one session simply vanishes in the next unless you move it into a file. So files played the role of state carrier between sessions.
To hold that line I set three rituals: write a decision into a policy or task doc the moment it’s made, accumulate implementation results into the task when a phase ends, and copy plan-mode output back into the task. Drop any one of the three and the handoff between sessions breaks. The problem was that, in the middle of work, those things get skipped. There was no concept of “skills” back then, so I had to ask for doc updates by hand every time, and doing it by hand meant things fell through.
What frustrated me more was something else. I made an index and set up a structure, but the agent still couldn’t reference the docs well. Unless you pointed Gemini at a folder directly with @, it didn’t even know the path existed. The deeper the structure got (docs/tasks/<domain>/, docs/process/feeds/<role>.feed.md, even an auto-generated inbox), the less it found, and if the index wasn’t current, it didn’t trust what it did find. There were two causes: a stale index, and a structure that was too deep.
3. Running 4-5 at once turned merging into hell
A background session was supposed to use its own worktree. One worktree, one session, one task. Run two sessions in one directory and the git state hits a race.
But I had 4 or 5 going at once in the agent view, running whatever was in front of me. And as I said in #1, I’d cut implementation and testing apart too, so even more sessions were churning on one screen. Throwing work case by case without a plan meant several agents touched the same spot, and since dev merges are only safe serially, parallel ones piled up and ate my time at merge. In a word, it was chaos.
The rule I eventually set was single-writer. Only the driver (the main session) updates docs/todo and the log; the verifying agent doesn’t touch the files directly, it just reports results. Only then did conflicts hit zero. Parallel isn’t free. You have to cut the dependencies first for parallel to pay off.
4. The AI couldn’t do the finish — more so because it’s stocks
With stocks, the finish is heavy. You have to reconcile profit and return per portfolio, sync the account, and pull in even the orders placed outside the app to match them up. There’s no end to what you have to watch.
The calculations especially were the problem. Not a single calculation came out right on the first try. Profit, return, valuation — the first numbers were always off somewhere. I’d set the agent to stop whenever a policy, contract, or schema judgment was needed, so with the calculation rules left vague, it couldn’t push any further.
In the end the numbers only started lining up after I nailed down policy docs under docs/product/policy/. Cost basis, order method, accounts, instruments — I wrote them in a “decision: what. reason: one line” format. The AI handled the finish once the standard was set. Making the standard itself was my job.
So next time, I laid the order down first
I took these five and, on the next project (the wiki), forced an order.
First, you need the publishing screens for the main features and a requirements file. Then docs-init lays the document structure down early, without going overboard (log and index each as a single file). Then docs-spec builds the standard docs: requirements, screen features, backend API policy, tech stack. You decide only the what and the why, not the how. Last, build-plan lays out the todos: sized to what one session can handle, with dependencies marked per unit, the parallelizable parts run in parallel, and auto vs plan modes separated.
Oddly enough, the places I got burned became the order itself. #1 and #3 (forcing apart what was one breath) went into build-plan: deciding ahead of time where to cut and where not to, the dependencies and the parallelism. #2 (the drifting docs) went into docs-init, keeping the structure shallow. #4 (the finish that wouldn’t close) went into docs-spec, standards first.
The effect was clear. The chaos went away. These days I develop in another session while shooting the breeze. Some of that is because I’m not at the finishing stage yet, sure — but it’s still far better than the start.
Automation itself wasn’t the problem. Taking something that was one breath, slicing it up fine, and then clinging to automation to stitch it back together — that was the problem.