A Token-Burning Experiment Became the Foundation of Atlas

AI generates, humans evaluate — what the template experiment left behind and what comes next

It started as an experiment to burn through leftover tokens.

Everything that came out of this series, summarized in one line: I had it freely generate topics, auto-stamp multiple versions, and tried to make it verify itself — then hit a wall. I learned that subjective evaluation criteria break the autonomous loop. So I handed judgment to humans. Via voting. Once preferences accumulate as data, the loop starts feeding back into the skill.

When public preferences accumulate from voting, what comes next becomes visible. Average preferences alone aren’t enough. People have different tastes.

So I’m adding a custom order feature to Templ. After picking a concept from the gallery you like, you request “make me something in this aesthetic.” You can request from a concept in competition, or request modifications on a completed gallery piece. Orders are paid; the output is delivered exclusively to the person who ordered.

A custom commission service is something I’m planning to add later. Right now there’s too much else on the plate.

But something only became visible after putting all this together.

This entire flow was the same as what Atlas app-creator was trying to do.

The flow Atlas app-creator envisions: user says “I want to build an app like this” → an example concept gallery appears → they pick what they like → expand from there into full detail. Topic → concept → selection → expansion.

What the template experiment did was exactly the same structure. Auto-generate requirements (auto-requirement), generate multiple concepts with different aesthetics from the same requirements (home-concepts), have people vote on them, and expand the selected one into full detail. Templ’s competition/voting structure is an expansion of Atlas’s “user selects a concept” stage — extended to the crowd.

The skills evolved this direction too. The pipeline that started as template-spec/gen/lint carried forward into the home-concepts and expand-concept skills. The lessons learned from the autonomous loop — only things that reduce to a checklist can be automated; design selection needs to be done by humans — became Atlas’s design principles as-is.

Even a lightweight experiment started with spare resources, if you run it all the way through, becomes the foundation of your next product. From a design experiment I started just to burn tokens, I learned two things.

One is how to split generation and evaluation. What machines can do and what humans have to do are different. I learned that boundary by bumping into it directly.

The other is that failure becomes an asset. The experience of self-verification not working left behind the design knowledge that “this kind of thing can’t be automated.” Without it, I would have made the same mistake again when building Atlas.

Every failure was part of the asset.