slowlp
← Blog
Method 2026.06.16 · 4 min read

I Couldn't Pick the Design, So I Let Users Vote

Outsourcing evaluation via home concept voting — and feeding that signal back into the generation loop

Method

I Couldn’t Pick the Design, So I Let Users Vote

Outsourcing evaluation via home concept voting — and feeding that signal back into the generation loop


The previous post ended at this wall. Neither I nor AI can judge “is this a good design?” I don’t know design, and AI loses direction when scoring things it made itself.

So who does it? The people who’ll use it.

So I decided to build Templ. A site where you put up AI-generated concepts and visitors vote on them.

The structure is simple. For each brief (e.g. “sleep app”), you put up multiple concepts with the same requirements but different visual styles. Visitors can multi-vote — they can pick several they like. The first concept to reach 10 votes per brief gets the detailed implementation treatment. First place always does; second and third places get built too if they cross 10 votes.

Concepts that receive detailed implementation are available for download in the gallery.

This structure solves multiple problems at once. I outsource aesthetic preference judgments — the thing I can’t decide — to actual people. Only concepts that received votes get the expensive detailed implementation treatment. No waste. Visitors get finished work for free, so they have a reason to participate in voting. As data accumulates, patterns emerge — “this kind of aesthetic gets chosen” — and that feeds back into the generation skill.

It’s a structure where voting signals flow back into the generation loop.

Templ is in development now. There’s no voting data yet — at the time of writing, only the structure is confirmed. When real responses come in, I’ll be able to cover that in follow-up posts.


Using LLM scores to identify “good concepts” ultimately hits a ceiling. No matter how sophisticated automatic scoring gets, it can’t substitute for “what people actually choose.” I ran experiments trying to stabilize the evaluator over multiple cycles, and this is where the conclusion landed — for top-tier selection, human and demand signals need to do it.

Don’t force automation in areas AI can’t evaluate. Hand it off to users, receive their judgments as data, and feed it back into the tool. When you outsource evaluation, the loop closes.

From the project
atlas-templates LAB
Users vote, and the winning template gets built.
View →
COMMENTS