The problem
Most prompt engineering today happens in a Notion doc and a hope. People paste prompts into ChatGPT, eyeball the output, tweak something, and call it done. There's no version control, no test set, no measurement, no way to know whether a change made things better or just different.
I wanted to build the workbench that prompt engineers actually need. Not another playground, not another LLM wrapper, but a real tool with the kind of feedback loops you'd expect from a software engineering environment.
The C.R.E.A.T.E. framework
The optimiser is the heart of the platform. It runs a four-phase flow: input, clarification, optimisation, output. You provide a starting prompt or speak it in via Whisper. The system asks clarifying questions if your intent is ambiguous. Then the Auto-Optimiser runs the prompt against a CSV test set you provide and iterates until the metrics improve, or until it confirms it can't.
The Auto-Optimiser only works if it has something to measure against. So the workflow forces you to define success before you start iterating. That's the same principle as test-driven development, applied to prompts.
Certification, with anti-cheat
Four progressive levels, L1 to L4. Each unlocks a deeper part of the platform: L1 gets you basic access, L4 unlocks the sandbox where you can run agents against real models.
Exam questions are generated server-side per session, and each session is hashed so the same questions don't render twice for the same user. Certificates are publicly verifiable, which means employers and clients can actually trust them. The whole thing exists because prompt engineering is becoming a real skill, and the industry needs a way to vouch for that.
Agent Foundry
A visual no-code builder for autonomous ReAct-loop agents. Drag-and-drop nodes, typed pipelines, full observability with P50, P95, and P99 latency traces baked in from the start.
Agent systems break in subtle ways. A tool returns a slightly different schema, an LLM hallucinates a key name, and three hops downstream the whole loop fails silently. End-to-end type safety means the compiler catches contract drift before runtime does.
AI Council and Benchmarks
Run a prompt against multiple models in parallel and compare the outputs side by side. Add consensus scoring on top, where the models grade each other's responses. Add LLM-judge benchmarking across 12 standardised tasks, and you've got a full evaluation suite that doesn't depend on any one model being right.
Marketplace and Seasons
The social layer. An AI-moderated prompt marketplace where creators can list and sell their work. Competitive seasons with live leaderboards. Weekly challenges. Peer-review teams for quality control. Creator monetisation tiers that pay out based on usage, not just downloads.
This is the part that turns Prompt Architect from a tool into a community.
The type-safety thesis
Most of the platform's reliability comes from one architectural commitment: end-to-end type safety via tRPC 11 and Drizzle ORM. The API contract is checked by the TypeScript compiler at every boundary, from database to client. There's no place in the stack where the schema can drift without something flagging it.
For a system with 35+ features that all need to interoperate, this isn't a nice-to-have. It's the only way I could ship something this large solo and still trust it.
Where it's going
Prompt Architect is in active development. The core platform is stable. The marketplace and seasons are the current focus. Public launch is when the social layer is complete enough that early creators have something worth participating in.