MuggsOfCode

MUGGSOFCODE

// BUILD_TOOL //

A spec-driven code generation sandbox.

— or — DEMO THE TOOL · no account needed →

MuggsOfCode is a spec-driven code-generation sandbox built for a high-school CS course — but the thing it actually teaches is older and bigger than the course.

The skill is precision of intent. There are three ways people write code with AI: vibe coding (“just make me a thing”), AI-assisted coding (“finish the line I’m typing”), and spec-driven development (“here is exactly what I want — build to it”). The third is the oldest by decades and the most transferable, because what it trains isn’t “use this particular AI.” It’s express what you want clearly enough that anyone — a person, a model, or future-you — can produce the right thing from it. That skill outlives any single model, company, or year.

Vibe coding isn’t the enemy of that — it’s the on-ramp. The tool opens by asking you to choose: start in vibecode (describe it, watch it appear) or in SDD (write the blueprint first). Vibecode is where the instinct lives; SDD is the discipline it grows into. The prompt-craft you build talking loosely to a model is the raw material of a real specification. Neither path is framed as the lesser one.

The AI is a critic, never a co-author. It asks Socratic questions about your spec, flags vague language, and checks whether your own comments match the code — but it will not pick a design choice, write a comment, or fix your code. An AI that did the work for you would quietly delete the lesson. The guidance is advisory, never coercive, and the interface stays honest about what it does and doesn’t keep.

Smallness is a feature. The model runs on a single GPU in my living room, not a frontier API. A small local model has less slack, so a vague spec produces visibly weaker output — and that gap, between what you meant and what you got, is the whole lesson. It’s also nearly free to run, which is the other half of the idea: do the cheap, verbose scaffolding here, then carry a compact package to a bigger model’s free tier for the hard part. A kid with no budget can start something genuinely ambitious by spending borrowed tokens only where they actually matter.

Nothing is saved between sessions. Close the window and the chat is gone. What you carry out — the spec, the artifact — is the only thing that lasts. That isn’t a limitation; it’s the thesis, made literal.

Chats are disposable. Specs are durable.

A spec-driven code-generation sandbox for a high-school CS course. You write a specification in markdown; a small language model running on a local GPU compiles a single-file HTML page from it; you read the generated code, comment what you understand, and iterate. You enter through a choice of two paths — Vibecode (describe it, watch it build) or SDD (write the blueprint first) — and the SDD path opens into a four-tab workspace.
There are three primary ways AI assists in the writing of code: vibe coding (“just make me a thing”), AI-assisted coding (“complete what I’m typing”), and spec-driven development (“here’s a precise specification — build to it”). This tool trains the third.

Spec-driven development is the oldest of the three by decades and the most transferable to real engineering. Working software has been designed from specifications since the 1960s; what changed in 2022 is that the spec→code translation got cheap enough that the spec can be a living document instead of an archived one.

The skill you’re building isn’t “use AI to write code.” It’s express your intent precisely enough that anyone — a person, a model, or future-you — can produce the right thing from it. That skill transfers to any AI tool, any year, any model size.
The start screen

You pick a path. Vibecode is the fast, freeform first pass — describe it, watch it appear, refine by talking. SDD is the discipline that instinct grows into: write the blueprint first. Neither is framed as the lesser one.

Markdown Builder

Four panels — Project, Structure, Style, Behavior — that force you to decompose intent into the categories a webpage actually has. A START WITH dropdown loads starter templates with {{slots}} to fill in.

Look at the Code

The compiled HTML/CSS/JS, in the same VS Code Dark+ palette you’ll meet in a real editor. It’s framed as a compiled artifact: you read it and comment it, but you change the design by amending the spec, not by hand-patching the code.

Browser Preview

The rendered page — plus a Troubleshoot box. Describe a symptom (“the fonts don’t look right”) and the AI diagnoses the cause and helps you strengthen the spec, then you re-compile.

Export Project

Nothing is saved between sessions, so Export is the only durable memory: a self-contained .html that carries its own spec and QC inside it, the loose source files, and a compact handoff package for continuing on a bigger model.

Two optional QC checklists

Markdown QC and Preview QC slide in from the right — lenses you open when you want to inspect your own work, never gates that block you.
Students who do their best work on this tool do something before they ever open it: they draw the page on graph paper. Top to bottom. Header here, hero there, three cards in a row, a form, a footer. The drawing doesn’t have to be neat. The point is that the hardest decisions — what goes on the page, in what order, with what hierarchy — get made physically, with a pencil, before any spec gets typed.

When you then sit at the tool, the Structure panel is mostly transcription: read what you drew, top to bottom, into a numbered list. Project is what the drawing is FOR. Style annotates the visual choices you made on paper. Behavior notes the things you implicitly drew as clickable or moving.

Page mapping turns spec-writing from intimidating abstraction into reading-aloud-from-paper.
1. Pick a path — vibecode to explore, SDD to build deliberately.
2. Write a spec into the four panels (or load a template and fill the slots).
3. Click COMPILE →. The spec goes to the local model, which returns a single-file page.
4. Look at the Code — read it, and comment what each section does.
5. Browser Preview — compare what you got to what you intended. Spot gaps.
6. Something off? Troubleshoot it: the AI diagnoses the cause and proposes a spec strengthening you insert, then re-compile. You recompile from the source — you don’t patch the page.
7. Optionally walk the QC checklists.
8. Export — a self-contained .html, or a handoff package to continue on a bigger model’s free tier.
9. Your commented artifact is the submission.
Per-section // ASK AI //

Each spec section has its own ASK AI button. Returns 3–4 Socratic questions about that section only — aware of which template you picked and of any unreplaced {{slots}}.

Whole-spec // REVIEW //

Critiques the full four-section spec: vague language, missing decisions, contradictions between sections, scope mismatches.

// CHECK COMMENTS //

After you add your own comments to the generated code, this checks whether each comment is valid syntax for its language context and whether it actually describes the surrounding code. It will NOT write or rewrite your comments.

// TROUBLESHOOT //

On Browser Preview, describe a symptom and the AI diagnoses the cause and asks you a guiding question first — then, only if you ask, proposes a spec change you insert into your own blueprint. It helps you strengthen the spec; it never patches the code.

What the AI is forbidden from doing
- It will not write spec content for you.
- It will not pick design choices for you.
- It will not write or rewrite your comments.
- It will not fix or edit your code.
These are curriculum guardrails enforced in the system prompts. The AI is an interviewer, critic, and verifier — never a co-author.
This tool talks to a local LLM running on a GPU in a closet, not to a cloud API. Two reasons that matter: cost (a class hitting an API would burn through budget fast) and pedagogy (a small local model has less slack, so vague specifications produce visibly weaker output — the gap teaches precision).

One card does the work for this tool:
- NVIDIA GeForce RTX 4090 — Ada Lovelace architecture, 24 GB GDDR6X, 16,384 CUDA cores, ~1 TB/s memory bandwidth. The big-VRAM workhorse, dedicated to this tool so a full class can build at once.
It runs llama-server from llama.cpp, serving Qwen3.5-9B with Multi-Token Prediction (MTP) — a model that drafts several tokens ahead of itself in a single pass to generate faster. (The rig has a second card, an RTX 5070 Ti, but that one runs a different tool — this build tool is the 4090’s alone.)
Modern LLMs generate text one token at a time, each token a full forward pass — which makes large models slow.

Speculative decoding cheats that step: something proposes the next several tokens cheaply, and the full model verifies them in a single pass, keeping the ones that match its own prediction. Guess right and you collect several tokens for the cost of one.

There are two ways to do the guessing. The classic way pairs a small “draft” model with the big “target” model. The newer way — Multi-Token Prediction (MTP) — builds the draft into the model itself, extra prediction heads that share the model’s own work, so there’s no second model to run.

Newer isn’t automatically better, so we measured both on this tool’s actual job — generating a full page from a spec, with several groups building at once. MTP won clearly: it roughly doubled throughput under class load, and every group kept near solo-speed generation. The classic external-draft setup was actually slower than no speculation at all here — the second model cost more than it saved. So the tool runs MTP: not because it’s newer, but because on our hardware and our workload, the numbers said so.
v0.1 was the first usable version — spec writing, synthesis, code reading, comment-checking, auth and infrastructure. v0.2 (teacher-only pilot) added a five-stage loop with two mandatory QC audit stages.

v0.3 streamlined the whole tool. The horizontal loop became a lighter, tabbed workspace entered through a start screen (Vibecode or SDD). The two QC checklists became optional right-side slideouts — lenses, not gates. The code is framed as a compiled artifact: you read and comment it, but you change the design by amending the spec. The Repair / Troubleshoot loop turns a plain-language preview note into a proposed spec change you accept into your blueprint, then re-compile. And nothing persists between sessions — Export is the only durable memory.

Export also includes a handoff: a compact, token-lean package — your spec, the current artifact, and a continue-from-here instruction — meant to paste into a free tier of ChatGPT, Gemini, or Claude. You scaffold the project here on local hardware, then spend a bigger model’s limited free budget only on the hard part.

Still held for later: publish-to-GitHub for student-owned URLs, inline syntax-error markers, vision-LLM ingest of hand-drawn paper specs, and model upgrades as new open-weights releases arrive.
Built by Sean Muggivan as a teaching tool for high-school computer science. If you want access to try it, email sean@muggivanlcsw.me.

UNDER THE HOOD · how the AI actually writes your page →

The start screen

Markdown Builder

Look at the Code

Browser Preview

Export Project

Two optional QC checklists

Per-section `// ASK AI //`

Whole-spec `// REVIEW //`

`// CHECK COMMENTS //`

`// TROUBLESHOOT //`

What the AI is forbidden from doing