The way I write code with AI is that I start with a project.md file, where I des...

jumploops · 2026-03-02T08:03:27 1772438607

I do something similar, but across three doc types: design, plan, and debug

Design works similar to your project.md file, but on a per feature request. I also explicitly ask it to outline open questions/unknowns.

Once the design doc (i.e. design/[feature].md) has been sufficiently iterated on, we move to the plan doc(s).

The plan docs are structured like `plan/[feature]/phase-N-[description].md`

From here, the agent iterates until the plan is "done" only stopping if it encounters some build/install/run limitation.

At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.

We review these hypotheses, sometimes iterate, and then tackle them one by one.

An important note for debug flows, similar to manual debugging, it's often better to have the agent instrument logging/traces/etc. to confirm a hypothesis, before moving directly to a fix.

Using this method has led to a 100% vibe-coded success rate both on greenfield and legacy projects.

Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.

miki123211 · 2026-03-02T09:58:35 1772445515

My "heavy" workflow for large changes is basically as follows:

0. create a .gitignored directory where agents can keep docs. Every project deserves one of these, not just for LLMs, but also for logs, random JSON responses you captured to a file etc.

1. Ask the agent to create a file for the change, rephrase the prompt in its own words. My prompts are super sloppy, full of typos, with 0 emphasis put on good grammar, so it's a good first step to make sure the agent understands what I want it to do. It also helps preserve the prompt across sessions.

2. Ask the agent to do research on the relevant subsystems and dump it to the change doc. This is to confirm that the agent correctly understands what the code is doing and isn't missing any assumptions. If something goes wrong here, it's a good opportunity to refactor or add comments to make future mistakes less likely.

3. Spec out behavior (UI, CLI etc). The agent is allowed to ask for decisions here.

4. Given the functional spec, figure out the technical architecture, same workflow as above.

5. High-level plan.

6. Detailed plan for the first incomplete high-level step.

7. Implement, manually review code until satisfied.

8. Go to 6.

jedberg · 2026-03-02T08:28:22 1772440102

> At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.

I'm biased because my company makes a durable execution library, but I'm super excited about the debug workflow we recently enabled when we launched both a skill and MCP server.

You can use the skill to tell your agent to build with durable execution (and it does a pretty great job the first time in most cases) and then you can use the MCP server to say things like "look at the failed workflows and find the bug". And since it has actual checkpoints from production runs, it can zero in on the bug a lot quicker.

We just dropped a blog post about it: https://www.dbos.dev/blog/mcp-agent-for-durable-workflows

zknill · 2026-03-02T09:20:34 1772443234

Why an MCP? dbos already ships a cli that appears to have the same features. Why an MCP over a skill that gives context on using the cli?

https://docs.dbos.dev/python/reference/cli

jumploops · 2026-03-02T09:51:34 1772445094

> we launched both a skill and MCP server.

My guess is that the MCP was easy enough to add, and some tools only support MCP.

Personal opinion: MCP is just codified context pollution.

jumploops · 2026-03-02T08:41:29 1772440889

This is great, giving agents access to logs (dev or prod) tightens the debug flow substantially.

With that said, I often find myself leaning on the debug flow for non-errors e.g. UI/UX regressions that the models are still bad at visualizing.

As an example, I added a "SlopGoo" component to a side project, which uses an animated SVG to produce a "goo" like effect. Ended up going through 8 debug docs[0] until I was satisified.

[0]https://github.com/jumploops/slop.haus/tree/main/debug

nubinetwork · 2026-03-02T10:43:34 1772448214

> giving agents access to logs (dev or prod) tightens the debug flow substantially.

Unless the agent doesn't know what it's doing... I've caught Gemini stuck in an edit-debug loop making the same 3-4 mistakes over and over again for like an hour, only to take the code over to Claude and get the correct result in 2-3 cycles (like 5-10 minutes)... I can't really blame Gemini for that too much though, what I have it working on isn't documented very well, which is why I wanted the help in the first place...

frumiousirc · 2026-03-02T11:51:44 1772452304

> Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.

FWIW, what you describe maps well to Beads. Your directory structure becomes dependencies between issues, and/or parent/children issue relationship and/or labels ("epic", "feature", "bug", etc). Your markdown moves from files to issue entries hidden away in a JSONL file with local DB as cache.

Your current file-system "UI" vs Beads command line UI is obviously a big difference.

Beads provides a kind of conceptual bottleneck which I think helps when using with LLMs. Beads more self-documenting while a file-system can be "anything".

danenania · 2026-03-02T16:16:17 1772468177

I have a similar process and have thought about committing all the planning files, but I've found that they tend to end up in an outdated state by the time the implementation is done.

Better imo is to produce a README or dev-facing doc at the end that distills all the planning and implementation into a final authoritative overview. This is easier for both humans and agents to digest than bunch of meandering planning files.

wek · 2026-03-02T14:21:52 1772461312

Similar, but we have the agent write the test cases after writing the plan and then iterate until it passes the test cases.

frank00001 · 2026-03-02T07:24:32 1772436272

Sounds like the spec driven approach. You should take a look at this https://github.com/github/spec-kit

kriro · 2026-03-02T12:05:03 1772453103

I basically use a spec driven approach except I only let Github Spec Kit create the initial md file templates and then fill them myself instead of letting the agent do it. Saves a ton of tokens and is reasonably quick and I actually know I wrote the specs myself and it contains what I want. After I'm happy with the md file "harness" I let the agents loose.

The most frustrating issues that pop up are usually library/API conflicts. I work with Gymnasium or PettingZoo and Rlib or stablebaselines3. The APIs are constantly out of sync so it helps to have a working environment were libraries and APIs are in sync beforehand.

jedberg · 2026-03-02T08:25:37 1772439937

Sort of, depending on if your spec includes technology specifics.

For example it might generate a plan that says "I will use library xyz", and I'll add a comment like "use library abc instead" and then tell it to update the plan, which now includes specific technology choices.

It's more like a plan I'd review with a junior engineer.

I'll check out that repo, it might at least give me some good ideas on some other default files I should be generating.

shinycode · 2026-03-02T08:00:49 1772438449

Thanks for the link ! I’m very curious about their choices and methods, I’ll try it

wolletd · 2026-03-02T07:36:59 1772437019

> 110 releases in 6 months

sethammons · 2026-03-02T10:52:25 1772448745

Almost a release per work day, esp. if you count standard holidays.

WXLCKNO · 2026-03-02T16:04:46 1772467486

or OpenSpec https://github.com/Fission-AI/OpenSpec/

I think it's much better

malloryerik · 2026-03-02T09:52:14 1772445134

Have you tried this? Review?

dmd · 2026-03-02T12:36:10 1772454970

https://github.com/obra/superpowers "brainstorming" is pretty much exactly this workflow, and it's great.

shinycode · 2026-03-02T07:58:04 1772438284

I also do that and it works quite well to iterate on spec md files first. When every step is detailed and clear and all md files linked to a master plan that Claude code reads and updates at every step it helps a lot to keep it on guard rails. Claude code only works well on small increments because context switching makes it mix and invent stuff. So working by increments makes it really easy to commit a clean session and I ask it to give me the next prompt from the specs before I clear context. It always go sideways at some point but having a nice structure helps even myself to do clean reviews and avoid 2h sessions that I have to throw away. Really easier to adjust only what’s wrong at each step. It works surprisingly well

nesarkvechnep · 2026-03-02T17:20:26 1772472026

By that time you would’ve written the code yourself, only better.

cortesoft · 2026-03-02T17:49:08 1772473748

I am sure this is partly tongue in cheek, but no, you can’t have written the code yourself in that amount of time. Would the code be better if you wrote it? Probably, depending on your coding skills.

But it would not be faster.

OP is talking about creating an entire project, from scratch, and having it feature complete at the end.

anbende · 2026-03-02T14:20:19 1772461219

Here’s how I do the same thing, just with a slightly different wrapper: I’m running my own stepwise runtime where agents are plugged into defined slots.

I’ll usually work out the big decisions in a chat pane (sometimes a couple panes) until I’ve got a solid foundation: general guidelines, contracts, schemas, and a deterministic spec that’s clear enough to execute without interpretation.

From there, the runtime runs a job. My current code-gen flow looks like this: 1. Sync the current build map + policies into CLAUDE|COPILOT.md 2. Create a fresh feature branch 3. Run an agent in “dangerous mode,” but restricted to that branch (and explicitly no git commands) 4. Run the same agent again—or a different one—another 1–2 times to catch drift, mistakes, or missed edge cases 5. Finish with a run report (a simple model pass over the spec + the patch) and keep all intermediate outputs inspectable

And at the end, I include a final step that says: “Inspect the whole run and suggest improvements to COPILOT.md or the spec runner package.” That recommendation shows up in the report, so the system gets a little better each iteration instead of just producing code.

I keep tweaking the spec format, agent.md instructions and job steps so my velocity improves over time.

--- To answer the original article's question. I keep all the run records including the llm reasoning and output in the run record in a separate store, but it could be in repo also. I just have too many repos and want it all in one place.

CompoundLoop · 2026-03-02T17:46:03 1772473563

What store do you use for your run records? A separate git repo? or do you have some SQL lite db holding the records.

anbende · 2026-03-02T18:27:39 1772476059

Hi there. Right now they are going to a separate git repo, yes. Like this:

local-governor/epics/e-epics/e014-clinical-domain-model/runs/run-e014-01-ops-catalog-20260302-173907-244c82

- Attempts

+ Steps

  - Step 1

  - Step 2

  - ...

  - Step 13

job_def.yaml

job_instance.json

changes_final.patch

run_report.md

improvement_suggestions.md

local-governor is my store for epics, specs, run records, schemas, contracts, etc. No logic, just files. I want all this stuff in a DB, but it's easier to just drop a file path into my spec runner or into a chat window (vscode chat or cli tool), but I'm tinkering with an alt version on a cloud DB that just projects to local files... shrug. I spend about as much time on tooling as actual features :)

RHSeeger · 2026-03-02T15:20:29 1772464829

I do something similar - A full work description in markdown (including pointers to tickets, etc); but not in a file - A "context" markdown file that I have it create once the plan is complete... that contains "everything important that it would need to regenerate the plan" - A "plan" markdown file that I have it create once the plan is complete

The "context" file is because, sometimes, it turns out the plan was totally wrong and I want to purge the changes locally and start over; discussing what was done wrong with it; it gives a good starting point. That being said, since I came up with the idea for this (from an experience it would have been useful and I did not have it) I haven't had an experience where I needed it. So I don't know how useful it really is.

None of that ^ goes into the repo though; mostly because I don't have a good place to put it. I like the idea though, so I may discuss it with my team. I don't like the idea of hundreds of such files winding up in the main branch, so I'm not sure what the right approach is. Thank you for the idea to look into it, though.

Edit: If you don't mind going into it, where do you put the task-specific md files into your repo, presumably in a way that doesn't stack of over time and cause ... noise?

giancarlostoro · 2026-03-02T19:44:42 1772480682

This is how I used to use Beads before I made GuardRails[0]. I basically iterate with the model, ask it to do market research, review everything it suggests, and you wind up with a "prompt" that tells it what to do and how to work that was designed by the model using its own known verbiage. Having learned about how XML could be used to influence Claude I'm rethinking my flow and how GuardRails behaves.

[0]: https://giancarlostoro.com/introducing-guardrails-a-new-codi...

adam_patarino · 2026-03-02T13:44:55 1772459095

You check the plan files into git? Don’t you end up with dozens of md files?

I’ve been copying and pasting the plan into the linear issue or PR to save it, but keep my codebase clean.

thearn4 · 2026-03-02T14:59:06 1772463546

Yeah I had the same question. I suppose you could put the project+plan text into the commit message?

8note · 2026-03-02T19:42:35 1772480555

the real question is when peer feedback and review happens.

is making the project file collaborative between multiple engineers? the plan file?

ive tried some variants of sharing different parts but it feels like ots almost water effort if the LLM then still goes through multiple iterations to get whats right, the oroginal plan and project gets lost a bit against the details of what happened in the resulting chat

the-grump · 2026-03-02T07:19:03 1772435943

Stealing this brilliant idea. Thank you for sharing!

jedberg · 2026-03-02T08:23:48 1772439828

I wish I could say I came up with it, but it's just a small variation on something I saw here on HN!

peyton · 2026-03-02T07:32:36 1772436756

For big tasks you can run the plan.md’s TODOs through 5.2 pro and tell it to write out a prompt for xyz model. It’ll usually greatly expand the input. Presumably it knows all the tricks that’ve been written for prompting various models.

winwang · 2026-03-02T16:26:07 1772468767

Interesting! I actually split up larger goals into two plan files: one detailed plan for design, and one "exec plan" which is effectively a build graph but the nodes are individual agents and what they should do. I throw the two-plan-file thing into a protocol md file along with a code/review loop.

odiroot · 2026-03-02T16:59:33 1772470773

How do you use your agent effectively for executing such projects in bigger brownfield codebases? It's always a balance between the agent going way too far into NIH vs burning loads and loads of tokens for the initial introspection.

matkoniecz · 2026-03-02T12:08:20 1772453300

I do the same, but put it as a comment on top of generated file.

(So far I have not used LLMs to generate code larger than fitting in one file.)

Overall idea is that I modify and tweak prompt, and keep starting new LLM sessions and dispose of old ones.

stackghost · 2026-03-02T06:49:53 1772434193

>I then iterate on that plan.md with the AI until it's what I want.

Which tools/interface are you using for this? Opencode/claude code? Gas town?

StrangeSound · 2026-03-02T06:51:04 1772434264

I find that Antigravity is really good for this. You can comment on the plan documents in-line.

d1sxeyes · 2026-03-02T11:43:36 1772451816

Best feature of Antigravity

anshumankmr · 2026-03-02T07:29:17 1772436557

While I have not commited my personal mind map, I just had Claude Code write it down for me. Plus I have a small Claude.MD, copilots-innstructions.md that are mentioning the various intricacies of what I am working on so the agent knows to refer to that file.

jedberg · 2026-03-02T08:23:21 1772439801

I'm using the Claude desktop app and vi at the moment. But honestly I would probably do better with a more modern editor with native markdown support, since that's mostly what I'm writing now.

tlb · 2026-03-02T12:24:03 1772454243

Do you clear the file and use the same name for the next commit? Or create a new directory with a plan.md for each set of changes?

fhub · 2026-03-02T08:29:15 1772440155

I do something similar but I get Claude to review Codex every step of the way and feed it back (or visa versa depending on day)

jedberg · 2026-03-02T08:34:52 1772440492

My next step was to add in having another LLM review Claude's plans. With a few markdown artifacts it should be easy for the other LLM to figure it out and make suggestions.

vorticalbox · 2026-03-02T11:46:37 1772451997