Most writings about the spec-driven development I see start with a product requirements document that is assumed to be valid. But I doubt that's the case. If so, you would've written about it, and probably would've involved agents in the research that goes into it. My gut feeling tells me there's much more emphasis on implementing the feature than on questioning if it's relevant, feasible, and based on valid assumptions.
A long time ago I dealt with the on-site radio systems at a major oil refinery. It was interesting over the ten years or so that I worked for the company that provided them to see how their safety policies changed, and how other companies with a similar risk profile (like distilleries and whisky bonds) just plain didn't.
For example, they drastically changed how they view Permits to Work. Now a rigidly-enforced PtW would have prevented the Piper A explosion - the permit was returned to the Permit Office, not actually looked at, and then someone assumed that since it had been handed back and the work was *supposed* to be complete, then the work *was* complete. Had they looked at the permit they'd have seen there was more to do and the isolations should remain in place.
Anyway, when I started doing stuff at that site then every permit required a rigorous Method Statement and Risk Assessment. Your RA would get rejected if you failed to mention every single of PPE that you were required to wear on site, and your Method Statement would be rejected if it didn't describe the function and use of every last tool you planned to bring on site.
This was, frankly, fucking *stupid*. It took longer to write the RAMS and apply for the permit than it did to carry out most jobs, and if there was any deviation no matter how small (often because of other work you'd no way of knowing about) the whole thing would have to be stopped and relogged, with a new RAMS taking into account whatever was in your way. Someone's put scaffolding up near the aerial you want to replace? Well tough, you're not getting on site today with that permit!
They changed this about halfway through my time with my previous employer, to a "Risk-Assessed Permit", where you'd describe the risks around the specific tasks you needed to carry out and how you'd mitigate them.
Now your RAP would get rejected if you *did* put on lists of PPE. You're expected to use the correct PPE, you're expected to use your tools correctly and the correct tool for the job. Don't tell me that, just do that.
Hour-long meeting to go over the RAMS? No, five-minute "Toolbox Talk" - look up "Take Five" for some helpful guidelines - and if there are no screaming blockers, crack on, get the job done, get off site, get home. Safely.
Oh now there's some scaffolding right near the aerial you want to work on? Okay, ask the operator of that job if you're going to affect their work. Oh, they're letting you use the scaff to access the roof instead of bringing a boom lift on? Excellent, cross that off the RAP, bring it up at the Toolbox Talk, far safer that way isn't it?
I still do Take Five at work even though I'm predominantly working from home managing network equipment. If there are big complex changes to make or a major piece of work, we'll discuss it together. Anyone got any question? Anyone see something that's going to blow it all up and cause a major outage? Okay, well, you know where I am if you need help. Crack on.
It's a great way to eliminate the kind of mistakes that lead to "I wish I was still bored" days.
Biased cause I work there, but that’s where software like Tactiq shines. We just added an MCP, and now the agent has access to the meetings when writing the plan.
Last week I had three meetings with three stakeholders, and the agent was able to gather everyone’s ideas and make sure they are all working together in the feature.
Personally it's well-defined and agentic - just not circulated.
/understand - agents interrogate the problem
/huddle - Thinking panel turns it into a PRD - attacks the premise, PRDs regularly die here
/tm - claude-task-master breaks the survivor into a dependency graph
Nobody writes this half up because "agent talked me out of building it" demos worse than "agent built it".
Sorry I have to ask. How senior are you? The notion that I‘d allow an agent to talk me out of something seems weird. 99% of cases, it’s the other way around. Architecture is just not where they shine.
What’s your process? My experience matches yours, but then again I usually just give a few lines to codex. I imagine if I tried harder to give detailed specs as input, the agent would have a lot more room to spot flaws and kill the plan.
Usually when they push back, it’s for obvious reasons, things I already know and actively decided to ignore. They are trained on mediocre software, and it shows.
I like using voice input a lot, I get way more info out of my brain and into the context that way.
Process wise, for bug fixes I usually just throw the ticket in and optionally some thoughts on how to fix. But if I don’t know the cause, I let it write instrumentation tests until the bug is reproed, and then the fix is easy.
For new features in brownfield projects, I usually need to align with team members because we‘re closely aligned between platforms. We iterate on what you could call a spec, which is just a mix of requirements, magic numbers we want, algorithms we‘ve picked (often by vibing prototypes), and sometimes going very specific on parts that must be done right. Eg for interfaces with other teams, and there’s not yet a document to describe that, we put that in the spec as well. We do use agents to shoot holes in those specs, and often they find inconsistencies. But architecturally, they seem to get caught up too much in what’s already in those specs, and personally I haven’t seen any worthwhile feedback that I‘d have taken up.
Sometimes we use this spec to vibe a first draft. Often the draft is so good that it can be bent to our liking. Sometimes, it just serves as a reservoir of ideas, and the feature must be implemented (with assistance) by re-assembling the pieces differently.
LLMs can synthesize the domain knowledge so long as it's within their training data. At some point, blindly trusting the decisions they make becomes gambling.
There is this over indexing in training data that I find quite problematic.
I have really good results getting LLMs to read documentation and work of these. This is in domains probably sparsely represented in the training data.
In my experience, even the best frontier LLMs are very likely to make critical but subtle mistakes and false assumptions the more they're trying to one-shot the solution. One-shotting could be thought of as a broad term and varies depending on the use case. You have great results with LLMs because you did the job of finding the right documentation, and more importantly, those who wrote the documentation both had a deep understanding of the domain, and effectively compiled them into a coherent document. In other words, the more vetting, supervision, and research you do, the better the results. Of coruse, this doesn't mean doing the heavy lifting yourself. But the signal is key.
I doubt if this actually solves a real problem for humans or agents, especially in complex projects. It might help if the examples show scenarios where this tool and its commands could make a difference.
Lemme give you an example. when you're working in a 100K-file TypeScript monorepo and you change a utility function that parses API responses. git diff tells you that you changed n lines in that function. What it doesn't tell you is which services, components, and tests actually depend on that function across the repo. You're left grepping for the function name, hoping nobody aliased the import or re-exported it through a barrel file. sem impact gives you that full downstream dependency list in seconds, so you know exactly what to review and test before you ship.
I think some structure in commit messages is helpful, but not to the point where it gets in the way of effectively reflecting what the commit contains, why it was done, and any comments for future reference, e.g. potential regressions.
Generating tonnes of documentation is easy, but it can easily get outdated and so much that no one would read it. Ideally, code should be the single source of truth. Documentation should be generated dynamically and upon request to not go stale. The amount of detail and how far to dig in should be up to the end user.
I recall, back in the AmigaOS days, we kept the documentation inline with the code. E.g. you had your exported API function in a shared library code's .c file, with the documentation right above it sitting in a special formatted comment. Easily kept in sync (code wasn't that convoluted back then either.) Afterwards, during the build phase, the documentation (for other developers) was extracted on-the-fly. There was a fixed format/style and the documentation was compact and actually useful too.
That could work, but there's still the chance that things could diverge if multiple people are working on a project and not everyone is as diligent as you. With LLM context windows continuously growing, agents should be able to scan the whole repository and even relevant repositories on the fly, provided they contain the truth and only what's necessary (i.e. minimal commentary).
> We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology.
reply