Day 11. That's when somebody finally noticed. A developer I know in Montreal had fed a beautifully structured spec into a coding agent the week before, the kind of spec that makes you nod as you scroll, every acceptance criterion in place, every edge case labeled, every behavior phrased as a clean "when this, then that." The agent returned fast, coherent, passing code three runs in a row. Then a compliance reviewer asked one question: where does the audit logging live? It didn't. The spec never mentioned it. Nobody had.
So the spec wasn't sloppy. By every structural measure it was excellent, testable and traceable and unambiguous, the kind of artifact a spec-driven workflow is built to produce. The regulatory logging requirement was simply never discovered. And that gap, the distance between a well-written spec and a well-discovered one, is the thing the entire spec-driven development movement quietly assumes away.
Spec-driven development is having a moment, and it has earned it. GitHub's Spec Kit passed 90,000 stars and more than 8,000 forks by early May 2026, less than a year after launch (Visual Studio Magazine, May 2026). AWS shipped Kiro. BMAD and Tessl built whole methodologies on a single idea: the spec, not the code, is the durable source of truth. I am a fan of all of it. Genuinely. The movement gets something right that the industry has fumbled for two decades.
Then AWS published a number that should stop you cold. When the Kiro team ran their new Requirements Analysis feature across 35 internal projects covering more than 1,400 acceptance criteria, roughly 60% of first-draft requirements needed refinement before they could be safely implemented (The New Stack, May 2026). Sixty percent. Not the code. The requirements themselves, written by professionals, were flawed more often than not at first draft.
What did AWS Kiro actually find in 60% of requirements?
Inconsistency. Mostly the quiet kind. Kiro's Requirements Analysis is a genuinely clever piece of engineering, and it is worth understanding what it does and, more important, what it cannot do. The feature uses a large language model to translate plain-English requirements into formal logic, then hands that logic to a Satisfiability Modulo Theories solver, a deterministic automated-reasoning engine that has been maturing in academic computer science for roughly fifty years, to mathematically prove whether the requirements contradict each other or leave provable gaps.
The natural-language side runs on something called EARS, the Easy Approach to Requirements Syntax, a notation Alistair Mavin introduced to force requirements into shapes like "WHEN [trigger] THE SYSTEM SHALL [response]." Tidy. When the solver reports that a set of requirements is consistent, it isn't guessing the way a chatbot guesses; it has run an actual proof. That is a real advance over "looks fine to me," and I don't want to undersell it, because most teams have never had any mathematical check on their requirements at all.
Here's the boundary, though. The solver can only reason about requirements that exist in the document. It checks the requirements you wrote against each other. It is blindingly good at catching "requirement 14 contradicts requirement 31." It has nothing to say about the requirement nobody wrote, because nobody knew to ask. My Montreal friend's audit-logging gap would have sailed straight through Kiro's analysis, clean, because there was no contradictory clause to flag. The 60% Kiro caught is the visible 60%. The dangerous requirements are the ones that never entered the room.
Why does spec-driven development still fail at the discovery layer?
Because SDD solves a different problem than the one most teams actually have. Worth being precise here. Spec-driven development, at its core, fixes two very real and very painful failure modes of AI coding agents: intent drift, where the agent slowly wanders from what you asked over a long run, and context decay, where it forgets earlier decisions as the token window fills. Anchoring the agent to a persistent spec fixes both. That is the whole value proposition, and it delivers.
None of that touches discovery. Think about the sequence. Discovery is where a human figures out what the system actually needs to do: which regulations apply, which stakeholders disagree, which "obvious" assumption is about to blow up, which legacy system nobody mentioned. Then the spec captures that understanding. Then the agent executes the spec. SDD makes steps two and three reliable. It says nothing about step one, the step where the audit-logging requirement either gets surfaced or doesn't.
I'll push my own argument a little, because it would be too neat otherwise. Don't a good spec template and a few sharp prompts nudge people toward better discovery? A bit, sure. Forcing an author to write "WHEN the user exports data THE SYSTEM SHALL..." does make them think about the export case. But a template can only prompt for the categories you already imagined. It cannot prompt you to consider a regulation you have never heard of, or a downstream team you didn't know consumed your data. Structure helps you organize what you know. It does almost nothing to surface what you don't.
What are the three gaps SDD tools cannot fix?
Three of them, in my experience, and they share a root. None of them is a flaw in the tools. They are flaws in what the tools are pointed at. Here they are, the three places a spec-driven pipeline goes quietly wrong, drawn from twenty-five years of watching specs land on developers' desks.
The missing requirement. This is the audit-logging story, and it is the deadliest of the three. A requirement that was never discovered cannot be written, cannot be specified, and cannot be checked by any solver on earth. It surfaces in production, or in an audit, or in a lawsuit. Kiro proves the requirements you have are consistent. It is silent on the one you don't have.
The false-consensus requirement. Three stakeholders read "the system shall support real-time inventory updates" and nod. They agree. Except one means sub-second, one means within five minutes, and one means "before the customer notices," and the spec, structurally flawless, encodes all three readings at once into a single clean line. The solver sees one consistent requirement. The humans saw three. That ambiguity ships.
The unowned trade-off. Every real system forces choices nobody wants to make explicit: speed versus cost, flexibility versus security, this customer segment versus that one. A spec written without an owner for those trade-offs defaults them silently, usually to whatever was easiest to phrase. Then an agent implements the default at machine speed, and you discover the decision you never made when the bill arrives. I have watched this one play out more times than I can count.
Give AWS real credit here, because the Kiro story is genuinely encouraging. When Amazon faced scrutiny over the reliability of AI-generated code, the Automated Reasoning Group's answer wasn't "add more AI." It was math. They reached for a Satisfiability Modulo Theories solver, the kind of provably correct logic engine that academic computer science has been refining since the 1970s, and wired it into the spec workflow so that "these requirements are consistent" becomes a proof rather than an opinion.
AWS's own applied scientists framed the stakes plainly in the Kiro write-up: a vague prompt produces a vague spec, and the agent implementing that spec produces code full of undisclosed decisions made on your behalf, without your awareness or agreement. That is exactly the right diagnosis. The honesty of it is what makes the feature trustworthy, and it is rare for a vendor to say the quiet part out loud.
Here is the part worth sitting with. Kiro proves consistency among the requirements that exist. Across 35 projects, 60% of those first drafts still needed work, and that is the share the solver could see. The undisclosed decisions that scare me most are the ones upstream of the solver entirely: the requirement nobody wrote, because nobody in the room knew it belonged there. AWS built a brilliant checker for the spec. The discovery that fills the spec is still on us.
How does requirements intelligence fill the gap before the spec?
By working one layer upstream of where every SDD tool starts. This is the entire reason Specira exists, so take the bias as disclosed. Requirements intelligence is the discipline of running structured, multi-expert discovery before a single line of spec gets written: surfacing the hidden assumptions, the unasked regulatory questions, the stakeholder definitions that secretly conflict, and recording the trade-off decisions with an owner attached to each one.
Picture the three gaps again, but with discovery done first. The missing audit-logging requirement gets surfaced because a structured discovery pass asks, every time, "what compliance or logging obligations touch this data?" The false-consensus on "real-time" gets caught because discovery forces each stakeholder to define the term in numbers before it ever reaches a spec line. The unowned trade-off gets an owner because discovery refuses to let a choice that big default silently. Same Kiro. Same Spec Kit. Suddenly aimed at the right system.
I want to be fair to the counterargument, because smart people raise it. Isn't this just "do good business analysis," dressed up? Partly, yes, and I won't pretend otherwise. The discipline is old. What's new is that AI finally makes structured discovery fast enough to keep pace with agentic execution. Twelve weeks of discovery feeding a pipeline that ships in an afternoon was always going to break; the discovery has to compress too, and that compression, done without losing rigor, is the actual frontier. We wrote more about that in what requirements intelligence is and how to compress weeks of discovery into hours.
What does a discovery-to-spec workflow actually look like?
Discovery, then spec, then agent. In that order, with no skipping. The sequence is simple enough to put on a sticky note, and almost every team in 2026 is running it backwards, starting at the spec because that is where the shiny new tools start.
Here is the order that works. First, structured discovery surfaces and validates the requirement set, with conflicts resolved, assumptions logged, and trade-offs assigned to named owners. Second, that validated set flows into a spec-driven tool like Kiro or Spec Kit, which does what it is genuinely excellent at: formalizing the requirements, proving internal consistency, and generating clean, traceable acceptance criteria. Third, the coding agent executes a spec that is finally describing the right system, fast and without intent drift. Each layer does the job it is actually good at, and none of them is asked to cover for the one before it.
The order is the whole point
Spec-driven development is real, and you should adopt it. Kiro, Spec Kit, BMAD, and Tessl are not the problem. The problem is pointing a flawless execution layer at requirements that were never discovered, and then being surprised when the code is wrong on schedule.
Fix discovery first. Put the spec layer downstream of it. Let the agent run last. A team that gets the order right turns the same tooling everyone else has into a durable advantage, and a team that gets it backwards just ships the wrong thing faster than ever. This is the same thread running through our work on the 29x rule and why AI agents still can't ask the right questions.