The Agentic Software Development Process

Most teams are using AI agents for one phase of development. Here's what happens when you use them for all eight.

The conversation about AI coding tools is stuck on one question: "which agent writes the best code?" That's the wrong question. Code generation is one phase of software development. There are seven others, and most of them matter more.

I've spent the last year building with AI agents across the full development lifecycle. Not just implementation. Requirements gathering, architecture design, specifications, testing, QA verification, deployment, and maintenance. Each phase has its own tools, its own failure modes, and its own hard-won lessons.

Here's the process, phase by phase.

1. Why Have Requirements Become the New Bottleneck in AI-Assisted Development?

A year ago the hard part was writing code. Now the hard part is knowing what to build. AI agents are shockingly good at implementation but terrible at reading your mind. Atlassian said it bluntly: "Shipping the wrong feature quickly is worse than shipping the right feature slowly." Every gap in your requirements is an invitation for the agent to guess, and the probability of all those guesses being correct approaches zero in any real system.

The fix: requirements as conversations, not documents. Tools like ChatPRD (100K+ PMs), Kiro (AWS), and GitHub spec-kit turn requirements gathering into an interview where the AI asks the questions you'd miss. MCP servers connect agents directly to Jira, Linear, and structured story systems so requirements are live data, not stale documents.

Full article: Bad Requirements Are Why Your AI Agent Writes Bad Code

2. How Big Is the Quality Gap Between AI-Only Code and Human-Guided Architecture?

BSWEN measured a 26-point quality gap between AI-only code and human-guided architecture (59/105 vs 85/105). The entire gap comes from design dimensions: async design, abstraction quality, maintainability, code organization. AI handles implementation fine when given clear architectural direction. The problem is nobody provides it.

The emerging winner is the modular monolith with bounded contexts. Shopify proved it at 2.8M lines. OpenAI proved it with their layered architecture (3 engineers, ~1M lines, 5 months). The pattern: keep each unit of work small, bounded, and clearly contracted. The agent stays in the quality zone when it doesn't have to hold the whole codebase in context.

Full article: Architecture for AI Agents: Which Patterns Actually Work?

3. Why Are Specifications Now More Valuable Than the Code Itself?

Code is no longer the asset. The specification is. When agents generate code at near-zero cost, the thing that constrains, directs, and validates that code becomes the primary deliverable.

A GitClear study of 211M lines found that after AI adoption, refactoring dropped 60% and duplication rose 48%. The VibeContract paper calls it the "illusion of correctness" - code that compiles and passes superficial tests but doesn't do the right thing.

Spec-driven development tools (Kiro, spec-kit, Tessl, cc-sdd, BMAD-METHOD) all converge on the same workflow: define requirements, design architecture, decompose into tasks, generate code, validate against the spec. The key insight from Augment Code: debug specifications, not code.

Full article: Your AI Agent Is Only as Good as Your Spec

4. Why Are Developers Slower With AI Tools Despite Feeling Faster?

Everyone's talking about AI code generation like it's solved. Copilot has 20M users. Cursor hit $2B ARR. 41% of code is now AI-generated.

The paradox: METR ran a rigorous study and found experienced developers are 19% slower with AI tools - but they perceive themselves as 20% faster. Vibe coding produces 2.74x more security vulnerabilities. Gartner predicts 2500% defect increase by 2028 from prompt-to-app approaches. One CTO: 93% of developers use AI, real productivity gain is about 10%.

The skill that matters isn't prompting. It's harness engineering: convention files, test commands, hooks, verification pipelines. The agent loop is commodity. What wraps around it is the differentiator.

Full article: The Implementation Phase: AI Writes the Code, But Who's Actually Driving?

5. What Is the Self-Confirming Loop in AI-Generated Tests?

When the same AI writes your code and your tests, you don't have tests — you have a mirror. The agent misunderstands a requirement, writes wrong code, then writes tests that confirm the wrong behavior. Tests pass, coverage looks great, and the app is broken. CodeRabbit analyzed 470 PRs and found that AI code produces 1.7x more issues, 8x more performance problems, and 1.5-2x more security vulnerabilities. The tests didn't catch any of it.

The fix: TDD where the agent is told which specific tests to check (not just "do TDD" - that makes things worse). BDD specs from acceptance criteria so tests come from requirements, not implementation. The cassette pattern for non-deterministic LLM output. And Meta's JiTTests concept: ephemeral tests per PR, discarded after they run.

Full article: Testing AI-Generated Code: The Self-Confirming Loop and How to Break It

6. Why Do 96% of Developers Distrust AI Code but Ship It Anyway?

96% of developers don't trust AI code but commit it anyway. 50% don't verify before committing. DORA found that for every 25% increase in AI adoption, delivery stability decreases ~7.2%.

Anthropic discovered agents mark features complete without end-to-end verification. The fix: browser automation (Playwright MCP, Puppeteer) where the agent tests as a user would. Cursor's cloud agents record video of themselves using the app and attach it to the PR. Model-as-judge patterns where a separate model evaluates the output.

The testing pyramid is inverting. E2E tests verify behavior. Unit tests verify implementation details the agent can regenerate at will. When code is disposable, behavior is what matters.

Full article: The Verification Gap: Why Agents Ship Broken Code

7. How Do You Teach AI Agents to Deploy Without Giving Them Cloud Credentials?

78% of enterprises have AI agent pilots but under 15% reach production. The deployment gap is real. Agents build apps but can't ship them.

Two approaches: give agents direct infrastructure access through MCP servers (AWS has 66), or teach agents about your infrastructure through knowledge files. I use the second. Four markdown files documenting my Hetzner/Docker/Cloudflare/S3 stack. Claude reads them and generates deployment commands I review before running. No credentials exposed. The agent follows my documented decisions instead of making its own.

Full article: Teaching AI Agents to Deploy

8. Why Is Maintenance the Phase Where AI Agents Deliver the Most Proven Value?

60-80% of total software cost is maintenance. Always has been. And it's where agents deliver the most proven value because maintenance tasks have clear success criteria.

The baseline: assign a GitHub issue to Copilot, get a PR back. OpenAI's garbage collection pattern runs background agents that scan for drift and open cleanup PRs on a cadence. Anthropic's multi-agent code review dispatches specialized agents per issue class with <1% false positive rate.

The paradox: AI agents are generating technical debt faster than they're cleaning it up. Cumulative AI-introduced issues exceeded 110,000 by February 2026. The teams getting it right run cleanup agents at the same cadence as generation agents. Maintenance equilibrium doesn't happen by accident.

Full article: Maintenance: Where Agents Actually Earn Their Keep

Why Does the Development Process Matter More Than the AI Tools You Choose?

The tools change every month. The process doesn't. Define what to build, design how it fits together, specify the contracts, generate the code, test it from multiple angles, verify the running application, deploy with documented knowledge, and maintain with the same rigor you built with. Most teams are doing step 4 (implementation) and skipping the other seven — that's why AI-generated codebases fall apart at month three. The code was generated fine; everything around the code was missing. The agentic software development process isn't about finding the best AI coding tool. It's about building the full lifecycle so the tool has something to work with.

Frequently Asked Questions

What is the agentic software development process? The agentic software development process uses AI agents across all eight phases of software development: requirements, architecture, specifications, implementation, testing, QA verification, deployment, and maintenance. Most teams only use AI for implementation (phase 4) and skip the other seven, which is why AI-generated codebases tend to fall apart after three months.

Why do AI-generated codebases degrade after a few months? The code itself was generated fine, but everything around it was missing. Without structured requirements, architectural guidance, specifications, proper testing, and verification, teams accumulate hidden defects and lose understanding of their own systems. Augment Code documents a predictable decay pattern where integration challenges emerge by month 4-9 and debugging overwhelms teams beyond month 9.

Which phase of software development benefits most from AI agents? Maintenance delivers the most proven, measurable value because maintenance tasks have clear success criteria -- a bug is fixed or it is not, a test passes or it does not. Implementation gets the most attention but produces mixed results, with one rigorous study showing experienced developers are actually 19% slower with AI tools despite perceiving themselves as faster.

How do you prevent AI agents from shipping broken code? Build verification into every phase. Write specifications before generating code so you have something to validate against. Use TDD with explicit test targets rather than generic instructions. Run QA against the actual running application, not just unit tests. Use separate agents for code generation and code review to avoid the self-confirming loop where the same agent writes code and validates its own work.

What is harness engineering and why is it replacing prompt engineering? Harness engineering is the practice of building constraint and verification layers around AI coding agents -- convention files, test commands, linting, type checking, and pre-commit hooks. The agent loop itself is commodity technology shared across all tools. The harness is what separates productive sessions from failures, and it represents the primary skill shift for developers working with AI in 2026.