Phoenix AI Development Blog

May 03, 2026

Naive BDD: The Tests Ran, the Tests Passed, the App Didn't Work

13 days, 51 stories, 50+ BDD specs. Most passed. The integrations didn't work. A Potemkin village of green tests over broken functionality.

May 03, 2026

Prompt and Pray: How I Started With AI Agents, and Why It Broke

AI coding agents contradict themselves on long tasks for a mechanical reason. Every new instruction deprioritizes every prior one. Here's the math.

May 03, 2026

Spec-Driven Development: Finally Useful, Still Not Executable

Module spec files gave me and the model the same definition of done. The unit tests passed. The features were still broken. Why.

May 03, 2026

BDD Attention Thesis: A Five-Step Workflow for AI Coding That Doesn't Drift

Every long AI coding session collapses for the same mechanical reason. Here's the five-step workflow that holds against it.

May 03, 2026

Three Amigos: The Gate That Was Missing

BDD's Three Amigos applied to AI coding agents: confirm scenario titles before any code is generated, plus a sealed-boundary spec module that can't cheat.

May 03, 2026

Write It Down: The First AI Coding Workflow That Actually Worked

Get the rules out of the chat and into files the model re-reads at position zero of every session. The first workflow that held against attention drift.

April 25, 2026

Dedicated Memory Stores: Most Marketing, Messiest Tradeoffs

MemPalace, mem0, Letta, vendor memory features. Why contamination, opacity, and lock-in outweigh the convenience for coding work. What actually breaks.

April 25, 2026

Graph and Structured Memory: Most Ambitious, Most Vaporware

Zep, Graphiti, mem0 graph mode, and the self-evolving knowledge base trap that exploded after Karpathy's LLM-Wiki gist. When graph memory pays off.

April 25, 2026

It's Not Memory If You Wrote It: The CLAUDE.md Confusion

The most-starred memory system in AI coding has 87k+ stars and isn't actually memory. Why the CLAUDE.md conflation costs you wrong tool choices.

April 25, 2026

Repo-Native Memory: The Boring Answer That Wins

Markdown files in your repo beat MemPalace, mem0, and every dedicated memory system. Cline Memory Bank, Doug's journal, Claude Code auto-memory.

April 25, 2026

Retrieval and RAG: The Category Everyone Reaches for First

RAG is over-applied to coding. Why retrieval breaks for most coding work, and the failure modes nobody mentions: wrong retrieval, context rot.

April 25, 2026

Transcript-Derived Memory: The Category Nobody's Writing About

Process Claude Code transcripts into durable memory. session-kit, claude-mem, claude-memory-compiler, autoDream. The under-discussed third leg.

April 20, 2026

Build In Public: The Three Amigos Problem

My harness was too module-spec heavy and too light on product management. Now prototyping a three amigos process for better BDD specs.

April 19, 2026

Why My Harness Produced Incomplete Apps (and What I Changed)

I got specs wrong twice before getting them right. The journey from module specs to BDD specs to executable boundary testing for AI-generated code.

April 17, 2026

What Is a Spec? The Most Overloaded Word in Software

Spec means 13 different things in software. If you're doing spec-driven development with AI, most definitions are wrong. BDD specs are the one that verifies.

April 16, 2026

Opus 4.7 Migration Guide: What Breaks, What's Better, What to Watch

Claude Opus 4.7 migration guide: three breaking API changes, a stealth 35% cost increase via tokenizer, and what's actually better.

April 11, 2026

The Skill Trajectory for Working with AI Agents

Five levels from prompt engineering to platform engineering. Where most developers are, where the leverage is, and how to level up.

April 10, 2026

How I Built a Local Embedding Pipeline in Elixir That Searches My Own Docs

Pull docs from compiled BEAM files, embed locally with Ortex, search with sqlite_vec, serve through MCP. No API calls. No network.

April 10, 2026

How the CodeMySpec Feedback Widget Captures Screenshots in LiveView

One-click screenshot capture in a LiveView feedback widget using html-to-image, colocated hooks, and presigned S3 uploads.

April 10, 2026

What Is the Client? (It's More Powerful Than You Think)

The client isn't just a browser. Browser apps, PWAs, mobile apps, desktop apps - what they are, what they can do, and why it matters.

April 10, 2026

What Is It and Where Does It Belong?

Does the toilet belong in the bathroom or the living room? A guide to putting the right parts of your app in the right place.

April 09, 2026

What Is Code, Actually?

A plain-English guide to what code is, what JavaScript, React, Supabase, and Vercel actually are, and why your AI picked them.

April 09, 2026

What Is a Server and Why Are You Paying For One?

A server is just a computer. Every service your AI signs you up for is someone else's computer. Here's why you're paying $50/month and how to pay $4.

April 09, 2026

Your App Has Two Halves and You Need to Know Which Is Which

Your app runs in two places. If you don't know which is which, your data will leak. A plain-English guide for vibe coders.

April 08, 2026

ADRs Are the Best Thing You Can Give Your AI Agent

AI agents drift when they don't know what decisions you've already made. ADRs fix that. Here's how to write them for agent consumption.

April 08, 2026

Anthropic Just Made the Agent Harness a Product

Anthropic launched Managed Agents in public beta. They're literally selling the harness now. Here's what it is, what it costs, and why it matters.

April 08, 2026

Claude Mythos: What We Know, What We Don't, and Why the Harness Still Matters

Anthropic's next model leaked before they were ready. 93.9% SWE-bench claimed. Here's what's confirmed, what's speculation, and why the harness still matters.

April 08, 2026

Cursor 3 Isn't an IDE Anymore. It's an Agent Switchboard.

Cursor 3 demoted the IDE for an agent switchboard with 8 parallel workers, mobile control, and Pro at $20/mo. What works and what's marketing.

April 08, 2026

What Is Progressive Disclosure and Why It Matters for AI Agents

Progressive disclosure is a 30-year-old UX principle that solves AI agent context bloat. Practical patterns for skills, CLAUDE.md, and MCP tools.

April 02, 2026

Architecture for AI Agents: Which Patterns Actually Work?

A 26-point quality gap between AI-only code and human-guided architecture. Here's which patterns produce the best agent output, ranked.

April 02, 2026

The Implementation Phase: AI Writes the Code, But Who's Actually Driving?

Devs are 19% slower with AI but perceive 20% faster. Vibe coding has 2.74x more security vulns. Here's what the implementation phase actually looks like.

April 02, 2026

Maintenance: Where Agents Actually Earn Their Keep

Bug fixes, dependency updates, security patches, tech debt. Maintenance is 60-80% of software cost and it's where agents deliver the most proven value.

April 02, 2026

The Verification Gap: Why Agents Ship Broken Code and What to Do About It

96% of developers don't trust AI code but commit it anyway. The verification gap is the central problem. Here's how to close it.

April 02, 2026

Bad Requirements Are Why Your AI Agent Writes Bad Code

The bottleneck moved from writing code to knowing what to build. Here's how AI is changing requirements gathering and why bad specs kill agent output.

April 02, 2026

The Agentic Software Development Process

Most teams use AI agents for one phase of development. Here's what the full lifecycle looks like across all eight phases, with data.

April 02, 2026

Your AI Agent Is Only as Good as Your Spec

Refactoring dropped 60% and duplication rose 48% after AI adoption. The fix is spec-driven development. Here's how it works.

April 02, 2026

Testing AI-Generated Code: The Self-Confirming Loop and How to Break It

When the same AI writes your code and your tests, you don't have tests. You have a mirror. Here's how to break the loop.

April 01, 2026

Teaching AI Agents to Deploy: Knowledge Files vs. Direct Access

AI agents can write code but can't deploy it. I close the gap with 4 markdown files instead of giving my agent cloud credentials.

April 01, 2026

How I Do Marketing with Claude Code and MCP Tools

Marketing isn't a content problem. It's a system problem. The Claude Code loop I built with Reddit MCP, GA4, Search Console, and 30 minutes a day.

March 31, 2026

The Five Layers of an Agentic Coding System

Most developers treat their AI coding tool as one thing. It's five layers. Here's the framework that changes how you evaluate and build with them.

March 31, 2026

The Agent Layer: How AI Coding Tools Actually Work

The agent loop is a while loop that changed software. Here's how tool use, context management, and ReAct turn a token predictor into a coding tool.

March 31, 2026

The Environment Layer: Where AI Code Actually Runs

CLI, IDE, or cloud? Sandboxed or wide open? The environment determines what your AI coding agent can do. Here's why it matters more than you think.

March 31, 2026

The Harness Layer: Why the Wrapper Matters More Than the Model

OpenAI shipped 1M lines with zero manually written source. The secret wasn't the model. It was the harness - constraints, verification, lifecycle.

March 31, 2026

The Model Layer: What Your AI Coding Tool Actually Is (and Isn't)

The model didn't write your code. It predicted tokens. Everything else is the harness. Here's why that matters more than benchmarks.

March 31, 2026

The Orchestration Layer: Coordinating Multiple Agents

One agent hitting its ceiling? Multi-agent coordination is the next frontier. Here's what works, what doesn't, and why the demo-to-production gap is wide.

March 30, 2026

GitHub Copilot in 2026: Features, Pricing, Benchmarks, and Community Sentiment

GitHub Copilot deep dive: $10/mo Pro tier, Coding Agent, 60M+ code reviews, Copilot Memory, and what Reddit developers actually think.

March 29, 2026

Aider in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Aider deep dive: 50+ model support, 4.2x token efficiency vs Claude Code, best-in-class git integration, and what Reddit developers actually think.

March 28, 2026

Gemini CLI in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Gemini CLI deep dive: 1,000 free requests/day, improving quality with 3.1 Pro, Jules async agents, and what Reddit developers actually think.

March 27, 2026

Codex CLI in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Codex CLI deep dive: open source Rust CLI, 2-3x token efficiency, 9,000+ plugins, and what Reddit devs actually think. Pricing, strengths, weaknesses.

March 26, 2026

Cursor in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Cursor deep dive: $2B ARR, Background Agents, MCP Apps, credit-based billing, and what Reddit devs actually think. Features, pricing, and assessment.

March 26, 2026

Writing Applications for LLMs

Your CLAUDE.md is settings. Your skills are libraries. Your hooks are middleware. Two activities, one progression.

March 25, 2026

Claude Code in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Claude Code deep dive: highest-rated for code quality, Agent Teams, MCP ecosystem, and what Reddit developers actually think. Pricing and weaknesses.

March 24, 2026

Open Source vs Vendor-Locked AI Coding Tools: The Tradeoffs That Matter

The most-loved tool (Claude Code) is fully closed. The most-starred (OpenCode, 117K) is fully open. Analysis of 21 tools shows when to choose which.

March 23, 2026

What Happened to Supermaven, Aide, and Void: AI Coding Tools That Didn't Make It

Supermaven was acquired. Aide is sunsetting. Void went silent. Why AI coding tools die, what patterns predict failure, and which tools are at risk today.

March 23, 2026

Building a Markdown API for LLM Collaboration

A web server returning navigable markdown replaces CLAUDE.md stuffing, MCP proliferation, and filesystem sync problems.

March 22, 2026

CodeMySpec Specs vs Kiro EARS: Two Approaches to Spec-Driven AI Development

Amazon's Kiro uses EARS notation. CodeMySpec uses BDD. Both bet that specs before code = better AI output. We tested both approaches.

March 21, 2026

MCP: The Protocol Connecting AI Coding Tools

Model Context Protocol is USB for AI agents. 1,000+ servers, adopted by Anthropic, OpenAI, Google. What MCP is, who supports it, and what it enables.

March 20, 2026

Best Free and Open-Source AI Coding Tools in 2026

9 free and open-source AI coding tools compared. Gemini CLI is truly free. Aider and Cline match paid tools. When is BYOK cheaper than subscriptions?

March 19, 2026

AI IDEs Compared in 2026: Cursor vs Windsurf vs Zed vs Kiro

Cursor 3 Glass, Windsurf's price hike, Zed's 1M context BYOK, Kiro's AWS Transform. Updated April 2026 comparison of pricing, philosophy, and fit.

March 19, 2026

Claude Code Skills: Writing Apps for Agents

How to write Claude Code skills that actually work. Real examples, common mistakes, and how skills differ from prompts, MCP servers, and hooks.

March 18, 2026

The Rise of CLI Coding Agents: Why Terminal-Native AI is Having a Moment

Claude Code accounts for 4% of GitHub commits. Gemini CLI hit 90K stars. The terminal won the AI coding war nobody expected. Here's why.

March 17, 2026

The Best CLI Coding Agents in 2026: Claude Code vs Codex vs Gemini CLI vs Aider vs OpenCode vs Goose

6 CLI coding agents compared: independent testing, pricing, and community sentiment. Refreshed April 2026 with Opus 4.7, Codex $100 tier, Gemini restrictions.

March 15, 2026

The Five Levels of AI-Assisted Development

From autocomplete to fully autonomous development. A framework for understanding where you are with AI coding tools and where the real leverage is.

March 10, 2026

Agentic QA

Unit tests and BDD specs verify pieces. QA verifies the running application — story QA, journey QA, and automated issue filing by AI agents.

March 10, 2026

BDD Specs for AI-Generated Code

Unit tests verify your code works. BDD specs verify your app does what users actually want. One scenario per acceptance criterion, traced to user stories.

March 10, 2026

How CodeMySpec Built and Verified a Fuel Card App in 5 Days

55 commits, 100K+ lines, 100+ QA issues caught in 5 active development days. How BDD specs and agentic QA verified a fuel card management platform.

March 10, 2026

The Part Nobody Talks About: Verifying AI-Generated Code

How CodeMySpec verifies AI-generated code with a 7-stage validation pipeline, dirty tracking, BDD specs, and end-to-end QA journeys.

March 08, 2026

Remote Permission Approval: Building Trust Boundaries for Autonomous AI Agents

How we built a full-stack permission approval system for Claude Code that lets you approve tool calls from your phone with Web Push and Phoenix Channels.

November 13, 2025

My first serious coding workflow with AI

Learn the architecture, planning, and process iteration that keeps LLMs on track.

November 13, 2025

How to write design documents that keep AI from going off the rails

Write one design doc per code file to prevent architectural drift and keep LLMs on track.

November 07, 2025

How to design architecture that keeps AI on track

Learn to design Phoenix contexts and vertical slice architecture to keep AI-generated code consistent.

November 07, 2025

Why Phoenix Contexts Are Perfect for LLM-Based Code Generation

Phoenix contexts provide self-contained modules, consistent patterns, and built-in testability that make them ideal for AI-assisted development.

October 29, 2025

How to manage user stories to get the most out of LLM's

Practical approach to using user stories for AI code generation. Keep LLMs focused on requirements, maintain living documentation, and avoid technical debt.

October 28, 2025

Code Generation is About Control, Not Prompts

The best way to get reliable code from an LLM is better control and enforcement through predefined workflows, validation, and test-driven development.

October 28, 2025

Design-Driven Code Generation, The Missing Layer Between Specs and Code

Design-driven development adds explicit, reviewable designs that define component architecture before implementation begins.

Latest Articles

Naive BDD: The Tests Ran, the Tests Passed, the App Didn't Work

Prompt and Pray: How I Started With AI Agents, and Why It Broke

Spec-Driven Development: Finally Useful, Still Not Executable

BDD Attention Thesis: A Five-Step Workflow for AI Coding That Doesn't Drift

Three Amigos: The Gate That Was Missing

Write It Down: The First AI Coding Workflow That Actually Worked

Dedicated Memory Stores: Most Marketing, Messiest Tradeoffs

Graph and Structured Memory: Most Ambitious, Most Vaporware

It's Not Memory If You Wrote It: The CLAUDE.md Confusion

Repo-Native Memory: The Boring Answer That Wins

Retrieval and RAG: The Category Everyone Reaches for First

Transcript-Derived Memory: The Category Nobody's Writing About

Build In Public: The Three Amigos Problem

Why My Harness Produced Incomplete Apps (and What I Changed)

What Is a Spec? The Most Overloaded Word in Software

Opus 4.7 Migration Guide: What Breaks, What's Better, What to Watch

The Skill Trajectory for Working with AI Agents

How I Built a Local Embedding Pipeline in Elixir That Searches My Own Docs

How the CodeMySpec Feedback Widget Captures Screenshots in LiveView

What Is the Client? (It's More Powerful Than You Think)

What Is It and Where Does It Belong?

What Is Code, Actually?

What Is a Server and Why Are You Paying For One?

Your App Has Two Halves and You Need to Know Which Is Which

ADRs Are the Best Thing You Can Give Your AI Agent

Anthropic Just Made the Agent Harness a Product

Claude Mythos: What We Know, What We Don't, and Why the Harness Still Matters

Cursor 3 Isn't an IDE Anymore. It's an Agent Switchboard.

What Is Progressive Disclosure and Why It Matters for AI Agents

Architecture for AI Agents: Which Patterns Actually Work?

The Implementation Phase: AI Writes the Code, But Who's Actually Driving?

Maintenance: Where Agents Actually Earn Their Keep

The Verification Gap: Why Agents Ship Broken Code and What to Do About It

Bad Requirements Are Why Your AI Agent Writes Bad Code

The Agentic Software Development Process

Your AI Agent Is Only as Good as Your Spec

Testing AI-Generated Code: The Self-Confirming Loop and How to Break It

Teaching AI Agents to Deploy: Knowledge Files vs. Direct Access

How I Do Marketing with Claude Code and MCP Tools

The Five Layers of an Agentic Coding System

The Agent Layer: How AI Coding Tools Actually Work

The Environment Layer: Where AI Code Actually Runs

The Harness Layer: Why the Wrapper Matters More Than the Model

The Model Layer: What Your AI Coding Tool Actually Is (and Isn't)

The Orchestration Layer: Coordinating Multiple Agents

GitHub Copilot in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Aider in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Gemini CLI in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Codex CLI in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Cursor in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Writing Applications for LLMs

Claude Code in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Open Source vs Vendor-Locked AI Coding Tools: The Tradeoffs That Matter

What Happened to Supermaven, Aide, and Void: AI Coding Tools That Didn't Make It

Building a Markdown API for LLM Collaboration

CodeMySpec Specs vs Kiro EARS: Two Approaches to Spec-Driven AI Development

MCP: The Protocol Connecting AI Coding Tools

Best Free and Open-Source AI Coding Tools in 2026

AI IDEs Compared in 2026: Cursor vs Windsurf vs Zed vs Kiro

Claude Code Skills: Writing Apps for Agents

The Rise of CLI Coding Agents: Why Terminal-Native AI is Having a Moment

The Best CLI Coding Agents in 2026: Claude Code vs Codex vs Gemini CLI vs Aider vs OpenCode vs Goose

The Five Levels of AI-Assisted Development

Agentic QA

BDD Specs for AI-Generated Code

How CodeMySpec Built and Verified a Fuel Card App in 5 Days

The Part Nobody Talks About: Verifying AI-Generated Code

Remote Permission Approval: Building Trust Boundaries for Autonomous AI Agents

My first serious coding workflow with AI

How to write design documents that keep AI from going off the rails

How to design architecture that keeps AI on track

Why Phoenix Contexts Are Perfect for LLM-Based Code Generation

How to manage user stories to get the most out of LLM's

Code Generation is About Control, Not Prompts

Design-Driven Code Generation, The Missing Layer Between Specs and Code