Blog

Latest Articles

Insights on structured AI development, avoiding LLM-generated technical debt, and building production-ready Phoenix applications.

Naive BDD: The Tests Ran, the Tests Passed, the App Didn't Work

Naive BDD: The Tests Ran, the Tests Passed, the App Didn't Work

13 days, 51 stories, 50+ BDD specs. Most passed. The integrations didn't work. A Potemkin village of green tests over broken functionality.

Read More
Prompt and Pray: How I Started With AI Agents, and Why It Broke

Prompt and Pray: How I Started With AI Agents, and Why It Broke

AI coding agents contradict themselves on long tasks for a mechanical reason. Every new instruction deprioritizes every prior one. Here's the math.

Read More
Spec-Driven Development: Finally Useful, Still Not Executable

Spec-Driven Development: Finally Useful, Still Not Executable

Module spec files gave me and the model the same definition of done. The unit tests passed. The features were still broken. Why.

Read More
BDD Attention Thesis: A Five-Step Workflow for AI Coding That Doesn't Drift

BDD Attention Thesis: A Five-Step Workflow for AI Coding That Doesn't Drift

Every long AI coding session collapses for the same mechanical reason. Here's the five-step workflow that holds against it.

Read More
Three Amigos: The Gate That Was Missing

Three Amigos: The Gate That Was Missing

BDD's Three Amigos applied to AI coding agents: confirm scenario titles before any code is generated, plus a sealed-boundary spec module that can't cheat.

Read More
Write It Down: The First AI Coding Workflow That Actually Worked

Write It Down: The First AI Coding Workflow That Actually Worked

Get the rules out of the chat and into files the model re-reads at position zero of every session. The first workflow that held against attention drift.

Read More
Dedicated Memory Stores: Most Marketing, Messiest Tradeoffs

Dedicated Memory Stores: Most Marketing, Messiest Tradeoffs

MemPalace, mem0, Letta, vendor memory features. Why contamination, opacity, and lock-in outweigh the convenience for coding work. What actually breaks.

Read More
Graph and Structured Memory: Most Ambitious, Most Vaporware

Graph and Structured Memory: Most Ambitious, Most Vaporware

Zep, Graphiti, mem0 graph mode, and the self-evolving knowledge base trap that exploded after Karpathy's LLM-Wiki gist. When graph memory pays off.

Read More
It's Not Memory If You Wrote It: The CLAUDE.md Confusion

It's Not Memory If You Wrote It: The CLAUDE.md Confusion

The most-starred memory system in AI coding has 87k+ stars and isn't actually memory. Why the CLAUDE.md conflation costs you wrong tool choices.

Read More
Repo-Native Memory: The Boring Answer That Wins

Repo-Native Memory: The Boring Answer That Wins

Markdown files in your repo beat MemPalace, mem0, and every dedicated memory system. Cline Memory Bank, Doug's journal, Claude Code auto-memory.

Read More
Retrieval and RAG: The Category Everyone Reaches for First

Retrieval and RAG: The Category Everyone Reaches for First

RAG is over-applied to coding. Why retrieval breaks for most coding work, and the failure modes nobody mentions: wrong retrieval, context rot.

Read More
Transcript-Derived Memory: The Category Nobody's Writing About

Transcript-Derived Memory: The Category Nobody's Writing About

Process Claude Code transcripts into durable memory. session-kit, claude-mem, claude-memory-compiler, autoDream. The under-discussed third leg.

Read More
Build In Public: The Three Amigos Problem

Build In Public: The Three Amigos Problem

My harness was too module-spec heavy and too light on product management. Now prototyping a three amigos process for better BDD specs.

Read More
Why My Harness Produced Incomplete Apps (and What I Changed)

Why My Harness Produced Incomplete Apps (and What I Changed)

I got specs wrong twice before getting them right. The journey from module specs to BDD specs to executable boundary testing for AI-generated code.

Read More
What Is a Spec? The Most Overloaded Word in Software

What Is a Spec? The Most Overloaded Word in Software

Spec means 13 different things in software. If you're doing spec-driven development with AI, most definitions are wrong. BDD specs are the one that verifies.

Read More
Opus 4.7 Migration Guide: What Breaks, What's Better, What to Watch

Opus 4.7 Migration Guide: What Breaks, What's Better, What to Watch

Claude Opus 4.7 migration guide: three breaking API changes, a stealth 35% cost increase via tokenizer, and what's actually better.

Read More
The Skill Trajectory for Working with AI Agents

The Skill Trajectory for Working with AI Agents

Five levels from prompt engineering to platform engineering. Where most developers are, where the leverage is, and how to level up.

Read More
How I Built a Local Embedding Pipeline in Elixir That Searches My Own Docs

How I Built a Local Embedding Pipeline in Elixir That Searches My Own Docs

Pull docs from compiled BEAM files, embed locally with Ortex, search with sqlite_vec, serve through MCP. No API calls. No network.

Read More
How the CodeMySpec Feedback Widget Captures Screenshots in LiveView

How the CodeMySpec Feedback Widget Captures Screenshots in LiveView

One-click screenshot capture in a LiveView feedback widget using html-to-image, colocated hooks, and presigned S3 uploads.

Read More
What Is the Client? (It's More Powerful Than You Think)

What Is the Client? (It's More Powerful Than You Think)

The client isn't just a browser. Browser apps, PWAs, mobile apps, desktop apps - what they are, what they can do, and why it matters.

Read More
What Is It and Where Does It Belong?

What Is It and Where Does It Belong?

Does the toilet belong in the bathroom or the living room? A guide to putting the right parts of your app in the right place.

Read More
What Is Code, Actually?

What Is Code, Actually?

A plain-English guide to what code is, what JavaScript, React, Supabase, and Vercel actually are, and why your AI picked them.

Read More
What Is a Server and Why Are You Paying For One?

What Is a Server and Why Are You Paying For One?

A server is just a computer. Every service your AI signs you up for is someone else's computer. Here's why you're paying $50/month and how to pay $4.

Read More
Your App Has Two Halves and You Need to Know Which Is Which

Your App Has Two Halves and You Need to Know Which Is Which

Your app runs in two places. If you don't know which is which, your data will leak. A plain-English guide for vibe coders.

Read More
ADRs Are the Best Thing You Can Give Your AI Agent

ADRs Are the Best Thing You Can Give Your AI Agent

AI agents drift when they don't know what decisions you've already made. ADRs fix that. Here's how to write them for agent consumption.

Read More
Anthropic Just Made the Agent Harness a Product

Anthropic Just Made the Agent Harness a Product

Anthropic launched Managed Agents in public beta. They're literally selling the harness now. Here's what it is, what it costs, and why it matters.

Read More
Claude Mythos: What We Know, What We Don't, and Why the Harness Still Matters

Claude Mythos: What We Know, What We Don't, and Why the Harness Still Matters

Anthropic's next model leaked before they were ready. 93.9% SWE-bench claimed. Here's what's confirmed, what's speculation, and why the harness still matters.

Read More
Cursor 3 Isn't an IDE Anymore. It's an Agent Switchboard.

Cursor 3 Isn't an IDE Anymore. It's an Agent Switchboard.

Cursor 3 demoted the IDE for an agent switchboard with 8 parallel workers, mobile control, and Pro at $20/mo. What works and what's marketing.

Read More
What Is Progressive Disclosure and Why It Matters for AI Agents

What Is Progressive Disclosure and Why It Matters for AI Agents

Progressive disclosure is a 30-year-old UX principle that solves AI agent context bloat. Practical patterns for skills, CLAUDE.md, and MCP tools.

Read More
Architecture for AI Agents: Which Patterns Actually Work?

Architecture for AI Agents: Which Patterns Actually Work?

A 26-point quality gap between AI-only code and human-guided architecture. Here's which patterns produce the best agent output, ranked.

Read More
The Implementation Phase: AI Writes the Code, But Who's Actually Driving?

The Implementation Phase: AI Writes the Code, But Who's Actually Driving?

Devs are 19% slower with AI but perceive 20% faster. Vibe coding has 2.74x more security vulns. Here's what the implementation phase actually looks like.

Read More
Maintenance: Where Agents Actually Earn Their Keep

Maintenance: Where Agents Actually Earn Their Keep

Bug fixes, dependency updates, security patches, tech debt. Maintenance is 60-80% of software cost and it's where agents deliver the most proven value.

Read More
The Verification Gap: Why Agents Ship Broken Code and What to Do About It

The Verification Gap: Why Agents Ship Broken Code and What to Do About It

96% of developers don't trust AI code but commit it anyway. The verification gap is the central problem. Here's how to close it.

Read More
Bad Requirements Are Why Your AI Agent Writes Bad Code

Bad Requirements Are Why Your AI Agent Writes Bad Code

The bottleneck moved from writing code to knowing what to build. Here's how AI is changing requirements gathering and why bad specs kill agent output.

Read More
The Agentic Software Development Process

The Agentic Software Development Process

Most teams use AI agents for one phase of development. Here's what the full lifecycle looks like across all eight phases, with data.

Read More
Your AI Agent Is Only as Good as Your Spec

Your AI Agent Is Only as Good as Your Spec

Refactoring dropped 60% and duplication rose 48% after AI adoption. The fix is spec-driven development. Here's how it works.

Read More
Testing AI-Generated Code: The Self-Confirming Loop and How to Break It

Testing AI-Generated Code: The Self-Confirming Loop and How to Break It

When the same AI writes your code and your tests, you don't have tests. You have a mirror. Here's how to break the loop.

Read More
Teaching AI Agents to Deploy: Knowledge Files vs. Direct Access

Teaching AI Agents to Deploy: Knowledge Files vs. Direct Access

AI agents can write code but can't deploy it. I close the gap with 4 markdown files instead of giving my agent cloud credentials.

Read More
How I Do Marketing with Claude Code and MCP Tools

How I Do Marketing with Claude Code and MCP Tools

Marketing isn't a content problem. It's a system problem. The Claude Code loop I built with Reddit MCP, GA4, Search Console, and 30 minutes a day.

Read More
The Five Layers of an Agentic Coding System

The Five Layers of an Agentic Coding System

Most developers treat their AI coding tool as one thing. It's five layers. Here's the framework that changes how you evaluate and build with them.

Read More
The Agent Layer: How AI Coding Tools Actually Work

The Agent Layer: How AI Coding Tools Actually Work

The agent loop is a while loop that changed software. Here's how tool use, context management, and ReAct turn a token predictor into a coding tool.

Read More
The Environment Layer: Where AI Code Actually Runs

The Environment Layer: Where AI Code Actually Runs

CLI, IDE, or cloud? Sandboxed or wide open? The environment determines what your AI coding agent can do. Here's why it matters more than you think.

Read More
The Harness Layer: Why the Wrapper Matters More Than the Model

The Harness Layer: Why the Wrapper Matters More Than the Model

OpenAI shipped 1M lines with zero manually written source. The secret wasn't the model. It was the harness - constraints, verification, lifecycle.

Read More
The Model Layer: What Your AI Coding Tool Actually Is (and Isn't)

The Model Layer: What Your AI Coding Tool Actually Is (and Isn't)

The model didn't write your code. It predicted tokens. Everything else is the harness. Here's why that matters more than benchmarks.

Read More
The Orchestration Layer: Coordinating Multiple Agents

The Orchestration Layer: Coordinating Multiple Agents

One agent hitting its ceiling? Multi-agent coordination is the next frontier. Here's what works, what doesn't, and why the demo-to-production gap is wide.

Read More
GitHub Copilot in 2026: Features, Pricing, Benchmarks, and Community Sentiment

GitHub Copilot in 2026: Features, Pricing, Benchmarks, and Community Sentiment

GitHub Copilot deep dive: $10/mo Pro tier, Coding Agent, 60M+ code reviews, Copilot Memory, and what Reddit developers actually think.

Read More
Aider in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Aider in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Aider deep dive: 50+ model support, 4.2x token efficiency vs Claude Code, best-in-class git integration, and what Reddit developers actually think.

Read More
Gemini CLI in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Gemini CLI in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Gemini CLI deep dive: 1,000 free requests/day, improving quality with 3.1 Pro, Jules async agents, and what Reddit developers actually think.

Read More
Codex CLI in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Codex CLI in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Codex CLI deep dive: open source Rust CLI, 2-3x token efficiency, 9,000+ plugins, and what Reddit devs actually think. Pricing, strengths, weaknesses.

Read More
Cursor in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Cursor in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Cursor deep dive: $2B ARR, Background Agents, MCP Apps, credit-based billing, and what Reddit devs actually think. Features, pricing, and assessment.

Read More
Writing Applications for LLMs

Writing Applications for LLMs

Your CLAUDE.md is settings. Your skills are libraries. Your hooks are middleware. Two activities, one progression.

Read More
Claude Code in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Claude Code in 2026: Features, Pricing, Benchmarks, and Community Sentiment

Claude Code deep dive: highest-rated for code quality, Agent Teams, MCP ecosystem, and what Reddit developers actually think. Pricing and weaknesses.

Read More
Open Source vs Vendor-Locked AI Coding Tools: The Tradeoffs That Matter

Open Source vs Vendor-Locked AI Coding Tools: The Tradeoffs That Matter

The most-loved tool (Claude Code) is fully closed. The most-starred (OpenCode, 117K) is fully open. Analysis of 21 tools shows when to choose which.

Read More
What Happened to Supermaven, Aide, and Void: AI Coding Tools That Didn't Make It

What Happened to Supermaven, Aide, and Void: AI Coding Tools That Didn't Make It

Supermaven was acquired. Aide is sunsetting. Void went silent. Why AI coding tools die, what patterns predict failure, and which tools are at risk today.

Read More
Building a Markdown API for LLM Collaboration

Building a Markdown API for LLM Collaboration

A web server returning navigable markdown replaces CLAUDE.md stuffing, MCP proliferation, and filesystem sync problems.

Read More
CodeMySpec Specs vs Kiro EARS: Two Approaches to Spec-Driven AI Development

CodeMySpec Specs vs Kiro EARS: Two Approaches to Spec-Driven AI Development

Amazon's Kiro uses EARS notation. CodeMySpec uses BDD. Both bet that specs before code = better AI output. We tested both approaches.

Read More
MCP: The Protocol Connecting AI Coding Tools

MCP: The Protocol Connecting AI Coding Tools

Model Context Protocol is USB for AI agents. 1,000+ servers, adopted by Anthropic, OpenAI, Google. What MCP is, who supports it, and what it enables.

Read More
Best Free and Open-Source AI Coding Tools in 2026

Best Free and Open-Source AI Coding Tools in 2026

9 free and open-source AI coding tools compared. Gemini CLI is truly free. Aider and Cline match paid tools. When is BYOK cheaper than subscriptions?

Read More
AI IDEs Compared in 2026: Cursor vs Windsurf vs Zed vs Kiro

AI IDEs Compared in 2026: Cursor vs Windsurf vs Zed vs Kiro

Cursor 3 Glass, Windsurf's price hike, Zed's 1M context BYOK, Kiro's AWS Transform. Updated April 2026 comparison of pricing, philosophy, and fit.

Read More
Claude Code Skills: Writing Apps for Agents

Claude Code Skills: Writing Apps for Agents

How to write Claude Code skills that actually work. Real examples, common mistakes, and how skills differ from prompts, MCP servers, and hooks.

Read More
The Rise of CLI Coding Agents: Why Terminal-Native AI is Having a Moment

The Rise of CLI Coding Agents: Why Terminal-Native AI is Having a Moment

Claude Code accounts for 4% of GitHub commits. Gemini CLI hit 90K stars. The terminal won the AI coding war nobody expected. Here's why.

Read More
The Best CLI Coding Agents in 2026: Claude Code vs Codex vs Gemini CLI vs Aider vs OpenCode vs Goose

The Best CLI Coding Agents in 2026: Claude Code vs Codex vs Gemini CLI vs Aider vs OpenCode vs Goose

6 CLI coding agents compared: independent testing, pricing, and community sentiment. Refreshed April 2026 with Opus 4.7, Codex $100 tier, Gemini restrictions.

Read More
The Five Levels of AI-Assisted Development

The Five Levels of AI-Assisted Development

From autocomplete to fully autonomous development. A framework for understanding where you are with AI coding tools and where the real leverage is.

Read More
Agentic QA

Agentic QA

Unit tests and BDD specs verify pieces. QA verifies the running application — story QA, journey QA, and automated issue filing by AI agents.

Read More
BDD Specs for AI-Generated Code

BDD Specs for AI-Generated Code

Unit tests verify your code works. BDD specs verify your app does what users actually want. One scenario per acceptance criterion, traced to user stories.

Read More
How CodeMySpec Built and Verified a Fuel Card App in 5 Days

How CodeMySpec Built and Verified a Fuel Card App in 5 Days

55 commits, 100K+ lines, 100+ QA issues caught in 5 active development days. How BDD specs and agentic QA verified a fuel card management platform.

Read More
The Part Nobody Talks About: Verifying AI-Generated Code

The Part Nobody Talks About: Verifying AI-Generated Code

How CodeMySpec verifies AI-generated code with a 7-stage validation pipeline, dirty tracking, BDD specs, and end-to-end QA journeys.

Read More
Remote Permission Approval: Building Trust Boundaries for Autonomous AI Agents

Remote Permission Approval: Building Trust Boundaries for Autonomous AI Agents

How we built a full-stack permission approval system for Claude Code that lets you approve tool calls from your phone with Web Push and Phoenix Channels.

Read More
My first serious coding workflow with AI

My first serious coding workflow with AI

Learn the architecture, planning, and process iteration that keeps LLMs on track.

Read More
How to write design documents that keep AI from going off the rails

How to write design documents that keep AI from going off the rails

Write one design doc per code file to prevent architectural drift and keep LLMs on track.

Read More
How to design architecture that keeps AI on track

How to design architecture that keeps AI on track

Learn to design Phoenix contexts and vertical slice architecture to keep AI-generated code consistent.

Read More
Why Phoenix Contexts Are Perfect for LLM-Based Code Generation

Why Phoenix Contexts Are Perfect for LLM-Based Code Generation

Phoenix contexts provide self-contained modules, consistent patterns, and built-in testability that make them ideal for AI-assisted development.

Read More
How to manage user stories to get the most out of LLM's

How to manage user stories to get the most out of LLM's

Practical approach to using user stories for AI code generation. Keep LLMs focused on requirements, maintain living documentation, and avoid technical debt.

Read More
Code Generation is About Control, Not Prompts

Code Generation is About Control, Not Prompts

The best way to get reliable code from an LLM is better control and enforcement through predefined workflows, validation, and test-driven development.

Read More
Design-Driven Code Generation, The Missing Layer Between Specs and Code

Design-Driven Code Generation, The Missing Layer Between Specs and Code

Design-driven development adds explicit, reviewable designs that define component architecture before implementation begins.

Read More