The AI Agent Quietly Decayed 25% After 20 Tries
The demo always works. The problem starts when you ask the same model to keep iterating on the same task. A Japanese solution architect just measured what most engineering leads have suspected for months.
Across 20 consecutive sessions of editing the same document with Claude Code, Codex, and MCP, the output degraded by an average of 25%. Information dropped out, and facts that were never there crept in. How you delegate to AI — the instructions, the checkpoints — is becoming a real corporate asset.
"Vibe coding" finally has a number attached to it
Vibe coding — vaguely waving at an AI agent and trusting the vibe — has been treated as productivity gospel for the last year. Cumulative delegation without verification accumulates noise. The 25% figure pushes a practical answer: human review every five to ten iterations, as a default, not a luxury.
AGENTS.md, SKILL.md, DESIGN.md — three instruction files in twelve months
While people debated whether AI agents should be trusted, the standards quietly shipped. AGENTS.md was co-authored by OpenAI, Google, Sourcegraph, Cursor, and Factory, then donated to the Linux Foundation in December 2025. Anthropic introduced SKILL.md alongside Claude Skills.
In April 2026, Google Labs released DESIGN.md, a specification format that hands a design system to an AI agent — complete with a CLI validator (npx @google/design.md lint). Three instruction-file standards in roughly twelve months. Spec-driven development is becoming the default for AI-era teams.
A non-developer shipped a working tool in nine days
NAVER's D2 engineering blog published a striking case study. A technical PM started with one question — "How do we measure organizational productivity?" — and shipped a working in-house measurement tool in nine days using internal documentation and an AI assistant. The internal DevOps board already exposed DORA metrics, but the team needed something more granular.
The detail that matters: the PM was not a developer. When organizational knowledge is structured for an AI to read, people who never wrote production code can ship working tools in a fortnight.
A music video for ¥2,450
The same pattern shows up in creative work. Using Claude Desktop and Higgsfield over MCP, a single person produced a two-minute original music video — ten cuts at five seconds each — in a single day. Total cost: ¥2,450, the Higgsfield Starter Plan for one month.
What B2B teams should take from this week's news
Railway raising $100M Series B as an AI-native cloud sits in the same story. Tools standardize fast, prices drop fast, and the differentiator is no longer the tool itself. It is how well your company's knowledge is packaged for an AI to consume.
See how we apply this approach in our portfolio of recent engagements.
Action items for this week
- Create your instruction files. Distill code conventions and domain knowledge into an AGENTS.md, SKILL.md, or CLAUDE.md at your repo root. Write it once; every AI tool you adopt will share it.
- Cap autonomous loops at five to ten iterations. Insert a human review step before AI keeps stacking on its own output. The 25% decay number is the cost of skipping this.
- Hand AI to your non-developers. If your internal docs are clean, non-engineers can ship working tooling in days, not quarters. Start with a small proof of concept.
Free consultation — 5years+ helps teams design instruction files and AI workflows that compound over time.
Frequently Asked Questions
Is vibe coding always a bad idea?
No. It still works well for short, one-off tasks and early prototyping. The decay shows up specifically in long-running work where the same context is reused. The fix is to make human review explicit — a fixed checkpoint, not a vibe.
How do I start with AGENTS.md?
Create an empty AGENTS.md at the root of your repo and write what you would tell a brand-new engineer in their first thirty seconds. Build commands, how to run tests, domain acronyms, patterns to avoid. That is enough to start.
Can non-developers really build internal tools?
Yes — provided your domain documentation is in order. With well-structured internal guides, an AI agent, and a clear goal, the nine-day timeline NAVER reported is not an exaggeration.