Skip to main content
PUBLISHED
Essay

The AI Agent Standards Landscape Just Crystallized

Three governance consolidations in twelve months — Google's A2A donation to the Linux Foundation, Anthropic's MCP donation to the Agentic AI Foundation, and OpenAI's AGENTS.md adopted by 60,000+ projects — have crystallized the AI agent standards landscape. The protocol layer is largely settled. The contract and identity layer is the remaining open opportunity.

Thomas Scola··13 min read
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>The AI Agent Standards Landscape Just Crystallized</title> <link href="https://fonts.googleapis.com/css2?family=Source+Serif+4:ital,wght@0,400;0,600;0,700;0,900;1,400;1,600&family=JetBrains+Mono:wght@400;500;600;700&family=Outfit:wght@400;500;600;700;800;900&display=swap" rel="stylesheet"> <style> :root { --ink: #191919; --paper: #FFFFFF; --paper-warm: #FAFAF7; --accent: #E63946; --accent-glow: rgba(230,57,70,0.08); --teal: #00B4A6; --teal-glow: rgba(0,180,166,0.08); --blue: #2B3E96; --blue-glow: rgba(43,62,150,0.06); --purple: #7B2FBE; --amber: #E8920D; --gray-50: #FAFAFA; --gray-100: #F4F4F2; --gray-200: #E8E8E4; --gray-300: #D0D0CC; --gray-500: #8A8A86; --gray-700: #555; --gray-900: #2A2A2A; --serif: 'Source Serif 4', Georgia, 'Times New Roman', serif; --mono: 'JetBrains Mono', 'SF Mono', Consolas, monospace; --sans: 'Outfit', system-ui, -apple-system, sans-serif; --measure: 680px; }

*, *::before, *::after { margin: 0; padding: 0; box-sizing: border-box; }

body { font-family: var(--serif); background: var(--paper); color: var(--ink); line-height: 1.8; font-size: 20px; -webkit-font-smoothing: antialiased; overflow-x: hidden; }

::selection { background: rgba(0,180,166,0.2); }

/* ════════ HERO ════════ */ .hero { position: relative; background: linear-gradient(160deg, #0D1117 0%, #161B22 40%, #1A2332 100%); padding: 6rem 2rem 5rem; overflow: hidden; } .hero::before { content: ''; position: absolute; inset: 0; background: radial-gradient(ellipse 600px 400px at 20% 50%, rgba(0,180,166,0.08), transparent), radial-gradient(ellipse 500px 500px at 80% 30%, rgba(43,62,150,0.1), transparent); } .hero::after { content: ''; position: absolute; inset: 0; background-image: url("data:image/svg+xml,%3Csvg width='40' height='40' viewBox='0 0 40 40' xmlns='http://www.w3.org/2000/svg'%3E%3Cpath d='M0 20h40M20 0v40' stroke='%23ffffff' stroke-opacity='.03' stroke-width='.5'/%3E%3C/svg%3E"); } .hero-inner { max-width: var(--measure); margin: 0 auto; position: relative; z-index: 1; } .hero-kicker { font-family: var(--mono); font-size: 0.65rem; letter-spacing: 0.2em; text-transform: uppercase; color: var(--teal); margin-bottom: 2rem; display: flex; align-items: center; gap: 0.75rem; } .hero-kicker::before { content: ''; width: 32px; height: 1px; background: var(--teal); } .hero h1 { font-family: var(--sans); font-size: clamp(2.2rem, 5.5vw, 3.2rem); font-weight: 900; line-height: 1.1; color: #F0F6FC; margin-bottom: 1.8rem; letter-spacing: -0.02em; } .hero h1 .hl { color: var(--teal); } .hero h1 .hl2 { color: var(--accent); } .hero-sub { font-family: var(--serif); font-size: 1.15rem; color: rgba(240,246,252,0.6); line-height: 1.7; max-width: 580px; margin-bottom: 2.5rem; } .hero-byline { display: flex; align-items: center; gap: 1rem; font-family: var(--sans); font-size: 0.8rem; color: rgba(240,246,252,0.4); } .hero-avatar { width: 44px; height: 44px; border-radius: 50%; background: linear-gradient(135deg, var(--teal), var(--blue)); display: grid; place-items: center; color: #fff; font-weight: 700; font-size: 0.75rem; flex-shrink: 0; } .hero-byline strong { color: rgba(240,246,252,0.75); font-weight: 600; }

/* ════════ ARTICLE BODY ════════ */ article { max-width: var(--measure); margin: 0 auto; padding: 3.5rem 2rem 5rem; } article > p { margin-bottom: 1.6rem; font-size: 1.05rem; } article > p:first-of-type::first-letter { font-family: var(--sans); font-size: 3.4rem; font-weight: 900; float: left; line-height: 0.85; margin: 0.05em 0.12em 0 0; color: var(--blue); }

/* Headings */ article h2 { font-family: var(--sans); font-size: 1.7rem; font-weight: 800; margin: 3.5rem 0 1rem; line-height: 1.2; letter-spacing: -0.01em; } article h3 { font-family: var(--sans); font-size: 1.15rem; font-weight: 700; margin: 2.5rem 0 0.6rem; color: var(--gray-700); } article strong { font-weight: 700; } article em { font-style: italic; } article code { font-family: var(--mono); font-size: 0.82em; background: var(--gray-100); padding: 0.15em 0.4em; border-radius: 4px; border: 1px solid var(--gray-200); } article a { color: var(--blue); text-decoration: underline; text-underline-offset: 3px; text-decoration-thickness: 1.5px; text-decoration-color: rgba(43,62,150,0.25); transition: text-decoration-color 0.2s; } article a:hover { text-decoration-color: var(--blue); }

/* ════════ HERO STATS ROW ════════ */ .stats-row { display: grid; grid-template-columns: repeat(3, 1fr); gap: 1px; background: var(--gray-200); border-radius: 14px; overflow: hidden; margin: 2.5rem 0 3rem; box-shadow: 0 1px 3px rgba(0,0,0,0.04); } .stat-cell { background: var(--paper); padding: 1.6rem 1.2rem; text-align: center; } .stat-num { font-family: var(--mono); font-size: 2.4rem; font-weight: 700; line-height: 1; margin-bottom: 0.35rem; } .stat-label { font-family: var(--sans); font-size: 0.7rem; font-weight: 500; color: var(--gray-500); text-transform: uppercase; letter-spacing: 0.06em; line-height: 1.4; } .stat-cell:nth-child(1) .stat-num { color: var(--accent); } .stat-cell:nth-child(2) .stat-num { color: var(--blue); } .stat-cell:nth-child(3) .stat-num { color: var(--teal); }

/* ════════ CALLOUT BOXES ════════ */ .callout { position: relative; padding: 1.4rem 1.5rem 1.4rem 1.8rem; margin: 2rem 0; border-radius: 0 10px 10px 0; border-left: 4px solid; font-size: 0.92rem; line-height: 1.65; } .callout p { margin-bottom: 0.5rem; } .callout p:last-child { margin-bottom: 0; } .callout-tag { font-family: var(--mono); font-size: 0.6rem; font-weight: 600; text-transform: uppercase; letter-spacing: 0.12em; margin-bottom: 0.6rem; display: flex; align-items: center; gap: 0.4rem; } .callout.danger { border-color: var(--accent); background: var(--accent-glow); } .callout.danger .callout-tag { color: var(--accent); } .callout.insight { border-color: var(--teal); background: var(--teal-glow); } .callout.insight .callout-tag { color: var(--teal); } .callout.spec { border-color: var(--blue); background: var(--blue-glow); } .callout.spec .callout-tag { color: var(--blue); } .callout.warn { border-color: var(--amber); background: rgba(232,146,13,0.06); } .callout.warn .callout-tag { color: var(--amber); }

/* ════════ PULL QUOTE ════════ */ .pull-quote { font-family: var(--sans); font-size: 1.55rem; font-weight: 700; line-height: 1.3; color: var(--blue); text-align: center; padding: 1.8rem 1rem; margin: 2.5rem 0; border-top: 3px solid var(--blue); border-bottom: 3px solid var(--blue); letter-spacing: -0.01em; }

/* ════════ FOUR WASTES GRID ════════ */ .waste-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 1px; background: var(--gray-200); border-radius: 14px; overflow: hidden; margin: 2rem 0; } .waste-cell { background: var(--paper); padding: 1.5rem; } .waste-pct { font-family: var(--mono); font-size: 2.4rem; font-weight: 700; line-height: 1; margin-bottom: 0.3rem; } .waste-title { font-family: var(--sans); font-size: 0.9rem; font-weight: 700; margin-bottom: 0.3rem; } .waste-desc { font-family: var(--sans); font-size: 0.75rem; color: var(--gray-500); line-height: 1.5; } .waste-cell:nth-child(1) .waste-pct { color: var(--accent); } .waste-cell:nth-child(2) .waste-pct { color: var(--amber); } .waste-cell:nth-child(3) .waste-pct { color: var(--purple); } .waste-cell:nth-child(4) .waste-pct { color: var(--blue); }

/* ════════ BAR CHART ════════ */ .chart-box { background: var(--gray-50); border: 1px solid var(--gray-200); border-radius: 14px; padding: 2rem; margin: 2rem 0; } .chart-title { font-family: var(--sans); font-weight: 700; font-size: 0.95rem; margin-bottom: 0.2rem; } .chart-sub { font-family: var(--sans); font-size: 0.72rem; color: var(--gray-500); margin-bottom: 1.5rem; } .bar-row { display: grid; grid-template-columns: 90px 1fr 75px; align-items: center; gap: 0.75rem; margin-bottom: 0.55rem; } .bar-label { font-family: var(--mono); font-size: 0.72rem; font-weight: 600; text-align: right; white-space: nowrap; } .bar-track { height: 30px; background: var(--gray-200); border-radius: 5px; overflow: hidden; } .bar-fill { height: 100%; border-radius: 5px; display: flex; align-items: center; padding-left: 10px; font-family: var(--mono); font-size: 0.6rem; font-weight: 600; color: #fff; transition: width 1s ease; } .bar-meta { font-family: var(--mono); font-size: 0.72rem; font-weight: 600; text-align: right; } .fill-toon { background: linear-gradient(90deg, #00B4A6, #00D4C4); } .fill-jsonc { background: linear-gradient(90deg, #5B8DEF, #3D6FD1); } .fill-yaml { background: linear-gradient(90deg, #9B59B6, #7D3C98); } .fill-json { background: linear-gradient(90deg, #E8920D, #D68910); } .fill-xml { background: linear-gradient(90deg, #E63946, #C0392B); }

/* ════════ FLOW DIAGRAM ════════ */ .flow-box { background: #0D1117; border-radius: 14px; padding: 2rem; margin: 2.5rem 0; overflow-x: auto; } .flow-title { font-family: var(--sans); font-weight: 700; font-size: 0.95rem; color: #F0F6FC; margin-bottom: 0.3rem; } .flow-sub { font-family: var(--sans); font-size: 0.7rem; color: rgba(240,246,252,0.35); margin-bottom: 1.5rem; } .flow-pipeline { display: flex; align-items: center; justify-content: center; gap: 0.3rem; flex-wrap: wrap; margin-bottom: 1.5rem; } .flow-node { padding: 0.55rem 0.8rem; border-radius: 8px; font-family: var(--mono); font-size: 0.65rem; font-weight: 500; text-align: center; min-width: 88px; line-height: 1.3; } .flow-node.agent { background: rgba(0,180,166,0.12); border: 1.5px solid rgba(0,180,166,0.5); color: var(--teal); } .flow-arrow { color: rgba(240,246,252,0.2); font-family: var(--mono); font-size: 1.1rem; } .flow-sides { display: grid; grid-template-columns: 1fr 1fr; gap: 1.5rem; } .flow-side-title { font-family: var(--mono); font-size: 0.6rem; font-weight: 600; text-transform: uppercase; letter-spacing: 0.1em; margin-bottom: 0.8rem; } .flow-side.naive .flow-side-title { color: var(--accent); } .flow-side.ossa .flow-side-title { color: var(--teal); } .flow-item { padding: 0.45rem 0.7rem; border-radius: 6px; font-family: var(--mono); font-size: 0.62rem; margin-bottom: 4px; line-height: 1.3; } .flow-item.waste { background: rgba(230,57,70,0.08); border: 1px dashed rgba(230,57,70,0.3); color: rgba(230,57,70,0.75); } .flow-item.opt { background: rgba(43,62,150,0.1); border: 1px solid rgba(43,62,150,0.25); color: rgba(120,140,230,0.9); } .flow-item .tag { color: var(--teal); font-weight: 700; } .flow-total { font-family: var(--mono); font-size: 1.4rem; font-weight: 700; text-align: center; margin-top: 0.7rem; line-height: 1; } .flow-total-sub { font-family: var(--mono); font-size: 0.65rem; text-align: center; margin-top: 0.2rem; } .flow-side.naive .flow-total { color: var(--accent); } .flow-side.naive .flow-total-sub { color: rgba(230,57,70,0.5); } .flow-side.ossa .flow-total { color: var(--teal); } .flow-side.ossa .flow-total-sub { color: rgba(0,180,166,0.5); }

/* ════════ STACK DIAGRAM ════════ */ .stack { border-radius: 14px; overflow: hidden; border: 1px solid var(--gray-200); margin: 2rem 0; } .stack-layer { padding: 1rem 1.5rem; display: flex; align-items: center; justify-content: space-between; border-bottom: 1px solid var(--gray-200); gap: 1rem; } .stack-layer:last-child { border-bottom: none; } .stack-name { font-family: var(--sans); font-weight: 700; font-size: 0.88rem; } .stack-detail { font-family: var(--mono); font-size: 0.68rem; color: var(--gray-500); text-align: right; } .stack-layer.highlight { background: linear-gradient(90deg, var(--blue-glow), var(--teal-glow)); border-left: 4px solid var(--teal); } .stack-layer.highlight .stack-name { color: var(--blue); } .stack-layer.muted { background: var(--gray-50); }

/* ════════ COMPARISON TABLE ════════ */ .data-table { width: 100%; border-collapse: separate; border-spacing: 0; border-radius: 12px; overflow: hidden; border: 1px solid var(--gray-200); margin: 2rem 0; font-size: 0.82rem; font-family: var(--sans); } .data-table thead th { background: #0D1117; color: #F0F6FC; padding: 0.85rem 1rem; text-align: left; font-weight: 600; font-size: 0.7rem; text-transform: uppercase; letter-spacing: 0.04em; } .data-table tbody td { padding: 0.7rem 1rem; border-bottom: 1px solid var(--gray-200); vertical-align: top; line-height: 1.45; } .data-table tbody tr:last-child td { border-bottom: none; } .data-table tbody tr:nth-child(even) { background: var(--gray-50); } .data-table .mono { font-family: var(--mono); font-size: 0.78rem; } .data-table .green { color: var(--teal); font-weight: 700; }

/* ════════ CODE BLOCK ════════ */ .codeblock { background: #1E1E1E; border-radius: 12px; overflow: hidden; margin: 1.5rem 0; } .codeblock-header { padding: 0.55rem 1rem; background: rgba(255,255,255,0.04); display: flex; align-items: center; gap: 6px; border-bottom: 1px solid rgba(255,255,255,0.06); } .codeblock-dot { width: 10px; height: 10px; border-radius: 50%; } .codeblock-dot.r { background: #FF5F57; } .codeblock-dot.y { background: #FFBD2E; } .codeblock-dot.g { background: #28C840; } .codeblock-lang { margin-left: auto; font-family: var(--mono); font-size: 0.6rem; color: rgba(255,255,255,0.25); text-transform: uppercase; letter-spacing: 0.05em; } .codeblock pre { padding: 1.2rem 1.2rem; overflow-x: auto; font-family: var(--mono); font-size: 0.78rem; line-height: 1.65; color: #D4D4D4; } .kw { color: #569CD6; } .str { color: #CE9178; } .cm { color: #6A9955; } .fn { color: #DCDCAA; } .num { color: #B5CEA8; } .op { color: #D4D4D4; }

/* ════════ IMAGE PLACEHOLDER ════════ */ .img-placeholder { background: var(--gray-100); border: 2px dashed var(--gray-300); border-radius: 12px; padding: 2rem; margin: 2rem 0; text-align: center; } .img-placeholder .img-label { font-family: var(--mono); font-size: 0.65rem; text-transform: uppercase; letter-spacing: 0.1em; color: var(--gray-500); margin-bottom: 0.5rem; } .img-placeholder .img-desc { font-family: var(--sans); font-size: 0.85rem; color: var(--gray-700); line-height: 1.5; max-width: 400px; margin: 0 auto; }

/* ════════ SEPARATOR ════════ */ .sep { text-align: center; margin: 3rem 0; color: var(--gray-300); font-size: 1.5rem; letter-spacing: 0.5em; }

/* ════════ CTA FOOTER ════════ */ .cta-box { background: linear-gradient(135deg, var(--blue-glow), var(--teal-glow)); border: 1px solid var(--gray-200); border-radius: 14px; padding: 2rem; margin: 3rem 0; text-align: center; } .cta-box h3 { font-family: var(--sans); font-size: 1.2rem; font-weight: 800; margin-bottom: 0.5rem; color: var(--blue); } .cta-box p { font-size: 0.88rem; color: var(--gray-700); margin-bottom: 1rem; line-height: 1.5; } .cta-links { display: flex; justify-content: center; gap: 1.5rem; flex-wrap: wrap; font-family: var(--mono); font-size: 0.78rem; } .cta-links a { color: var(--blue); font-weight: 600; text-decoration: none; border-bottom: 2px solid var(--teal); padding-bottom: 2px; transition: border-color 0.2s; } .cta-links a:hover { border-color: var(--blue); }

/* ════════ TAG FOOTER ════════ */ .tag-row { max-width: var(--measure); margin: 0 auto; padding: 0 2rem 4rem; border-top: 1px solid var(--gray-200); padding-top: 1.5rem; display: flex; flex-wrap: wrap; gap: 0.5rem; } .tag { font-family: var(--sans); font-size: 0.72rem; font-weight: 500; padding: 0.3rem 0.75rem; background: var(--gray-100); border-radius: 20px; color: var(--gray-700); }

/* ════════ RESPONSIVE ════════ */ @media (max-width: 680px) { .stats-row { grid-template-columns: 1fr; } .waste-grid { grid-template-columns: 1fr; } .flow-sides { grid-template-columns: 1fr; } .bar-row { grid-template-columns: 72px 1fr 60px; } .hero { padding: 4rem 1.5rem 3rem; } article { padding: 2.5rem 1.5rem 4rem; } .hero h1 { font-size: 1.9rem; } .stack-layer { flex-direction: column; align-items: flex-start; } .stack-detail { text-align: left; } article > p:first-of-type::first-letter { font-size: 2.8rem; } } </style>

</head> <body> <!-- ══════════════════════════ HERO ══════════════════════════ --> <header class="hero"> <div class="hero-inner"> <div class="hero-kicker">AI Agent Standards · February 2026</div> <h1>The AI Agent Standards Landscape Just <span class="hl">Crystallized</span> — Here's What <span class="hl2">Changed</span></h1> <p class="hero-sub">Protocol consolidation, the Perfect Agent folder structure, and why the contract layer is the last open territory in agentic AI.</p> <div class="hero-byline"> <div class="hero-avatar">TS</div> <div> <strong>Thomas Scola</strong><br> Founder, Bluefly.io · Creator of OSSA · 18 min read </div> </div> </div> </header> <!-- ══════════════════════════ ARTICLE ══════════════════════════ --> <article> <!-- ── Lede ── --> <p>The agentic AI ecosystem underwent a governance consolidation in late 2025 that fundamentally reshaped how agent projects should be structured. MCP, A2A, and AGENTS.md are now all under Linux Foundation governance, with 97 million monthly MCP SDK downloads and 150+ organizations backing A2A. The protocol layer is largely settled. What remains wide open — and where the "Perfect Agent" folder structure can dominate — is the <strong>contract, identity, and governance layer</strong> that sits above these protocols.</p> <p>This report maps every relevant standard, framework convention, and emerging pattern as of February 2026, identifying the specific gaps an ideal agent project directory must address. I analyzed 23 papers, production telemetry, and benchmarks across five data formats. What I found surprised me.</p> <!-- ── Key Stats ── --> <div class="stats-row"> <div class="stat-cell"> <div class="stat-num">97M+</div> <div class="stat-label">Monthly MCP SDK downloads</div> </div> <div class="stat-cell"> <div class="stat-num">150+</div> <div class="stat-label">Organizations backing A2A</div> </div> <div class="stat-cell"> <div class="stat-num">60K+</div> <div class="stat-label">Projects using AGENTS.md</div> </div> </div> <!-- ════════ SECTION 1: THE STACK ════════ --> <h2>The Protocol Stack Locked In During Q4 2025</h2> <p>Three events between June and December 2025 consolidated what had been a fragmented landscape into a clear four-layer architecture.</p> <p><strong>Google donated A2A to the Linux Foundation</strong> on June 23, 2025, with seven founding members (AWS, Cisco, Google, Microsoft, Salesforce, SAP, ServiceNow). IBM's Agent Communication Protocol merged into A2A in August, eliminating a competing standard. The spec advanced to Release Candidate v1.0 with a three-layer architecture: a canonical protobuf data model, abstract operations, and concrete bindings for JSON-RPC, gRPC, and HTTP/REST. Agent Cards gained JWS digital signatures (RFC 7515) and <code>preferredTransport</code> as a required field.</p> <p><strong>Anthropic donated MCP to the newly formed Agentic AI Foundation (AAIF)</strong> on December 9, 2025, alongside Block's goose framework and OpenAI's AGENTS.md convention. The AAIF operates as a directed fund with eight Platinum members at $350K/year — AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI. The latest MCP spec added an experimental Tasks primitive for async operations, a formal Extensions framework, OAuth Client Credentials for machine-to-machine auth, and Sampling with Tools enabling server-side agent loops.</p> <p><strong>AGENTS.md</strong>, contributed by OpenAI in August 2025, has been adopted by 60,000+ open-source projects including Codex, Cursor, Devin, GitHub Copilot, and Gemini CLI. It functions as a "README for AI agents" — project-specific markdown instructions that coding agents read before operating in a repository.</p> <div class="callout insight"> <div class="callout-tag">💡 Key insight</div> <p>The resulting stack is unambiguous. Three of the four layers are now governed by major foundations. <strong>The contract and identity layer — what an agent <em>is</em>, what it promises, and what it requires — is the only layer still up for grabs.</strong></p> </div> <!-- Stack Diagram --> <div class="stack"> <div class="stack-layer muted"> <div class="stack-name">Applications & Frameworks</div> <div class="stack-detail">LangChain · CrewAI · ADK · Cursor · K8s</div> </div> <div class="stack-layer highlight"> <div class="stack-name">⬤ &nbsp;Contract & Identity — <em>the gap</em></div> <div class="stack-detail">What an agent is, promises, requires</div> </div> <div class="stack-layer muted"> <div class="stack-name">Agent-to-Agent (A2A RC v1.0)</div> <div class="stack-detail">Peer discovery · Task delegation · Streaming</div> </div> <div class="stack-layer"> <div class="stack-name">Agent-to-Tool (MCP 2025-11-25)</div> <div class="stack-detail">Tool connectivity · Resources · Prompts</div> </div> <div class="stack-layer muted"> <div class="stack-name">Project Context (AGENTS.md)</div> <div class="stack-detail">Repository-specific agent instructions</div> </div> </div> <p>A2A and AAIF remain separate Linux Foundation projects — they have not been merged, though significant membership overlap makes eventual coordination likely.</p> <div class="sep">· · ·</div> <!-- ════════ SECTION 2: FRAMEWORK CONVENTIONS ════════ --> <h2>How Leading Frameworks Organize Agent Projects</h2> <p>No two frameworks agree on project structure, but clear patterns are emerging. The five major frameworks split into two camps: <strong>opinionated scaffolders</strong> (LangGraph, CrewAI, Google ADK) and <strong>bring-your-own</strong> (AutoGen, OpenAI Agents SDK).</p> <p><strong>CrewAI</strong> is the most declarative. Its <code>crewai create</code> command generates a structure separating agent and task definitions into YAML configs while orchestration logic lives in a Python <code>crew.py</code> decorated with <code>@CrewBase</code>. This hybrid model — declarative YAML for the <em>what</em>, imperative Python for the <em>how</em> — most closely mirrors an OSSA-style philosophy.</p> <p><strong>Google ADK</strong> enforces a strict convention: agent folder name must match the Agent object's <code>name</code> parameter, <code>__init__.py</code> must exist, and <code>agent.py</code> must define a <code>root_agent</code> variable. For A2A integration, placing an <code>agent.json</code> Agent Card alongside the agent code enables automatic discovery. This pattern — <strong>a manifest file next to the code it describes</strong> — is the emerging convention.</p> <div class="callout warn"> <div class="callout-tag">⚠️ Cross-framework gap</div> <p>Every framework uses <code>.env</code> for secrets and <code>pyproject.toml</code> for deps. Every framework is Python-first. Every framework supports <code>@tool</code> decorated functions. But <strong>no framework has a universal agent manifest</strong>. This is the gap a contract layer fills.</p> </div> <div class="sep">· · ·</div> <!-- ════════ SECTION 3: THE TOKEN PROBLEM ════════ --> <h2>Meanwhile, Your Agents Are Burning 70% of Their Token Budget on Formatting</h2> <p>While the governance landscape was consolidating, a parallel crisis was building. Per-token inference costs have decreased 280-fold since 2023. Yet enterprise AI expenditure is <em>accelerating</em> — cloud bills rose 19% in 2025, and Gartner projects 40% of enterprise agent pilots will be cancelled by 2027 due to unsustainable costs.</p> <p>The paradox resolves when you look at consumption patterns. When agents shift from single-turn generation to multi-step reasoning with tool-calling loops, token consumption scales <strong>quadratically or worse</strong> with task complexity.</p> <div class="callout danger"> <div class="callout-tag">⚡ The 99/1 split</div> <p><strong>On OpenRouter, daily usage of Claude 4 Sonnet reaches 100 billion tokens — 99% are input tokens accumulated in agent trajectories.</strong> Only 1% are newly generated output. Agents aren't expensive because they think. They're expensive because they <em>re-read</em>.</p> </div> <p>I spent three months analyzing where those tokens go. The waste falls into four systemic categories — each addressable at the specification layer.</p> <!-- Four Wastes Grid --> <div class="waste-grid"> <div class="waste-cell"> <div class="waste-pct">40–70%</div> <div class="waste-title">Serialization Overhead</div> <div class="waste-desc">Field names, braces, brackets, quotes, colons, commas — structural formatting repeated across records that conveys zero reasoning-relevant information.</div> </div> <div class="waste-cell"> <div class="waste-pct">39–60%</div> <div class="waste-title">Trajectory Accumulation</div> <div class="waste-desc">At step N, the context window contains full history of all prior steps. Stale observations, redundant file contents, and expired state compound quadratically.</div> </div> <div class="waste-cell"> <div class="waste-pct">29–50%</div> <div class="waste-title">Coordination Tax</div> <div class="waste-desc">Multi-agent systems fragment token budgets. A 4-agent pipeline gives each agent 25% of the budget. Inter-agent messages and shared context consume the rest.</div> </div> <div class="waste-cell"> <div class="waste-pct">6.5–14%</div> <div class="waste-title">Protocol Envelope Bloat</div> <div class="waste-desc">100 MCP tool definitions consume 13K–27K tokens before a single call is made. Up to 13.5% of a 200K context window — just for the menu.</div> </div> </div> <p>None of these are bugs. They're consequences of <em>specification-level decisions</em> — the choice of JSON as universal format, the absence of output projection in composition, the lack of consolidation hooks in iterative execution, and the all-at-once tool loading pattern in MCP.</p> <div class="pull-quote">Token efficiency in agent systems is not an optimization problem — it is an architectural problem.</div> <div class="sep">· · ·</div> <!-- ════════ SECTION 4: THE FORMAT TAX ════════ --> <h2>The Format Tax: 40–70% of Your Context Is Structural Noise</h2> <p>The TOON project published the first rigorous cross-format benchmark in November 2025 — 209 data-retrieval questions across four LLMs with deterministic validation. The results challenge a core assumption:</p> <div class="pull-quote">Token reduction correlates with accuracy <em>improvement</em>, not degradation.</div> <p>Compact formats achieved both the lowest token count and the highest accuracy across all four models tested.</p> <!-- Bar Chart --> <div class="chart-box"> <div class="chart-title">Token Count by Serialization Format</div> <div class="chart-sub">209 questions · 4 LLMs · GPT-5 o200k_base tokenizer · Lower is better</div>
<div class="bar-row">
  <div class="bar-label">TOON</div>
  <div class="bar-track"><div class="bar-fill fill-toon" style="width:53%">73.9% acc</div></div>
  <div class="bar-meta" style="color:var(--teal)">2,744</div>
</div>
<div class="bar-row">
  <div class="bar-label">JSON compact</div>
  <div class="bar-track"><div class="bar-fill fill-jsonc" style="width:60%">70.7% acc</div></div>
  <div class="bar-meta" style="color:#5B8DEF">3,081</div>
</div>
<div class="bar-row">
  <div class="bar-label">YAML</div>
  <div class="bar-track"><div class="bar-fill fill-yaml" style="width:72%">69.0% acc</div></div>
  <div class="bar-meta" style="color:var(--purple)">3,719</div>
</div>
<div class="bar-row">
  <div class="bar-label">JSON pretty</div>
  <div class="bar-track"><div class="bar-fill fill-json" style="width:88%">69.7% acc</div></div>
  <div class="bar-meta" style="color:var(--amber)">4,545</div>
</div>
<div class="bar-row">
  <div class="bar-label">XML</div>
  <div class="bar-track"><div class="bar-fill fill-xml" style="width:100%">67.1% acc</div></div>
  <div class="bar-meta" style="color:var(--accent)">5,167</div>
</div>
</div> <div class="callout insight"> <div class="callout-tag">📊 What this means for agents</div> <p>A typical MCP <code>tools/list</code> response with 50 tools consumes ~9,000–10,000 tokens in standard JSON. In a compact format with header-once field declaration: <strong>~4,500–5,500 tokens — a 40–50% reduction with zero information loss.</strong></p> </div> <div class="sep">· · ·</div> <!-- ════════ SECTION 5: TRAJECTORY ════════ --> <h2>The Compounding Context Curve: Why Step 10 Costs 10× Step 1</h2> <p>In multi-turn agent systems, the context window at step <em>n</em> contains the full history of all prior steps. Total cost grows quadratically. Production data from Trae Agent on SWE-bench Verified: mean accumulated input per issue reaches 1.0M tokens, with 99% being prior context.</p> <p>AgentDiet (Xiao et al., September 2025) conducted the first systematic analysis and identified three categories of trajectory waste through manual inspection of 50 agent trajectories:</p> <table class="data-table"> <thead> <tr> <th>Category</th> <th>What it is</th> <th>Example</th> </tr> </thead> <tbody> <tr> <td><strong>Useless</strong></td> <td>Irrelevant data that entered context through tool output</td> <td>Cache files in <code>find</code> output, ANSI escape codes, build noise</td> </tr> <tr> <td><strong>Redundant</strong></td> <td>Same information appearing multiple times</td> <td>Full file shown on open, then shown <em>again</em> after editing 3 lines</td> </tr> <tr> <td><strong>Expired</strong></td> <td>Information superseded by later actions</td> <td>Pre-edit file state after a subsequent edit; old test results</td> </tr> </tbody> </table> <div class="callout danger"> <div class="callout-tag">🔑 Counterintuitive finding</div> <p><strong>Reducing context <em>improved</em> agent performance.</strong> Multiple studies found that removing stale context prevented "lost in the middle" effects and eliminated expired information that caused action repetition. AgentDiet on Gemini 2.5 Pro actually <em>reduced</em> the average number of steps required to solve tasks.</p> </div> <p>The research converges: you can cut 40–60% of trajectory tokens with zero to slightly positive accuracy impact.</p> <div class="sep">· · ·</div> <!-- ════════ SECTION 6: MULTI-AGENT ════════ --> <h2>The Coordination Tax: When More Agents = Less Intelligence</h2> <p>Kim et al. (December 2025) conducted the first controlled study of agent system scaling — 180 configurations across five architectures, three LLM families, and four benchmarks. The key finding: <strong>when base model performance is already high, coordination overhead becomes a net cost.</strong></p> <p>The most striking result came from a different approach. GLM (Huan et al., November 2025) achieved a <strong>95.7% token reduction with a simultaneous 38% accuracy improvement</strong> by decomposing monolithic prompts into specialized agents with selective context sharing.</p> <div class="pull-quote">The highest-leverage optimization in multi-agent systems isn't compression — it's architectural decomposition with context projection.</div> <p>This means the contract layer — where you define what each agent sees and what it passes downstream — is where the real savings live.</p> <div class="sep">· · ·</div> <!-- ════════ SECTION 7: PIPELINE COMPARISON ════════ --> <h2>Putting It Together: A Real Pipeline</h2> <p>Consider a four-agent security pipeline: scanner → vulnerability analyzer → (critical response ‖ compliance checker) → remediation planner.</p> <!-- Flow Diagram --> <div class="flow-box"> <div class="flow-title">Four-Agent Security Pipeline: Naive vs. Contract-Optimized</div> <div class="flow-sub">Same agents, same task — different specification-level decisions</div>
<div class="flow-pipeline">
  <div class="flow-node agent">Scanner<br><span style="font-size:0.55rem;opacity:0.6">12 tools</span></div>
  <div class="flow-arrow">→</div>
  <div class="flow-node agent">Analyzer<br><span style="font-size:0.55rem;opacity:0.6">15 tools</span></div>
  <div class="flow-arrow">→</div>
  <div style="display:flex;flex-direction:column;gap:3px">
    <div class="flow-node agent" style="min-width:75px;font-size:0.6rem">Critical<br>Response</div>
    <div class="flow-node agent" style="min-width:75px;font-size:0.6rem">Compliance<br>Check</div>
  </div>
  <div class="flow-arrow">→</div>
  <div class="flow-node agent">Remediation<br><span style="font-size:0.55rem;opacity:0.6">↻ until 95%</span></div>
</div>

<div class="flow-sides">
  <div class="flow-side naive">
    <div class="flow-side-title">❌ Naive (JSON, full manifests, no projection)</div>
    <div class="flow-item waste">Manifests: 1,200 tokens</div>
    <div class="flow-item waste">Tool defs: 8,500 tokens</div>
    <div class="flow-item waste">Context transfer: 9,600 tokens</div>
    <div class="flow-item waste">Execution: 120,000 tokens</div>
    <div class="flow-item waste">Iterative refinement: 75,000 tokens</div>
    <div class="flow-total">214,300 tokens</div>
    <div class="flow-total-sub">$0.643 / run</div>
  </div>
  <div class="flow-side ossa">
    <div class="flow-side-title">✅ Contract-optimized (projection, consolidation, lazy tools)</div>
    <div class="flow-item opt">Manifests: 120 tok <span class="tag">↓ 90%</span></div>
    <div class="flow-item opt">Tool defs: 850 tok <span class="tag">↓ 90%</span></div>
    <div class="flow-item opt">Context transfer: 1,440 tok <span class="tag">↓ 85%</span></div>
    <div class="flow-item opt">Execution: 60,000 tok <span class="tag">↓ 50%</span></div>
    <div class="flow-item opt">Iterative refinement: 30,000 tok <span class="tag">↓ 60%</span></div>
    <div class="flow-total">92,410 tokens</div>
    <div class="flow-total-sub">$0.277 / run — 56.9% savings</div>
  </div>
</div>
</div> <div class="callout spec"> <div class="callout-tag">💰 At enterprise scale</div> <p>At 1,000 executions/day across 10 tenants: <strong>$10,980/month saved.</strong> And this is a single pipeline — enterprise deployments run hundreds of agent pipelines concurrently. The math gets very real, very fast.</p> </div> <div class="sep">· · ·</div> <!-- ════════ SECTION 8: IDENTITY ════════ --> <h2>Agent Identity Remains the Largest Unsolved Problem</h2> <p>No vendor-neutral, cross-platform agent identity standard exists. The landscape fragments across five competing approaches, none of which are composable with agent manifests:</p> <table class="data-table"> <thead> <tr> <th>Approach</th> <th>Backed by</th> <th>Status</th> <th>Limitation</th> </tr> </thead> <tbody> <tr> <td><strong>W3C DIDs</strong></td> <td>W3C, DIF</td> <td>v1.0 stable</td> <td>No agent-specific methods; <code>did:wba</code> still emerging</td> </tr> <tr> <td><strong>OWASP ANS</strong></td> <td>GoDaddy (production)</td> <td>v1.0</td> <td>DNS-centric; no capability binding</td> </tr> <tr> <td><strong>Microsoft Entra Agent ID</strong></td> <td>Microsoft</td> <td>Public preview</td> <td>Azure-locked; most complete but proprietary</td> </tr> <tr> <td><strong>RFC 9421 Signatures</strong></td> <td>HUMAN Security</td> <td>Production</td> <td>Transport-level only; no manifest binding</td> </tr> <tr> <td><strong>OIDC-A 1.0</strong></td> <td>Academic proposal</td> <td>Draft</td> <td>Not adopted by any identity provider</td> </tr> </tbody> </table> <div class="callout insight"> <div class="callout-tag">🔐 The contract-layer opportunity</div> <p>A proper agent manifest should include <strong>identity declarations</strong> — DID configuration, certificate references, attestation metadata — that bind identity to capabilities. No other manifest format does this. The five-section manifest (Identity, Capabilities, Autonomy, Resources, Governance) addresses this more completely than any competing format.</p> </div> <div class="sep">· · ·</div> <!-- ════════ SECTION 9: EIGHT GAPS ════════ --> <h2>Eight Gaps the "Perfect Agent" Folder Structure Must Address</h2> <p>Synthesizing all the research — protocol consolidation, framework conventions, identity standards, observability patterns, testing approaches, and Claude Code's extensibility model — eight structural gaps emerge that no current framework addresses:</p> <h3>1. Contract Manifest as Single Source of Truth</h3> <p>A single <code>.ossa.yaml</code> at project root that declares identity, capabilities, autonomy boundaries, resource constraints, and governance — then <em>generates</em> A2A Agent Cards, MCP configs, AGENTS.md, and CLAUDE.md from it.</p> <h3>2. Identity and Trust Declarations</h3> <p>No framework includes DID configuration, certificate references, or attestation metadata. The manifest needs <code>identity.did</code>, <code>identity.certificates</code>, and <code>provenance.signatures</code> sections.</p> <h3>3. Multi-Protocol Support Files</h3> <p>A compliant agent needs <code>/.well-known/agent-card.json</code> (A2A), <code>.mcp.json</code> (MCP), and <code>AGENTS.md</code> (project context) — all generated from one source.</p> <h3>4. Evaluation as First-Class Structure</h3> <p>Only Google ADK ships evaluation tooling. The "Perfect Agent" needs <code>evals/</code> with golden sets, configs, and cost-efficiency tracking aligned with CLEAR framework dimensions.</p> <h3>5. Observability Configuration</h3> <p>OTel trace schemas referencing <code>gen_ai.*</code> semantic conventions, sampling configs, and token efficiency dashboards.</p> <h3>6. Governance and Compliance Artifacts</h3> <p>No framework includes compliance documentation in project structure. Policy files, approval chains, audit trails — critical with EU AI Act enforcement in August 2026.</p> <h3>7. Skills and Capabilities Packaging</h3> <p>Claude Code's Skills pattern (YAML frontmatter + markdown + scripts) is the most mature model. Agent projects need a similar <code>skills/</code> directory.</p> <h3>8. Extension Namespacing</h3> <p>Framework-specific configs (LangChain memory, K8s pod specs, CrewAI configs) need a home that doesn't pollute the core manifest.</p> <div class="sep">· · ·</div> <!-- ════════ SECTION 10: THE STRUCTURE ════════ --> <h2>The Recommended Structure</h2> <div class="codeblock"> <div class="codeblock-header"> <span class="codeblock-dot r"></span> <span class="codeblock-dot y"></span> <span class="codeblock-dot g"></span> <span class="codeblock-lang">Perfect Agent Folder Structure</span> </div> <pre><span class="fn">my-agent/</span> ├── <span class="kw">.ossa.yaml</span> <span class="cm"># Single source of truth</span> ├── agent.py <span class="cm"># Implementation</span> ├── __init__.py <span class="cm"># ADK convention</span> ├── .env <span class="cm"># Secrets (universal)</span> ├── pyproject.toml <span class="cm"># Dependencies</span> ├── <span class="str">AGENTS.md</span> <span class="cm"># ← Generated from .ossa.yaml</span> ├── README.md │ ├── <span class="str">.well-known/</span> │ └── agent-card.json <span class="cm"># ← Generated A2A Agent Card</span> │ ├── <span class="fn">tools/</span> │ ├── __init__.py │ └── custom_tools.py <span class="cm"># @tool decorated functions</span> │ ├── <span class="fn">skills/</span> <span class="cm"># Claude Code pattern</span> │ └── skill-name/ │ ├── SKILL.md │ └── scripts/ │ ├── <span class="fn">evals/</span> <span class="cm"># First-class testing</span> │ ├── golden-sets/ │ ├── eval-config.yaml │ └── results/ │ ├── <span class="fn">identity/</span> <span class="cm"># Trust layer</span> │ ├── did.json │ └── certs/ │ ├── <span class="fn">governance/</span> <span class="cm"># Compliance</span> │ ├── compliance.yaml │ └── policies/ <span class="cm"># OPA/Cedar</span> │ ├── <span class="fn">observability/</span> <span class="cm"># OTel traces</span> │ ├── traces.yaml │ └── dashboards/ │ ├── <span class="fn">extensions/</span> <span class="cm"># Framework-specific</span> │ ├── langchain/ │ ├── crewai/ │ ├── kagent/ <span class="cm"># K8s deployment</span> │ └── mcp/ │ └── .mcp.json │ ├── <span class="str">.claude/</span> │ ├── CLAUDE.md <span class="cm"># ← Generated from .ossa.yaml</span> │ └── skills/ │ └── <span class="fn">tests/</span> ├── unit/ └── integration/</pre> </div> <div class="callout insight"> <div class="callout-tag">🎯 The generate-down pattern</div> <p>The critical insight: <strong>one manifest generates all protocol files.</strong> <code>.ossa.yaml</code> → A2A Agent Card, MCP config, AGENTS.md, CLAUDE.md, K8s manifests. Write once, deploy everywhere. This is what "OpenAPI for agents" actually means.</p> </div> <div class="sep">· · ·</div> <!-- ════════ SECTION 11: SEVEN PRIMITIVES ════════ --> <h2>Seven Specification Primitives for Token-Aware Agents</h2> <p>Based on the empirical evidence, these are the specification-level mechanisms that address each waste category:</p> <h3>1. Multi-Profile Manifest Serialization</h3> <p>Three profiles — <code>full</code> (200–400 tokens, docs), <code>compact</code> (60–120 tokens, runtime), <code>fingerprint</code> (15–30 tokens, routing). An orchestrator evaluating 100 agents: ~2,000 tokens instead of ~30,000.</p> <h3>2. Output Projection in Composition</h3> <div class="codeblock"> <div class="codeblock-header"> <span class="codeblock-dot r"></span><span class="codeblock-dot y"></span><span class="codeblock-dot g"></span> <span class="codeblock-lang">ossa composition</span> </div> <pre><span class="cm"># Only severity + affected_files flow downstream</span> <span class="cm"># Not the full 6,400-token scan output</span> AgentA <span class="op">>></span> <span class="fn">project</span>([<span class="str">"severity"</span>, <span class="str">"affected_files"</span>]) <span class="op">>></span> AgentB</pre> </div> <p>Based on GLM's finding that selective context sharing achieves <strong>95.7% token reduction with +38% accuracy</strong>. This is the highest-leverage primitive.</p> <h3>3. Consolidation Strategy in Iterative Loops</h3> <p>The iterative operator (↻) specifies per-field retention: <code>retain</code>, <code>summarize</code>, <code>drop</code>, or <code>accumulate</code>. Based on Focus Agent's 57% savings.</p> <h3>4. Observation Schema Typing</h3> <p>Tool outputs declare typed schemas. File-editing tools return diffs instead of full content. Addresses the dominant source of trajectory waste identified by AgentDiet.</p> <h3>5. Capability Fingerprinting</h3> <div class="codeblock"> <div class="codeblock-header"> <span class="codeblock-dot r"></span><span class="codeblock-dot y"></span><span class="codeblock-dot g"></span> <span class="codeblock-lang">fingerprint</span> </div> <pre><span class="fn">fingerprint</span>(capability) = <span class="fn">hash</span>( input_type + <span class="str">"→"</span> + output_type + <span class="str">":"</span> + composition_flags ) <span class="cm">// Route on 15 tokens instead of 250</span></pre> </div> <h3>6. Token Budget Propagation</h3> <p>Composition operators accept a <code>token_budget</code> that's divided across stages. Based on CoRL's finding that budget-controlled systems <strong>surpass unconstrained ones in accuracy</strong>.</p> <h3>7. Parallel Deduplication</h3> <p>When agents A and B receive the same upstream output in parallel, the spec defines a shared-context mechanism rather than duplicating the payload.</p> <div class="sep">· · ·</div> <!-- ════════ SECTION 12: BOTTOM LINE ════════ --> <h2>The Bottom Line</h2> <div class="pull-quote">No existing agent standard — MCP, A2A, ECMA NLIP, Oracle Agent Spec — addresses token efficiency at the specification level.</div> <p>This isn't a niche concern. Gartner says 40% of enterprise agent pilots will fail on cost. The research shows 57–70% of those costs are reducible. The gap between "agents that technically work" and "agents that are economically viable at scale" is a specification gap.</p> <p>The protocol layer is settled. The contract layer is open. And the folder structure is how developers will first encounter it.</p> <p>This structure bridges every protocol in the stack, supports every major framework through extensions, includes the identity and governance layers that no other format provides, and maintains compatibility with the conventions that 97M+ MCP users and 150+ A2A organizations already expect.</p> <p><strong>The contract layer is the only piece the ecosystem is still missing.</strong></p> <!-- CTA --> <div class="cta-box"> <h3>Read the Full Spec. Try the CLI. Join the Conversation.</h3> <p>OSSA is open-source (Apache 2.0), research-backed, and actively seeking early adopters and contributors.</p> <div class="cta-links"> <a href="https://openstandardagents.org">openstandardagents.org</a> <a href="https://gitlab.com/blueflyio/ossa">GitLab</a> <a href="https://openstandardagents.org/spec">Read the Spec</a> </div> </div> <p style="font-size:0.82rem;color:var(--gray-500);font-style:italic;margin-top:2rem">This article is adapted from OSSA Technical Report TR-2026–002 and the "Perfect Agent Folder Structure" research document. Full references, methodology details, and benchmark data are available at openstandardagents.org/research.</p> </article> <!-- Tags --> <div class="tag-row"> <span class="tag">AI Agents</span> <span class="tag">MCP</span> <span class="tag">A2A Protocol</span> <span class="tag">Token Efficiency</span> <span class="tag">Open Standards</span> <span class="tag">OSSA</span> <span class="tag">Multi-Agent Systems</span> <span class="tag">LLM Optimization</span> <span class="tag">Developer Tools</span> <span class="tag">Agentic AI</span> </div> </body> </html>
OSSAMCPA2AAI StandardsAgent InteroperabilityAGENTS.mdToken Efficiency