Skip to main content
PUBLISHED
Whitepaper

Federated Agent Registries at Scale: OCI Distribution, Mesh Topology, and Global Agent Discovery

The agent ecosystem's most consequential gap is not another protocol. It is a universal, decentralized registry. The Model Context Protocol (MCP) solved the agent-to-tool integration problem with remarkable velocity: over 10,000 registered servers and 97 million SDK downloads ...

BlueFly.io / OSSA Research Team··40 min read

Federated Agent Registries at Scale: OCI Distribution, Mesh Topology, and Global Agent Discovery

Whitepaper v1.0 | February 2026 Authors: BlueFly.io Agent Platform Team Status: Draft for Community Review


Abstract

The agent ecosystem's most consequential gap is not another protocol. It is a universal, decentralized registry. The Model Context Protocol (MCP) solved the agent-to-tool integration problem with remarkable velocity: over 10,000 registered servers and 97 million SDK downloads in under eighteen months. Google's Agent-to-Agent protocol (A2A) addressed peer coordination across more than 100 contributing organizations. Neither protocol provides federated discovery, distribution, trust verification, or deployment orchestration across organizational boundaries. The result is an ecosystem where agents are easy to build, increasingly easy to connect, but nearly impossible to find, verify, and deploy at enterprise scale without vendor lock-in.

This paper proposes a federated registry architecture built on three pillars: OCI-compatible artifact distribution for agent packaging, a mesh topology of registry pods for decentralized discovery, and a layered trust model anchored in DNS and SPIFFE identity. We draw design principles from the systems that already federate billions of artifacts globally (Docker Hub, npm, Maven Central, Artifact Hub) and extend them with agent-specific semantics: capability advertisement, context budget negotiation, and runtime constraint declaration. The architecture is specified through the Open Standard for Sustainable Agents (OSSA) manifest format and implemented as a set of composable Kubernetes-native components.

We present formal consistency bounds for gossip-based mesh synchronization, resource planning formulas for production deployment, and a reference implementation timeline spanning twenty-four weeks. The goal is not to replace existing registries but to federate them: a Universal Agent Registry (UAR) that serves as an index-of-indexes, allowing any organization to publish, discover, and deploy agents across trust boundaries without surrendering control to a single vendor.


1. The Registry Problem Space

1.1 The Fragmented Landscape

The current agent registry ecosystem is characterized by isolated silos, proprietary formats, and incomplete solutions. Every major platform vendor has built or is building an agent registry, but none interoperate. The following table captures the state as of early 2026:

Table 1: Agent Registry Landscape (February 2026)
+------------------+----------+----------+----------------+---------------+-------------+
| Platform         | Open     | Federat- | Discovery      | Trust         | OCI         |
|                  | Standard | able     | Model          | Model         | Compatible  |
+------------------+----------+----------+----------------+---------------+-------------+
| MS Entra Agents  | No       | No       | Entra ID only  | Azure AD      | No          |
| MCP Registry     | Yes      | Partial  | Sub-registries | Per-server    | No          |
| A2A Agent Cards  | Yes      | Yes      | Decentralized  | Agent Cards   | No          |
| AGNTCY ACP       | Yes      | Yes      | DHT + OCI      | Certificate   | Yes         |
| NANDA Directory  | Yes      | Yes      | Quilt/CRDT     | Capability    | No          |
| OpenAI Plugins   | No       | No       | Centralized    | OpenAI review | No          |
| Google Vertex AI | No       | No       | GCP project    | IAM           | Partial     |
| OSSA Registry    | Yes      | Planned  | Mesh pods      | SPIFFE/DNS    | Yes         |
| AWS Bedrock      | No       | No       | AWS account    | IAM           | No          |
| Hugging Face     | Partial  | No       | Search/tags    | Community     | No          |
+------------------+----------+----------+----------------+---------------+-------------+

Three patterns emerge from this landscape. First, proprietary platforms (Microsoft, OpenAI, Google, AWS) offer no federation and deliberately create switching costs. Second, open protocols (MCP, A2A) define communication but leave discovery to implementors. Third, emerging open standards (AGNTCY, NANDA, OSSA) recognize federation as a first-class concern but have not yet converged on a shared artifact format or discovery mesh.

1.2 Why Containers Won Before Agents

The container ecosystem offers the most instructive precedent. In 2013, Docker introduced a container image format and a centralized registry (Docker Hub). By 2015, the Open Container Initiative (OCI) standardized the image format and distribution specification. By 2017, every major cloud provider operated an OCI-compatible registry. By 2020, OCI artifacts had expanded beyond container images to include Helm charts, WASM modules, and policy bundles.

Figure 1: Container Registry Evolution Timeline

2013        2015        2017        2019        2021        2023        2025
  |           |           |           |           |           |           |
  Docker      OCI         Cloud       OCI         ORAS        Artifact    Agent
  Hub         Image       Registries  Artifacts   v1.0        Hub         Registries
  Launch      Spec        (ECR,GCR,   Expand      GA          Federated   ?
              Formed      ACR)                                Discovery

  Container images: ~18 months from proprietary to open standard
  Agent artifacts:  ~12 months in, still fragmented (2024-2026)

The key lesson is that standardization happened because Docker Hub succeeded first. A single dominant registry created enough gravity to force standardization. The agent ecosystem does not have a Docker Hub equivalent. Instead, it has ten incompatible registries, each with fewer than 50,000 entries. This is both a problem (fragmentation) and an opportunity (no incumbent to displace).

1.3 The Window Before Vendor Lock-In

The window for establishing an open, federated agent registry standard is closing. Microsoft's Entra agent identity system, announced in late 2025, bundles agent discovery with Azure Active Directory. Organizations that adopt Entra for agent identity will find it increasingly difficult to discover or trust agents outside the Microsoft ecosystem. Google's Vertex AI Agent Builder follows the same pattern: agents are GCP resources, discoverable only within GCP projects.

The risk is not hypothetical. It mirrors the 2015-2018 period when Kubernetes won the container orchestration battle precisely because it was the open alternative to proprietary systems (Docker Swarm, Amazon ECS, Azure Service Fabric). The agent ecosystem needs its Kubernetes moment: an open, vendor-neutral federation layer that preserves organizational autonomy while enabling global discovery.

We estimate that by mid-2027, more than 60% of enterprise agents will be registered in exactly one vendor's proprietary registry. Once that threshold is crossed, the network effects of proprietary registries become self-reinforcing, and the open federation window closes. The time to act is now.


2. Design Principles from Global Infrastructure

2.1 Content-Addressable Storage

Every successful artifact registry is built on content-addressable storage. The principle is simple: the identity of an artifact is derived from its content, not from its location or name.

Formula 1: Content-Addressable Identity

H(artifact) = sha256(content)

Where:
  - H is the digest function
  - artifact is the complete, serialized agent package
  - content is the byte-stream of the artifact
  - The resulting digest is immutable and globally unique

This property provides three guarantees that are essential for federated registries. First, integrity: any tampering with the artifact changes the digest, making corruption detectable without contacting the original publisher. Second, deduplication: identical artifacts across different registries share the same digest, enabling efficient caching and replication. Third, verifiability: a consumer can verify that the artifact they received matches the digest advertised in the registry, regardless of which mirror served it.

For agent artifacts, content-addressable storage means that an OSSA manifest, its associated code bundles, and its capability declarations are hashed together into a single digest. This digest becomes the canonical identifier for that agent version across all registries in the federation.

Figure 2: Content-Addressable Agent Artifact

+----------------------------------+
| Agent Artifact                   |
|                                  |
| +------------------------------+|
| | Manifest (OSSA v1)           ||
| | sha256: a1b2c3...            ||
| +------------------------------+|
| | Code Bundle (WASM/Container) ||
| | sha256: d4e5f6...            ||
| +------------------------------+|
| | Capability Schema            ||
| | sha256: 789abc...            ||
| +------------------------------+|
| | Context Spec                 ||
| | sha256: def012...            ||
| +------------------------------+|
|                                  |
| Artifact Digest:                 |
| sha256(manifest + code +         |
|        capabilities + context)   |
| = sha256:fedcba987654...         |
+----------------------------------+

2.2 Proxy/Cache Federation

The npm ecosystem, with over 3 million packages and 200 billion monthly downloads, operates through a federation of proxies and caches. The canonical registry (registry.npmjs.org) is the source of truth, but most downloads are served by CDN caches, corporate proxies (Verdaccio, Nexus, Artifactory), and regional mirrors.

This pattern maps directly to agent registries. An organization publishes an agent to its own registry pod. The pod replicates metadata to the federation mesh. When another organization discovers the agent, their local pod caches the artifact on first pull. Subsequent requests are served from the local cache without contacting the publisher's pod.

Figure 3: Proxy/Cache Federation Pattern

Publisher Org          Federation Mesh          Consumer Org
+-------------+       +---------------+       +-------------+
|             |       |               |       |             |
| Registry    |------>| Metadata      |<------| Registry    |
| Pod A       | push  | Gossip Layer  | pull  | Pod B       |
|             |       |               |       |             |
| [Agent v1]  |       | Index Entry:  |       | [Cache]     |
| [Agent v2]  |       | agent@v2      |       | [Agent v2]  |
|             |       | digest:abc... |       | (on demand) |
+-------------+       | pod-a.org     |       +-------------+
                      +---------------+

The critical design decision is what to replicate eagerly versus lazily. Metadata (agent name, version, capabilities, digest, publisher identity) should be replicated eagerly through the gossip mesh. Artifacts (code bundles, large context files) should be pulled lazily on first access. This mirrors the Docker registry model where manifests are small and replicated widely, but image layers are pulled only when needed.

2.3 Discovery Aggregation: The Artifact Hub Model

Artifact Hub, the CNCF project for discovering Kubernetes packages, demonstrates a successful federation model for heterogeneous artifacts. It does not host artifacts directly. Instead, it aggregates metadata from hundreds of independent repositories (Helm chart repos, OPA policy repos, Falco rule repos) into a searchable index.

For agent registries, we propose a similar pattern: the Universal Agent Registry (UAR) is an index-of-indexes. It does not store agent artifacts. It stores pointers to registry pods that do. Each pointer includes the agent's digest, capabilities, publisher identity, and the pod URL where the artifact can be retrieved.

2.4 DNS as Trust Anchor

DNS is the internet's oldest and most widely deployed trust infrastructure. The SPF/DKIM/DMARC email authentication system demonstrates how DNS records can establish organizational identity and prevent spoofing without a central authority.

We propose an analogous system for agent registries:

Table 2: DNS Trust Anchors for Agent Registries

+-------------------+---------------------------+-------------------------------+
| DNS Record        | Email Analogy             | Agent Registry Purpose        |
+-------------------+---------------------------+-------------------------------+
| _ossa-registry    | SPF (authorized senders)  | Declares authorized registry  |
|   TXT record      |                           | pods for this domain          |
+-------------------+---------------------------+-------------------------------+
| _ossa-key         | DKIM (signing key)        | Public key for verifying      |
|   TXT record      |                           | agent signatures              |
+-------------------+---------------------------+-------------------------------+
| _ossa-policy      | DMARC (policy)            | Federation policy (open,      |
|   TXT record      |                           | restricted, closed)           |
+-------------------+---------------------------+-------------------------------+

Example DNS records:

_ossa-registry.acme.com   TXT "v=ossa1; pod=registry.acme.com:5000;
                               fallback=hub.ossa.ai"
_ossa-key.acme.com        TXT "v=ossa1; k=ed25519;
                               p=MCowBQYDK2VwAyEA..."
_ossa-policy.acme.com     TXT "v=ossa1; federation=open;
                               require-sig=true; min-trust=standard"

This approach has three advantages over certificate-based trust. First, it is zero-cost: DNS records are included in every domain registration. Second, it is self-service: organizations can update their registry configuration without involving a certificate authority. Third, it is composable: existing DNS infrastructure (DNSSEC, DNS-over-HTTPS) provides additional security layers without agent-specific tooling.

2.5 Layered Security Model

Not all agents require the same level of trust verification. A team's internal utility agent does not need the same vetting as a publicly deployed financial services agent. We propose a three-tier trust model:

Table 3: Layered Trust Tiers

+-----------+-------------------+------------------+-------------------------+
| Tier      | Verification      | Use Case         | Requirements            |
+-----------+-------------------+------------------+-------------------------+
| Basic     | DNS TXT record    | Internal/team    | - Valid DNS record      |
|           | + self-signed     | agents           | - Self-signed manifest  |
|           |                   |                  | - Digest verification   |
+-----------+-------------------+------------------+-------------------------+
| Standard  | DNS + org-signed  | Cross-org        | - DNS + DKIM analogue   |
|           | certificate       | collaboration    | - Org CA signature      |
|           |                   |                  | - Capability schema     |
|           |                   |                  | - SBOM attached         |
+-----------+-------------------+------------------+-------------------------+
| Verified  | DNS + third-party | Public/regulated | - Third-party audit     |
|           | audit + notary    | deployment       | - Sigstore/Notary v2    |
|           |                   |                  | - SLSA provenance       |
|           |                   |                  | - Compliance attestation|
+-----------+-------------------+------------------+-------------------------+

Each tier is additive: Standard includes all Basic requirements plus additional verification. Verified includes all Standard requirements plus third-party attestation. Registry pods can set a minimum trust tier for ingesting external agents, creating a natural boundary between internal experimentation and production deployment.

2.6 Data Flow: Publish, Discover, Deploy

The complete data flow for a federated agent lifecycle involves six phases:

Figure 4: End-to-End Data Flow

PUBLISH                    FEDERATE                   DISCOVER
+--------+    +-------+    +--------+    +-------+    +--------+
|Developer| -> |Local  | -> |Gossip  | -> |UAR    | -> |Consumer|
|builds   |    |Pod    |    |Mesh    |    |Index  |    |searches|
|agent    |    |stores |    |spreads |    |aggre- |    |for     |
|         |    |artifact|   |metadata|    |gates  |    |agent   |
+--------+    +-------+    +--------+    +-------+    +--------+
                                                          |
DEPLOY                     VERIFY                      RESOLVE
+--------+    +-------+    +--------+    +-------+    +---v----+
|Runtime  | <- |Local  | <- |Trust   | <- |Pull   | <- |Pod     |
|starts   |    |Cache  |    |Chain   |    |Artifact|   |located |
|agent    |    |stores |    |verified|    |from   |    |via UAR |
|         |    |copy   |    |        |    |source |    |        |
+--------+    +-------+    +--------+    +-------+    +--------+

3. The Perfect Agent Artifact

3.1 OSSA Manifest Schema

The Open Standard for Sustainable Agents (OSSA) defines a declarative manifest format that answers four questions every registry consumer needs answered before deploying an agent:

  1. Who published this agent? (Identity and provenance)
  2. What can it do? (Capabilities and interfaces)
  3. What does it need? (Dependencies and resources)
  4. What is it allowed to access? (Permissions and constraints)

The manifest schema follows Kubernetes resource conventions: a versioned apiVersion, a kind declaration, and metadata / spec sections that separate identity from behavior.

# OSSA Agent Manifest v1 apiVersion: ossa.ai/v1 kind: Agent metadata: name: financial-analyst namespace: acme-corp version: 2.4.1 digest: sha256:a1b2c3d4e5f6789... labels: domain: finance tier: verified compliance: sox-compliant annotations: ossa.ai/publisher: acme-corp ossa.ai/signed-by: _ossa-key.acme.com ossa.ai/slsa-provenance: https://rekor.sigstore.dev/entry/... ossa.ai/license: Apache-2.0 spec: # Question 1: Who published this? publisher: organization: Acme Corporation domain: acme.com contact: agents@acme.com verified: true dns-record: _ossa-registry.acme.com # Question 2: What can it do? capabilities: protocols: - type: a2a version: "1.0" endpoint: /a2a - type: mcp version: "2025-03-26" endpoint: /mcp skills: - name: financial-analysis description: "Analyzes financial statements and generates reports" input-schema: type: object properties: ticker: { type: string, pattern: "^[A-Z]{1,5}$" } period: { type: string, enum: [quarterly, annual] } output-schema: type: object properties: report: { type: string, format: markdown } confidence: { type: number, minimum: 0, maximum: 1 } - name: risk-assessment description: "Evaluates portfolio risk metrics" input-schema: type: object properties: portfolio-id: { type: string, format: uuid } agent-card: name: "Financial Analyst Agent" description: "Enterprise financial analysis with SOX compliance" url: "https://agents.acme.com/financial-analyst" provider: organization: "Acme Corporation" url: "https://acme.com" # Question 3: What does it need? requirements: runtime: type: container image: registry.acme.com/agents/financial-analyst:2.4.1 platform: os: linux arch: [amd64, arm64] resources: cpu: "500m" memory: "512Mi" gpu: false storage: "1Gi" dependencies: - name: market-data-provider type: mcp-server version: ">=1.2.0" required: true - name: compliance-checker type: ossa-agent version: ">=3.0.0" required: true - name: charting-engine type: ossa-agent version: ">=1.0.0" required: false context: max-input-tokens: 128000 max-output-tokens: 16384 required-context-bytes: 4096 sliding-window-bytes: 32768 retrieval-augmented: true # Question 4: What is it allowed to access? permissions: network: outbound: - host: "*.acme.com" ports: [443] - host: "api.marketdata.com" ports: [443] inbound: - port: 8080 protocol: https filesystem: read: ["/data/models/", "/config/"] write: ["/tmp/", "/output/"] deny: ["/etc/", "/var/"] secrets: required: - name: MARKET_DATA_API_KEY description: "API key for market data provider" - name: ACME_INTERNAL_TOKEN description: "Internal service authentication" data-classification: handles: [public, internal, confidential] never-handles: [restricted, top-secret] compliance: frameworks: [SOX, SOC2] audit-log: required data-retention: 7-years

3.2 Canonical Folder Structure

An OSSA agent artifact follows a predictable directory structure that tools can rely on:

Figure 5: OSSA Agent Artifact Structure

agent-financial-analyst/
+-- manifest.yaml              # OSSA manifest (required)
+-- README.md                  # Human-readable description
+-- LICENSE                    # License file
+-- SBOM.json                  # Software Bill of Materials (SPDX)
+-- provenance.json            # SLSA provenance attestation
+-- signatures/
|   +-- manifest.sig           # Detached signature for manifest
|   +-- bundle.sig             # Detached signature for code bundle
+-- capabilities/
|   +-- openapi.yaml           # OpenAPI 3.1 spec for HTTP endpoints
|   +-- a2a-card.json          # A2A Agent Card
|   +-- mcp-schema.json        # MCP tool declarations
|   +-- skills.yaml            # Detailed skill descriptions
+-- context/
|   +-- system-prompt.md       # System prompt template
|   +-- few-shot-examples.yaml # Few-shot examples for capabilities
|   +-- retrieval-index.json   # RAG index metadata
+-- runtime/
|   +-- Dockerfile             # Container build (if containerized)
|   +-- helm/                  # Helm chart (if K8s-deployed)
|   |   +-- Chart.yaml
|   |   +-- values.yaml
|   |   +-- templates/
|   +-- compose.yaml           # Docker Compose (if compose-deployed)
+-- tests/
|   +-- capability-tests.yaml  # Capability conformance tests
|   +-- integration-tests.yaml # Integration test definitions
+-- .ossa/
    +-- config.yaml            # Build and publish configuration
    +-- hooks/                 # Lifecycle hooks (pre-publish, etc.)

3.3 Context Schema and Token Budget Mathematics

One of the most under-specified aspects of agent deployment is context management. An agent designed for a 128K-token context window behaves differently when deployed with only 8K tokens available. The OSSA manifest includes a context specification that enables registries to match agents with compatible runtimes.

Formula 2: Optimal Context Budget

OptimalBudget = min(MaxInput, Required + Sliding + Retrieval)

Where:
  MaxInput    = Maximum input tokens supported by the target LLM
  Required    = Fixed context (system prompt + few-shot examples)
  Sliding     = Conversation history window
  Retrieval   = RAG-retrieved context per turn

Constraints:
  Required    <= 0.25 * MaxInput   (system prompt should not dominate)
  Sliding     <= 0.50 * MaxInput   (history should not starve retrieval)
  Retrieval   <= 0.25 * MaxInput   (RAG should supplement, not replace)
  OutputReserve = MaxInput * 0.10  (always reserve for output generation)

Example for financial-analyst agent:
  MaxInput    = 128,000 tokens
  Required    = 4,096 tokens  (system prompt + examples)
  Sliding     = 32,768 tokens (conversation history)
  Retrieval   = 16,384 tokens (financial data context)
  OutputReserve = 12,800 tokens

  OptimalBudget = min(128000, 4096 + 32768 + 16384 + 12800)
               = min(128000, 66048)
               = 66,048 tokens

  Utilization = 66048 / 128000 = 51.6% (healthy range: 40-70%)

Context budget information in the manifest enables three registry functions. First, compatibility filtering: a registry can exclude agents whose context requirements exceed the available runtime. Second, resource planning: operators can estimate LLM costs before deploying an agent. Third, degradation strategy: the manifest can specify which context components to shrink first when budget is constrained (typically sliding window first, then retrieval, never system prompt).


4. Registry Pods as Mesh Nodes

4.1 The Pod Model

A registry pod is a self-contained unit that performs four functions: store artifacts, serve discovery queries, participate in mesh gossip, and replicate data to peers. Each pod is independently deployable and operates autonomously even when disconnected from the mesh.

Figure 6: Registry Pod Architecture

+------------------------------------------------------------------+
|                        Registry Pod                               |
|                                                                   |
|  +-------------------+  +-------------------+  +---------------+ |
|  | Artifact Store    |  | Discovery Engine  |  | Mesh Agent    | |
|  |                   |  |                   |  |               | |
|  | - OCI blobs       |  | - Full-text search|  | - Gossip      | |
|  | - Manifests       |  | - Capability match|  | - Heartbeat   | |
|  | - Signatures      |  | - Semantic search |  | - Sync        | |
|  | - Content-addr    |  | - Tag/label index |  | - Peer mgmt   | |
|  | - Garbage collect |  | - Version resolve |  | - Conflict    | |
|  |                   |  |                   |  |   resolution  | |
|  +-------------------+  +-------------------+  +---------------+ |
|                                                                   |
|  +-------------------+  +-------------------+  +---------------+ |
|  | Trust Engine      |  | API Gateway       |  | Replication   | |
|  |                   |  |                   |  | Manager       | |
|  | - Signature verify|  | - REST API        |  |               | |
|  | - DNS validation  |  | - GraphQL API     |  | - Pull-based  | |
|  | - SPIFFE identity |  | - OCI Distribution|  | - Push-based  | |
|  | - Trust scoring   |  |   API v2          |  | - Selective   | |
|  | - Policy enforce  |  | - Rate limiting   |  | - Conflict    | |
|  |                   |  |                   |  |   merge       | |
|  +-------------------+  +-------------------+  +---------------+ |
|                                                                   |
|  +-------------------------------------------------------------+ |
|  | Storage Layer                                                | |
|  | PostgreSQL (metadata) + S3-compatible (blobs) + Redis (cache)| |
|  +-------------------------------------------------------------+ |
+------------------------------------------------------------------+

4.2 Pod Topologies

Different organizational contexts require different pod deployment patterns. We define four canonical topologies:

Table 4: Registry Pod Topologies

+------------+------------------+------------------+-----------------+-------------+
| Topology   | Scale            | Replication      | Use Case        | Consistency |
+------------+------------------+------------------+-----------------+-------------+
| Edge Pod   | 1-10 agents      | Pull from parent | Developer       | Eventual    |
|            | Single node      | No outbound push | workstation,    | (minutes)   |
|            |                  |                  | CI/CD pipeline  |             |
+------------+------------------+------------------+-----------------+-------------+
| Team Pod   | 10-500 agents    | Bi-directional   | Department or   | Eventual    |
|            | 3-node cluster   | with regional    | team registry   | (seconds)   |
|            |                  |                  |                 |             |
+------------+------------------+------------------+-----------------+-------------+
| Regional   | 500-50,000       | Full mesh with   | Enterprise or   | Strong      |
| Pod        | agents           | regional peers   | cloud region    | eventual    |
|            | 5+ node cluster  | Selective global | registry        | (sub-second)|
+------------+------------------+------------------+-----------------+-------------+
| Ephemeral  | 1-100 agents     | Snapshot-based   | Testing, demos, | None        |
| Pod        | Single container | No persistence   | development     | (stateless) |
|            |                  | beyond session   | sandboxes       |             |
+------------+------------------+------------------+-----------------+-------------+

4.3 Consistency Model

Federated registries operate under the CAP theorem constraints. We choose availability and partition tolerance (AP) with bounded eventual consistency, formalized as follows:

Formula 3: Gossip Mesh Convergence Bound

T_converge <= t_gossip + log2(n) * d_network

Where:
  T_converge  = Time for all pods to learn about a new/updated agent
  t_gossip    = Gossip interval (default: 5 seconds)
  n           = Number of pods in the mesh
  d_network   = Average network latency between pods

Example:
  t_gossip    = 5 seconds
  n           = 64 pods (global enterprise)
  d_network   = 100ms (cross-region)

  T_converge  <= 5 + log2(64) * 0.1
              <= 5 + 6 * 0.1
              <= 5.6 seconds

For 1024 pods (large federation):
  T_converge  <= 5 + log2(1024) * 0.1
              <= 5 + 10 * 0.1
              <= 6.0 seconds

The logarithmic scaling is the key insight: doubling the number of pods in the federation adds only one gossip round to convergence time. A federation of 1024 pods converges in approximately 6 seconds, which is acceptable for agent discovery (where queries can tolerate seconds of staleness) but insufficient for real-time coordination (which should use direct A2A communication instead).

Conflict resolution follows a last-writer-wins (LWW) strategy for metadata updates, with digest-based deduplication preventing artifact conflicts. Since digests are content-derived, two pods that independently receive the same artifact version will compute the same digest, making the "conflict" a no-op.

4.4 Universal Agent Registry as Index-of-Indexes

The Universal Agent Registry (UAR) is not a registry in the traditional sense. It is an aggregation layer that indexes metadata from participating pods without storing artifacts. This design has three advantages:

  1. No single point of failure: The UAR can be replicated across multiple operators. If one UAR instance fails, others continue serving.
  2. No vendor lock-in: Any organization can operate a UAR instance. Multiple competing UAR instances can coexist.
  3. No data custody: The UAR never holds agent artifacts, only metadata pointers. This eliminates data sovereignty concerns.
Figure 7: UAR as Index-of-Indexes

+-------+     +-------+     +-------+     +-------+
| Pod A |     | Pod B |     | Pod C |     | Pod D |
| (Acme)|     | (Beta)|     | (CNCF)|     | (OSS) |
+---+---+     +---+---+     +---+---+     +---+---+
    |             |             |             |
    +------+------+------+------+------+------+
           |             |             |
     +-----v-----+ +-----v-----+ +-----v-----+
     | UAR West  | | UAR East  | | UAR EU    |
     | (replica) | | (replica) | | (replica) |
     +-----------+ +-----------+ +-----------+
           |             |             |
           +------+------+------+------+
                  |             |
            +-----v-----+ +-----v-----+
            | Consumer  | | Consumer  |
            | queries   | | queries   |
            | UAR West  | | UAR EU    |
            +-----------+ +-----------+

4.5 Kubernetes Deployment for Registry Pods

A production-grade registry pod deploys as a Kubernetes StatefulSet with persistent storage and mesh networking:

# Registry Pod StatefulSet apiVersion: apps/v1 kind: StatefulSet metadata: name: ossa-registry-pod namespace: agent-registry labels: app.kubernetes.io/name: ossa-registry app.kubernetes.io/component: pod spec: serviceName: ossa-registry-pod replicas: 3 selector: matchLabels: app.kubernetes.io/name: ossa-registry template: metadata: labels: app.kubernetes.io/name: ossa-registry spec: containers: - name: registry image: ghcr.io/ossa-registry/pod:1.0.0 ports: - containerPort: 5000 name: oci-api - containerPort: 8080 name: discovery-api - containerPort: 7946 name: gossip env: - name: OSSA_POD_ID valueFrom: fieldRef: fieldPath: metadata.name - name: OSSA_MESH_SEEDS value: "pod-0.ossa-registry-pod:7946,pod-1.ossa-registry-pod:7946" - name: OSSA_TRUST_MIN_TIER value: "standard" - name: OSSA_DNS_DOMAIN value: "registry.acme.com" resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2" memory: "4Gi" volumeMounts: - name: artifact-store mountPath: /data/artifacts - name: metadata-store mountPath: /data/metadata livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 - name: postgres image: postgres:16 env: - name: POSTGRES_DB value: "ossa_registry" - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: name: ossa-db-credentials key: password volumeMounts: - name: metadata-store mountPath: /var/lib/postgresql/data volumeClaimTemplates: - metadata: name: artifact-store spec: accessModes: ["ReadWriteOnce"] storageClassName: fast-ssd resources: requests: storage: 100Gi - metadata: name: metadata-store spec: accessModes: ["ReadWriteOnce"] storageClassName: fast-ssd resources: requests: storage: 20Gi

5. Five-Layer Architecture

The federated registry architecture is organized into five layers, each addressing a distinct concern. The layers are designed to be independently replaceable: an organization can adopt layers 1-3 without committing to layers 4-5, or substitute alternative implementations for any layer.

5.1 Layer 1: OCI Artifact Distribution

The foundation layer uses the OCI Distribution Specification for storing and transferring agent artifacts. This is not a new distribution protocol; it is the same protocol used by every container registry in production today (Docker Hub, GitHub Container Registry, Amazon ECR, Google Artifact Registry, Azure Container Registry).

The key extension is the artifactType field, introduced in OCI Image Manifest v1.1, which allows non-container artifacts to be stored in OCI registries. Agent manifests use a custom artifact type:

artifactType: application/vnd.ossa.agent.manifest.v1+json

This single decision provides immediate access to the entire OCI ecosystem: existing registries can store agent artifacts without modification, existing tooling (crane, skopeo, ORAS) can push and pull agent artifacts, and existing infrastructure (registry mirrors, CDN caches, garbage collection) works unchanged.

Figure 8: OCI Artifact Structure for Agents

OCI Image Index
+--------------------------------------------------+
| mediaType: application/vnd.oci.image.index.v1+json|
| manifests:                                        |
|   - platform: linux/amd64                         |
|     digest: sha256:abc...                         |
|   - platform: linux/arm64                         |
|     digest: sha256:def...                         |
+--------------------------------------------------+
         |                          |
         v                          v
OCI Image Manifest (amd64)    OCI Image Manifest (arm64)
+---------------------------+ +---------------------------+
| config:                   | | config:                   |
|   mediaType: vnd.ossa.    | |   mediaType: vnd.ossa.    |
|     agent.config.v1+json  | |     agent.config.v1+json  |
|   digest: sha256:111...   | |   digest: sha256:222...   |
| layers:                   | | layers:                   |
|   - manifest.yaml         | |   - manifest.yaml         |
|     (ossa manifest)       | |     (ossa manifest)       |
|   - code-bundle.tar.gz    | |   - code-bundle.tar.gz    |
|     (agent code)          | |     (agent code)          |
|   - capabilities.json     | |   - capabilities.json     |
|     (skill schemas)       | |     (skill schemas)       |
|   - context.tar.gz        | |   - context.tar.gz        |
|     (prompts, examples)   | |     (prompts, examples)   |
+---------------------------+ +---------------------------+

ORAS (OCI Registry As Storage) is the recommended client library for pushing and pulling agent artifacts. It provides a clean API for working with non-container OCI artifacts:

# Push an agent artifact
oras push registry.acme.com/agents/financial-analyst:2.4.1 \
  --artifact-type application/vnd.ossa.agent.manifest.v1+json \
  manifest.yaml:application/vnd.ossa.manifest.v1+yaml \
  code-bundle.tar.gz:application/gzip \
  capabilities.json:application/vnd.ossa.capabilities.v1+json \
  context.tar.gz:application/gzip

# Pull an agent artifact
oras pull registry.acme.com/agents/financial-analyst:2.4.1

# Copy between registries (federation)
oras copy registry.acme.com/agents/financial-analyst:2.4.1 \
  mirror.partner.com/agents/financial-analyst:2.4.1

5.2 Layer 2: Agent Card Mesh

Layer 2 builds on the A2A Agent Card specification to create a searchable mesh of agent capabilities. Each registry pod maintains an index of Agent Cards for the agents it hosts, and gossips card updates to peers in the mesh.

The OSSA extension to Agent Cards adds fields that A2A does not specify: resource requirements, trust tier, context budget, and dependency declarations. These extensions are placed in a dedicated x-ossa extension object to maintain backward compatibility with standard A2A implementations.

The Agent Card mesh supports three query patterns:

  1. Exact match: Find an agent by name and namespace (e.g., acme-corp/financial-analyst@2.4.1).
  2. Capability match: Find agents that can perform a specific skill (e.g., "agents that can analyze SEC filings with SOX compliance").
  3. Semantic search: Find agents by natural language description, using embedding-based similarity search against agent descriptions and capability schemas.

5.3 Layer 3: SPIFFE/SPIRE Federated Identity

Layer 3 provides cryptographic identity for agents and registry pods using the SPIFFE (Secure Production Identity Framework for Everyone) standard and its reference implementation SPIRE (SPIFFE Runtime Environment).

Every agent and every registry pod receives a SPIFFE ID:

Agent SPIFFE ID format:
  spiffe://acme.com/agent/financial-analyst/v2.4.1

Registry Pod SPIFFE ID format:
  spiffe://acme.com/registry-pod/us-west-2/pod-0

Federation trust domain:
  spiffe://ossa-federation.io/

SPIFFE federation allows trust domains to be linked without a central authority. When acme.com and partner.org establish a federation relationship, their SPIRE servers exchange trust bundles. After this exchange, an agent from acme.com can authenticate to a registry pod at partner.org without any shared secrets or central identity provider.

This is a critical advantage over token-based authentication (OAuth, API keys) for cross-organizational federation. Tokens require a shared identity provider or token exchange protocol. SPIFFE federation requires only a one-time trust bundle exchange, after which authentication is fully decentralized.

5.4 Layer 4: CloudEvents and NATS JetStream Synchronization

Layer 4 defines the event-driven synchronization protocol between registry pods. Every mutation to a registry pod (agent published, agent updated, agent deprecated, trust status changed) emits a CloudEvent that is propagated through the mesh.

NATS JetStream provides the messaging substrate for three reasons. First, it supports both publish-subscribe (for gossip) and request-reply (for directed queries). Second, it provides at-least-once delivery with message deduplication, ensuring that no registry update is lost even during network partitions. Third, it supports subject-based routing, allowing pods to subscribe to events from specific namespaces or trust tiers.

CloudEvent schema for agent publication:

{ "specversion": "1.0", "id": "evt-20260207-001", "source": "spiffe://acme.com/registry-pod/us-west-2/pod-0", "type": "io.ossa.agent.published", "subject": "acme-corp/financial-analyst@2.4.1", "time": "2026-02-07T14:30:00Z", "datacontenttype": "application/json", "data": { "agent": { "name": "financial-analyst", "namespace": "acme-corp", "version": "2.4.1", "digest": "sha256:a1b2c3d4e5f6789...", "artifactType": "application/vnd.ossa.agent.manifest.v1+json", "capabilities": ["financial-analysis", "risk-assessment"], "trustTier": "verified", "registryUrl": "registry.acme.com/agents/financial-analyst:2.4.1" }, "signature": { "algorithm": "ed25519", "keyId": "_ossa-key.acme.com", "value": "base64-encoded-signature..." } } }

5.5 Layer 5: REST and GraphQL API Surface

Layer 5 provides the human-and-machine-facing API for interacting with registry pods and the UAR. The API surface includes three interfaces:

OCI Distribution API (v2): Standard OCI endpoints for pushing, pulling, and managing artifacts. This enables existing OCI tooling to work with agent registries without modification.

REST Discovery API: Purpose-built endpoints for agent discovery, capability matching, and trust verification. These endpoints are specific to the agent registry domain and have no OCI equivalent.

GraphQL Federation API: A GraphQL endpoint that supports federated queries across multiple registry pods. A single GraphQL query can search for agents across the entire federation, with results annotated by source pod and trust tier.

Table 5: API Surface Endpoints

+------------------+----------------------------------+-------------------+
| Interface        | Key Endpoints                    | Use Case          |
+------------------+----------------------------------+-------------------+
| OCI Distribution | GET  /v2/{name}/manifests/{ref}  | Pull agent        |
|                  | PUT  /v2/{name}/manifests/{ref}  | Push agent        |
|                  | GET  /v2/{name}/tags/list        | List versions     |
|                  | HEAD /v2/{name}/manifests/{ref}  | Check existence   |
|                  | GET  /v2/_catalog                | List all agents   |
+------------------+----------------------------------+-------------------+
| REST Discovery   | GET  /api/v1/agents              | Search agents     |
|                  | GET  /api/v1/agents/{id}         | Agent details     |
|                  | GET  /api/v1/capabilities        | Capability search |
|                  | GET  /api/v1/trust/{digest}      | Trust verification|
|                  | GET  /api/v1/mesh/peers          | Mesh topology     |
|                  | POST /api/v1/agents/match        | Capability match  |
+------------------+----------------------------------+-------------------+
| GraphQL          | POST /graphql                    | Federated queries |
|                  | Subscriptions: agentPublished,   | Real-time updates |
|                  |   agentUpdated, meshChanged      |                   |
+------------------+----------------------------------+-------------------+

6. Federation Approaches Compared

6.1 Taxonomy of Federation Models

Federation is not a single design pattern. Different architectures offer different trade-offs between query latency, consistency, operational complexity, and organizational autonomy. We analyze four approaches:

Table 6: Federation Approaches Comparison

+------------------+-----------+-----------+-----------+-----------+-----------+
| Dimension        | Proxy/    | DHT       | Central   | Mesh Pods |           |
|                  | Cache     | (Kademlia)| Index     | (Proposed)|           |
+------------------+-----------+-----------+-----------+-----------+-----------+
| Query Latency    | O(1) with | O(log n)  | O(1)      | O(1) local|           |
|                  | warm cache| hops      |           | O(k) fed  |           |
+------------------+-----------+-----------+-----------+-----------+-----------+
| Consistency      | Eventual  | Eventual  | Strong    | Bounded   |           |
|                  | (TTL)     | (DHT sync)| (single)  | eventual  |           |
+------------------+-----------+-----------+-----------+-----------+-----------+
| Fault Tolerance  | High      | Very High | Low       | High      |           |
|                  | (local    | (no SPOF) | (SPOF)    | (mesh     |           |
|                  | fallback) |           |           | resilient)|           |
+------------------+-----------+-----------+-----------+-----------+-----------+
| Org Autonomy     | High      | Low       | None      | High      |           |
|                  | (own pod) | (shared   | (vendor   | (own pod, |           |
|                  |           | DHT)      | controls) | own data) |           |
+------------------+-----------+-----------+-----------+-----------+-----------+
| Operational Cost | Low       | Medium    | Low       | Medium    |           |
|                  | (cache)   | (DHT ops) | (SaaS)    | (K8s pod) |           |
+------------------+-----------+-----------+-----------+-----------+-----------+
| Data Sovereignty | Full      | Partial   | None      | Full      |           |
|                  |           | (DHT      | (vendor   | (local    |           |
|                  |           | spreads)  | holds all)| storage)  |           |
+------------------+-----------+-----------+-----------+-----------+-----------+
| Ecosystem Fit    | npm,      | IPFS,     | Docker    | Artifact  |           |
|                  | Maven     | BitTorrent| Hub       | Hub, K8s  |           |
+------------------+-----------+-----------+-----------+-----------+-----------+
| Best For         | Known     | Censorship| Simple    | Enterprise|           |
|                  | sources,  | resistant,| consumer  | multi-org |           |
|                  | caching   | academic  | use cases | federation|           |
+------------------+-----------+-----------+-----------+-----------+-----------+

6.2 Query Latency Analysis

Query latency is the primary user-facing performance metric for a registry. When a developer or runtime queries the registry for an agent, the response time directly impacts productivity and deployment speed.

Formula 4: Federated Query Latency

For Proxy/Cache (warm):
  L_proxy = L_local_cache                     -- O(1), typically < 5ms

For Proxy/Cache (cold):
  L_proxy = L_local + L_upstream + L_cache_write  -- O(1), typically 50-200ms

For DHT:
  L_dht = O(log n) * L_hop                   -- Logarithmic in network size
  Example: 10,000 nodes, 50ms per hop
  L_dht = log2(10000) * 50ms = 13.3 * 50ms = 665ms

For Centralized Index:
  L_central = L_network_to_index              -- O(1), typically 20-100ms
  (but SPOF risk makes effective availability lower)

For Mesh Pods (proposed):
  L_local = L_local_pod                       -- O(1) for local queries, < 10ms
  L_federated = L_local + k * L_peer          -- O(k) for federated, k = peer count
  Example: 5 peer pods, 100ms per peer
  L_federated = 10ms + 5 * 100ms = 510ms     -- But parallelizable to ~110ms

The mesh pod approach optimizes for the common case: most queries are satisfied by the local pod (sub-10ms) because organizations primarily use agents they have already pulled. Federated queries across peers are parallelizable because the querying pod contacts all relevant peers simultaneously and aggregates results.

6.3 Why Mesh Pods Over Pure DHT

DHT-based approaches (used by AGNTCY's ACP) offer theoretical elegance but practical challenges in enterprise environments. Enterprise network policies often block the UDP-based gossip protocols used by DHTs. NAT traversal, common in corporate networks, introduces unpredictable latency and connectivity failures. Most critically, DHT participation requires organizations to contribute storage and bandwidth to a shared network, which conflicts with data sovereignty requirements in regulated industries.

Mesh pods solve the same problem (decentralized discovery without a central authority) while operating within enterprise network constraints. Pods communicate over standard HTTPS and NATS (TCP-based), work behind NAT without traversal, and store data exclusively on infrastructure controlled by the owning organization.

The trade-off is that mesh pods require each participant to operate infrastructure (a Kubernetes cluster or equivalent), whereas DHT participation requires only a lightweight node. For the enterprise target audience of this specification, the infrastructure requirement is not a barrier; these organizations already operate Kubernetes clusters.


7. Kubernetes Deployment

7.1 Production Architecture

A production deployment of the registry pod infrastructure requires careful resource planning. The following formulas guide capacity planning:

Formula 5: Resource Planning

Storage:
  S_total = sum(S_agent_i * R_i) for all agents i

  Where:
    S_agent_i = Size of agent artifact i (manifest + code + context)
    R_i       = Replication factor for agent i

  Example:
    1000 agents, average 50MB each, replication factor 2
    S_total = 1000 * 50MB * 2 = 100GB

CPU:
  CPU_total = CPU_base + (QPS * CPU_per_query) + (SYNC_rate * CPU_per_sync)

  Where:
    CPU_base       = Baseline for pod processes (500m)
    QPS            = Queries per second to discovery API
    CPU_per_query  = CPU cost per query (5m for exact, 50m for semantic)
    SYNC_rate      = Mesh sync events per second
    CPU_per_sync   = CPU cost per sync event (10m)

  Example:
    CPU_base = 500m
    QPS = 100 (50 exact + 50 semantic)
    SYNC_rate = 10 events/sec
    CPU_total = 500m + (50 * 5m + 50 * 50m) + (10 * 10m)
             = 500m + 250m + 2500m + 100m
             = 3350m (approximately 3.5 CPU cores)

Memory:
  MEM_total = MEM_base + (N_agents * MEM_per_index) + MEM_cache

  Where:
    MEM_base      = Baseline (256Mi)
    N_agents      = Number of indexed agents
    MEM_per_index = Memory per agent index entry (50KB avg)
    MEM_cache     = Query result cache (configurable)

  Example:
    MEM_base = 256Mi
    N_agents = 10,000
    MEM_per_index = 50KB
    MEM_cache = 512Mi
    MEM_total = 256Mi + (10000 * 50KB) + 512Mi
             = 256Mi + 488Mi + 512Mi
             = 1256Mi (approximately 1.3Gi)

7.2 Helm Chart Configuration

The registry pod ships as a Helm chart with configurable values for each deployment topology:

# values.yaml - Production configuration global: domain: registry.acme.com trustTier: standard federation: enabled: true meshSeeds: - registry.partner-a.com:7946 - registry.partner-b.com:7946 syncInterval: 5s maxPeers: 50 registryPod: replicas: 3 image: repository: ghcr.io/ossa-registry/pod tag: "1.0.0" resources: requests: cpu: "1" memory: "2Gi" limits: cpu: "4" memory: "8Gi" storage: artifacts: storageClass: fast-ssd size: 200Gi metadata: storageClass: fast-ssd size: 50Gi autoscaling: enabled: true minReplicas: 3 maxReplicas: 10 targetCPUUtilization: 70 targetMemoryUtilization: 80 networkPolicy: enabled: true ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: agent-runtime ports: - port: 5000 protocol: TCP - port: 8080 protocol: TCP - from: - ipBlock: cidr: 10.0.0.0/8 ports: - port: 7946 protocol: TCP postgres: enabled: true replicas: 3 resources: requests: cpu: "500m" memory: "1Gi" storage: size: 50Gi storageClass: fast-ssd redis: enabled: true replicas: 3 resources: requests: cpu: "250m" memory: "512Mi" nats: enabled: true jetstream: enabled: true storage: size: 10Gi monitoring: prometheus: enabled: true serviceMonitor: true grafana: dashboards: true alerts: enabled: true slackWebhook: ""

7.3 Network Policies

Network policies enforce the principle of least privilege for registry pod communication:

Intra-pod communication:
  registry <-> postgres    : TCP/5432 (metadata)
  registry <-> redis       : TCP/6379 (cache)
  registry <-> nats        : TCP/4222 (events), TCP/6222 (cluster)

Consumer access:
  clients  -> registry     : TCP/5000 (OCI API), TCP/8080 (Discovery API)

Mesh communication:
  pod      <-> pod         : TCP/7946 (gossip), TCP/4222 (NATS)

External (optional):
  internet -> registry     : TCP/443 (via Ingress, TLS terminated)

7.4 High Availability Considerations

The StatefulSet deployment ensures pod identity stability across restarts, which is critical for the gossip mesh (pods must rejoin with the same identity to avoid phantom node accumulation). Persistent volume claims ensure that artifact data survives pod rescheduling.

For cross-region high availability, we recommend deploying registry pods in at least two Kubernetes clusters in different regions, with NATS JetStream replication configured across clusters. This provides sub-second failover for discovery queries (served from any available pod) and bounded recovery for artifact pulls (redirected to the surviving cluster).


8. Reference Implementation

8.1 Package Architecture

The reference implementation is organized as a monorepo with six publishable packages:

@ossa/registry-core        Core types, schemas, validation, digest computation
@ossa/registry-pod          Registry pod server (OCI + Discovery + Trust)
@ossa/registry-mesh         Gossip mesh, NATS integration, sync protocol
@ossa/registry-cli          CLI for publishing, pulling, searching agents
@ossa/registry-client       TypeScript/JavaScript client SDK
@ossa/registry-plugins      Plugin system for custom trust, storage, discovery

8.2 Package Responsibilities

@ossa/registry-core provides the foundational types and utilities shared by all other packages. It includes TypeScript type definitions generated from the OSSA OpenAPI 3.1 specification, Zod schemas for runtime validation, digest computation utilities (SHA-256 over artifact content), and OSSA manifest parsing and validation.

@ossa/registry-pod is the main server process. It implements three HTTP servers: an OCI Distribution API v2 server (for artifact push/pull), a REST Discovery API server (for agent search and capability matching), and a health/readiness endpoint server. It also manages the local artifact store (S3-compatible blob storage), the metadata database (PostgreSQL), and the query cache (Redis).

@ossa/registry-mesh handles all inter-pod communication. It implements the gossip protocol using a membership list (SWIM-style failure detection with suspicion), the NATS JetStream integration for durable event streaming, and the sync protocol for reconciling metadata differences between pods. It also implements the UAR aggregation logic for index-of-indexes queries.

@ossa/registry-cli provides a command-line interface for developers and CI/CD pipelines:

ossa-registry login <pod-url>                    # Authenticate to a pod
ossa-registry push <artifact-path>               # Publish an agent
ossa-registry pull <agent-ref>                   # Pull an agent
ossa-registry search <query>                     # Search for agents
ossa-registry search --capability <skill>        # Capability-based search
ossa-registry verify <agent-ref>                 # Verify trust chain
ossa-registry inspect <agent-ref>                # Show manifest details
ossa-registry mesh status                        # Show mesh topology
ossa-registry mesh peers                         # List connected peers
ossa-registry config set federation.enabled true # Configure federation

@ossa/registry-client is a TypeScript SDK for programmatic access to registry pods. It provides typed methods for all Discovery API endpoints, automatic retry and circuit-breaking for resilient access, and streaming support for large artifact transfers.

@ossa/registry-plugins defines extension points for custom implementations: storage backends (S3, GCS, Azure Blob, local filesystem), trust providers (custom CA, hardware security modules, cloud KMS), discovery providers (Elasticsearch, Meilisearch, custom vector stores), and authentication providers (OIDC, LDAP, custom SSO).

8.3 Dependency Graph

Figure 9: Package Dependency Graph

@ossa/registry-plugins
         |
         v
@ossa/registry-pod -----> @ossa/registry-mesh
         |                        |
         v                        v
@ossa/registry-core <--- @ossa/registry-core
         ^                        ^
         |                        |
@ossa/registry-cli       @ossa/registry-client

All packages depend on @ossa/registry-core for shared types and schemas. The pod depends on the mesh for inter-pod communication. The CLI and client are leaf packages that depend only on core (for types) and communicate with pods over HTTP.

8.4 Technology Stack

The reference implementation uses the following technology choices, each selected for production readiness and community adoption:

  • Runtime: Node.js 20 LTS (TypeScript 5.4, strict mode)
  • HTTP Framework: Hono (lightweight, edge-compatible, OpenAPI integration)
  • Database: PostgreSQL 16 (metadata, search) with pgvector (semantic search)
  • Cache: Redis 7 (query cache, rate limiting)
  • Blob Storage: S3-compatible (MinIO for self-hosted, any S3 provider for cloud)
  • Messaging: NATS 2.10 with JetStream (event streaming, gossip substrate)
  • Schema Validation: Zod (runtime validation), generated from OpenAPI 3.1
  • OCI: ORAS SDK (artifact push/pull, registry interaction)
  • Identity: SPIRE SDK (SPIFFE identity, trust bundle management)
  • Cryptography: Node.js crypto module (SHA-256, Ed25519 signatures)
  • Containerization: Distroless base images (minimal attack surface)
  • Orchestration: Kubernetes 1.28+ with Helm 3 charts

9. Implementation Timeline

9.1 Phase 1: Foundation (Weeks 1-4)

Objective: Establish core types, single-pod functionality, and OCI artifact support.

Deliverables:

  • @ossa/registry-core v0.1.0: Types, schemas, digest utilities, manifest validation
  • @ossa/registry-pod v0.1.0: OCI Distribution API v2 (push/pull), PostgreSQL metadata store, S3 artifact store
  • @ossa/registry-cli v0.1.0: login, push, pull, inspect commands
  • CI/CD pipeline: Build, test, publish to npm registry
  • Integration tests: Full push/pull cycle for OSSA agent artifacts

Key Decisions (Week 1):

  • Finalize OSSA manifest schema v1 (community review)
  • Select OCI artifact type identifiers
  • Define API versioning strategy (URL prefix vs header)

Acceptance Criteria:

  • A developer can ossa-registry push an agent artifact and ossa-registry pull it from the same pod
  • OCI-compatible tooling (crane, skopeo) can interact with the registry
  • All artifacts are content-addressable with SHA-256 digests
  • 95% test coverage on core and pod packages

9.2 Phase 2: Discovery (Weeks 5-8)

Objective: Add agent discovery, capability matching, and the REST Discovery API.

Deliverables:

  • @ossa/registry-pod v0.2.0: REST Discovery API (search, capability match, trust verification), full-text search (PostgreSQL FTS), semantic search (pgvector), Agent Card indexing
  • @ossa/registry-client v0.1.0: TypeScript SDK for Discovery API
  • @ossa/registry-cli v0.2.0: search, search --capability, verify commands
  • Documentation: API reference (OpenAPI 3.1 spec published)

Acceptance Criteria:

  • Exact match queries return results in under 10ms
  • Capability-based queries return results in under 100ms
  • Semantic search queries return results in under 500ms
  • Agent Cards are automatically indexed on artifact push

9.3 Phase 3: Mesh Federation (Weeks 9-12)

Objective: Implement inter-pod gossip, NATS JetStream synchronization, and multi-pod federation.

Deliverables:

  • @ossa/registry-mesh v0.1.0: SWIM-style gossip protocol, NATS JetStream integration, CloudEvents sync protocol, conflict resolution (LWW with digest dedup)
  • @ossa/registry-pod v0.3.0: Mesh agent integration, peer management, federated queries
  • @ossa/registry-cli v0.3.0: mesh status, mesh peers commands
  • Integration tests: Multi-pod sync, partition recovery, convergence verification
  • Helm chart v0.1.0: Single-cluster deployment with 3-pod StatefulSet

Acceptance Criteria:

  • Metadata converges across 3 pods within T_converge bound (Formula 3)
  • Federated queries return results from all reachable pods
  • Network partition recovery completes without data loss
  • Gossip protocol handles pod join/leave/failure correctly

9.4 Phase 4: Trust and Identity (Weeks 13-16)

Objective: Implement the three-tier trust model, SPIFFE identity, and DNS trust anchors.

Deliverables:

  • @ossa/registry-pod v0.4.0: Trust engine (Basic/Standard/Verified tiers), SPIFFE/SPIRE integration, DNS TXT record validation, signature verification (Ed25519), Sigstore/Notary v2 integration (Verified tier)
  • @ossa/registry-plugins v0.1.0: Plugin system architecture, storage backend plugins (S3, GCS, local), trust provider plugins (custom CA, cloud KMS)
  • @ossa/registry-cli v0.4.0: config, trust management commands
  • Security audit: Third-party review of trust model and cryptographic implementation

Acceptance Criteria:

  • Basic tier: DNS TXT record validation passes for test domains
  • Standard tier: Org-signed certificates validate correctly
  • Verified tier: Sigstore transparency log entries verify correctly
  • SPIFFE federation: Two pods in different trust domains authenticate successfully
  • Plugin system: Custom storage and trust providers can be loaded without modifying core code

9.5 Phase 5: Production Hardening (Weeks 17-20)

Objective: Prepare for production deployment with performance optimization, monitoring, and operational tooling.

Deliverables:

  • Performance optimization: Query caching (Redis), connection pooling, batch sync operations
  • Monitoring: Prometheus metrics, Grafana dashboards, alerting rules
  • Helm chart v1.0.0: Production-ready with HPA, PDB, network policies, resource limits
  • Operational runbooks: Deployment, scaling, backup, recovery, troubleshooting
  • Load testing: 10,000 agents, 100 QPS sustained, 10 peer pods
  • GraphQL federation API: Cross-pod federated queries

Acceptance Criteria:

  • P99 query latency under 100ms for local queries
  • P99 query latency under 500ms for federated queries (5 peers)
  • Zero data loss during rolling updates
  • Automated backup and restore verified
  • HPA scales correctly under load

9.6 Phase 6: Ecosystem Integration (Weeks 21-24)

Objective: Integrate with existing agent ecosystems and prepare for community adoption.

Deliverables:

  • MCP Registry bridge: Import/export agents between MCP Registry and OSSA Registry
  • A2A Agent Card sync: Bidirectional sync between A2A directories and OSSA mesh
  • UAR prototype: Index-of-indexes aggregation service
  • GitHub Actions: CI/CD actions for agent publishing
  • GitLab CI/CD: Pipeline templates for agent publishing
  • Community documentation: Getting started guide, architecture overview, contribution guide
  • OSSA specification update: Registry federation extension (submitted to OSSA working group)

Acceptance Criteria:

  • MCP servers can be discovered through OSSA registry search
  • A2A Agent Cards are automatically indexed when agents are published
  • UAR prototype aggregates metadata from 3+ independent registry pods
  • CI/CD templates work in GitHub Actions and GitLab CI/CD
  • Community documentation enables self-service deployment

9.7 Timeline Summary

Week  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
      |--Phase 1--|  |--Phase 2--|  |--Phase 3--|  |--Phase 4--|  |--Phase 5--|  |--Phase 6--|
      Foundation     Discovery      Mesh           Trust          Production     Ecosystem
      Core types     Search API     Gossip         SPIFFE         Performance    MCP bridge
      OCI push/pull  Capability     NATS sync      DNS trust      Monitoring     A2A sync
      CLI basics     Semantic       Multi-pod      Signatures     Helm v1.0      UAR proto
                     Client SDK     Helm v0.1      Plugins        Load test      CI/CD

10. Future Work and Open Questions

10.1 Economic Models for Federation

This paper does not address the economic incentives for operating registry pods. In the container ecosystem, commercial registries (Docker Hub Pro, GitHub Container Registry, cloud provider registries) sustain themselves through storage fees, bandwidth charges, and enterprise support contracts. Agent registries may require different economic models because agent artifacts are typically smaller than container images (megabytes vs gigabytes) but require more expensive operations (semantic search, capability matching, trust verification).

Possible models include federation membership fees (organizations pay to participate in the mesh), query-based pricing (pay per federated query), freemium tiers (basic local registry free, federation features paid), and grant-funded operation (for public-good registries like UAR).

10.2 Capability Negotiation Protocol

The current specification treats capabilities as static declarations in the OSSA manifest. In practice, agent capabilities are dynamic: they depend on the runtime environment, available tools, connected data sources, and context budget. A future extension should define a capability negotiation protocol where a consumer and an agent dynamically agree on the capabilities available for a specific interaction.

10.3 Cross-Protocol Agent Cards

The A2A Agent Card format and the OSSA manifest format overlap significantly but are not identical. A formal mapping between the two formats, maintained as an OSSA specification extension, would enable seamless interoperability. This mapping should be bidirectional: an A2A Agent Card should be derivable from an OSSA manifest, and an OSSA manifest should be constructable from an A2A Agent Card plus additional metadata.

10.4 Regulatory Compliance

Agent registries in regulated industries (financial services, healthcare, government) must comply with data sovereignty regulations (GDPR, CCPA, data localization laws), audit requirements (SOX, HIPAA, FedRAMP), and export controls (ITAR, EAR for agents that process controlled data). The mesh pod topology naturally supports data sovereignty (each organization controls its own pod's storage location), but the gossip protocol may inadvertently propagate metadata across jurisdictional boundaries. A future specification should define metadata classification levels and gossip boundary controls.

10.5 AI-Native Discovery

Current discovery mechanisms (full-text search, capability matching, semantic search) are designed for human consumers. As agent-to-agent interactions increase, agents themselves become the primary registry consumers. AI-native discovery would allow an agent to describe its need in natural language and receive a ranked list of compatible agents, including compatibility scores, trust assessments, and deployment recommendations. This requires embedding-based retrieval augmented with structured capability matching, a combination that the current architecture supports through pgvector but does not yet optimize for.


11. Conclusion

The agent ecosystem stands at an inflection point. The protocols for agent communication (MCP, A2A) are maturing rapidly. The formats for agent definition (OSSA, Agent Cards) are converging. The missing piece is the infrastructure for agent distribution: a federated, open, vendor-neutral registry that allows organizations to publish, discover, verify, and deploy agents across trust boundaries.

This paper has presented an architecture for that infrastructure. It builds on proven patterns from the container ecosystem (OCI distribution, content-addressable storage), the package management ecosystem (proxy/cache federation, index-of-indexes), and the internet trust infrastructure (DNS-anchored identity, layered verification). It introduces agent-specific extensions for capability-based discovery, context budget negotiation, and mesh-topology federation.

The architecture is designed for incremental adoption. An organization can start with a single edge pod for internal agent management, graduate to a team pod for departmental sharing, federate with partners through mesh gossip, and participate in the global UAR for public discovery. At each stage, the organization retains full control of its data, its trust policies, and its operational infrastructure.

The reference implementation targets a twenty-four-week delivery timeline, with core functionality (push, pull, search) available in eight weeks and production-ready federation in twenty weeks. The implementation uses standard Kubernetes infrastructure, widely adopted open-source components (PostgreSQL, Redis, NATS, SPIRE), and the OCI distribution protocol that already powers billions of artifact transfers daily.

The window for establishing an open standard is narrow. Within eighteen months, proprietary agent registries from major cloud vendors will accumulate enough network effects to become self-reinforcing. The agent ecosystem needs its OCI moment: a vendor-neutral specification that enables competition on implementation quality rather than lock-in. This paper is a contribution toward that goal.


References

  1. Open Container Initiative. "OCI Distribution Specification v1.1." 2023. https://github.com/opencontainers/distribution-spec

  2. Open Container Initiative. "OCI Image Manifest Specification v1.1." 2023. https://github.com/opencontainers/image-spec

  3. ORAS Project. "OCI Registry as Storage." CNCF Sandbox Project, 2024. https://oras.land

  4. Anthropic. "Model Context Protocol Specification." 2024-2025. https://modelcontextprotocol.io

  5. Google DeepMind. "Agent-to-Agent Protocol Specification v1.0." 2025. https://google.github.io/A2A

  6. AGNTCY. "Agent Communication Protocol and Open Agent Registry." 2025. https://agntcy.org

  7. NANDA Protocol. "Network for Agent Discovery and Adaptation." 2025. https://nanda-protocol.org

  8. BlueFly.io. "Open Standard for Sustainable Agents (OSSA) v0.3.3." 2025. https://gitlab.com/blueflyio/openstandardagents

  9. SPIFFE. "Secure Production Identity Framework for Everyone." CNCF Graduated Project, 2024. https://spiffe.io

  10. SPIRE. "SPIFFE Runtime Environment." CNCF Graduated Project, 2024. https://spiffe.io/spire

  11. CNCF. "Artifact Hub." 2024. https://artifacthub.io

  12. Sigstore. "Software Supply Chain Security." Linux Foundation Project, 2024. https://sigstore.dev

  13. Notary Project. "Notary v2 Specification." CNCF Incubating Project, 2024. https://notaryproject.dev

  14. NATS. "NATS JetStream." 2024. https://nats.io/jetstream

  15. CloudEvents. "CloudEvents Specification v1.0." CNCF Graduated Project, 2024. https://cloudevents.io

  16. Docker Inc. "Docker Hub." 2013-2026. https://hub.docker.com

  17. npm Inc. "npm Registry." 2010-2026. https://www.npmjs.com

  18. Verdaccio. "A Lightweight Private npm Proxy Registry." 2024. https://verdaccio.org

  19. Harbor. "Cloud Native Registry." CNCF Graduated Project, 2024. https://goharbor.io

  20. Maymounkov, P. and Mazières, D. "Kademlia: A Peer-to-peer Information System Based on the XOR Metric." IPTPS 2002, LNCS 2429, pp. 53-65. DOI:10.1007/3-540-45748-8_5 | PDF

  21. Das, A., Gupta, I., and Motivala, A. "SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol." DSN 2002. DOI:10.1109/DSN.2002.1028914 | PDF

  22. Microsoft. "Microsoft Entra Agent Identity." 2025. https://learn.microsoft.com/en-us/entra

  23. Google Cloud. "Vertex AI Agent Builder." 2025. https://cloud.google.com/vertex-ai/docs/agents

  24. Amazon Web Services. "Amazon Bedrock Agents." 2025. https://aws.amazon.com/bedrock/agents

  25. SLSA. "Supply-chain Levels for Software Artifacts." 2024. https://slsa.dev

  26. Brewer, E. "CAP Twelve Years Later: How the 'Rules' Have Changed." IEEE Computer, 45(2), 23-29, 2012. DOI:10.1109/MC.2012.37 | InfoQ

  27. Shapiro, M., Preguiça, N., Baquero, C., and Zawirski, M. "Conflict-Free Replicated Data Types." SSS 2011, LNCS 6976. DOI:10.1007/978-3-642-24550-3_29 | HAL

  28. Kubernetes. "StatefulSet Documentation." 2024. https://kubernetes.io/docs/concepts/workloads/controllers/statefulset

  29. Helm. "The Package Manager for Kubernetes." CNCF Graduated Project, 2024. https://helm.sh

  30. PostgreSQL. "Full Text Search and pgvector Extension." 2024. https://www.postgresql.org


Document History

VersionDateAuthorChanges
1.02026-02-07BlueFly.io Agent Platform TeamInitial publication

License: This document is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

Feedback: Submit issues and comments to https://gitlab.com/blueflyio/agent-platform/technical-docs/-/issues with the label whitepaper::federated-registries.

OSSAAgentsResearch