Skip to main content

token efficiency observability

Token Efficiency Observability Pipeline

Status: Production — deployed 2026-02-17 Repos: platform-agents, agent-tracer, ai_agents_ossa Spec: OSSA v0.5 Token Efficiency Primitives

Overview

Full-stack token efficiency tracking across 74 OSSA agents, 247 Tool plugins, and 13 AI providers. Every token tracked, every dollar accounted for, every waste category visible.

Architecture

Drupal AI call → PostGenerateResponseEvent
    ↓
ai_agents_ossa_token_efficiency (submodule)
    → TokenEfficiencySubscriber extracts TokenUsageDto
    → Async POST to agent-tracer /api/v1/token-efficiency/record
    ↓
agent-tracer OssaTokenEfficiencyCollector (845 lines)
    → CostCalculator (provider pricing, Claude 4.x + GPT-4o)
    → VortexEngine (35-40% data compression)
    → Waste detection (4 categories)
    → ClickHouse persistence (ossa_token_efficiency table)
    → Prometheus export (25+ metrics at /metrics/ossa)
    ↓
Grafana Dashboard (14 panels)
    → Budget burndown, waste breakdown, routing distribution
    → Cost-per-agent, anomaly detection, latency percentiles
    ↓
ECA Governance (AiTokenUsageEvent)
    → Budget alerts, cost thresholds, routing decisions
    ↓
Agent Self-Query (GetTokenEfficiency Tool)
    → Tool → FunctionCall → MCP auto-bridge

Components

1. Platform Agents (platform-agents repo)

Commit: 44598f00 on release/v0.1.x

All 74 agents have token_efficiency config in their manifests:

token_efficiency: serialization_profile: compact observation_format: projected budget: max_input_tokens: 50000 allocation_strategy: adaptive routing: cascade: [haiku, sonnet, opus] complexity_threshold: [0.3, 0.7] consolidation: strategy: moderate retain: [final_answer] drop: [raw_observations]

3 agents have deep optimization:

  • security-scanner: 10x reduction (100K → 10K tokens)
  • code-reviewer: 5.3x reduction (80K → 15K, 60/35/5 Haiku/Sonnet/Opus)
  • pipeline-remediation: 7.5x reduction (60K → 8K, zero-copy knowledge refs)

Production workflow: security-scanner/workflow.yaml — 7.3x total (240K → 33K, 86% cost savings)

2. Token Optimizer (platform-agents/packages/@ossa/token-optimizer)

TypeScript implementation with 4 modules:

ModuleWhat
agent.tsToken counting, cost estimation, budget enforcement, optimization pipeline
compression.tsTOON compact serialization, 3-profile manifests (full/compact/fingerprint), template compression
routing.tsComplexity classifier (10 weighted features), Haiku/Sonnet/Opus cascade routing
trajectory.tsAgentDiet-style observation masking, step consolidation, state expiration

3. Agent-Tracer (agent-tracer repo)

Commit: 7fd8745 on release/v0.1.x

OssaTokenEfficiencyCollector

845-line production collector integrating:

  • CostCalculator — provider-aware pricing (Claude 4.x, GPT-4o, Google, Azure, Ollama)
  • VortexEngine — runtime data compression (35-40%)
  • ClickHouseossa_token_efficiency table with MergeTree, monthly partitioning, 1-year TTL, batch buffering (10s flush)
  • Prometheus — 25+ metrics exported at /metrics/ossa
  • Waste Detection — 4 categories per span: serialization, trajectory, coordination, envelope

Dual-Namespace Attribute Support

Reads both OTel GenAI standard and legacy attributes:

OTel StandardLegacyWhat
gen_ai.systemllm.providerProvider
gen_ai.request.modelllm.modelModel
gen_ai.usage.input_tokensllm.token_count.promptInput tokens
gen_ai.usage.output_tokensllm.token_count.completionOutput tokens
ossa.agent.nameagent.nameAgent
ossa.compression.ratioCompression
ossa.routing.tierRouting decision
ossa.token_budget.utilizationBudget usage

REST API Endpoints

EndpointMethodPurpose
/metrics/ossaGETPrometheus scrape
/api/v1/token-efficiencyGETFull JSON snapshot
/api/v1/token-efficiency/agents/:idGETPer-agent breakdown
/api/v1/token-efficiency/pipelinesGETPipeline cost tracking
/api/v1/token-efficiency/wasteGET4-category waste analysis
/api/v1/token-efficiency/routingGETCascade effectiveness
/api/v1/token-efficiency/vortexGETVORTEX engine status
/api/v1/token-efficiency/recordPOSTExternal agent reporting

ClickHouse Schema

CREATE TABLE IF NOT EXISTS ossa_token_efficiency ( timestamp DateTime64(3), agent_id String, agent_domain LowCardinality(String), agent_tier LowCardinality(String), pipeline_id String DEFAULT '', model LowCardinality(String), provider LowCardinality(String), input_tokens UInt32, output_tokens UInt32, total_tokens UInt32, cost_usd Float64, savings_usd Float64, tokens_saved UInt32, compression_ratio Float32 DEFAULT 0, routing_tier LowCardinality(String) DEFAULT '', complexity_score Float32 DEFAULT 0, budget_max UInt32 DEFAULT 0, budget_utilization Float32 DEFAULT 0, trajectory_original UInt32 DEFAULT 0, trajectory_reduced UInt32 DEFAULT 0, latency_ms Float32 DEFAULT 0, waste_serialization UInt32 DEFAULT 0, waste_trajectory UInt32 DEFAULT 0, waste_coordination UInt32 DEFAULT 0, waste_envelope UInt32 DEFAULT 0 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(timestamp) ORDER BY (agent_id, model, timestamp) TTL timestamp + INTERVAL 1 YEAR

4. Grafana Dashboard (infrastructure/grafana/dashboards/token-efficiency.json)

14 panels:

  1. Token Budget Burndown
  2. Waste Breakdown (pie chart, 4 categories)
  3. Model Routing Distribution
  4. Compression Ratios (histogram)
  5. Cost Per Agent (daily bar chart)
  6. Cache Hit Rates
  7. Trajectory Reduction (before/after)
  8. Token Consumption Anomalies
  9. Total Tokens (stat)
  10. Total Tokens Saved (stat)
  11. Total Cost (stat)
  12. Total Savings (stat)
  13. Agent Latency (p50/p95 timeseries)
  14. Savings by Optimization Type

5. Prometheus Alerting (deploy/prometheus/rules/token-efficiency-alerts.yaml)

8 alerts + 3 recording rules:

  • OSSABudgetThresholdBreach — >90% budget utilization
  • OSSABudgetExceeded — 100% budget exhausted
  • OSSACostSpike — 5x above daily average
  • OSSAHighTokenWaste — >10K tokens/hour wasted
  • OSSALowCompression — median ratio above 0.8
  • OSSARoutingImbalance — Opus usage above 20%
  • OSSAAgentLatencyHigh — p95 above 30s
  • OSSAPipelineCostHigh — pipeline >$10/hour

6. Agent SDK Telemetry (packages/@ossa/agent-sdk/src/telemetry/)

  • genai-metrics.ts — OTel GenAI semantic conventions + 18 OSSA-specific attributes
  • prometheus.ts — 12 metric definitions + MetricsCollector with Prometheus text export

7. Drupal Integration (ai_agents_ossa submodule)

Commit: 259b49b on release/v0.1.x

Submodule: ai_agents_ossa_token_efficiency

FileLinesPurpose
TokenEfficiencySubscriber.php~90Listens to PostGenerateResponseEvent + PostStreamingResponseEvent, extracts TokenUsageDto, async POST to agent-tracer
AiTokenUsageEvent.php~50ECA Event plugin — 8 tokens (provider, model, operation_type, input/output/total/cached/reasoning tokens)
GetTokenEfficiency.php~50#[Tool] plugin — agents query efficiency data via auto-bridge chain

Dependencies: ai, http_client_manager, key (all contrib, all already installed)

Config: ai_agents_ossa_token_efficiency.settings — endpoint URL, API key reference, tracking flags

Waste Categories

CategorySourceImpact
SerializationJSON field names repeated across tool definitions40-70% of context
TrajectoryStale context accumulating in multi-turn interactions39.9-59.7% reducible
CoordinationInter-agent messages eating token budget29-50% reducible
EnvelopeMCP/protocol tool definitions13K-27K tokens per 100 tools

Optimization Techniques

  1. Knowledge graph references — 20x savings (URN refs vs full context)
  2. Template compression — 5x reduction per template match
  3. Semantic compression — LLMLingua-style pruning
  4. Dynamic budget allocation — ML-based complexity prediction
  5. Hierarchical context loading — 60-80% reduction (essential → conditional → deep)
  6. Checkpoint-based retry — 62% failure cost reduction
  7. Cascade composition — 9x cost reduction (Haiku filters → Sonnet processes → Opus validates)

Target Metrics

  • 57-70% cost reduction per pipeline execution
  • 90% manifest loading reduction via fingerprint profile
  • 40-50% serialization reduction via compact format
  • All 74 agents with token_efficiency config
  • Real-time Grafana dashboard with token tracking
ServicePortRole
agent-tracer3006Observability + token tracking
agent-mesh300554 production agents
agent-router4000Multi-provider LLM gateway
agent-brainQdrant vector memory
agent-dockerVast.ai GPU orchestration
dragonfly3020AI-powered Drupal testing
compliance-engine3010SOC2/GDPR/HIPAA policy enforcement