token efficiency observability

Token Efficiency Observability Pipeline

Status: Production — deployed 2026-02-17 Repos: platform-agents, agent-tracer, ai_agents_ossa Spec: OSSA v0.5 Token Efficiency Primitives

Overview

Full-stack token efficiency tracking across 74 OSSA agents, 247 Tool plugins, and 13 AI providers. Every token tracked, every dollar accounted for, every waste category visible.

Architecture

Drupal AI call → PostGenerateResponseEvent
    ↓
ai_agents_ossa_token_efficiency (submodule)
    → TokenEfficiencySubscriber extracts TokenUsageDto
    → Async POST to agent-tracer /api/v1/token-efficiency/record
    ↓
agent-tracer OssaTokenEfficiencyCollector (845 lines)
    → CostCalculator (provider pricing, Claude 4.x + GPT-4o)
    → VortexEngine (35-40% data compression)
    → Waste detection (4 categories)
    → ClickHouse persistence (ossa_token_efficiency table)
    → Prometheus export (25+ metrics at /metrics/ossa)
    ↓
Grafana Dashboard (14 panels)
    → Budget burndown, waste breakdown, routing distribution
    → Cost-per-agent, anomaly detection, latency percentiles
    ↓
ECA Governance (AiTokenUsageEvent)
    → Budget alerts, cost thresholds, routing decisions
    ↓
Agent Self-Query (GetTokenEfficiency Tool)
    → Tool → FunctionCall → MCP auto-bridge

Components

1. Platform Agents (platform-agents repo)

Commit: 44598f00 on release/v0.1.x

All 74 agents have token_efficiency config in their manifests:

token_efficiency:
  serialization_profile: compact
  observation_format: projected
  budget:
    max_input_tokens: 50000
    allocation_strategy: adaptive
  routing:
    cascade: [haiku, sonnet, opus]
    complexity_threshold: [0.3, 0.7]
  consolidation:
    strategy: moderate
    retain: [final_answer]
    drop: [raw_observations]

3 agents have deep optimization:

security-scanner: 10x reduction (100K → 10K tokens)
code-reviewer: 5.3x reduction (80K → 15K, 60/35/5 Haiku/Sonnet/Opus)
pipeline-remediation: 7.5x reduction (60K → 8K, zero-copy knowledge refs)

Production workflow: security-scanner/workflow.yaml — 7.3x total (240K → 33K, 86% cost savings)

2. Token Optimizer (platform-agents/packages/@ossa/token-optimizer)

TypeScript implementation with 4 modules:

Module	What
`agent.ts`	Token counting, cost estimation, budget enforcement, optimization pipeline
`compression.ts`	TOON compact serialization, 3-profile manifests (full/compact/fingerprint), template compression
`routing.ts`	Complexity classifier (10 weighted features), Haiku/Sonnet/Opus cascade routing
`trajectory.ts`	AgentDiet-style observation masking, step consolidation, state expiration

3. Agent-Tracer (agent-tracer repo)

Commit: 7fd8745 on release/v0.1.x

OssaTokenEfficiencyCollector

845-line production collector integrating:

CostCalculator — provider-aware pricing (Claude 4.x, GPT-4o, Google, Azure, Ollama)
VortexEngine — runtime data compression (35-40%)
ClickHouse — ossa_token_efficiency table with MergeTree, monthly partitioning, 1-year TTL, batch buffering (10s flush)
Prometheus — 25+ metrics exported at /metrics/ossa
Waste Detection — 4 categories per span: serialization, trajectory, coordination, envelope

Dual-Namespace Attribute Support

Reads both OTel GenAI standard and legacy attributes:

OTel Standard	Legacy	What
`gen_ai.system`	`llm.provider`	Provider
`gen_ai.request.model`	`llm.model`	Model
`gen_ai.usage.input_tokens`	`llm.token_count.prompt`	Input tokens
`gen_ai.usage.output_tokens`	`llm.token_count.completion`	Output tokens
`ossa.agent.name`	`agent.name`	Agent
`ossa.compression.ratio`	—	Compression
`ossa.routing.tier`	—	Routing decision
`ossa.token_budget.utilization`	—	Budget usage

REST API Endpoints

Endpoint	Method	Purpose
`/metrics/ossa`	GET	Prometheus scrape
`/api/v1/token-efficiency`	GET	Full JSON snapshot
`/api/v1/token-efficiency/agents/:id`	GET	Per-agent breakdown
`/api/v1/token-efficiency/pipelines`	GET	Pipeline cost tracking
`/api/v1/token-efficiency/waste`	GET	4-category waste analysis
`/api/v1/token-efficiency/routing`	GET	Cascade effectiveness
`/api/v1/token-efficiency/vortex`	GET	VORTEX engine status
`/api/v1/token-efficiency/record`	POST	External agent reporting

ClickHouse Schema

CREATE TABLE IF NOT EXISTS ossa_token_efficiency (
  timestamp DateTime64(3),
  agent_id String,
  agent_domain LowCardinality(String),
  agent_tier LowCardinality(String),
  pipeline_id String DEFAULT '',
  model LowCardinality(String),
  provider LowCardinality(String),
  input_tokens UInt32,
  output_tokens UInt32,
  total_tokens UInt32,
  cost_usd Float64,
  savings_usd Float64,
  tokens_saved UInt32,
  compression_ratio Float32 DEFAULT 0,
  routing_tier LowCardinality(String) DEFAULT '',
  complexity_score Float32 DEFAULT 0,
  budget_max UInt32 DEFAULT 0,
  budget_utilization Float32 DEFAULT 0,
  trajectory_original UInt32 DEFAULT 0,
  trajectory_reduced UInt32 DEFAULT 0,
  latency_ms Float32 DEFAULT 0,
  waste_serialization UInt32 DEFAULT 0,
  waste_trajectory UInt32 DEFAULT 0,
  waste_coordination UInt32 DEFAULT 0,
  waste_envelope UInt32 DEFAULT 0
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (agent_id, model, timestamp)
TTL timestamp + INTERVAL 1 YEAR

4. Grafana Dashboard (infrastructure/grafana/dashboards/token-efficiency.json)

14 panels:

Token Budget Burndown
Waste Breakdown (pie chart, 4 categories)
Model Routing Distribution
Compression Ratios (histogram)
Cost Per Agent (daily bar chart)
Cache Hit Rates
Trajectory Reduction (before/after)
Token Consumption Anomalies
Total Tokens (stat)
Total Tokens Saved (stat)
Total Cost (stat)
Total Savings (stat)
Agent Latency (p50/p95 timeseries)
Savings by Optimization Type

5. Prometheus Alerting (deploy/prometheus/rules/token-efficiency-alerts.yaml)

8 alerts + 3 recording rules:

OSSABudgetThresholdBreach — >90% budget utilization
OSSABudgetExceeded — 100% budget exhausted
OSSACostSpike — 5x above daily average
OSSAHighTokenWaste — >10K tokens/hour wasted
OSSALowCompression — median ratio above 0.8
OSSARoutingImbalance — Opus usage above 20%
OSSAAgentLatencyHigh — p95 above 30s
OSSAPipelineCostHigh — pipeline >$10/hour

6. Agent SDK Telemetry (packages/@ossa/agent-sdk/src/telemetry/)

genai-metrics.ts — OTel GenAI semantic conventions + 18 OSSA-specific attributes
prometheus.ts — 12 metric definitions + MetricsCollector with Prometheus text export

7. Drupal Integration (ai_agents_ossa submodule)

Commit: 259b49b on release/v0.1.x

Submodule: ai_agents_ossa_token_efficiency

File	Lines	Purpose
`TokenEfficiencySubscriber.php`	~90	Listens to PostGenerateResponseEvent + PostStreamingResponseEvent, extracts TokenUsageDto, async POST to agent-tracer
`AiTokenUsageEvent.php`	~50	ECA Event plugin — 8 tokens (provider, model, operation_type, input/output/total/cached/reasoning tokens)
`GetTokenEfficiency.php`	~50	#[Tool] plugin — agents query efficiency data via auto-bridge chain

Dependencies: ai, http_client_manager, key (all contrib, all already installed)

Config: ai_agents_ossa_token_efficiency.settings — endpoint URL, API key reference, tracking flags

Waste Categories

Category	Source	Impact
Serialization	JSON field names repeated across tool definitions	40-70% of context
Trajectory	Stale context accumulating in multi-turn interactions	39.9-59.7% reducible
Coordination	Inter-agent messages eating token budget	29-50% reducible
Envelope	MCP/protocol tool definitions	13K-27K tokens per 100 tools

Optimization Techniques

Knowledge graph references — 20x savings (URN refs vs full context)
Template compression — 5x reduction per template match
Semantic compression — LLMLingua-style pruning
Dynamic budget allocation — ML-based complexity prediction
Hierarchical context loading — 60-80% reduction (essential → conditional → deep)
Checkpoint-based retry — 62% failure cost reduction
Cascade composition — 9x cost reduction (Haiku filters → Sonnet processes → Opus validates)

Target Metrics

57-70% cost reduction per pipeline execution
90% manifest loading reduction via fingerprint profile
40-50% serialization reduction via compact format
All 74 agents with token_efficiency config
Real-time Grafana dashboard with token tracking

Service	Port	Role
agent-tracer	3006	Observability + token tracking
agent-mesh	3005	54 production agents
agent-router	4000	Multi-provider LLM gateway
agent-brain	—	Qdrant vector memory
agent-docker	—	Vast.ai GPU orchestration
dragonfly	3020	AI-powered Drupal testing
compliance-engine	3010	SOC2/GDPR/HIPAA policy enforcement