token efficiency observability
Token Efficiency Observability Pipeline
Status: Production — deployed 2026-02-17 Repos: platform-agents, agent-tracer, ai_agents_ossa Spec: OSSA v0.5 Token Efficiency Primitives
Overview
Full-stack token efficiency tracking across 74 OSSA agents, 247 Tool plugins, and 13 AI providers. Every token tracked, every dollar accounted for, every waste category visible.
Architecture
Drupal AI call → PostGenerateResponseEvent
↓
ai_agents_ossa_token_efficiency (submodule)
→ TokenEfficiencySubscriber extracts TokenUsageDto
→ Async POST to agent-tracer /api/v1/token-efficiency/record
↓
agent-tracer OssaTokenEfficiencyCollector (845 lines)
→ CostCalculator (provider pricing, Claude 4.x + GPT-4o)
→ VortexEngine (35-40% data compression)
→ Waste detection (4 categories)
→ ClickHouse persistence (ossa_token_efficiency table)
→ Prometheus export (25+ metrics at /metrics/ossa)
↓
Grafana Dashboard (14 panels)
→ Budget burndown, waste breakdown, routing distribution
→ Cost-per-agent, anomaly detection, latency percentiles
↓
ECA Governance (AiTokenUsageEvent)
→ Budget alerts, cost thresholds, routing decisions
↓
Agent Self-Query (GetTokenEfficiency Tool)
→ Tool → FunctionCall → MCP auto-bridge
Components
1. Platform Agents (platform-agents repo)
Commit: 44598f00 on release/v0.1.x
All 74 agents have token_efficiency config in their manifests:
token_efficiency: serialization_profile: compact observation_format: projected budget: max_input_tokens: 50000 allocation_strategy: adaptive routing: cascade: [haiku, sonnet, opus] complexity_threshold: [0.3, 0.7] consolidation: strategy: moderate retain: [final_answer] drop: [raw_observations]
3 agents have deep optimization:
- security-scanner: 10x reduction (100K → 10K tokens)
- code-reviewer: 5.3x reduction (80K → 15K, 60/35/5 Haiku/Sonnet/Opus)
- pipeline-remediation: 7.5x reduction (60K → 8K, zero-copy knowledge refs)
Production workflow: security-scanner/workflow.yaml — 7.3x total (240K → 33K, 86% cost savings)
2. Token Optimizer (platform-agents/packages/@ossa/token-optimizer)
TypeScript implementation with 4 modules:
| Module | What |
|---|---|
agent.ts | Token counting, cost estimation, budget enforcement, optimization pipeline |
compression.ts | TOON compact serialization, 3-profile manifests (full/compact/fingerprint), template compression |
routing.ts | Complexity classifier (10 weighted features), Haiku/Sonnet/Opus cascade routing |
trajectory.ts | AgentDiet-style observation masking, step consolidation, state expiration |
3. Agent-Tracer (agent-tracer repo)
Commit: 7fd8745 on release/v0.1.x
OssaTokenEfficiencyCollector
845-line production collector integrating:
- CostCalculator — provider-aware pricing (Claude 4.x, GPT-4o, Google, Azure, Ollama)
- VortexEngine — runtime data compression (35-40%)
- ClickHouse —
ossa_token_efficiencytable with MergeTree, monthly partitioning, 1-year TTL, batch buffering (10s flush) - Prometheus — 25+ metrics exported at
/metrics/ossa - Waste Detection — 4 categories per span: serialization, trajectory, coordination, envelope
Dual-Namespace Attribute Support
Reads both OTel GenAI standard and legacy attributes:
| OTel Standard | Legacy | What |
|---|---|---|
gen_ai.system | llm.provider | Provider |
gen_ai.request.model | llm.model | Model |
gen_ai.usage.input_tokens | llm.token_count.prompt | Input tokens |
gen_ai.usage.output_tokens | llm.token_count.completion | Output tokens |
ossa.agent.name | agent.name | Agent |
ossa.compression.ratio | — | Compression |
ossa.routing.tier | — | Routing decision |
ossa.token_budget.utilization | — | Budget usage |
REST API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/metrics/ossa | GET | Prometheus scrape |
/api/v1/token-efficiency | GET | Full JSON snapshot |
/api/v1/token-efficiency/agents/:id | GET | Per-agent breakdown |
/api/v1/token-efficiency/pipelines | GET | Pipeline cost tracking |
/api/v1/token-efficiency/waste | GET | 4-category waste analysis |
/api/v1/token-efficiency/routing | GET | Cascade effectiveness |
/api/v1/token-efficiency/vortex | GET | VORTEX engine status |
/api/v1/token-efficiency/record | POST | External agent reporting |
ClickHouse Schema
CREATE TABLE IF NOT EXISTS ossa_token_efficiency ( timestamp DateTime64(3), agent_id String, agent_domain LowCardinality(String), agent_tier LowCardinality(String), pipeline_id String DEFAULT '', model LowCardinality(String), provider LowCardinality(String), input_tokens UInt32, output_tokens UInt32, total_tokens UInt32, cost_usd Float64, savings_usd Float64, tokens_saved UInt32, compression_ratio Float32 DEFAULT 0, routing_tier LowCardinality(String) DEFAULT '', complexity_score Float32 DEFAULT 0, budget_max UInt32 DEFAULT 0, budget_utilization Float32 DEFAULT 0, trajectory_original UInt32 DEFAULT 0, trajectory_reduced UInt32 DEFAULT 0, latency_ms Float32 DEFAULT 0, waste_serialization UInt32 DEFAULT 0, waste_trajectory UInt32 DEFAULT 0, waste_coordination UInt32 DEFAULT 0, waste_envelope UInt32 DEFAULT 0 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(timestamp) ORDER BY (agent_id, model, timestamp) TTL timestamp + INTERVAL 1 YEAR
4. Grafana Dashboard (infrastructure/grafana/dashboards/token-efficiency.json)
14 panels:
- Token Budget Burndown
- Waste Breakdown (pie chart, 4 categories)
- Model Routing Distribution
- Compression Ratios (histogram)
- Cost Per Agent (daily bar chart)
- Cache Hit Rates
- Trajectory Reduction (before/after)
- Token Consumption Anomalies
- Total Tokens (stat)
- Total Tokens Saved (stat)
- Total Cost (stat)
- Total Savings (stat)
- Agent Latency (p50/p95 timeseries)
- Savings by Optimization Type
5. Prometheus Alerting (deploy/prometheus/rules/token-efficiency-alerts.yaml)
8 alerts + 3 recording rules:
OSSABudgetThresholdBreach— >90% budget utilizationOSSABudgetExceeded— 100% budget exhaustedOSSACostSpike— 5x above daily averageOSSAHighTokenWaste— >10K tokens/hour wastedOSSALowCompression— median ratio above 0.8OSSARoutingImbalance— Opus usage above 20%OSSAAgentLatencyHigh— p95 above 30sOSSAPipelineCostHigh— pipeline >$10/hour
6. Agent SDK Telemetry (packages/@ossa/agent-sdk/src/telemetry/)
genai-metrics.ts— OTel GenAI semantic conventions + 18 OSSA-specific attributesprometheus.ts— 12 metric definitions + MetricsCollector with Prometheus text export
7. Drupal Integration (ai_agents_ossa submodule)
Commit: 259b49b on release/v0.1.x
Submodule: ai_agents_ossa_token_efficiency
| File | Lines | Purpose |
|---|---|---|
TokenEfficiencySubscriber.php | ~90 | Listens to PostGenerateResponseEvent + PostStreamingResponseEvent, extracts TokenUsageDto, async POST to agent-tracer |
AiTokenUsageEvent.php | ~50 | ECA Event plugin — 8 tokens (provider, model, operation_type, input/output/total/cached/reasoning tokens) |
GetTokenEfficiency.php | ~50 | #[Tool] plugin — agents query efficiency data via auto-bridge chain |
Dependencies: ai, http_client_manager, key (all contrib, all already installed)
Config: ai_agents_ossa_token_efficiency.settings — endpoint URL, API key reference, tracking flags
Waste Categories
| Category | Source | Impact |
|---|---|---|
| Serialization | JSON field names repeated across tool definitions | 40-70% of context |
| Trajectory | Stale context accumulating in multi-turn interactions | 39.9-59.7% reducible |
| Coordination | Inter-agent messages eating token budget | 29-50% reducible |
| Envelope | MCP/protocol tool definitions | 13K-27K tokens per 100 tools |
Optimization Techniques
- Knowledge graph references — 20x savings (URN refs vs full context)
- Template compression — 5x reduction per template match
- Semantic compression — LLMLingua-style pruning
- Dynamic budget allocation — ML-based complexity prediction
- Hierarchical context loading — 60-80% reduction (essential → conditional → deep)
- Checkpoint-based retry — 62% failure cost reduction
- Cascade composition — 9x cost reduction (Haiku filters → Sonnet processes → Opus validates)
Target Metrics
- 57-70% cost reduction per pipeline execution
- 90% manifest loading reduction via fingerprint profile
- 40-50% serialization reduction via compact format
- All 74 agents with token_efficiency config
- Real-time Grafana dashboard with token tracking
Related Services
| Service | Port | Role |
|---|---|---|
| agent-tracer | 3006 | Observability + token tracking |
| agent-mesh | 3005 | 54 production agents |
| agent-router | 4000 | Multi-provider LLM gateway |
| agent-brain | — | Qdrant vector memory |
| agent-docker | — | Vast.ai GPU orchestration |
| dragonfly | 3020 | AI-powered Drupal testing |
| compliance-engine | 3010 | SOC2/GDPR/HIPAA policy enforcement |