Skip to main content

otel observability integration

OTEL Observability Integration — Implementation Runbook

Status: BLOCKED on Phase 1 (GitLab Observability 502) Updated: 2026-02-27 Epic: Observability (#35, #52)


BLOCKER: GitLab OTEL Endpoint Returns 502

All collectors are wired and running. The only blocker is GitLab returning 502.

Fix: Go to https://gitlab.com/groups/blueflyio/-/settings/general → Permissions → Enable Observability

Verify: curl -s -o /dev/null -w '%{http_code}' -X POST http://87749026.otel.gitlab-o11y.com:4318/v1/traces -H 'Content-Type: application/json' -d '{"resourceSpans":[]}' → should return 200


What Exists (Verified 2026-02-27)

Oracle OTEL Collector (RUNNING — v0.145.0)

  • Container: otel-collector on Oracle
  • Config: /opt/agent-platform/config/otel-collector.yaml
  • Exports to: GitLab (87749026.otel.gitlab-o11y.com:4318), Tempo, Loki, Prometheus
  • Receives from: All Oracle services via OTLP + A2A scraping
  • Status: Running, retrying GitLab exports (502)

NAS OTEL Collector (EXISTS — config updated)

  • Location: /Volumes/AgentPlatform/services/otel-collector/
  • Image: otel/opentelemetry-collector-contrib:0.93.0
  • Config: otel-config.yaml — updated to use 87749026.otel.gitlab-o11y.com:4318 (no auth)
  • Exports to: GitLab, Tempo, Loki, Prometheus, ClickHouse, file backup
  • .env: Updated — removed placeholder token, using hardcoded endpoint
  • Needs: docker compose down && docker compose up -d after Phase 1

Agent-Tracer (ALL CODE EXISTS — pushed to release/v0.1.x)

  • Worktree: ~/Sites/blueflyio/worktrees/agent-tracer/release-v0.1.x
  • OTEL Bootstrap: src/tracing.ts — NodeSDK + auto-instrumentation + Sentry
  • OtelCollector class: src/observability/collector/otel-collector.ts
  • DORA Metrics: src/metrics/dora/ — full stack (collector, calculator, exporter, GitLab integration)
  • DORA OTEL Emitter: src/metrics/dora/otel-dora-emitter.ts — OTEL Meters API bridge
  • Agent DORA DevX: src/metrics/agent-dora-devx/ — agent-specific metrics + Prometheus exporter
  • Neo4j Correlation: src/correlation/neo4j/ — GitLab ↔ Neo4j trace correlation
  • Phoenix Bridge: src/observability/phoenix-bridge/phoenix-neo4j-bridge.ts
  • ClickHouse Schemas: src/storage/clickhouse/schemas/
  • Oracle Collector Config: infrastructure/oracle-otel-collector/ — docker-compose + config
  • ClickHouse Init SQL: infrastructure/clickhouse/init-otel-db.sql

Key IDs

  • GitLab Group ID: 87749026 (blueflyio)
  • OTEL Endpoint: http://87749026.otel.gitlab-o11y.com:4318
  • SSH: ubuntu@oracle-platform.tailcf98b3.ts.net

Phase Status

PhaseStatusNotes
1. Enable GitLab ObservabilityBLOCKEDManual UI step — 502 persists
2. NAS Collector → GitLabDONEConfig updated, ClickHouse exporter added
3. Oracle CollectorDONEAlready running v0.145.0
4. SDK InstrumentationDONEsrc/tracing.ts exists in agent-tracer
5. DORA MetricsDONEClickHouse + OTEL emitter
6. KG CorrelationEXISTSCode exists, needs runtime verification
7. SentryDONEConditional init in src/tracing.ts
8. ClickHouse AnalyticsDONEExporter in NAS config, init SQL created

After Phase 1 (Observability Enabled)

  1. Restart NAS collector: ssh flux423@blueflynas.tailcf98b3.ts.net "cd /volume1/AgentPlatform/services/otel-collector && docker compose down && docker compose up -d"
  2. Check Oracle collector logs: ssh ubuntu@oracle-platform.tailcf98b3.ts.net "docker logs otel-collector --tail 20"
  3. Verify traces in GitLab: https://gitlab.com/groups/blueflyio/-/observability/traces
  4. Run ClickHouse init: clickhouse-client < infrastructure/clickhouse/init-otel-db.sql
  5. Set SENTRY_DSN on Oracle services (via /opt/.env)

Remaining Work (Phase 4 expansion)

Other Oracle services need OTEL SDK instrumentation. Pattern:

  1. Copy src/tracing.ts pattern to each service
  2. Add OTEL deps to package.json
  3. Import tracing.ts as FIRST line in entrypoint
  4. Set OTEL_SERVICE_NAME and OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 in docker env

Priority: agent-mesh, agent-router, agent-brain, agent-protocol, gkg, workflow-engine