otel observability integration
OTEL Observability Integration — Implementation Runbook
Status: BLOCKED on Phase 1 (GitLab Observability 502) Updated: 2026-02-27 Epic: Observability (#35, #52)
BLOCKER: GitLab OTEL Endpoint Returns 502
All collectors are wired and running. The only blocker is GitLab returning 502.
Fix: Go to https://gitlab.com/groups/blueflyio/-/settings/general → Permissions → Enable Observability
Verify: curl -s -o /dev/null -w '%{http_code}' -X POST http://87749026.otel.gitlab-o11y.com:4318/v1/traces -H 'Content-Type: application/json' -d '{"resourceSpans":[]}' → should return 200
What Exists (Verified 2026-02-27)
Oracle OTEL Collector (RUNNING — v0.145.0)
- Container:
otel-collectoron Oracle - Config:
/opt/agent-platform/config/otel-collector.yaml - Exports to: GitLab (
87749026.otel.gitlab-o11y.com:4318), Tempo, Loki, Prometheus - Receives from: All Oracle services via OTLP + A2A scraping
- Status: Running, retrying GitLab exports (502)
NAS OTEL Collector (EXISTS — config updated)
- Location:
/Volumes/AgentPlatform/services/otel-collector/ - Image:
otel/opentelemetry-collector-contrib:0.93.0 - Config:
otel-config.yaml— updated to use87749026.otel.gitlab-o11y.com:4318(no auth) - Exports to: GitLab, Tempo, Loki, Prometheus, ClickHouse, file backup
- .env: Updated — removed placeholder token, using hardcoded endpoint
- Needs:
docker compose down && docker compose up -dafter Phase 1
Agent-Tracer (ALL CODE EXISTS — pushed to release/v0.1.x)
- Worktree:
~/Sites/blueflyio/worktrees/agent-tracer/release-v0.1.x - OTEL Bootstrap:
src/tracing.ts— NodeSDK + auto-instrumentation + Sentry - OtelCollector class:
src/observability/collector/otel-collector.ts - DORA Metrics:
src/metrics/dora/— full stack (collector, calculator, exporter, GitLab integration) - DORA OTEL Emitter:
src/metrics/dora/otel-dora-emitter.ts— OTEL Meters API bridge - Agent DORA DevX:
src/metrics/agent-dora-devx/— agent-specific metrics + Prometheus exporter - Neo4j Correlation:
src/correlation/neo4j/— GitLab ↔ Neo4j trace correlation - Phoenix Bridge:
src/observability/phoenix-bridge/phoenix-neo4j-bridge.ts - ClickHouse Schemas:
src/storage/clickhouse/schemas/ - Oracle Collector Config:
infrastructure/oracle-otel-collector/— docker-compose + config - ClickHouse Init SQL:
infrastructure/clickhouse/init-otel-db.sql
Key IDs
- GitLab Group ID:
87749026(blueflyio) - OTEL Endpoint:
http://87749026.otel.gitlab-o11y.com:4318 - SSH:
ubuntu@oracle-platform.tailcf98b3.ts.net
Phase Status
| Phase | Status | Notes |
|---|---|---|
| 1. Enable GitLab Observability | BLOCKED | Manual UI step — 502 persists |
| 2. NAS Collector → GitLab | DONE | Config updated, ClickHouse exporter added |
| 3. Oracle Collector | DONE | Already running v0.145.0 |
| 4. SDK Instrumentation | DONE | src/tracing.ts exists in agent-tracer |
| 5. DORA Metrics | DONE | ClickHouse + OTEL emitter |
| 6. KG Correlation | EXISTS | Code exists, needs runtime verification |
| 7. Sentry | DONE | Conditional init in src/tracing.ts |
| 8. ClickHouse Analytics | DONE | Exporter in NAS config, init SQL created |
After Phase 1 (Observability Enabled)
- Restart NAS collector:
ssh flux423@blueflynas.tailcf98b3.ts.net "cd /volume1/AgentPlatform/services/otel-collector && docker compose down && docker compose up -d" - Check Oracle collector logs:
ssh ubuntu@oracle-platform.tailcf98b3.ts.net "docker logs otel-collector --tail 20" - Verify traces in GitLab:
https://gitlab.com/groups/blueflyio/-/observability/traces - Run ClickHouse init:
clickhouse-client < infrastructure/clickhouse/init-otel-db.sql - Set
SENTRY_DSNon Oracle services (via/opt/.env)
Remaining Work (Phase 4 expansion)
Other Oracle services need OTEL SDK instrumentation. Pattern:
- Copy
src/tracing.tspattern to each service - Add OTEL deps to
package.json - Import
tracing.tsas FIRST line in entrypoint - Set
OTEL_SERVICE_NAMEandOTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318in docker env
Priority: agent-mesh, agent-router, agent-brain, agent-protocol, gkg, workflow-engine