Deployment Handbook

Separation of Duties: See Separation of Duties - Deployment documentation is responsible for documenting deployment procedures. It does NOT own agent manifests, execution, or infrastructure configuration.

Vast.ai Integration: See BULLETPROOF_VASTAI_PLAN.md - Complete Vast.ai implementation plan with Cloudflare Tunnel + Tailscale integration.

Single-source deployment reference for all BlueFly Agent Platform projects

This handbook consolidates deployment patterns used across 12+ projects into one authoritative reference. Link to specific sections from project READMEs.

Quick Start
Kubernetes Deployment
Docker Compose
GitLab CI/CD Integration
Environment Configuration
Health Checks and Monitoring

Quick Start

Choose your deployment target:

Target	Command	Use Case
Local Docker	`docker compose up -d`	Development
OrbStack K8s	`helm install <app> ./helm-chart`	Local K8s testing
GitLab CI/CD	Push to branch	Automated deployment
Production K8s	`helm upgrade --install`	Production

Kubernetes Deployment

Helm Charts

Chart Structure (standard for all projects):

infrastructure/helm-chart/
 Chart.yaml               # Chart metadata
 values.yaml             # Default values
 values-dev.yaml         # Development overrides
 values-staging.yaml     # Staging overrides
 values-prod.yaml        # Production overrides
 templates/
     _helpers.tpl        # Template helpers
     NOTES.txt           # Post-install notes
     deployment.yaml     # Main deployment
     service.yaml        # Service definition
     configmap.yaml      # Configuration
     secret.yaml         # Secrets (external-secrets recommended)
     ingress.yaml        # Ingress rules
     hpa.yaml            # Horizontal Pod Autoscaler
     pvc.yaml            # Persistent Volume Claims

Chart.yaml Template:

apiVersion: v2
name: <service-name>
description: <Service description>
type: application
version: 1.0.0
appVersion: "1.0.0"
keywords:
  - ai
  - agent
  - llm
maintainers:
  - name: BlueFly Team
    email: dev@bluefly.io
dependencies:
  - name: postgresql
    version: 12.x.x
    repository: https://charts.bitnami.com/bitnami
    condition: postgres.enabled
  - name: redis
    version: 18.x.x
    repository: https://charts.bitnami.com/bitnami
    condition: redis.enabled

values.yaml Template:

# Global settings
global:
  environment: development
  domain: service.local

# Application configuration
app:
  replicaCount: 1
  image:
    repository: registry.gitlab.com/blueflyio/<project>
    pullPolicy: IfNotPresent
    tag: "latest"

  service:
    type: ClusterIP
    port: 3000

  resources:
    requests:
      cpu: 250m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

  autoscaling:
    enabled: false
    minReplicas: 1
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70

  # Health checks
  livenessProbe:
    httpGet:
      path: /health
      port: http
    initialDelaySeconds: 30
    periodSeconds: 10

  readinessProbe:
    httpGet:
      path: /health/ready
      port: http
    initialDelaySeconds: 10
    periodSeconds: 5

# Ingress
ingress:
  enabled: false
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: service.local
      paths:
        - path: /
          pathType: Prefix
  tls: []

Environment-Specific Values:

# values-prod.yaml
global:
  environment: production
  domain: llm-platform.example.com

app:
  replicaCount: 3

  autoscaling:
    enabled: true
    minReplicas: 3
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70

  resources:
    requests:
      cpu: 1000m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 2Gi

ingress:
  enabled: true
  tls:
    - secretName: service-tls
      hosts:
        - llm-platform.example.com

Helm Commands:

# Install
helm install <release> ./infrastructure/helm-chart \
  --namespace <namespace> \
  --create-namespace \
  --values ./infrastructure/helm-chart/values-dev.yaml

# Upgrade
helm upgrade <release> ./infrastructure/helm-chart \
  --namespace <namespace> \
  --values ./infrastructure/helm-chart/values-prod.yaml

# Rollback
helm rollback <release> -n <namespace>

# Uninstall
helm uninstall <release> -n <namespace>

# Dry run / Debug
helm install <release> ./infrastructure/helm-chart \
  --dry-run --debug \
  --namespace <namespace>

# Template rendering
helm template <release> ./infrastructure/helm-chart \
  --values ./infrastructure/helm-chart/values-prod.yaml

Raw Manifests

For simpler deployments without Helm:

Deployment Template:

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${SERVICE_NAME}
  labels:
    app: ${SERVICE_NAME}
    app.kubernetes.io/name: ${SERVICE_NAME}
    app.kubernetes.io/version: "${VERSION}"
    app.kubernetes.io/component: api
spec:
  replicas: ${REPLICAS:-1}
  selector:
    matchLabels:
      app: ${SERVICE_NAME}
  template:
    metadata:
      labels:
        app: ${SERVICE_NAME}
    spec:
      containers:
      - name: ${SERVICE_NAME}
        image: ${IMAGE}:${TAG:-latest}
        ports:
        - name: http
          containerPort: ${PORT:-3000}
        env:
        - name: NODE_ENV
          value: "${ENVIRONMENT:-production}"
        - name: PORT
          value: "${PORT:-3000}"
        envFrom:
        - configMapRef:
            name: ${SERVICE_NAME}-config
        - secretRef:
            name: ${SERVICE_NAME}-secrets
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/ready
            port: http
          initialDelaySeconds: 10
          periodSeconds: 5
        resources:
          requests:
            cpu: 250m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi

Service Template:

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: ${SERVICE_NAME}
  labels:
    app: ${SERVICE_NAME}
spec:
  type: ClusterIP
  ports:
  - name: http
    port: 80
    targetPort: http
    protocol: TCP
  selector:
    app: ${SERVICE_NAME}

OrbStack Local

OrbStack provides local Kubernetes for macOS development:

# Create cluster
orb create k8s llm-platform

# Set context
kubectl config use-context orbstack

# Deploy
kubectl apply -f k8s/

# Port forward for local access
kubectl port-forward svc/<service> 8080:80

# Access via orb.local domain
# http://<service>.orb.local

OrbStack-Specific Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ${SERVICE_NAME}
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: ${SERVICE_NAME}.orb.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: ${SERVICE_NAME}
            port:
              number: 80

Multi-Machine K3s

For production-like local clusters across multiple machines:

Control Plane (Mac M4):

# Install K3s server
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --tls-san=100.108.129.7 \
  --disable traefik \
  --flannel-backend=wireguard-native

# Get join token
cat /var/lib/rancher/k3s/server/node-token

Worker Node (Mac M3):

# Join cluster
curl -sfL https://get.k3s.io | K3S_URL=https://100.108.129.7:6443 \
  K3S_TOKEN=<node-token> sh -

# Verify
kubectl get nodes

Longhorn Storage:

# Install Longhorn for distributed storage
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml

# Set as default StorageClass
kubectl patch storageclass longhorn -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Docker Compose

Development Stack

docker-compose.yml (development):

version: "3.9"

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
      target: development
    volumes:
      - .:/app
      - node_modules:/app/node_modules
    ports:
      - "${PORT:-3000}:3000"
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgresql://user:pass@postgres:5432/db
      - REDIS_URL=redis://redis:6379
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: db
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d db"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  node_modules:
  postgres_data:
  redis_data:

Production Stack

docker-compose.prod.yml:

version: "3.9"

services:
  app:
    image: registry.gitlab.com/blueflyio/${PROJECT}:${TAG:-latest}
    restart: unless-stopped
    ports:
      - "${PORT:-3000}:3000"
    environment:
      - NODE_ENV=production
    env_file:
      - .env.production
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

Usage:

# Development
docker compose up -d

# Production
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

# With environment-specific overrides
docker compose --env-file .env.production up -d

# Scale service
docker compose up -d --scale app=3

# View logs
docker compose logs -f app

# Stop and remove
docker compose down -v

Service Templates

Common service definitions:

# PostgreSQL with backup
postgres:
  image: postgres:16-alpine
  restart: unless-stopped
  environment:
    POSTGRES_USER: ${POSTGRES_USER:-llm}
    POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    POSTGRES_DB: ${POSTGRES_DB:-llm_platform}
  volumes:
    - postgres_data:/var/lib/postgresql/data
    - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-llm}"]
    interval: 10s
    timeout: 5s
    retries: 5

# Redis with persistence
redis:
  image: redis:7-alpine
  restart: unless-stopped
  command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
  volumes:
    - redis_data:/data
  healthcheck:
    test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
    interval: 10s
    timeout: 5s
    retries: 5

# Qdrant vector database
qdrant:
  image: qdrant/qdrant:latest
  restart: unless-stopped
  ports:
    - "6333:6333"
    - "6334:6334"
  volumes:
    - qdrant_data:/qdrant/storage
  environment:
    QDRANT__SERVICE__GRPC_PORT: 6334

# MinIO object storage
minio:
  image: minio/minio:latest
  restart: unless-stopped
  command: server /data --console-address ":9001"
  ports:
    - "9000:9000"
    - "9001:9001"
  environment:
    MINIO_ROOT_USER: ${MINIO_ROOT_USER:-minioadmin}
    MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD}
  volumes:
    - minio_data:/data
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
    interval: 30s
    timeout: 20s
    retries: 3

GitLab CI/CD Integration

Pipeline Configuration

.gitlab-ci.yml (standard template):

include:
  # Golden workflow component
  - component: gitlab.com/blueflyio/agent-platform/gitlab_components/golden-workflow@v1
    inputs:
      project_type: npm  # npm | drupal | python | go
      enable_security_scan: true
      deploy_environments: ["dev", "staging", "prod"]

# Global variables
variables:
  DOCKER_REGISTRY: registry.gitlab.com/blueflyio/${CI_PROJECT_NAME}
  HELM_CHART_PATH: infrastructure/helm-chart
  KUBERNETES_NAMESPACE: ${CI_PROJECT_NAME}

# Stages
stages:
  - validate
  - test
  - build
  - deploy:dev
  - deploy:staging
  - deploy:production

# Build job
build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker build -t $DOCKER_REGISTRY:$CI_COMMIT_SHA .
    - docker build -t $DOCKER_REGISTRY:$CI_COMMIT_REF_SLUG .
    - docker push $DOCKER_REGISTRY:$CI_COMMIT_SHA
    - docker push $DOCKER_REGISTRY:$CI_COMMIT_REF_SLUG
  only:
    - main
    - development
    - tags

Environment Deployment

GitLab Environments:

# Development (automatic)
deploy:dev:
  stage: deploy:dev
  image: alpine/helm:latest
  environment:
    name: development
    url: https://dev.${CI_PROJECT_NAME}.example.com
    on_stop: stop:dev
    auto_stop_in: 7 days
  before_script:
    - kubectl config use-context $KUBE_CONTEXT_DEV
  script:
    - helm upgrade --install ${CI_PROJECT_NAME} $HELM_CHART_PATH
        --namespace ${KUBERNETES_NAMESPACE}-dev
        --create-namespace
        --values $HELM_CHART_PATH/values-dev.yaml
        --set image.tag=$CI_COMMIT_SHA
        --wait
  only:
    - development

# Staging (automatic on main)
deploy:staging:
  stage: deploy:staging
  image: alpine/helm:latest
  environment:
    name: staging
    url: https://staging.${CI_PROJECT_NAME}.example.com
  before_script:
    - kubectl config use-context $KUBE_CONTEXT_STAGING
  script:
    - helm upgrade --install ${CI_PROJECT_NAME} $HELM_CHART_PATH
        --namespace ${KUBERNETES_NAMESPACE}-staging
        --create-namespace
        --values $HELM_CHART_PATH/values-staging.yaml
        --set image.tag=$CI_COMMIT_SHA
        --wait
  only:
    - main

# Production (manual, tags only)
deploy:production:
  stage: deploy:production
  image: alpine/helm:latest
  environment:
    name: production
    url: https://${CI_PROJECT_NAME}.example.com
  before_script:
    - kubectl config use-context $KUBE_CONTEXT_PROD
  script:
    - helm upgrade --install ${CI_PROJECT_NAME} $HELM_CHART_PATH
        --namespace ${KUBERNETES_NAMESPACE}
        --create-namespace
        --values $HELM_CHART_PATH/values-prod.yaml
        --set image.tag=$CI_COMMIT_TAG
        --wait
  when: manual
  only:
    - tags

Review Apps (for MRs):

deploy:review:
  stage: deploy:dev
  image: alpine/helm:latest
  environment:
    name: review/$CI_COMMIT_REF_SLUG
    url: https://$CI_COMMIT_REF_SLUG.review.example.com
    on_stop: stop:review
    auto_stop_in: 1 week
  script:
    - helm upgrade --install review-$CI_COMMIT_REF_SLUG $HELM_CHART_PATH
        --namespace review
        --create-namespace
        --values $HELM_CHART_PATH/values-dev.yaml
        --set image.tag=$CI_COMMIT_SHA
        --set ingress.hosts[0].host=$CI_COMMIT_REF_SLUG.review.example.com
        --wait
  only:
    - merge_requests
  except:
    - main

stop:review:
  stage: deploy:dev
  image: alpine/helm:latest
  environment:
    name: review/$CI_COMMIT_REF_SLUG
    action: stop
  script:
    - helm uninstall review-$CI_COMMIT_REF_SLUG --namespace review
  when: manual
  only:
    - merge_requests

Runner Configuration

Runner Tags:

Tag	Use Case	Runner
`docker, local`	Generic Docker jobs	docker-runner
`npm-package, docker, local`	Node.js/TypeScript	npm-runner
`drupal-module, docker, local`	Drupal/PHP	drupal-module-runner
`python, docker, local`	Python projects	python-runner

Runner Selector Component:

include:
  - component: gitlab.com/blueflyio/agent-platform/gitlab_components/runner-selector@v0.1.x
    inputs:
      runner_type: auto
      fallback_to_shared: true

build:
  extends: .runner-npm
  script:
    - npm ci
    - npm run build

Environment Configuration

Environment Variables

Standard Variables:

# Application
NODE_ENV=production
PORT=3000
LOG_LEVEL=info

# Database
DATABASE_URL=postgresql://user:pass@host:5432/db
DATABASE_POOL_MIN=2
DATABASE_POOL_MAX=10

# Redis
REDIS_URL=redis://:password@host:6379/0

# AI Providers
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
OLLAMA_BASE_URL=http://ollama:11434

# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4317
OTEL_SERVICE_NAME=my-service
OTEL_RESOURCE_ATTRIBUTES=service.version=1.0.0,deployment.environment=production

# Security
JWT_SECRET=...
CORS_ORIGINS=https://example.com

Environment Files:

.env                 # Default (development)
.env.local           # Local overrides (gitignored)
.env.development     # Development
.env.staging         # Staging
.env.production      # Production
.env.test            # Testing

Secrets Management

GitLab CI/CD Variables:

# Set in Settings > CI/CD > Variables
ANTHROPIC_API_KEY      # Masked, protected
DATABASE_PASSWORD      # Masked, protected
KUBECONFIG_PROD        # File, protected

Kubernetes Secrets:

apiVersion: v1
kind: Secret
metadata:
  name: api-keys
type: Opaque
stringData:
  ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
  OPENAI_API_KEY: ${OPENAI_API_KEY}

External Secrets Operator (recommended for production):

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: api-keys
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: SecretStore
  target:
    name: api-keys
  data:
    - secretKey: ANTHROPIC_API_KEY
      remoteRef:
        key: secret/data/api-keys
        property: anthropic

Local Secrets (development):

# Store tokens in ~/.tokens/
~/.tokens/gitlab
~/.tokens/anthropic
~/.tokens/openai

# Load in shell
export ANTHROPIC_API_KEY=$(cat ~/.tokens/anthropic)

Configuration Files

ConfigMap Pattern:

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  config.json: |
    {
      "port": 3000,
      "logLevel": "info",
      "features": {
        "enableMetrics": true,
        "enableTracing": true
      }
    }

Mounting Configuration:

containers:
- name: app
  volumeMounts:
  - name: config
    mountPath: /app/config
    readOnly: true
volumes:
- name: config
  configMap:
    name: app-config

Health Checks and Monitoring

Health Endpoints

Standard Health Endpoints:

Endpoint	Purpose	Returns
`/health`	Basic liveness	`{"status": "ok"}`
`/health/ready`	Readiness with deps	`{"status": "ok", "checks": {...}}`
`/health/live`	Kubernetes liveness	`{"status": "ok"}`
`/metrics`	Prometheus metrics	Prometheus format

TypeScript Implementation:

// health.controller.ts
import { Router } from 'express';

const router = Router();

// Liveness - is the process alive?
router.get('/health', (req, res) => {
  res.json({ status: 'ok', timestamp: new Date().toISOString() });
});

// Readiness - can the service handle requests?
router.get('/health/ready', async (req, res) => {
  const checks = {
    database: await checkDatabase(),
    redis: await checkRedis(),
    external: await checkExternalDeps()
  };

  const allHealthy = Object.values(checks).every(c => c.healthy);

  res.status(allHealthy ? 200 : 503).json({
    status: allHealthy ? 'ok' : 'degraded',
    checks,
    timestamp: new Date().toISOString()
  });
});

async function checkDatabase(): Promise<HealthCheck> {
  try {
    await db.query('SELECT 1');
    return { healthy: true, latency: 5 };
  } catch (error) {
    return { healthy: false, error: error.message };
  }
}

Readiness and Liveness

Kubernetes Probes:

containers:
- name: app
  livenessProbe:
    httpGet:
      path: /health/live
      port: http
    initialDelaySeconds: 30
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3

  readinessProbe:
    httpGet:
      path: /health/ready
      port: http
    initialDelaySeconds: 10
    periodSeconds: 5
    timeoutSeconds: 3
    failureThreshold: 3

  startupProbe:
    httpGet:
      path: /health
      port: http
    initialDelaySeconds: 5
    periodSeconds: 5
    failureThreshold: 30  # 5s * 30 = 150s max startup time

Probe Best Practices:

Probe	Purpose	Failure Action
`startupProbe`	App initialization	Delays other probes
`livenessProbe`	Deadlock detection	Container restart
`readinessProbe`	Traffic routing	Remove from service

Observability Stack

OpenTelemetry Configuration:

// tracing.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

const sdk = new NodeSDK({
  resource: new Resource({
    'service.name': process.env.OTEL_SERVICE_NAME,
    'service.version': process.env.npm_package_version,
    'deployment.environment': process.env.NODE_ENV
  }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT
  }),
  instrumentations: [getNodeAutoInstrumentations()]
});

sdk.start();

Prometheus Metrics:

import { collectDefaultMetrics, Registry, Counter, Histogram } from 'prom-client';

const register = new Registry();
collectDefaultMetrics({ register });

// Custom metrics
const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status'],
  registers: [register]
});

const httpRequestTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status'],
  registers: [register]
});

// Metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

Observability Stack Deployment:

# docker-compose.observability.yml
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    volumes:
      - grafana_data:/var/lib/grafana

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # UI
      - "4317:4317"    # OTLP gRPC
      - "4318:4318"    # OTLP HTTP

  phoenix:
    image: arizephoenix/phoenix:latest
    ports:
      - "6006:6006"    # UI
      - "4317:4317"    # OTLP

volumes:
  prometheus_data:
  grafana_data:

Prometheus Scrape Config:

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'app'
    static_configs:
      - targets: ['app:3000']
    metrics_path: '/metrics'

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

Version: 1.0.0 Last Updated: 2026-01-01 Maintainer: BlueFly Agent Platform Team

Deployment Handbook

Deployment Handbook

Table of Contents

Quick Start

Kubernetes Deployment

Helm Charts

Raw Manifests

OrbStack Local

Multi-Machine K3s

Docker Compose

Development Stack

Production Stack

Service Templates

GitLab CI/CD Integration

Pipeline Configuration

Environment Deployment

Runner Configuration

Environment Configuration

Environment Variables

Secrets Management

Configuration Files

Health Checks and Monitoring

Health Endpoints

Readiness and Liveness

Observability Stack

Related Documentation