Production Deployment
Production Deployment
Separation of Duties: See Separation of Duties - Getting started guides document onboarding. They do NOT own agent manifests, execution, or infrastructure configuration.
Complete guide to deploying the LLM Platform and OSSA agents to production Kubernetes environments.
Overview
This guide covers deploying:
- LLM Platform (Drupal) - Web application and API
- BuildKit Agents - OSSA-compliant autonomous agents
- Supporting Services - Databases, caching, vector storage, monitoring
Prerequisites
Infrastructure Requirements
- Kubernetes cluster: 1.28+ with 3+ worker nodes
- kubectl: 1.28+ configured with cluster access
- Helm: 3.13+ installed locally
- Persistent storage: StorageClass with dynamic provisioning
- Ingress controller: NGINX, Traefik, or similar
- SSL certificates: Let's Encrypt or commercial CA
- Container registry: Docker Hub, GitLab Container Registry, or private registry
Access Requirements
- Cluster admin access via kubectl
- Container registry push access
- DNS management for domain configuration
- GitLab project with CI/CD enabled
Resource Requirements
Minimum cluster capacity:
- CPU: 24 cores total (across all nodes)
- RAM: 64 GB total
- Storage: 500 GB persistent volumes
- Network: 10 Gbps inter-node bandwidth
Recommended for production:
- CPU: 48+ cores
- RAM: 128+ GB
- Storage: 1 TB+ SSD-backed PVs
- Load balancer: Cloud provider LB or MetalLB
Deployment Architecture
Component Overview
Kubernetes Cluster
Ingress Cert Mgr Monitoring
Controller Stack
LLM Platform Namespace
Drupal Redis PostgreSQL
Pods Cache DB
Agents Namespace
OSSA Vector Kafka
Agents Database Stream
Phase 1: Kubernetes Cluster Preparation
Step 1: Verify Cluster Access
# Check kubectl configuration kubectl cluster-info kubectl get nodes # Expected output: # NAME STATUS ROLES AGE VERSION # node-1 Ready master 30d v1.28.4 # node-2 Ready worker 30d v1.28.4 # node-3 Ready worker 30d v1.28.4
Step 2: Create Namespaces
# Create namespaces for organization kubectl create namespace llm-platform kubectl create namespace agents kubectl create namespace monitoring # Label namespaces kubectl label namespace llm-platform environment=production kubectl label namespace agents environment=production kubectl label namespace monitoring environment=production
Step 3: Configure Storage Classes
Check available storage classes:
kubectl get storageclass # Example output: # NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE # standard kubernetes.io/gce-pd Delete Immediate # ssd-storage kubernetes.io/gce-pd Delete Immediate
Create custom SSD storage class (if needed):
# storage-class.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-ssd provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd replication-type: regional-pd allowVolumeExpansion: true volumeBindingMode: WaitForFirstConsumer
kubectl apply -f storage-class.yaml
Step 4: Install Cert Manager (SSL/TLS)
# Install cert-manager via Helm helm repo add jetstack https://charts.jetstack.io helm repo update helm install cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --version v1.13.3 \ --set installCRDs=true # Verify installation kubectl get pods -n cert-manager
Configure Let's Encrypt issuer:
# letsencrypt-prod.yaml apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: admin@yourcompany.com privateKeySecretRef: name: letsencrypt-prod-key solvers: - http01: ingress: class: nginx
kubectl apply -f letsencrypt-prod.yaml
Step 5: Install Ingress Controller
# Install NGINX Ingress Controller helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo update helm install ingress-nginx ingress-nginx/ingress-nginx \ --namespace ingress-nginx \ --create-namespace \ --set controller.replicaCount=2 \ --set controller.nodeSelector."kubernetes\.io/os"=linux \ --set controller.service.type=LoadBalancer # Get external IP kubectl get service -n ingress-nginx ingress-nginx-controller
Note the EXTERNAL-IP - you'll configure DNS to point to this IP.
Phase 2: Configure Secrets and ConfigMaps
Step 1: Database Credentials
# Create PostgreSQL secret kubectl create secret generic postgres-credentials \ --from-literal=POSTGRES_USER=llm_user \ --from-literal=POSTGRES_PASSWORD=$(openssl rand -base64 32) \ --from-literal=POSTGRES_DB=llm_platform \ --namespace llm-platform # Create Drupal secret kubectl create secret generic drupal-credentials \ --from-literal=DRUPAL_DB_HOST=postgres \ --from-literal=DRUPAL_DB_PORT=5432 \ --from-literal=DRUPAL_DB_NAME=llm_platform \ --from-literal=DRUPAL_DB_USER=llm_user \ --from-literal=DRUPAL_DB_PASSWORD=$(kubectl get secret postgres-credentials -n llm-platform -o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d) \ --namespace llm-platform
Step 2: GitLab Integration Secrets
# Create GitLab OAuth secret kubectl create secret generic gitlab-oauth-secret \ --from-literal=client-id=$GITLAB_OAUTH_CLIENT_ID \ --from-literal=client-secret=$GITLAB_OAUTH_CLIENT_SECRET \ --namespace agents # Create GitLab API token secret kubectl create secret generic gitlab-api-token \ --from-literal=token=$(cat ~/.tokens/gitlab) \ --namespace agents
Step 3: AI Provider API Keys (Optional)
# OpenAI API key kubectl create secret generic openai-credentials \ --from-literal=api-key=$(cat ~/.tokens/openai) \ --namespace llm-platform # Anthropic API key kubectl create secret generic anthropic-credentials \ --from-literal=api-key=$(cat ~/.tokens/anthropic) \ --namespace llm-platform
Step 4: Application ConfigMap
# llm-platform-config.yaml apiVersion: v1 kind: ConfigMap metadata: name: llm-platform-config namespace: llm-platform data: DRUPAL_HASH_SALT: "generate-random-64-char-string" REDIS_HOST: "redis" REDIS_PORT: "6379" QDRANT_URL: "http://qdrant:6333" LLM_GATEWAY_URL: "http://llm-gateway:4000/api/v1" ENVIRONMENT: "production" TRUSTED_HOST_PATTERNS: "^llm\\.yourcompany\\.com$"
kubectl apply -f llm-platform-config.yaml
Phase 3: Deploy Core Services
Step 1: Deploy PostgreSQL Database
Using Helm (recommended):
# Add Bitnami repo helm repo add bitnami https://charts.bitnami.com/bitnami helm repo update # Deploy PostgreSQL helm install postgres bitnami/postgresql \ --namespace llm-platform \ --set auth.username=llm_user \ --set auth.password=$(kubectl get secret postgres-credentials -n llm-platform -o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d) \ --set auth.database=llm_platform \ --set primary.persistence.size=50Gi \ --set primary.persistence.storageClass=fast-ssd \ --set metrics.enabled=true \ --set metrics.serviceMonitor.enabled=true # Verify deployment kubectl get pods -n llm-platform -l app.kubernetes.io/name=postgresql
Step 2: Deploy Redis Cache
# Deploy Redis helm install redis bitnami/redis \ --namespace llm-platform \ --set auth.enabled=false \ --set master.persistence.size=10Gi \ --set master.persistence.storageClass=fast-ssd \ --set replica.replicaCount=2 \ --set metrics.enabled=true # Verify deployment kubectl get pods -n llm-platform -l app.kubernetes.io/name=redis
Step 3: Deploy Qdrant Vector Database
# qdrant-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: qdrant namespace: llm-platform spec: replicas: 2 selector: matchLabels: app: qdrant template: metadata: labels: app: qdrant spec: containers: - name: qdrant image: qdrant/qdrant:v1.7.4 ports: - containerPort: 6333 - containerPort: 6334 volumeMounts: - name: qdrant-storage mountPath: /qdrant/storage resources: requests: memory: "2Gi" cpu: "1000m" limits: memory: "8Gi" cpu: "4000m" volumes: - name: qdrant-storage persistentVolumeClaim: claimName: qdrant-pvc --- apiVersion: v1 kind: Service metadata: name: qdrant namespace: llm-platform spec: selector: app: qdrant ports: - name: http port: 6333 targetPort: 6333 - name: grpc port: 6334 targetPort: 6334 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: qdrant-pvc namespace: llm-platform spec: accessModes: - ReadWriteOnce storageClassName: fast-ssd resources: requests: storage: 100Gi
kubectl apply -f qdrant-deployment.yaml
Phase 4: Deploy LLM Platform (Drupal)
Step 1: Build and Push Docker Image
Option A: Using CI/CD (GitLab)
Create .gitlab-ci.yml:
stages: - build - deploy variables: DOCKER_REGISTRY: registry.gitlab.com IMAGE_NAME: ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA} build: stage: build image: docker:24 services: - docker:24-dind before_script: - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY script: - docker build -t ${IMAGE_NAME} . - docker push ${IMAGE_NAME} only: - main - tags deploy: stage: deploy image: bitnami/kubectl:latest script: - kubectl set image deployment/drupal drupal=${IMAGE_NAME} -n llm-platform - kubectl rollout status deployment/drupal -n llm-platform only: - main
Option B: Manual build
# Build image docker build -t yourregistry.com/llm-platform:v1.0.0 . # Push to registry docker push yourregistry.com/llm-platform:v1.0.0
Step 2: Create Drupal Deployment
# drupal-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: drupal namespace: llm-platform spec: replicas: 3 selector: matchLabels: app: drupal template: metadata: labels: app: drupal spec: containers: - name: drupal image: yourregistry.com/llm-platform:v1.0.0 ports: - containerPort: 8080 env: - name: DRUPAL_DB_HOST valueFrom: secretKeyRef: name: drupal-credentials key: DRUPAL_DB_HOST - name: DRUPAL_DB_USER valueFrom: secretKeyRef: name: drupal-credentials key: DRUPAL_DB_USER - name: DRUPAL_DB_PASSWORD valueFrom: secretKeyRef: name: drupal-credentials key: DRUPAL_DB_PASSWORD - name: REDIS_HOST valueFrom: configMapKeyRef: name: llm-platform-config key: REDIS_HOST volumeMounts: - name: drupal-files mountPath: /var/www/html/web/sites/default/files resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "2Gi" cpu: "2000m" livenessProbe: httpGet: path: / port: 8080 initialDelaySeconds: 60 periodSeconds: 10 readinessProbe: httpGet: path: / port: 8080 initialDelaySeconds: 30 periodSeconds: 5 volumes: - name: drupal-files persistentVolumeClaim: claimName: drupal-files-pvc --- apiVersion: v1 kind: Service metadata: name: drupal namespace: llm-platform spec: selector: app: drupal ports: - port: 80 targetPort: 8080 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: drupal-files-pvc namespace: llm-platform spec: accessModes: - ReadWriteMany storageClassName: fast-ssd resources: requests: storage: 50Gi
kubectl apply -f drupal-deployment.yaml
Step 3: Configure Ingress with SSL
# drupal-ingress.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: drupal-ingress namespace: llm-platform annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/proxy-body-size: "100m" spec: ingressClassName: nginx tls: - hosts: - llm.yourcompany.com secretName: llm-platform-tls rules: - host: llm.yourcompany.com http: paths: - path: / pathType: Prefix backend: service: name: drupal port: number: 80
kubectl apply -f drupal-ingress.yaml # Check certificate issuance kubectl describe certificate llm-platform-tls -n llm-platform
Phase 5: Deploy BuildKit Agents
Step 1: Deploy Agent Infrastructure with Helm
# Clone agent-buildkit repository git clone https://gitlab.com/agentstudio/agent-buildkit.git cd agent-buildkit # Install Helm chart helm install agent-buildkit charts/agents \ --namespace agents \ --set gitlab.oauthClientId=$GITLAB_OAUTH_CLIENT_ID \ --set gitlab.oauthClientSecret=$GITLAB_OAUTH_CLIENT_SECRET \ --set gitlab.apiToken=$(cat ~/.tokens/gitlab) \ --set ingress.enabled=true \ --set ingress.host=agents.yourcompany.com \ --set persistence.enabled=true \ --set persistence.size=100Gi # Verify deployment helm status agent-buildkit -n agents kubectl get pods -n agents
Step 2: Deploy OSSA-Compliant Agents
Using BuildKit CLI:
# Deploy agents to Kubernetes buildkit agents deploy --all \ --namespace agents \ --registry yourregistry.com # This deploys: # - TDD Enforcer # - API Builder # - Documentation Synchronizer # - Security Auditor # - Performance Monitor # - And 20+ more agents
Manual deployment example:
# tdd-enforcer-agent.yaml apiVersion: apps/v1 kind: Deployment metadata: name: tdd-enforcer namespace: agents spec: replicas: 2 selector: matchLabels: app: tdd-enforcer ossa-agent: "true" template: metadata: labels: app: tdd-enforcer ossa-agent: "true" spec: containers: - name: agent image: yourregistry.com/agents/tdd-enforcer:latest env: - name: GITLAB_TOKEN valueFrom: secretKeyRef: name: gitlab-api-token key: token - name: OSSA_AGENT_ID value: "tdd-enforcer-001" resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "1Gi" cpu: "1000m"
kubectl apply -f tdd-enforcer-agent.yaml
Phase 6: Deploy Monitoring Stack
Step 1: Deploy Prometheus & Grafana
# Add Prometheus community Helm repo helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update # Deploy kube-prometheus-stack helm install monitoring prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --set prometheus.prometheusSpec.retention=30d \ --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=100Gi \ --set grafana.adminPassword=$(openssl rand -base64 20) \ --set grafana.ingress.enabled=true \ --set grafana.ingress.hosts[0]=grafana.yourcompany.com # Get Grafana password kubectl get secret -n monitoring monitoring-grafana -o jsonpath="{.data.admin-password}" | base64 -d; echo
Step 2: Deploy Jaeger for Distributed Tracing
# Add Jaeger Helm repo helm repo add jaegertracing https://jaegertracing.github.io/helm-charts helm repo update # Deploy Jaeger helm install jaeger jaegertracing/jaeger \ --namespace monitoring \ --set provisionDataStore.cassandra=false \ --set allInOne.enabled=true \ --set storage.type=memory \ --set ingress.enabled=true \ --set ingress.hosts[0]=jaeger.yourcompany.com
Phase 7: Post-Deployment Configuration
Step 1: Initialize Drupal Site
# Get a Drupal pod name POD=$(kubectl get pod -n llm-platform -l app=drupal -o jsonpath='{.items[0].metadata.name}') # Import configuration kubectl exec -n llm-platform $POD -- drush cim -y # Clear caches kubectl exec -n llm-platform $POD -- drush cr # Verify site status kubectl exec -n llm-platform $POD -- drush status
Step 2: Configure DNS
Point your domains to the ingress controller's external IP:
# Get ingress IP INGRESS_IP=$(kubectl get service -n ingress-nginx ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}') echo "Configure these DNS records:" echo "llm.yourcompany.com A $INGRESS_IP" echo "agents.yourcompany.com A $INGRESS_IP" echo "grafana.yourcompany.com A $INGRESS_IP" echo "jaeger.yourcompany.com A $INGRESS_IP"
In your DNS provider:
- Create A records pointing to the ingress IP
- Wait for DNS propagation (5-60 minutes)
Step 3: Verify SSL Certificates
# Check certificate status kubectl get certificate -n llm-platform kubectl get certificate -n agents # Describe certificate for details kubectl describe certificate llm-platform-tls -n llm-platform # Should show: Certificate is up to date and has not expired
Step 4: Create Initial Admin User
# Create admin user kubectl exec -n llm-platform $POD -- drush user:create admin \ --mail="admin@yourcompany.com" \ --password="SecurePassword123!" # Grant admin role kubectl exec -n llm-platform $POD -- drush user:role:add administrator admin
Phase 8: Verification and Testing
Verify Platform Access
# Test HTTP endpoints curl -I https://llm.yourcompany.com # Should return: HTTP/2 200 curl -I https://agents.yourcompany.com # Should return: HTTP/2 200
Verify Database Connectivity
# PostgreSQL kubectl exec -n llm-platform -it deployment/postgres -- psql -U llm_user -d llm_platform -c "SELECT version();" # Redis kubectl exec -n llm-platform -it deployment/redis-master -- redis-cli ping # Should return: PONG
Verify Agent Health
# Using BuildKit CLI buildkit agents status --namespace agents # Or kubectl kubectl get pods -n agents -l ossa-agent=true
Load Testing
# Install k6 for load testing brew install k6 # macOS # or download from https://k6.io/ # Run basic load test k6 run - <<EOF import http from 'k6/http'; import { check } from 'k6'; export let options = { vus: 10, duration: '30s', }; export default function () { let res = http.get('https://llm.yourcompany.com'); check(res, { 'status was 200': (r) => r.status == 200 }); } EOF
Scaling and High Availability
Horizontal Pod Autoscaling
# drupal-hpa.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: drupal-hpa namespace: llm-platform spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: drupal minReplicas: 3 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80
kubectl apply -f drupal-hpa.yaml
Database High Availability
PostgreSQL with replication:
helm upgrade postgres bitnami/postgresql \ --namespace llm-platform \ --set replication.enabled=true \ --set replication.readReplicas=2 \ --reuse-values
Multi-Region Deployment
For global deployments:
- Deploy to multiple Kubernetes clusters (different regions)
- Use global load balancer (CloudFlare, AWS Global Accelerator)
- Implement database replication across regions
- Configure agent mesh for cross-region coordination
Backup and Disaster Recovery
Database Backups
# Install Velero for cluster backups helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts helm install velero vmware-tanzu/velero \ --namespace velero \ --create-namespace \ --set configuration.provider=aws \ --set configuration.backupStorageLocation.bucket=your-backup-bucket \ --set configuration.backupStorageLocation.config.region=us-east-1 # Create backup schedule kubectl apply -f - <<EOF apiVersion: velero.io/v1 kind: Schedule metadata: name: daily-backup namespace: velero spec: schedule: "0 2 * * *" template: includedNamespaces: - llm-platform - agents storageLocation: default ttl: 720h0m0s EOF
Drupal Files Backup
# Create CronJob for file backups kubectl apply -f - <<EOF apiVersion: batch/v1 kind: CronJob metadata: name: drupal-files-backup namespace: llm-platform spec: schedule: "0 3 * * *" jobTemplate: spec: template: spec: containers: - name: backup image: amazon/aws-cli command: - /bin/sh - -c - | tar czf /tmp/files-backup.tar.gz /files aws s3 cp /tmp/files-backup.tar.gz s3://your-backup-bucket/files/\$(date +%Y%m%d).tar.gz volumeMounts: - name: drupal-files mountPath: /files volumes: - name: drupal-files persistentVolumeClaim: claimName: drupal-files-pvc restartPolicy: OnFailure EOF
Maintenance and Updates
Rolling Updates
# Update Drupal image kubectl set image deployment/drupal \ drupal=yourregistry.com/llm-platform:v1.1.0 \ -n llm-platform # Monitor rollout kubectl rollout status deployment/drupal -n llm-platform # Rollback if needed kubectl rollout undo deployment/drupal -n llm-platform
Run Drupal Updates
# Run database updates kubectl exec -n llm-platform deployment/drupal -- drush updb -y # Clear caches kubectl exec -n llm-platform deployment/drupal -- drush cr
Troubleshooting
Pod Not Starting
# Check pod status kubectl get pods -n llm-platform # Describe pod for events kubectl describe pod <pod-name> -n llm-platform # Check logs kubectl logs <pod-name> -n llm-platform # Check previous logs if crashed kubectl logs <pod-name> -n llm-platform --previous
Database Connection Issues
# Test from Drupal pod kubectl exec -it -n llm-platform deployment/drupal -- /bin/bash # Inside pod: php -r "new PDO('pgsql:host=postgres;port=5432;dbname=llm_platform', 'llm_user', 'password');"
SSL Certificate Not Issuing
# Check cert-manager logs kubectl logs -n cert-manager deployment/cert-manager # Check certificate request kubectl describe certificaterequest -n llm-platform # Check challenge kubectl describe challenge -n llm-platform
Security Hardening
Network Policies
# network-policy.yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: drupal-network-policy namespace: llm-platform spec: podSelector: matchLabels: app: drupal policyTypes: - Ingress - Egress ingress: - from: - namespaceSelector: matchLabels: name: ingress-nginx ports: - protocol: TCP port: 8080 egress: - to: - podSelector: matchLabels: app: postgres ports: - protocol: TCP port: 5432 - to: - podSelector: matchLabels: app: redis ports: - protocol: TCP port: 6379
kubectl apply -f network-policy.yaml
Pod Security Standards
# pod-security.yaml apiVersion: v1 kind: Namespace metadata: name: llm-platform labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted
Next Steps
Your production deployment is complete! Next:
- Monitor your deployment: Observability Guide
- Set up CI/CD: Automate deployments with GitLab CI/CD
- Scale agents: Add more OSSA agents based on workload
- Optimize performance: Tune resource limits and caching