Skip to main content

Production Deployment

Production Deployment

Separation of Duties: See Separation of Duties - Getting started guides document onboarding. They do NOT own agent manifests, execution, or infrastructure configuration.

Complete guide to deploying the LLM Platform and OSSA agents to production Kubernetes environments.

Overview

This guide covers deploying:

  • LLM Platform (Drupal) - Web application and API
  • BuildKit Agents - OSSA-compliant autonomous agents
  • Supporting Services - Databases, caching, vector storage, monitoring

Prerequisites

Infrastructure Requirements

  • Kubernetes cluster: 1.28+ with 3+ worker nodes
  • kubectl: 1.28+ configured with cluster access
  • Helm: 3.13+ installed locally
  • Persistent storage: StorageClass with dynamic provisioning
  • Ingress controller: NGINX, Traefik, or similar
  • SSL certificates: Let's Encrypt or commercial CA
  • Container registry: Docker Hub, GitLab Container Registry, or private registry

Access Requirements

  • Cluster admin access via kubectl
  • Container registry push access
  • DNS management for domain configuration
  • GitLab project with CI/CD enabled

Resource Requirements

Minimum cluster capacity:

  • CPU: 24 cores total (across all nodes)
  • RAM: 64 GB total
  • Storage: 500 GB persistent volumes
  • Network: 10 Gbps inter-node bandwidth

Recommended for production:

  • CPU: 48+ cores
  • RAM: 128+ GB
  • Storage: 1 TB+ SSD-backed PVs
  • Load balancer: Cloud provider LB or MetalLB

Deployment Architecture

Component Overview


                       Kubernetes Cluster                        

                                                                 
              
     Ingress        Cert Mgr         Monitoring         
    Controller                       Stack              
              
                                                              
                                                              
    
                LLM Platform Namespace                       
                      
      Drupal       Redis    PostgreSQL              
       Pods        Cache        DB                  
                      
    
                                                                 
    
                Agents Namespace                             
                      
      OSSA        Vector       Kafka                
      Agents      Database    Stream                
                      
    
                                                                 


Phase 1: Kubernetes Cluster Preparation

Step 1: Verify Cluster Access

# Check kubectl configuration kubectl cluster-info kubectl get nodes # Expected output: # NAME STATUS ROLES AGE VERSION # node-1 Ready master 30d v1.28.4 # node-2 Ready worker 30d v1.28.4 # node-3 Ready worker 30d v1.28.4

Step 2: Create Namespaces

# Create namespaces for organization kubectl create namespace llm-platform kubectl create namespace agents kubectl create namespace monitoring # Label namespaces kubectl label namespace llm-platform environment=production kubectl label namespace agents environment=production kubectl label namespace monitoring environment=production

Step 3: Configure Storage Classes

Check available storage classes:

kubectl get storageclass # Example output: # NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE # standard kubernetes.io/gce-pd Delete Immediate # ssd-storage kubernetes.io/gce-pd Delete Immediate

Create custom SSD storage class (if needed):

# storage-class.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-ssd provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd replication-type: regional-pd allowVolumeExpansion: true volumeBindingMode: WaitForFirstConsumer
kubectl apply -f storage-class.yaml

Step 4: Install Cert Manager (SSL/TLS)

# Install cert-manager via Helm helm repo add jetstack https://charts.jetstack.io helm repo update helm install cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --version v1.13.3 \ --set installCRDs=true # Verify installation kubectl get pods -n cert-manager

Configure Let's Encrypt issuer:

# letsencrypt-prod.yaml apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: admin@yourcompany.com privateKeySecretRef: name: letsencrypt-prod-key solvers: - http01: ingress: class: nginx
kubectl apply -f letsencrypt-prod.yaml

Step 5: Install Ingress Controller

# Install NGINX Ingress Controller helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo update helm install ingress-nginx ingress-nginx/ingress-nginx \ --namespace ingress-nginx \ --create-namespace \ --set controller.replicaCount=2 \ --set controller.nodeSelector."kubernetes\.io/os"=linux \ --set controller.service.type=LoadBalancer # Get external IP kubectl get service -n ingress-nginx ingress-nginx-controller

Note the EXTERNAL-IP - you'll configure DNS to point to this IP.


Phase 2: Configure Secrets and ConfigMaps

Step 1: Database Credentials

# Create PostgreSQL secret kubectl create secret generic postgres-credentials \ --from-literal=POSTGRES_USER=llm_user \ --from-literal=POSTGRES_PASSWORD=$(openssl rand -base64 32) \ --from-literal=POSTGRES_DB=llm_platform \ --namespace llm-platform # Create Drupal secret kubectl create secret generic drupal-credentials \ --from-literal=DRUPAL_DB_HOST=postgres \ --from-literal=DRUPAL_DB_PORT=5432 \ --from-literal=DRUPAL_DB_NAME=llm_platform \ --from-literal=DRUPAL_DB_USER=llm_user \ --from-literal=DRUPAL_DB_PASSWORD=$(kubectl get secret postgres-credentials -n llm-platform -o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d) \ --namespace llm-platform

Step 2: GitLab Integration Secrets

# Create GitLab OAuth secret kubectl create secret generic gitlab-oauth-secret \ --from-literal=client-id=$GITLAB_OAUTH_CLIENT_ID \ --from-literal=client-secret=$GITLAB_OAUTH_CLIENT_SECRET \ --namespace agents # Create GitLab API token secret kubectl create secret generic gitlab-api-token \ --from-literal=token=$(cat ~/.tokens/gitlab) \ --namespace agents

Step 3: AI Provider API Keys (Optional)

# OpenAI API key kubectl create secret generic openai-credentials \ --from-literal=api-key=$(cat ~/.tokens/openai) \ --namespace llm-platform # Anthropic API key kubectl create secret generic anthropic-credentials \ --from-literal=api-key=$(cat ~/.tokens/anthropic) \ --namespace llm-platform

Step 4: Application ConfigMap

# llm-platform-config.yaml apiVersion: v1 kind: ConfigMap metadata: name: llm-platform-config namespace: llm-platform data: DRUPAL_HASH_SALT: "generate-random-64-char-string" REDIS_HOST: "redis" REDIS_PORT: "6379" QDRANT_URL: "http://qdrant:6333" LLM_GATEWAY_URL: "http://llm-gateway:4000/api/v1" ENVIRONMENT: "production" TRUSTED_HOST_PATTERNS: "^llm\\.yourcompany\\.com$"
kubectl apply -f llm-platform-config.yaml

Phase 3: Deploy Core Services

Step 1: Deploy PostgreSQL Database

Using Helm (recommended):

# Add Bitnami repo helm repo add bitnami https://charts.bitnami.com/bitnami helm repo update # Deploy PostgreSQL helm install postgres bitnami/postgresql \ --namespace llm-platform \ --set auth.username=llm_user \ --set auth.password=$(kubectl get secret postgres-credentials -n llm-platform -o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d) \ --set auth.database=llm_platform \ --set primary.persistence.size=50Gi \ --set primary.persistence.storageClass=fast-ssd \ --set metrics.enabled=true \ --set metrics.serviceMonitor.enabled=true # Verify deployment kubectl get pods -n llm-platform -l app.kubernetes.io/name=postgresql

Step 2: Deploy Redis Cache

# Deploy Redis helm install redis bitnami/redis \ --namespace llm-platform \ --set auth.enabled=false \ --set master.persistence.size=10Gi \ --set master.persistence.storageClass=fast-ssd \ --set replica.replicaCount=2 \ --set metrics.enabled=true # Verify deployment kubectl get pods -n llm-platform -l app.kubernetes.io/name=redis

Step 3: Deploy Qdrant Vector Database

# qdrant-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: qdrant namespace: llm-platform spec: replicas: 2 selector: matchLabels: app: qdrant template: metadata: labels: app: qdrant spec: containers: - name: qdrant image: qdrant/qdrant:v1.7.4 ports: - containerPort: 6333 - containerPort: 6334 volumeMounts: - name: qdrant-storage mountPath: /qdrant/storage resources: requests: memory: "2Gi" cpu: "1000m" limits: memory: "8Gi" cpu: "4000m" volumes: - name: qdrant-storage persistentVolumeClaim: claimName: qdrant-pvc --- apiVersion: v1 kind: Service metadata: name: qdrant namespace: llm-platform spec: selector: app: qdrant ports: - name: http port: 6333 targetPort: 6333 - name: grpc port: 6334 targetPort: 6334 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: qdrant-pvc namespace: llm-platform spec: accessModes: - ReadWriteOnce storageClassName: fast-ssd resources: requests: storage: 100Gi
kubectl apply -f qdrant-deployment.yaml

Phase 4: Deploy LLM Platform (Drupal)

Step 1: Build and Push Docker Image

Option A: Using CI/CD (GitLab)

Create .gitlab-ci.yml:

stages: - build - deploy variables: DOCKER_REGISTRY: registry.gitlab.com IMAGE_NAME: ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA} build: stage: build image: docker:24 services: - docker:24-dind before_script: - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY script: - docker build -t ${IMAGE_NAME} . - docker push ${IMAGE_NAME} only: - main - tags deploy: stage: deploy image: bitnami/kubectl:latest script: - kubectl set image deployment/drupal drupal=${IMAGE_NAME} -n llm-platform - kubectl rollout status deployment/drupal -n llm-platform only: - main

Option B: Manual build

# Build image docker build -t yourregistry.com/llm-platform:v1.0.0 . # Push to registry docker push yourregistry.com/llm-platform:v1.0.0

Step 2: Create Drupal Deployment

# drupal-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: drupal namespace: llm-platform spec: replicas: 3 selector: matchLabels: app: drupal template: metadata: labels: app: drupal spec: containers: - name: drupal image: yourregistry.com/llm-platform:v1.0.0 ports: - containerPort: 8080 env: - name: DRUPAL_DB_HOST valueFrom: secretKeyRef: name: drupal-credentials key: DRUPAL_DB_HOST - name: DRUPAL_DB_USER valueFrom: secretKeyRef: name: drupal-credentials key: DRUPAL_DB_USER - name: DRUPAL_DB_PASSWORD valueFrom: secretKeyRef: name: drupal-credentials key: DRUPAL_DB_PASSWORD - name: REDIS_HOST valueFrom: configMapKeyRef: name: llm-platform-config key: REDIS_HOST volumeMounts: - name: drupal-files mountPath: /var/www/html/web/sites/default/files resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "2Gi" cpu: "2000m" livenessProbe: httpGet: path: / port: 8080 initialDelaySeconds: 60 periodSeconds: 10 readinessProbe: httpGet: path: / port: 8080 initialDelaySeconds: 30 periodSeconds: 5 volumes: - name: drupal-files persistentVolumeClaim: claimName: drupal-files-pvc --- apiVersion: v1 kind: Service metadata: name: drupal namespace: llm-platform spec: selector: app: drupal ports: - port: 80 targetPort: 8080 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: drupal-files-pvc namespace: llm-platform spec: accessModes: - ReadWriteMany storageClassName: fast-ssd resources: requests: storage: 50Gi
kubectl apply -f drupal-deployment.yaml

Step 3: Configure Ingress with SSL

# drupal-ingress.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: drupal-ingress namespace: llm-platform annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/proxy-body-size: "100m" spec: ingressClassName: nginx tls: - hosts: - llm.yourcompany.com secretName: llm-platform-tls rules: - host: llm.yourcompany.com http: paths: - path: / pathType: Prefix backend: service: name: drupal port: number: 80
kubectl apply -f drupal-ingress.yaml # Check certificate issuance kubectl describe certificate llm-platform-tls -n llm-platform

Phase 5: Deploy BuildKit Agents

Step 1: Deploy Agent Infrastructure with Helm

# Clone agent-buildkit repository git clone https://gitlab.com/agentstudio/agent-buildkit.git cd agent-buildkit # Install Helm chart helm install agent-buildkit charts/agents \ --namespace agents \ --set gitlab.oauthClientId=$GITLAB_OAUTH_CLIENT_ID \ --set gitlab.oauthClientSecret=$GITLAB_OAUTH_CLIENT_SECRET \ --set gitlab.apiToken=$(cat ~/.tokens/gitlab) \ --set ingress.enabled=true \ --set ingress.host=agents.yourcompany.com \ --set persistence.enabled=true \ --set persistence.size=100Gi # Verify deployment helm status agent-buildkit -n agents kubectl get pods -n agents

Step 2: Deploy OSSA-Compliant Agents

Using BuildKit CLI:

# Deploy agents to Kubernetes buildkit agents deploy --all \ --namespace agents \ --registry yourregistry.com # This deploys: # - TDD Enforcer # - API Builder # - Documentation Synchronizer # - Security Auditor # - Performance Monitor # - And 20+ more agents

Manual deployment example:

# tdd-enforcer-agent.yaml apiVersion: apps/v1 kind: Deployment metadata: name: tdd-enforcer namespace: agents spec: replicas: 2 selector: matchLabels: app: tdd-enforcer ossa-agent: "true" template: metadata: labels: app: tdd-enforcer ossa-agent: "true" spec: containers: - name: agent image: yourregistry.com/agents/tdd-enforcer:latest env: - name: GITLAB_TOKEN valueFrom: secretKeyRef: name: gitlab-api-token key: token - name: OSSA_AGENT_ID value: "tdd-enforcer-001" resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "1Gi" cpu: "1000m"
kubectl apply -f tdd-enforcer-agent.yaml

Phase 6: Deploy Monitoring Stack

Step 1: Deploy Prometheus & Grafana

# Add Prometheus community Helm repo helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update # Deploy kube-prometheus-stack helm install monitoring prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --set prometheus.prometheusSpec.retention=30d \ --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=100Gi \ --set grafana.adminPassword=$(openssl rand -base64 20) \ --set grafana.ingress.enabled=true \ --set grafana.ingress.hosts[0]=grafana.yourcompany.com # Get Grafana password kubectl get secret -n monitoring monitoring-grafana -o jsonpath="{.data.admin-password}" | base64 -d; echo

Step 2: Deploy Jaeger for Distributed Tracing

# Add Jaeger Helm repo helm repo add jaegertracing https://jaegertracing.github.io/helm-charts helm repo update # Deploy Jaeger helm install jaeger jaegertracing/jaeger \ --namespace monitoring \ --set provisionDataStore.cassandra=false \ --set allInOne.enabled=true \ --set storage.type=memory \ --set ingress.enabled=true \ --set ingress.hosts[0]=jaeger.yourcompany.com

Phase 7: Post-Deployment Configuration

Step 1: Initialize Drupal Site

# Get a Drupal pod name POD=$(kubectl get pod -n llm-platform -l app=drupal -o jsonpath='{.items[0].metadata.name}') # Import configuration kubectl exec -n llm-platform $POD -- drush cim -y # Clear caches kubectl exec -n llm-platform $POD -- drush cr # Verify site status kubectl exec -n llm-platform $POD -- drush status

Step 2: Configure DNS

Point your domains to the ingress controller's external IP:

# Get ingress IP INGRESS_IP=$(kubectl get service -n ingress-nginx ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}') echo "Configure these DNS records:" echo "llm.yourcompany.com A $INGRESS_IP" echo "agents.yourcompany.com A $INGRESS_IP" echo "grafana.yourcompany.com A $INGRESS_IP" echo "jaeger.yourcompany.com A $INGRESS_IP"

In your DNS provider:

  • Create A records pointing to the ingress IP
  • Wait for DNS propagation (5-60 minutes)

Step 3: Verify SSL Certificates

# Check certificate status kubectl get certificate -n llm-platform kubectl get certificate -n agents # Describe certificate for details kubectl describe certificate llm-platform-tls -n llm-platform # Should show: Certificate is up to date and has not expired

Step 4: Create Initial Admin User

# Create admin user kubectl exec -n llm-platform $POD -- drush user:create admin \ --mail="admin@yourcompany.com" \ --password="SecurePassword123!" # Grant admin role kubectl exec -n llm-platform $POD -- drush user:role:add administrator admin

Phase 8: Verification and Testing

Verify Platform Access

# Test HTTP endpoints curl -I https://llm.yourcompany.com # Should return: HTTP/2 200 curl -I https://agents.yourcompany.com # Should return: HTTP/2 200

Verify Database Connectivity

# PostgreSQL kubectl exec -n llm-platform -it deployment/postgres -- psql -U llm_user -d llm_platform -c "SELECT version();" # Redis kubectl exec -n llm-platform -it deployment/redis-master -- redis-cli ping # Should return: PONG

Verify Agent Health

# Using BuildKit CLI buildkit agents status --namespace agents # Or kubectl kubectl get pods -n agents -l ossa-agent=true

Load Testing

# Install k6 for load testing brew install k6 # macOS # or download from https://k6.io/ # Run basic load test k6 run - <<EOF import http from 'k6/http'; import { check } from 'k6'; export let options = { vus: 10, duration: '30s', }; export default function () { let res = http.get('https://llm.yourcompany.com'); check(res, { 'status was 200': (r) => r.status == 200 }); } EOF

Scaling and High Availability

Horizontal Pod Autoscaling

# drupal-hpa.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: drupal-hpa namespace: llm-platform spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: drupal minReplicas: 3 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80
kubectl apply -f drupal-hpa.yaml

Database High Availability

PostgreSQL with replication:

helm upgrade postgres bitnami/postgresql \ --namespace llm-platform \ --set replication.enabled=true \ --set replication.readReplicas=2 \ --reuse-values

Multi-Region Deployment

For global deployments:

  1. Deploy to multiple Kubernetes clusters (different regions)
  2. Use global load balancer (CloudFlare, AWS Global Accelerator)
  3. Implement database replication across regions
  4. Configure agent mesh for cross-region coordination

Backup and Disaster Recovery

Database Backups

# Install Velero for cluster backups helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts helm install velero vmware-tanzu/velero \ --namespace velero \ --create-namespace \ --set configuration.provider=aws \ --set configuration.backupStorageLocation.bucket=your-backup-bucket \ --set configuration.backupStorageLocation.config.region=us-east-1 # Create backup schedule kubectl apply -f - <<EOF apiVersion: velero.io/v1 kind: Schedule metadata: name: daily-backup namespace: velero spec: schedule: "0 2 * * *" template: includedNamespaces: - llm-platform - agents storageLocation: default ttl: 720h0m0s EOF

Drupal Files Backup

# Create CronJob for file backups kubectl apply -f - <<EOF apiVersion: batch/v1 kind: CronJob metadata: name: drupal-files-backup namespace: llm-platform spec: schedule: "0 3 * * *" jobTemplate: spec: template: spec: containers: - name: backup image: amazon/aws-cli command: - /bin/sh - -c - | tar czf /tmp/files-backup.tar.gz /files aws s3 cp /tmp/files-backup.tar.gz s3://your-backup-bucket/files/\$(date +%Y%m%d).tar.gz volumeMounts: - name: drupal-files mountPath: /files volumes: - name: drupal-files persistentVolumeClaim: claimName: drupal-files-pvc restartPolicy: OnFailure EOF

Maintenance and Updates

Rolling Updates

# Update Drupal image kubectl set image deployment/drupal \ drupal=yourregistry.com/llm-platform:v1.1.0 \ -n llm-platform # Monitor rollout kubectl rollout status deployment/drupal -n llm-platform # Rollback if needed kubectl rollout undo deployment/drupal -n llm-platform

Run Drupal Updates

# Run database updates kubectl exec -n llm-platform deployment/drupal -- drush updb -y # Clear caches kubectl exec -n llm-platform deployment/drupal -- drush cr

Troubleshooting

Pod Not Starting

# Check pod status kubectl get pods -n llm-platform # Describe pod for events kubectl describe pod <pod-name> -n llm-platform # Check logs kubectl logs <pod-name> -n llm-platform # Check previous logs if crashed kubectl logs <pod-name> -n llm-platform --previous

Database Connection Issues

# Test from Drupal pod kubectl exec -it -n llm-platform deployment/drupal -- /bin/bash # Inside pod: php -r "new PDO('pgsql:host=postgres;port=5432;dbname=llm_platform', 'llm_user', 'password');"

SSL Certificate Not Issuing

# Check cert-manager logs kubectl logs -n cert-manager deployment/cert-manager # Check certificate request kubectl describe certificaterequest -n llm-platform # Check challenge kubectl describe challenge -n llm-platform

Security Hardening

Network Policies

# network-policy.yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: drupal-network-policy namespace: llm-platform spec: podSelector: matchLabels: app: drupal policyTypes: - Ingress - Egress ingress: - from: - namespaceSelector: matchLabels: name: ingress-nginx ports: - protocol: TCP port: 8080 egress: - to: - podSelector: matchLabels: app: postgres ports: - protocol: TCP port: 5432 - to: - podSelector: matchLabels: app: redis ports: - protocol: TCP port: 6379
kubectl apply -f network-policy.yaml

Pod Security Standards

# pod-security.yaml apiVersion: v1 kind: Namespace metadata: name: llm-platform labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted

Next Steps

Your production deployment is complete! Next:

  1. Monitor your deployment: Observability Guide
  2. Set up CI/CD: Automate deployments with GitLab CI/CD
  3. Scale agents: Add more OSSA agents based on workload
  4. Optimize performance: Tune resource limits and caching

Additional Resources