M4 Pro AI Agent Machine - Complete Automation Guide

Separation of Duties: See Separation of Duties - Infrastructure setup documentation is responsible for documenting setup procedures. It does NOT own agent manifests, execution, or infrastructure configuration.

Hardware: Apple M4 Pro, 14 cores, 48GB RAM Goal: Deploy 100+ agents in parallel with full automation Current: KEDA installed, metrics broken, manual lifecycle management

Phase 1: Replace OrbStack Rancher Desktop

Why: OrbStack uses VM layer (7GB overhead). Rancher Desktop uses native containerd.

Option A: Rancher Desktop (Recommended)

# Backup OrbStack data first
kubectl get all -A -o yaml > ~/orbstack-backup-$(date +%Y%m%d).yaml

# Stop OrbStack
orb k8s stop
killall OrbStack

# Install Rancher Desktop (includes K3s, containerd, Helm, kubectl)
brew install --cask rancher

# Configure for M4 optimization
# Settings  Kubernetes  12 CPUs, 32GB RAM (leave 16GB for macOS)
# Settings  Container Engine  containerd (not dockerd)
# Settings  Kubernetes  Enable Traefik, Metrics Server

# Restore workloads
kubectl apply -f ~/orbstack-backup-$(date +%Y%m%d).yaml

Option B: Colima (Lightweight Alternative)

# Even lighter than Rancher - just containerd
brew install colima docker kubectl helm

# Start with M4 optimization
colima start \
  --cpu 12 \
  --memory 32 \
  --disk 100 \
  --arch aarch64 \
  --vm-type vz \
  --vz-rosetta \
  --network-address \
  --kubernetes \
  --kubernetes-version v1.28.3+k3s1

Winner: Rancher Desktop - Better UI, includes Traefik, auto-updates

Phase 2: Fix KEDA Autoscaling

Your KEDA is installed but metrics are broken. Fix:

# 1. Check metrics-server (required for KEDA)
kubectl get deployment metrics-server -n kube-system

# 2. If missing, install:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# 3. Fix metrics-server for local K3s (insecure certs)
kubectl patch deployment metrics-server -n kube-system --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

# 4. Wait for metrics
kubectl top nodes
kubectl top pods -A

# 5. Test KEDA scaling
kubectl get scaledobjects -A
kubectl describe scaledobject ecosystem-cleanup-orchestrator-scaler

# 6. Create a test scaled deployment
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-test-metrics
  namespace: default
data:
  metric: "100"  # Trigger scaling
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: test-agent-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: test-agent
  minReplicaCount: 0
  maxReplicaCount: 10
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-service.ossa-agents:9090
      metricName: test_agent_queue_depth
      query: |
        test_agent_queue_depth
      threshold: "5"
EOF

# 7. Verify scaling works
kubectl get hpa -w

Phase 3: Install Temporal (Durable Agent Workflows)

WHY TEMPORAL?

Runs 100,000+ concurrent workflows (agents)
Auto-retry failed agents
Durable execution (survives crashes)
Event-driven agent orchestration
Built-in metrics for KEDA

# Install Temporal via Helm
helm repo add temporalio https://go.temporal.io/helm-charts
helm repo update

# Install with optimized settings for M4
helm install temporal temporalio/temporal \
  --namespace temporal-system \
  --create-namespace \
  --set server.replicaCount=1 \
  --set cassandra.enabled=false \
  --set postgresql.enabled=true \
  --set postgresql.primary.resources.requests.memory=2Gi \
  --set prometheus.enabled=true \
  --set grafana.enabled=true \
  --set elasticsearch.enabled=false \
  --set server.resources.requests.cpu=2 \
  --set server.resources.requests.memory=4Gi \
  --set server.resources.limits.cpu=8 \
  --set server.resources.limits.memory=16Gi

# Wait for Temporal to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=temporal -n temporal-system --timeout=300s

# Install Temporal CLI
brew install temporal

# Connect to local Temporal
temporal operator namespace create default
temporal operator search-attribute list

# Test workflow execution
cat > /tmp/test-workflow.json <<EOF
{
  "workflowId": "test-agent-workflow",
  "workflowType": "AgentExecutionWorkflow",
  "taskQueue": "agent-tasks"
}
EOF

temporal workflow start --input-file /tmp/test-workflow.json

Scale Temporal with KEDA

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: temporal-worker-scaler
  namespace: temporal-system
spec:
  scaleTargetRef:
    name: temporal-worker
  minReplicaCount: 1
  maxReplicaCount: 100  # Scale to 100 workers for 100s of agents
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-service.ossa-agents:9090
      query: |
        sum(temporal_task_queue_depth{task_queue="agent-tasks"})
      threshold: "10"  # 1 worker per 10 tasks

Phase 4: Install Argo Workflows (Parallel Jobs)

WHY ARGO WORKFLOWS?

DAG-based parallel execution
Run 1000+ agents simultaneously
Better than K8s Jobs (reusable, templated)
Native Prometheus metrics

# Install Argo Workflows
kubectl create namespace argo
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/latest/download/install.yaml

# Patch for LoadBalancer access
kubectl patch svc argo-server -n argo -p '{"spec": {"type": "LoadBalancer"}}'

# Install Argo CLI
brew install argo

# Configure RBAC for workflow execution
kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=argo:default -n argo

# Test parallel agent execution
cat <<EOF | argo submit -n argo --watch -
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: parallel-agents-
spec:
  entrypoint: agent-swarm
  arguments:
    parameters:
    - name: agent-count
      value: "100"
  templates:
  - name: agent-swarm
    inputs:
      parameters:
      - name: agent-count
    steps:
    - - name: spawn-agents
        template: agent-task
        arguments:
          parameters:
          - name: agent-id
            value: "{{item}}"
        withSequence:
          count: "{{inputs.parameters.agent-count}}"

  - name: agent-task
    inputs:
      parameters:
      - name: agent-id
    container:
      image: ghcr.io/your-org/ossa-agent:latest
      command: ["/bin/sh", "-c"]
      args:
      - |
        echo "Agent {{inputs.parameters.agent-id}} executing task"
        # Your agent code here
        sleep $((RANDOM % 10))
        echo "Agent {{inputs.parameters.agent-id}} completed"
      resources:
        requests:
          memory: "256Mi"
          cpu: "100m"
        limits:
          memory: "512Mi"
          cpu: "500m"
EOF

KEDA + Argo Integration

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: argo-workflow-scaler
  namespace: argo
spec:
  scaleTargetRef:
    name: workflow-controller
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-service.ossa-agents:9090
      query: |
        sum(argo_workflow_status{status="Running"})
      threshold: "50"  # Scale controller when >50 workflows running

Phase 5: Mac Automation with launchd

Auto-start/stop K3s based on schedule and resource usage

Create launchd job for resource monitoring

# Create monitoring script
cat > /usr/local/bin/k3s-resource-manager.sh <<'EOF'
#!/bin/bash
set -e

MEMORY_THRESHOLD_GB=40
CPU_THRESHOLD_PERCENT=80

# Get current resource usage
MEMORY_USED_GB=$(vm_stat | awk '/Pages active/ {print int($3 * 16384 / 1073741824)}')
CPU_USED=$(ps -A -o %cpu | awk '{s+=$1} END {print int(s)}')

# Check if K3s should scale down
if [ "$MEMORY_USED_GB" -gt "$MEMORY_THRESHOLD_GB" ] || [ "$CPU_USED" -gt "$CPU_THRESHOLD_PERCENT" ]; then
  echo "$(date): High resource usage - scaling down non-critical services"

  # Scale down development services
  kubectl scale deployment --all --replicas=0 -n agents-staging
  kubectl scale deployment --all --replicas=0 -n apidog

  # Keep production running
  kubectl scale deployment agent-gateway --replicas=3 -n ossa-prod
fi

# Auto-cleanup completed Argo workflows
argo delete --completed -n argo

# Cleanup old pods
kubectl delete pods --field-selector status.phase=Succeeded -A
kubectl delete pods --field-selector status.phase=Failed -A
EOF

chmod +x /usr/local/bin/k3s-resource-manager.sh

# Create launchd plist
cat > ~/Library/LaunchAgents/com.bluefly.k3s-resource-manager.plist <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.bluefly.k3s-resource-manager</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/k3s-resource-manager.sh</string>
    </array>
    <key>StartInterval</key>
    <integer>300</integer>
    <key>RunAtLoad</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/tmp/k3s-resource-manager.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/k3s-resource-manager.err</string>
</dict>
</plist>
EOF

# Load launchd job
launchctl load ~/Library/LaunchAgents/com.bluefly.k3s-resource-manager.plist

Schedule-based K3s management

# Create schedule script
cat > /usr/local/bin/k3s-scheduler.sh <<'EOF'
#!/bin/bash

HOUR=$(date +%H)

# Business hours (9am-6pm): Full power
if [ "$HOUR" -ge 9 ] && [ "$HOUR" -lt 18 ]; then
  echo "$(date): Business hours - scaling up all services"
  kubectl scale deployment --all --replicas=3 -n ossa-agents
  kubectl scale deployment --all --replicas=5 -n ossa-prod

# Night time: Scale down to minimum
elif [ "$HOUR" -ge 22 ] || [ "$HOUR" -lt 6 ]; then
  echo "$(date): Night time - scaling down to minimum"
  kubectl scale deployment --all --replicas=0 -n agents-staging
  kubectl scale deployment --all --replicas=1 -n ossa-prod

# Default: Moderate scaling
else
  echo "$(date): Default hours - moderate scaling"
  kubectl scale deployment --all --replicas=1 -n ossa-agents
  kubectl scale deployment --all --replicas=2 -n ossa-prod
fi
EOF

chmod +x /usr/local/bin/k3s-scheduler.sh

# Create hourly launchd job
cat > ~/Library/LaunchAgents/com.bluefly.k3s-scheduler.plist <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.bluefly.k3s-scheduler</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/k3s-scheduler.sh</string>
    </array>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Minute</key>
        <integer>0</integer>
    </dict>
    <key>RunAtLoad</key>
    <true/>
</dict>
</plist>
EOF

launchctl load ~/Library/LaunchAgents/com.bluefly.k3s-scheduler.plist

Phase 6: M4 Optimization

Use ARM-native images (3x faster)

# Check current images
kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | sort -u

# Replace with ARM-optimized versions
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: arm-optimized-images
  namespace: kube-system
data:
  # Replace x86 images with ARM equivalents
  redis: "arm64v8/redis:7-alpine"
  postgres: "arm64v8/postgres:16-alpine"
  nginx: "arm64v8/nginx:alpine"
  prometheus: "prom/prometheus:latest"  # Multi-arch
  grafana: "grafana/grafana:latest"  # Multi-arch
EOF

Pin critical workloads to performance cores

# M4 Pro has 10 performance cores + 4 efficiency cores
# Pin agent workloads to performance cores

kubectl label nodes orbstack node-role.kubernetes.io/performance=true

# Apply affinity rules
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-gateway
  namespace: ossa-prod
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/performance
                operator: In
                values:
                - "true"
      # Use guaranteed QoS (highest priority)
      containers:
      - name: agent-gateway
        resources:
          requests:
            cpu: "2"
            memory: "4Gi"
          limits:
            cpu: "2"
            memory: "4Gi"
EOF

Enable Metal GPU acceleration for AI workloads

# For AI/ML agents using GPU
# Install Metal device plugin (if needed)
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: metal-device-plugin
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: metal-device-plugin
  template:
    metadata:
      labels:
        name: metal-device-plugin
    spec:
      containers:
      - name: metal-device-plugin
        image: ghcr.io/bluefly/metal-device-plugin:latest
        securityContext:
          privileged: true
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
EOF

Phase 7: BuildKit Agent Swarm Commands

Create buildkit golden swarm commands:

# Create swarm command implementation
cat > $LLM_ROOT/agent-buildkit/lib/commands/golden/swarm.js <<'EOF'
#!/usr/bin/env node

const { execSync } = require('child_process');
const fs = require('fs');

async function deployAgentSwarm(options) {
  const {
    agents = 100,
    workflow = 'argo',  // 'argo' or 'temporal'
    namespace = 'agent-swarm',
    image = 'ghcr.io/bluefly/ossa-agent:latest',
    task = 'coding',
    parallel = true
  } = options;

  console.log(` Deploying ${agents} agents to ${workflow}...`);

  if (workflow === 'argo') {
    // Use Argo Workflows for parallel execution
    const argoManifest = {
      apiVersion: 'argoproj.io/v1alpha1',
      kind: 'Workflow',
      metadata: {
        generateName: 'agent-swarm-',
        namespace
      },
      spec: {
        entrypoint: 'agent-swarm',
        arguments: {
          parameters: [
            { name: 'agent-count', value: agents.toString() },
            { name: 'task-type', value: task }
          ]
        },
        templates: [
          {
            name: 'agent-swarm',
            steps: parallel ? [
              [{
                name: 'spawn-agents',
                template: 'agent-task',
                arguments: {
                  parameters: [{ name: 'agent-id', value: '{{item}}' }]
                },
                withSequence: { count: agents.toString() }
              }]
            ] : [[]]
          },
          {
            name: 'agent-task',
            inputs: {
              parameters: [{ name: 'agent-id' }]
            },
            container: {
              image,
              command: ['ossa', 'execute'],
              args: [
                '--task', '{{workflow.parameters.task-type}}',
                '--agent-id', '{{inputs.parameters.agent-id}}'
              ],
              resources: {
                requests: { memory: '256Mi', cpu: '100m' },
                limits: { memory: '512Mi', cpu: '500m' }
              }
            }
          }
        ]
      }
    };

    fs.writeFileSync('/tmp/argo-swarm.yaml', JSON.stringify(argoManifest));
    execSync('argo submit /tmp/argo-swarm.yaml -n agent-swarm --watch', { stdio: 'inherit' });
  }

  if (workflow === 'temporal') {
    // Use Temporal for durable workflows
    const temporalWorkflow = `
temporal workflow start \\
  --type AgentSwarmWorkflow \\
  --task-queue agent-tasks \\
  --workflow-id agent-swarm-${Date.now()} \\
  --input '{"agentCount": ${agents}, "taskType": "${task}"}'
`;
    execSync(temporalWorkflow, { stdio: 'inherit' });
  }

  console.log(' Agent swarm deployed!');
}

module.exports = { deployAgentSwarm };
EOF

# Add to buildkit CLI
buildkit golden swarm deploy --agents 100 --workflow argo --task coding

Final Architecture


              M4 Pro (14 cores, 48GB RAM)                    

                                                             
    
    Rancher Desktop (K3s + containerd)                   
    - 12 CPUs, 32GB RAM                                  
    - ARM-native images                                  
    - Metal GPU support                                  
    
                                                            
       
                                                           
               
        KEDA          Temporal       Argo          
     Autoscaler      Workflows      Flows          
               
                                                        
          
            100+ OSSA Agents (Parallel)                 
          
                                                           
            
      Observability (Prometheus, Grafana, Loki)         
            
       
                                                             
    
    launchd Automation                                   
    - Resource monitoring (every 5min)                   
    - Schedule-based scaling (hourly)                    
    - Auto-cleanup (daily)

Quick Start Commands

# 1. Replace OrbStack with Rancher Desktop
brew install --cask rancher

# 2. Fix KEDA metrics
kubectl patch deployment metrics-server -n kube-system --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

# 3. Install Temporal
helm install temporal temporalio/temporal -n temporal-system --create-namespace

# 4. Install Argo Workflows
kubectl create namespace argo
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/latest/download/install.yaml

# 5. Setup launchd automation
/usr/local/bin/k3s-resource-manager.sh  # Run once to test
launchctl load ~/Library/LaunchAgents/com.bluefly.k3s-resource-manager.plist

# 6. Deploy 100 agents in parallel
buildkit golden swarm deploy --agents 100 --workflow argo --task coding

Monitoring & Debugging

# Watch KEDA scaling
kubectl get hpa -A -w

# Watch Argo workflows
argo list -n argo --watch

# Watch resource usage
kubectl top nodes
kubectl top pods -A --sort-by=memory

# Mac system resources
vm_stat 1  # Memory stats every 1 second
top -l 1 -s 0 -stats pid,command,cpu,mem | head -20

# launchd job status
launchctl list | grep bluefly
tail -f /tmp/k3s-resource-manager.log

Next Steps:

Migrate from OrbStack Rancher Desktop (30min)
Fix KEDA metrics (5min)
Install Temporal + Argo (10min)
Test 100-agent deployment (5min)
Setup launchd automation (10min)

Result: Fully automated M4 Pro agent machine that can deploy 100+ agents in parallel with auto-scaling, monitoring, and lifecycle management.