Skip to main content

M4 Pro AI Agent Machine - Complete Automation Guide

M4 Pro AI Agent Machine - Complete Automation Guide

Separation of Duties: See Separation of Duties - Infrastructure setup documentation is responsible for documenting setup procedures. It does NOT own agent manifests, execution, or infrastructure configuration.

Hardware: Apple M4 Pro, 14 cores, 48GB RAM Goal: Deploy 100+ agents in parallel with full automation Current: KEDA installed, metrics broken, manual lifecycle management


Phase 1: Replace OrbStack Rancher Desktop

Why: OrbStack uses VM layer (7GB overhead). Rancher Desktop uses native containerd.

# Backup OrbStack data first kubectl get all -A -o yaml > ~/orbstack-backup-$(date +%Y%m%d).yaml # Stop OrbStack orb k8s stop killall OrbStack # Install Rancher Desktop (includes K3s, containerd, Helm, kubectl) brew install --cask rancher # Configure for M4 optimization # Settings Kubernetes 12 CPUs, 32GB RAM (leave 16GB for macOS) # Settings Container Engine containerd (not dockerd) # Settings Kubernetes Enable Traefik, Metrics Server # Restore workloads kubectl apply -f ~/orbstack-backup-$(date +%Y%m%d).yaml

Option B: Colima (Lightweight Alternative)

# Even lighter than Rancher - just containerd brew install colima docker kubectl helm # Start with M4 optimization colima start \ --cpu 12 \ --memory 32 \ --disk 100 \ --arch aarch64 \ --vm-type vz \ --vz-rosetta \ --network-address \ --kubernetes \ --kubernetes-version v1.28.3+k3s1

Winner: Rancher Desktop - Better UI, includes Traefik, auto-updates


Phase 2: Fix KEDA Autoscaling

Your KEDA is installed but metrics are broken. Fix:

# 1. Check metrics-server (required for KEDA) kubectl get deployment metrics-server -n kube-system # 2. If missing, install: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # 3. Fix metrics-server for local K3s (insecure certs) kubectl patch deployment metrics-server -n kube-system --type='json' \ -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]' # 4. Wait for metrics kubectl top nodes kubectl top pods -A # 5. Test KEDA scaling kubectl get scaledobjects -A kubectl describe scaledobject ecosystem-cleanup-orchestrator-scaler # 6. Create a test scaled deployment cat <<EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: prometheus-test-metrics namespace: default data: metric: "100" # Trigger scaling --- apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: test-agent-scaler namespace: default spec: scaleTargetRef: name: test-agent minReplicaCount: 0 maxReplicaCount: 10 triggers: - type: prometheus metadata: serverAddress: http://prometheus-service.ossa-agents:9090 metricName: test_agent_queue_depth query: | test_agent_queue_depth threshold: "5" EOF # 7. Verify scaling works kubectl get hpa -w

Phase 3: Install Temporal (Durable Agent Workflows)

WHY TEMPORAL?

  • Runs 100,000+ concurrent workflows (agents)
  • Auto-retry failed agents
  • Durable execution (survives crashes)
  • Event-driven agent orchestration
  • Built-in metrics for KEDA
# Install Temporal via Helm helm repo add temporalio https://go.temporal.io/helm-charts helm repo update # Install with optimized settings for M4 helm install temporal temporalio/temporal \ --namespace temporal-system \ --create-namespace \ --set server.replicaCount=1 \ --set cassandra.enabled=false \ --set postgresql.enabled=true \ --set postgresql.primary.resources.requests.memory=2Gi \ --set prometheus.enabled=true \ --set grafana.enabled=true \ --set elasticsearch.enabled=false \ --set server.resources.requests.cpu=2 \ --set server.resources.requests.memory=4Gi \ --set server.resources.limits.cpu=8 \ --set server.resources.limits.memory=16Gi # Wait for Temporal to be ready kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=temporal -n temporal-system --timeout=300s # Install Temporal CLI brew install temporal # Connect to local Temporal temporal operator namespace create default temporal operator search-attribute list # Test workflow execution cat > /tmp/test-workflow.json <<EOF { "workflowId": "test-agent-workflow", "workflowType": "AgentExecutionWorkflow", "taskQueue": "agent-tasks" } EOF temporal workflow start --input-file /tmp/test-workflow.json

Scale Temporal with KEDA

apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: temporal-worker-scaler namespace: temporal-system spec: scaleTargetRef: name: temporal-worker minReplicaCount: 1 maxReplicaCount: 100 # Scale to 100 workers for 100s of agents triggers: - type: prometheus metadata: serverAddress: http://prometheus-service.ossa-agents:9090 query: | sum(temporal_task_queue_depth{task_queue="agent-tasks"}) threshold: "10" # 1 worker per 10 tasks

Phase 4: Install Argo Workflows (Parallel Jobs)

WHY ARGO WORKFLOWS?

  • DAG-based parallel execution
  • Run 1000+ agents simultaneously
  • Better than K8s Jobs (reusable, templated)
  • Native Prometheus metrics
# Install Argo Workflows kubectl create namespace argo kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/latest/download/install.yaml # Patch for LoadBalancer access kubectl patch svc argo-server -n argo -p '{"spec": {"type": "LoadBalancer"}}' # Install Argo CLI brew install argo # Configure RBAC for workflow execution kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=argo:default -n argo # Test parallel agent execution cat <<EOF | argo submit -n argo --watch - apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: parallel-agents- spec: entrypoint: agent-swarm arguments: parameters: - name: agent-count value: "100" templates: - name: agent-swarm inputs: parameters: - name: agent-count steps: - - name: spawn-agents template: agent-task arguments: parameters: - name: agent-id value: "{{item}}" withSequence: count: "{{inputs.parameters.agent-count}}" - name: agent-task inputs: parameters: - name: agent-id container: image: ghcr.io/your-org/ossa-agent:latest command: ["/bin/sh", "-c"] args: - | echo "Agent {{inputs.parameters.agent-id}} executing task" # Your agent code here sleep $((RANDOM % 10)) echo "Agent {{inputs.parameters.agent-id}} completed" resources: requests: memory: "256Mi" cpu: "100m" limits: memory: "512Mi" cpu: "500m" EOF

KEDA + Argo Integration

apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: argo-workflow-scaler namespace: argo spec: scaleTargetRef: name: workflow-controller minReplicaCount: 1 maxReplicaCount: 10 triggers: - type: prometheus metadata: serverAddress: http://prometheus-service.ossa-agents:9090 query: | sum(argo_workflow_status{status="Running"}) threshold: "50" # Scale controller when >50 workflows running

Phase 5: Mac Automation with launchd

Auto-start/stop K3s based on schedule and resource usage

Create launchd job for resource monitoring

# Create monitoring script cat > /usr/local/bin/k3s-resource-manager.sh <<'EOF' #!/bin/bash set -e MEMORY_THRESHOLD_GB=40 CPU_THRESHOLD_PERCENT=80 # Get current resource usage MEMORY_USED_GB=$(vm_stat | awk '/Pages active/ {print int($3 * 16384 / 1073741824)}') CPU_USED=$(ps -A -o %cpu | awk '{s+=$1} END {print int(s)}') # Check if K3s should scale down if [ "$MEMORY_USED_GB" -gt "$MEMORY_THRESHOLD_GB" ] || [ "$CPU_USED" -gt "$CPU_THRESHOLD_PERCENT" ]; then echo "$(date): High resource usage - scaling down non-critical services" # Scale down development services kubectl scale deployment --all --replicas=0 -n agents-staging kubectl scale deployment --all --replicas=0 -n apidog # Keep production running kubectl scale deployment agent-gateway --replicas=3 -n ossa-prod fi # Auto-cleanup completed Argo workflows argo delete --completed -n argo # Cleanup old pods kubectl delete pods --field-selector status.phase=Succeeded -A kubectl delete pods --field-selector status.phase=Failed -A EOF chmod +x /usr/local/bin/k3s-resource-manager.sh # Create launchd plist cat > ~/Library/LaunchAgents/com.bluefly.k3s-resource-manager.plist <<EOF <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.bluefly.k3s-resource-manager</string> <key>ProgramArguments</key> <array> <string>/usr/local/bin/k3s-resource-manager.sh</string> </array> <key>StartInterval</key> <integer>300</integer> <key>RunAtLoad</key> <true/> <key>StandardOutPath</key> <string>/tmp/k3s-resource-manager.log</string> <key>StandardErrorPath</key> <string>/tmp/k3s-resource-manager.err</string> </dict> </plist> EOF # Load launchd job launchctl load ~/Library/LaunchAgents/com.bluefly.k3s-resource-manager.plist

Schedule-based K3s management

# Create schedule script cat > /usr/local/bin/k3s-scheduler.sh <<'EOF' #!/bin/bash HOUR=$(date +%H) # Business hours (9am-6pm): Full power if [ "$HOUR" -ge 9 ] && [ "$HOUR" -lt 18 ]; then echo "$(date): Business hours - scaling up all services" kubectl scale deployment --all --replicas=3 -n ossa-agents kubectl scale deployment --all --replicas=5 -n ossa-prod # Night time: Scale down to minimum elif [ "$HOUR" -ge 22 ] || [ "$HOUR" -lt 6 ]; then echo "$(date): Night time - scaling down to minimum" kubectl scale deployment --all --replicas=0 -n agents-staging kubectl scale deployment --all --replicas=1 -n ossa-prod # Default: Moderate scaling else echo "$(date): Default hours - moderate scaling" kubectl scale deployment --all --replicas=1 -n ossa-agents kubectl scale deployment --all --replicas=2 -n ossa-prod fi EOF chmod +x /usr/local/bin/k3s-scheduler.sh # Create hourly launchd job cat > ~/Library/LaunchAgents/com.bluefly.k3s-scheduler.plist <<EOF <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.bluefly.k3s-scheduler</string> <key>ProgramArguments</key> <array> <string>/usr/local/bin/k3s-scheduler.sh</string> </array> <key>StartCalendarInterval</key> <dict> <key>Minute</key> <integer>0</integer> </dict> <key>RunAtLoad</key> <true/> </dict> </plist> EOF launchctl load ~/Library/LaunchAgents/com.bluefly.k3s-scheduler.plist

Phase 6: M4 Optimization

Use ARM-native images (3x faster)

# Check current images kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | sort -u # Replace with ARM-optimized versions cat <<EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: arm-optimized-images namespace: kube-system data: # Replace x86 images with ARM equivalents redis: "arm64v8/redis:7-alpine" postgres: "arm64v8/postgres:16-alpine" nginx: "arm64v8/nginx:alpine" prometheus: "prom/prometheus:latest" # Multi-arch grafana: "grafana/grafana:latest" # Multi-arch EOF

Pin critical workloads to performance cores

# M4 Pro has 10 performance cores + 4 efficiency cores # Pin agent workloads to performance cores kubectl label nodes orbstack node-role.kubernetes.io/performance=true # Apply affinity rules cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: agent-gateway namespace: ossa-prod spec: template: spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/performance operator: In values: - "true" # Use guaranteed QoS (highest priority) containers: - name: agent-gateway resources: requests: cpu: "2" memory: "4Gi" limits: cpu: "2" memory: "4Gi" EOF

Enable Metal GPU acceleration for AI workloads

# For AI/ML agents using GPU # Install Metal device plugin (if needed) kubectl apply -f - <<EOF apiVersion: apps/v1 kind: DaemonSet metadata: name: metal-device-plugin namespace: kube-system spec: selector: matchLabels: name: metal-device-plugin template: metadata: labels: name: metal-device-plugin spec: containers: - name: metal-device-plugin image: ghcr.io/bluefly/metal-device-plugin:latest securityContext: privileged: true volumeMounts: - name: device-plugin mountPath: /var/lib/kubelet/device-plugins volumes: - name: device-plugin hostPath: path: /var/lib/kubelet/device-plugins EOF

Phase 7: BuildKit Agent Swarm Commands

Create buildkit golden swarm commands:

# Create swarm command implementation cat > $LLM_ROOT/agent-buildkit/lib/commands/golden/swarm.js <<'EOF' #!/usr/bin/env node const { execSync } = require('child_process'); const fs = require('fs'); async function deployAgentSwarm(options) { const { agents = 100, workflow = 'argo', // 'argo' or 'temporal' namespace = 'agent-swarm', image = 'ghcr.io/bluefly/ossa-agent:latest', task = 'coding', parallel = true } = options; console.log(` Deploying ${agents} agents to ${workflow}...`); if (workflow === 'argo') { // Use Argo Workflows for parallel execution const argoManifest = { apiVersion: 'argoproj.io/v1alpha1', kind: 'Workflow', metadata: { generateName: 'agent-swarm-', namespace }, spec: { entrypoint: 'agent-swarm', arguments: { parameters: [ { name: 'agent-count', value: agents.toString() }, { name: 'task-type', value: task } ] }, templates: [ { name: 'agent-swarm', steps: parallel ? [ [{ name: 'spawn-agents', template: 'agent-task', arguments: { parameters: [{ name: 'agent-id', value: '{{item}}' }] }, withSequence: { count: agents.toString() } }] ] : [[]] }, { name: 'agent-task', inputs: { parameters: [{ name: 'agent-id' }] }, container: { image, command: ['ossa', 'execute'], args: [ '--task', '{{workflow.parameters.task-type}}', '--agent-id', '{{inputs.parameters.agent-id}}' ], resources: { requests: { memory: '256Mi', cpu: '100m' }, limits: { memory: '512Mi', cpu: '500m' } } } } ] } }; fs.writeFileSync('/tmp/argo-swarm.yaml', JSON.stringify(argoManifest)); execSync('argo submit /tmp/argo-swarm.yaml -n agent-swarm --watch', { stdio: 'inherit' }); } if (workflow === 'temporal') { // Use Temporal for durable workflows const temporalWorkflow = ` temporal workflow start \\ --type AgentSwarmWorkflow \\ --task-queue agent-tasks \\ --workflow-id agent-swarm-${Date.now()} \\ --input '{"agentCount": ${agents}, "taskType": "${task}"}' `; execSync(temporalWorkflow, { stdio: 'inherit' }); } console.log(' Agent swarm deployed!'); } module.exports = { deployAgentSwarm }; EOF # Add to buildkit CLI buildkit golden swarm deploy --agents 100 --workflow argo --task coding

Final Architecture


              M4 Pro (14 cores, 48GB RAM)                    

                                                             
    
    Rancher Desktop (K3s + containerd)                   
    - 12 CPUs, 32GB RAM                                  
    - ARM-native images                                  
    - Metal GPU support                                  
    
                                                            
       
                                                           
               
        KEDA          Temporal       Argo          
     Autoscaler      Workflows      Flows          
               
                                                        
          
            100+ OSSA Agents (Parallel)                 
          
                                                           
            
      Observability (Prometheus, Grafana, Loki)         
            
       
                                                             
    
    launchd Automation                                   
    - Resource monitoring (every 5min)                   
    - Schedule-based scaling (hourly)                    
    - Auto-cleanup (daily)                               
    


Quick Start Commands

# 1. Replace OrbStack with Rancher Desktop brew install --cask rancher # 2. Fix KEDA metrics kubectl patch deployment metrics-server -n kube-system --type='json' \ -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]' # 3. Install Temporal helm install temporal temporalio/temporal -n temporal-system --create-namespace # 4. Install Argo Workflows kubectl create namespace argo kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/latest/download/install.yaml # 5. Setup launchd automation /usr/local/bin/k3s-resource-manager.sh # Run once to test launchctl load ~/Library/LaunchAgents/com.bluefly.k3s-resource-manager.plist # 6. Deploy 100 agents in parallel buildkit golden swarm deploy --agents 100 --workflow argo --task coding

Monitoring & Debugging

# Watch KEDA scaling kubectl get hpa -A -w # Watch Argo workflows argo list -n argo --watch # Watch resource usage kubectl top nodes kubectl top pods -A --sort-by=memory # Mac system resources vm_stat 1 # Memory stats every 1 second top -l 1 -s 0 -stats pid,command,cpu,mem | head -20 # launchd job status launchctl list | grep bluefly tail -f /tmp/k3s-resource-manager.log

Next Steps:

  1. Migrate from OrbStack Rancher Desktop (30min)
  2. Fix KEDA metrics (5min)
  3. Install Temporal + Argo (10min)
  4. Test 100-agent deployment (5min)
  5. Setup launchd automation (10min)

Result: Fully automated M4 Pro agent machine that can deploy 100+ agents in parallel with auto-scaling, monitoring, and lifecycle management.