Multi-Mac Agent Coordination: Technical Planning Document

Executive Summary

This document outlines the technical architecture and implementation plan for coordinating AI agents across two Mac machines (Mac M4 - BlueFly primary, Mac M3 - GitLab secondary) to double agent processing capacity through shared infrastructure and intelligent load balancing.

Goal: Enable agents running on both Macs to coordinate through a shared Redis/PostgreSQL backend, with a coordinator service managing task distribution and resource monitoring.

Estimated Implementation Time: ~20 minutes for basic setup, ~2-4 hours for full production deployment

Architecture Overview

System Components


                    Shared Infrastructure                     
          
      Redis         PostgreSQL       Coordinator     
    (Router)         (Router)        (Mac M4)        
          

                                                 
                                                 
                        
                                                  
       
 Mac     Mac     Mac     Mac     Mac     Mac  
 M4      M4      M3      M3      M4      M3   
Agent 1 Agent   Agent   Agent   Agent   Agent 
          2       3       4       5       6

Key Design Decisions

Coordinator Location: Mac M4 (primary work machine)
- Rationale: More stable, always-on machine
- Handles task distribution and health monitoring
Shared Infrastructure: Router-based services
- Redis: Task queue, agent state, coordination
- PostgreSQL: Persistent agent registry, metrics, history
Agent Registration: Both Macs register agents with coordinator
- Agents announce capabilities and health status
- Coordinator maintains registry in PostgreSQL
- Redis used for real-time coordination
Load Balancing: Capability-aware weighted round-robin
- Considers agent capabilities, current load, machine resources
- Failover to healthy agents on either machine

Infrastructure Setup

Phase 1: Shared Redis Setup (5 minutes)

Option A: Router-Based Redis (Recommended)

Requirements:

Router with Docker support OR
Router with ability to run Redis container OR
Dedicated small device (Raspberry Pi, etc.)

Setup Steps:

Install Redis on Router/Device:

# If router supports Docker
docker run -d \
  --name redis-coordinator \
  --restart unless-stopped \
  -p 6379:6379 \
  -v redis-data:/data \
  redis:7-alpine redis-server --appendonly yes

Configure Network Access:
- Ensure Redis port 6379 is accessible from both Macs
- Configure firewall rules if needed
- Set strong password: redis-cli CONFIG SET requirepass <strong-password>

Test Connectivity from Both Macs:

# From Mac M4
redis-cli -h <router-ip> -p 6379 -a <password> PING

# From Mac M3
redis-cli -h <router-ip> -p 6379 -a <password> PING

Option B: Mac M4 Hosted Redis (Fallback)

If router-based setup isn't feasible:

Install Redis on Mac M4:

brew install redis
brew services start redis

Configure for Network Access:

# Edit /opt/homebrew/etc/redis.conf
bind 0.0.0.0
requirepass <strong-password>

Configure Mac M4 Firewall:
- Allow incoming connections on port 6379 from Mac M3 IP

Phase 2: Shared PostgreSQL Setup (10 minutes)

Option A: Router-Based PostgreSQL

Install PostgreSQL on Router/Device:

docker run -d \
  --name postgres-coordinator \
  --restart unless-stopped \
  -p 5432:5432 \
  -e POSTGRES_DB=agent_coordinator \
  -e POSTGRES_USER=agent_user \
  -e POSTGRES_PASSWORD=<strong-password> \
  -v postgres-data:/var/lib/postgresql/data \
  postgres:16-alpine

Create Agent Registry Schema:

CREATE TABLE agent_registry (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  agent_id VARCHAR(255) UNIQUE NOT NULL,
  agent_name VARCHAR(255) NOT NULL,
  machine_id VARCHAR(100) NOT NULL,
  capabilities JSONB NOT NULL,
  endpoint VARCHAR(500),
  status VARCHAR(50) NOT NULL,
  registered_at TIMESTAMP DEFAULT NOW(),
  last_seen TIMESTAMP DEFAULT NOW(),
  metadata JSONB
);

CREATE TABLE agent_metrics (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  agent_id VARCHAR(255) REFERENCES agent_registry(agent_id),
  timestamp TIMESTAMP DEFAULT NOW(),
  cpu_percent FLOAT,
  memory_mb INTEGER,
  active_tasks INTEGER,
  completed_tasks INTEGER,
  failed_tasks INTEGER
);

CREATE INDEX idx_agent_capabilities ON agent_registry USING GIN(capabilities);
CREATE INDEX idx_agent_status ON agent_registry(status);
CREATE INDEX idx_agent_machine ON agent_registry(machine_id);

Option B: Mac M4 Hosted PostgreSQL (Fallback)

Install PostgreSQL on Mac M4:

brew install postgresql@16
brew services start postgresql@16

Configure for Network Access:

# Edit /opt/homebrew/var/postgresql@16/postgresql.conf
listen_addresses = '*'

# Edit pg_hba.conf
host    all    all    0.0.0.0/0    md5

Create Database and Schema (same as above)

Coordinator Service Implementation

Phase 3: Coordinator Service on Mac M4 (15 minutes)

Service Architecture

The coordinator service will be built using existing agent-mesh infrastructure:

Location: common_npm/agent-mesh/backend/src/services/coordinator/

Key Components:

Agent Registry Manager
- Maintains PostgreSQL-backed registry
- Handles agent registration/deregistration
- Tracks agent health and capabilities
Task Queue Manager
- Redis-backed task queue
- Priority-based task distribution
- Task assignment and tracking
Load Balancer
- Capability-aware routing
- Weighted round-robin with health checks
- Failover logic
Resource Monitor
- Tracks CPU, memory, active tasks per agent
- Updates metrics in PostgreSQL
- Triggers alerts for unhealthy agents

Implementation Steps

Create Coordinator Service:

// common_npm/agent-mesh/backend/src/services/coordinator/coordinator-service.ts

import { EventEmitter } from 'events';
import Redis from 'ioredis';
import { Pool } from 'pg';
import { AgentRegistry } from '../../mesh/runtime/agent-registry';
import { LoadBalancer } from '../../mesh/routing/load-balancer';

export class CoordinatorService extends EventEmitter {
  private redis: Redis;
  private pgPool: Pool;
  private registry: AgentRegistry;
  private loadBalancer: LoadBalancer;
  
  constructor(config: {
    redisUrl: string;
    postgresUrl: string;
  }) {
    super();
    this.redis = new Redis(config.redisUrl);
    this.pgPool = new Pool({ connectionString: config.postgresUrl });
    this.registry = new AgentRegistry();
    this.loadBalancer = new LoadBalancer({
      strategy: 'weighted-round-robin',
      healthCheckEnabled: true,
      failoverEnabled: true
    });
  }
  
  async registerAgent(agentInfo: {
    agentId: string;
    agentName: string;
    machineId: string;
    capabilities: string[];
    endpoint: string;
  }): Promise<void> {
    // Register in PostgreSQL
    await this.pgPool.query(
      `INSERT INTO agent_registry 
       (agent_id, agent_name, machine_id, capabilities, endpoint, status)
       VALUES ($1, $2, $3, $4, $5, 'active')
       ON CONFLICT (agent_id) 
       DO UPDATE SET last_seen = NOW(), status = 'active'`,
      [
        agentInfo.agentId,
        agentInfo.agentName,
        agentInfo.machineId,
        JSON.stringify(agentInfo.capabilities),
        agentInfo.endpoint
      ]
    );
    
    // Register in Redis for fast lookup
    await this.redis.hset(
      `agent:${agentInfo.agentId}`,
      'status', 'active',
      'machine', agentInfo.machineId,
      'capabilities', JSON.stringify(agentInfo.capabilities),
      'endpoint', agentInfo.endpoint,
      'last_seen', Date.now()
    );
    
    // Add to capability index
    for (const capability of agentInfo.capabilities) {
      await this.redis.sadd(`capability:${capability}`, agentInfo.agentId);
    }
    
    this.emit('agentRegistered', agentInfo);
  }
  
  async distributeTask(task: {
    id: string;
    type: string;
    requiredCapabilities: string[];
    priority: number;
    payload: any;
  }): Promise<string> {
    // Find eligible agents
    const eligibleAgents = await this.findAgentsByCapabilities(
      task.requiredCapabilities
    );
    
    // Select agent using load balancer
    const selectedAgent = this.loadBalancer.selectEndpoint(
      eligibleAgents.map(agent => ({
        id: agent.agentId,
        weight: this.calculateWeight(agent),
        health: agent.status === 'active'
      }))
    );
    
    // Assign task
    await this.redis.lpush(
      `tasks:${selectedAgent.id}`,
      JSON.stringify(task)
    );
    
    // Track task assignment
    await this.redis.hset(
      `task:${task.id}`,
      'assigned_to', selectedAgent.id,
      'status', 'assigned',
      'assigned_at', Date.now()
    );
    
    return selectedAgent.id;
  }
  
  private async findAgentsByCapabilities(
    requiredCapabilities: string[]
  ): Promise<any[]> {
    // Get agents with all required capabilities
    const agentIds = await this.redis.sinter(
      ...requiredCapabilities.map(cap => `capability:${cap}`)
    );
    
    // Fetch agent details
    const agents = await Promise.all(
      agentIds.map(async (agentId: string) => {
        const agentData = await this.redis.hgetall(`agent:${agentId}`);
        return {
          agentId,
          ...agentData,
          capabilities: JSON.parse(agentData.capabilities || '[]')
        };
      })
    );
    
    return agents.filter(agent => agent.status === 'active');
  }
  
  private calculateWeight(agent: any): number {
    // Weight based on:
    // - Machine performance (M4 > M3)
    // - Current load
    // - Health status
    let weight = 1.0;
    
    if (agent.machine === 'mac-m4') weight *= 1.2;
    if (agent.machine === 'mac-m3') weight *= 1.0;
    
    // Reduce weight if high load
    const load = parseFloat(agent.active_tasks || '0');
    weight *= Math.max(0.5, 1.0 - (load / 10));
    
    return weight;
  }
}

Create Coordinator CLI/Service Entry Point:

// common_npm/agent-mesh/backend/src/services/coordinator/index.ts

import { CoordinatorService } from './coordinator-service';
import express from 'express';

const app = express();
app.use(express.json());

const coordinator = new CoordinatorService({
  redisUrl: process.env.REDIS_URL || 'redis://localhost:6379',
  postgresUrl: process.env.POSTGRES_URL || 'postgresql://localhost:5432/agent_coordinator'
});

// Agent registration endpoint
app.post('/api/v1/agents/register', async (req, res) => {
  try {
    await coordinator.registerAgent(req.body);
    res.json({ success: true });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

// Task distribution endpoint
app.post('/api/v1/tasks/distribute', async (req, res) => {
  try {
    const agentId = await coordinator.distributeTask(req.body);
    res.json({ agentId, success: true });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

// Health check endpoint
app.get('/health', async (req, res) => {
  res.json({ status: 'healthy' });
});

const PORT = process.env.PORT || 8080;
app.listen(PORT, () => {
  console.log(`Coordinator service running on port ${PORT}`);
});

Create Systemd Service or LaunchDaemon:

<!-- ~/Library/LaunchAgents/com.bluefly.agent-coordinator.plist -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.bluefly.agent-coordinator</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/local/bin/node</string>
    <string>$LLM_ROOT/common_npm/agent-mesh/backend/dist/services/coordinator/index.js</string>
  </array>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
  <key>StandardOutPath</key>
  <string>$LLM_ROOT/logs/coordinator.log</string>
  <key>StandardErrorPath</key>
  <string>$LLM_ROOT/logs/coordinator.error.log</string>
  <key>EnvironmentVariables</key>
  <dict>
    <key>REDIS_URL</key>
    <string>redis://<router-ip>:6379</string>
    <key>POSTGRES_URL</key>
    <string>postgresql://agent_user:<password>@<router-ip>:5432/agent_coordinator</string>
    <key>NODE_ENV</key>
    <string>production</string>
  </dict>
</dict>
</plist>

Load service:

launchctl load ~/Library/LaunchAgents/com.bluefly.agent-coordinator.plist
launchctl start com.bluefly.agent-coordinator

Agent Registration from Both Macs

Phase 4: Agent Registration System (10 minutes)

Mac M4 Agent Registration

Agents on Mac M4 will automatically register with the coordinator on startup:

// common_npm/agent-mesh/src/runtime/agent-registry.ts (enhancement)

import axios from 'axios';

export class AgentRegistry extends EventEmitter {
  private coordinatorUrl: string;
  private machineId: string = 'mac-m4';
  
  constructor(config?: { coordinatorUrl?: string }) {
    super();
    this.coordinatorUrl = config?.coordinatorUrl || 
      process.env.COORDINATOR_URL || 
      'http://localhost:8080';
  }
  
  async register(manifest: AgentManifest): Promise<AgentRegistration> {
    // Local registration (existing code)
    const registration = await this.localRegister(manifest);
    
    // Register with coordinator
    try {
      await axios.post(`${this.coordinatorUrl}/api/v1/agents/register`, {
        agentId: registration.id,
        agentName: manifest.name,
        machineId: this.machineId,
        capabilities: manifest.capabilities.map(c => c.name),
        endpoint: this.getAgentEndpoint(registration.id)
      });
      
      // Start heartbeat
      this.startHeartbeat(registration.id);
    } catch (error) {
      console.warn('Failed to register with coordinator:', error);
      // Continue with local registration only
    }
    
    return registration;
  }
  
  private startHeartbeat(agentId: string): void {
    setInterval(async () => {
      try {
        await axios.post(`${this.coordinatorUrl}/api/v1/agents/heartbeat`, {
          agentId,
          machineId: this.machineId,
          metrics: this.getCurrentMetrics(agentId)
        });
      } catch (error) {
        console.warn('Heartbeat failed:', error);
      }
    }, 30000); // Every 30 seconds
  }
  
  private getCurrentMetrics(agentId: string): any {
    const agent = this.agents.get(agentId);
    if (!agent) return {};
    
    return {
      cpu_percent: this.getCpuUsage(),
      memory_mb: this.getMemoryUsage(),
      active_tasks: this.getActiveTaskCount(agentId),
      status: agent.status
    };
  }
}

Mac M3 Agent Registration

Same code, but with machineId: 'mac-m3':

// On Mac M3, set environment variable:
export MACHINE_ID=mac-m3
export COORDINATOR_URL=http://<mac-m4-ip>:8080

Agent Discovery and Task Execution

When an agent needs to execute a task, it queries the coordinator:

// common_npm/agent-mesh/src/runtime/task-executor.ts

export class TaskExecutor {
  private coordinatorUrl: string;
  
  async executeTask(task: Task): Promise<TaskResult> {
    // If task requires coordination, route through coordinator
    if (task.requiresCoordination) {
      const response = await axios.post(
        `${this.coordinatorUrl}/api/v1/tasks/distribute`,
        {
          id: task.id,
          type: task.type,
          requiredCapabilities: task.requiredCapabilities,
          priority: task.priority,
          payload: task.payload
        }
      );
      
      // Task assigned to an agent (may be on different machine)
      // Wait for result or execute locally if assigned to this agent
      if (response.data.agentId === this.agentId) {
        return this.executeLocally(task);
      } else {
        return this.waitForRemoteResult(task.id);
      }
    } else {
      // Execute locally
      return this.executeLocally(task);
    }
  }
}

Load Balancing Configuration

Phase 5: Load Balancing Strategy (5 minutes)

The coordinator implements capability-aware weighted round-robin:

Algorithm:

Capability Matching: Filter agents by required capabilities
Health Check: Only consider healthy agents (status = 'active', recent heartbeat)
Weight Calculation:
- Base weight: Machine type (M4 = 1.2, M3 = 1.0)
- Load factor: Reduce weight based on active tasks (max 10 tasks = 0.5x weight)
- Health factor: Unhealthy agents = 0 weight
Selection: Weighted round-robin from eligible agents

Configuration:

// common_npm/agent-mesh/backend/src/services/coordinator/load-balancer-config.ts

export const LoadBalancerConfig = {
  strategy: 'weighted-round-robin',
  healthCheckInterval: 30000, // 30 seconds
  healthCheckTimeout: 5000,   // 5 seconds
  maxRetries: 3,
  failoverEnabled: true,
  
  weights: {
    'mac-m4': 1.2,
    'mac-m3': 1.0
  },
  
  loadThresholds: {
    maxActiveTasks: 10,
    cpuThreshold: 80,      // %
    memoryThreshold: 2048  // MB
  }
};

Resource Monitoring

Phase 6: Resource Monitoring System (10 minutes)

Metrics Collection

Each agent reports metrics to the coordinator:

// common_npm/agent-mesh/src/runtime/metrics-collector.ts

import os from 'os';
import { performance } from 'perf_hooks';

export class MetricsCollector {
  private agentId: string;
  private coordinatorUrl: string;
  
  startCollection(): void {
    setInterval(() => {
      this.collectAndReport();
    }, 30000); // Every 30 seconds
  }
  
  private async collectAndReport(): Promise<void> {
    const metrics = {
      agentId: this.agentId,
      timestamp: new Date().toISOString(),
      cpu: {
        percent: this.getCpuUsage(),
        cores: os.cpus().length
      },
      memory: {
        used_mb: this.getMemoryUsage(),
        total_mb: os.totalmem() / 1024 / 1024,
        percent: (os.freemem() / os.totalmem()) * 100
      },
      tasks: {
        active: this.getActiveTaskCount(),
        completed: this.getCompletedTaskCount(),
        failed: this.getFailedTaskCount()
      },
      network: {
        latency_ms: await this.measureLatency()
      }
    };
    
    // Report to coordinator
    try {
      await axios.post(
        `${this.coordinatorUrl}/api/v1/metrics/report`,
        metrics
      );
    } catch (error) {
      console.warn('Failed to report metrics:', error);
    }
  }
  
  private getCpuUsage(): number {
    const cpus = os.cpus();
    let totalIdle = 0;
    let totalTick = 0;
    
    for (const cpu of cpus) {
      for (const type in cpu.times) {
        totalTick += cpu.times[type as keyof typeof cpu.times];
      }
      totalIdle += cpu.times.idle;
    }
    
    const idle = totalIdle / cpus.length;
    const total = totalTick / cpus.length;
    const usage = 100 - ~~(100 * idle / total);
    
    return usage;
  }
  
  private getMemoryUsage(): number {
    return (os.totalmem() - os.freemem()) / 1024 / 1024;
  }
}

Coordinator Metrics Storage

Coordinator stores metrics in PostgreSQL:

// In coordinator-service.ts

async storeMetrics(metrics: AgentMetrics): Promise<void> {
  await this.pgPool.query(
    `INSERT INTO agent_metrics 
     (agent_id, timestamp, cpu_percent, memory_mb, active_tasks, completed_tasks, failed_tasks)
     VALUES ($1, $2, $3, $4, $5, $6, $7)`,
    [
      metrics.agentId,
      metrics.timestamp,
      metrics.cpu.percent,
      metrics.memory.used_mb,
      metrics.tasks.active,
      metrics.tasks.completed,
      metrics.tasks.failed
    ]
  );
}

Monitoring Dashboard (Optional)

Create a simple monitoring endpoint:

// In coordinator service

app.get('/api/v1/monitoring/dashboard', async (req, res) => {
  const agents = await this.pgPool.query(
    `SELECT 
       agent_id, agent_name, machine_id, status, 
       last_seen, capabilities
     FROM agent_registry
     WHERE status = 'active'
     ORDER BY machine_id, agent_name`
  );
  
  const metrics = await this.pgPool.query(
    `SELECT 
       agent_id, 
       AVG(cpu_percent) as avg_cpu,
       AVG(memory_mb) as avg_memory,
       SUM(active_tasks) as total_active_tasks,
       SUM(completed_tasks) as total_completed,
       SUM(failed_tasks) as total_failed
     FROM agent_metrics
     WHERE timestamp > NOW() - INTERVAL '1 hour'
     GROUP BY agent_id`
  );
  
  res.json({
    agents: agents.rows,
    metrics: metrics.rows,
    summary: {
      total_agents: agents.rows.length,
      agents_by_machine: {
        'mac-m4': agents.rows.filter(a => a.machine_id === 'mac-m4').length,
        'mac-m3': agents.rows.filter(a => a.machine_id === 'mac-m3').length
      }
    }
  });
});

Testing and Validation

Phase 7: Testing Task Distribution (5 minutes)

Test Script

// test-coordination.ts

import axios from 'axios';

const COORDINATOR_URL = process.env.COORDINATOR_URL || 'http://localhost:8080';

async function testCoordination() {
  console.log('Testing agent coordination...');
  
  // 1. Register test agents from both Macs
  console.log('\n1. Registering agents...');
  
  // Mac M4 agents
  await axios.post(`${COORDINATOR_URL}/api/v1/agents/register`, {
    agentId: 'test-agent-m4-1',
    agentName: 'Test Agent M4-1',
    machineId: 'mac-m4',
    capabilities: ['code_generation', 'testing'],
    endpoint: 'http://mac-m4:3000/agent/test-agent-m4-1'
  });
  
  await axios.post(`${COORDINATOR_URL}/api/v1/agents/register`, {
    agentId: 'test-agent-m4-2',
    agentName: 'Test Agent M4-2',
    machineId: 'mac-m4',
    capabilities: ['code_review', 'refactoring'],
    endpoint: 'http://mac-m4:3000/agent/test-agent-m4-2'
  });
  
  // Mac M3 agents
  await axios.post(`${COORDINATOR_URL}/api/v1/agents/register`, {
    agentId: 'test-agent-m3-1',
    agentName: 'Test Agent M3-1',
    machineId: 'mac-m3',
    capabilities: ['code_generation', 'testing'],
    endpoint: 'http://mac-m3:3000/agent/test-agent-m3-1'
  });
  
  await axios.post(`${COORDINATOR_URL}/api/v1/agents/register`, {
    agentId: 'test-agent-m3-2',
    agentName: 'Test Agent M3-2',
    machineId: 'mac-m3',
    capabilities: ['documentation', 'analysis'],
    endpoint: 'http://mac-m3:3000/agent/test-agent-m3-2'
  });
  
  console.log(' Agents registered');
  
  // 2. Distribute tasks
  console.log('\n2. Distributing tasks...');
  
  const tasks = [
    { id: 'task-1', type: 'code_generation', requiredCapabilities: ['code_generation'] },
    { id: 'task-2', type: 'code_review', requiredCapabilities: ['code_review'] },
    { id: 'task-3', type: 'testing', requiredCapabilities: ['testing'] },
    { id: 'task-4', type: 'documentation', requiredCapabilities: ['documentation'] },
  ];
  
  for (const task of tasks) {
    const response = await axios.post(
      `${COORDINATOR_URL}/api/v1/tasks/distribute`,
      {
        ...task,
        priority: 1,
        payload: { test: true }
      }
    );
    
    console.log(`Task ${task.id} assigned to: ${response.data.agentId}`);
  }
  
  // 3. Check monitoring dashboard
  console.log('\n3. Checking monitoring dashboard...');
  const dashboard = await axios.get(`${COORDINATOR_URL}/api/v1/monitoring/dashboard`);
  console.log('Dashboard data:', JSON.stringify(dashboard.data, null, 2));
  
  console.log('\n Coordination test complete!');
}

testCoordination().catch(console.error);

Security Considerations

Network Security

Redis Authentication: Always use strong passwords
PostgreSQL Authentication: Use strong passwords, limit network access
TLS/SSL: Consider using stunnel or Redis/TLS for production
Firewall Rules: Only allow connections from known Mac IPs
VPN Option: Consider using VPN for secure communication

Agent Authentication

API Keys: Each agent should authenticate with coordinator using API key
Token Rotation: Rotate tokens regularly
Rate Limiting: Implement rate limiting on coordinator endpoints

Implementation

// Add authentication middleware to coordinator

import crypto from 'crypto';

const API_KEYS = new Map([
  ['mac-m4-key', 'mac-m4'],
  ['mac-m3-key', 'mac-m3']
]);

app.use('/api/v1', (req, res, next) => {
  const apiKey = req.headers['x-api-key'];
  if (!apiKey || !API_KEYS.has(apiKey)) {
    return res.status(401).json({ error: 'Unauthorized' });
  }
  req.machineId = API_KEYS.get(apiKey);
  next();
});

Deployment Checklist

Pre-Deployment

Redis installed and accessible from both Macs
PostgreSQL installed and accessible from both Macs
Database schema created
Coordinator service code implemented
Environment variables configured
Firewall rules configured
API keys generated and distributed

Mac M4 Setup

Coordinator service installed
LaunchDaemon configured
Service running and healthy
Agents configured to register with coordinator
Test agent registration successful

Mac M3 Setup

Environment variables set (COORDINATOR_URL, MACHINE_ID, API_KEY)
Agents configured to register with coordinator
Test agent registration successful
Network connectivity verified

Validation

Agents from both Macs visible in coordinator
Task distribution working across machines
Load balancing distributing tasks correctly
Metrics collection working
Health checks functioning
Failover working (test by stopping one agent)

Troubleshooting Guide

Common Issues

1. Agents Not Registering

Symptoms: Agents don't appear in coordinator dashboard

Diagnosis:

# Check coordinator logs
tail -f ~/Sites/LLM/logs/coordinator.log

# Check Redis connectivity
redis-cli -h <router-ip> -p 6379 -a <password> PING

# Check PostgreSQL connectivity
psql -h <router-ip> -U agent_user -d agent_coordinator -c "SELECT COUNT(*) FROM agent_registry;"

Solutions:

Verify network connectivity
Check firewall rules
Verify credentials
Check coordinator service is running

2. Tasks Not Distributing

Symptoms: Tasks stay in queue, not assigned to agents

Diagnosis:

# Check Redis task queues
redis-cli -h <router-ip> -p 6379 -a <password> KEYS "tasks:*"

# Check agent capabilities
redis-cli -h <router-ip> -p 6379 -a <password> SMEMBERS "capability:code_generation"

Solutions:

Verify agents have required capabilities
Check load balancer configuration
Verify agent health status

3. High Latency

Symptoms: Tasks take long time to distribute

Diagnosis:

Check network latency between Macs and router
Check Redis/PostgreSQL performance
Review coordinator service logs for bottlenecks

Solutions:

Optimize database queries
Consider Redis connection pooling
Review load balancer algorithm

Performance Optimization

Redis Optimization

Connection Pooling: Use connection pools for Redis
Pipelining: Batch Redis operations
Memory Management: Configure Redis maxmemory policy

PostgreSQL Optimization

Indexes: Ensure proper indexes on frequently queried columns
Connection Pooling: Use pgBouncer or similar
Query Optimization: Review slow queries

Load Balancer Tuning

Weights: Adjust machine weights based on actual performance
Health Check Interval: Balance between responsiveness and overhead
Task Batching: Batch small tasks for efficiency

Future Enhancements

Phase 8: Advanced Features (Future)

Dynamic Scaling: Automatically spawn agents based on load
Task Prioritization: Implement priority queues
Agent Specialization: Route tasks to specialized agents
Cost Tracking: Track token usage per agent/machine
Auto-Failover: Automatically restart failed agents
Distributed Tracing: Full observability across machines
Web Dashboard: Real-time monitoring UI
Mobile Notifications: Alert on critical issues

Summary

This plan provides a complete technical approach to coordinating agents across two Mac machines:

Shared Infrastructure: Redis and PostgreSQL on router for coordination
Coordinator Service: Central service on Mac M4 managing task distribution
Agent Registration: Both Macs register agents with coordinator
Load Balancing: Capability-aware weighted distribution
Resource Monitoring: Real-time metrics collection and storage

Total Implementation Time:

Basic setup: ~20 minutes
Full production deployment: ~2-4 hours
Testing and validation: ~30 minutes

Expected Benefits:

Double agent processing capacity
Better resource utilization
Automatic failover and load balancing
Centralized monitoring and management