Skip to main content

infrastructure migration summary

Infrastructure Migration Summary - 2026-01-23

Purpose: Complete guide for infrastructure migration to NAS - includes Docker services, git repositories, worktrees, and wikis

Status: ✅ MIGRATION COMPLETE - 100% NAS-Based Infrastructure


WHAT GOT MOVED

1. Git Infrastructure Moved to NAS (2026-01-23)

All Repositories (67 GitLab Projects)

  • Status: ✅ COMPLETE - All repos are bare repos on NAS
  • Location: /Volumes/AgentPlatform/repos/bare/blueflyio/
  • Size: 9.4GB total
  • Organization:
    • Top-level repos: gitlab_components.git, platform-agents.git, security-policies.git
    • Agent-platform group: /Volumes/AgentPlatform/repos/bare/blueflyio/agent-platform/[project].git
    • OSSA group: /Volumes/AgentPlatform/repos/bare/blueflyio/ossa/
  • Local Action: NO local clones exist or needed - all work via NAS worktrees

All Worktrees (NAS-Centralized Strategy)

  • Status: ✅ COMPLETE - 100% NAS-based worktree workflow
  • Location: /Volumes/AgentPlatform/worktrees/[DEVICE]/[DATE]/[PROJECT]/[BRANCH]/
  • Device Namespaces:
    • shared/ - Multi-device work (Mac M3, M4, code-server, phone, iPad)
    • m3/ - Mac M3 performance-specific work
    • m4/ - Mac M4 performance-specific work
  • Old Local Worktrees: ✅ REMOVED
    • Cleaned up: ~/Sites/blueflyio/.worktrees/ (all old local worktrees removed)
    • Freed: ~500MB
    • All bare repos pruned to remove stale registrations
  • Benefits: Work from ANY device - same files, same state, always in sync

All Wikis (Documentation)

  • Status: ✅ COMPLETE - All wikis on NAS only
  • Location: /Volumes/AgentPlatform/wikis/blueflyio/
  • Wikis Migrated:
    • technical-docs.wiki/ - Platform documentation (already had GitLab origin)
    • api_normalization.wiki/ - API Normalization project docs
  • Old Local Wikis: ✅ REMOVED
    • Cleaned up: ~/Sites/blueflyio/_WIKI/ (entire directory removed)
    • Freed: 2.7GB (including old .zip backups)
    • No symlinks or local copies remain
  • Access: Direct git operations in NAS wiki directories (no worktrees needed for wikis)

Infrastructure Alignment Complete

  • AGENTS.md: ✅ Updated to NAS-centralized workflow (v1.1.0)
  • CLAUDE.md: ✅ Updated with NAS worktree paths and device namespaces
  • Wiki Docs: ✅ Updated separation-of-duties.md, system-overview.md, platform-overview.md
  • Architecture Decisions: ✅ worktree-strategy-nas-centralized.md marked APPROVED
  • Local Setup Required: MINIMAL
    • ~/Sites/blueflyio/CLAUDE.md (instructions)
    • ~/Sites/blueflyio/.claude/ (config)
    • Mount NAS: /Volumes/AgentPlatform/
    • NO repos, NO worktrees, NO wikis locally

2. Services Moved to NAS (blueflynas.tailcf98b3.ts.net)

Agent-Tracer (Knowledge Graph API)

  • Status: Deployed on NAS K3s cluster
  • Location: development namespace, port 3007
  • Pod: agent-tracer-67df67fc46-q9kzh
  • Local Docker: Can be removed (postgres-tracer, redis-tracer, agent-tracer containers)

KAGENT Platform (23 Agents)

  • Status: Running in K3s kagent namespace
  • Agents: Assessment, Onboarding, Renewal, ROI Calculator, K8s agents, Helm agents, etc.
  • API/UI: KAGENT API, UI, and controller all on NAS
  • Local Docker: Can be cleaned up (old KAGENT images)

Phoenix Observability

  • Status: Running in K3s monitoring namespace
  • Local Docker: Old Phoenix images can be removed

3. Services Moved to Vast.ai

LLMLingua Compression Service

  • Status: Deployed on Vast.ai RTX 4090
  • Instance ID: 30386215
  • SSH: ssh -p 26214 root@ssh9.vast.ai
  • Cost: $0.29/hr ($207/month)
  • Purpose: Token compression (79x reduction for large contexts)
  • Local Docker: Never existed locally (deployed directly to Vast.ai)

WHAT TO DO ON YOUR OTHER MAC

Phase 0: Git Infrastructure Setup (REQUIRED FIRST)

NAS-centralized workflow means NOTHING local except instructions:

  1. Mount NAS (Finder: ⌘K):

    # Mount via NFS nfs://192.168.68.54/volume1/AgentPlatform # Or via Tailscale: nfs://blueflynas.tailcf98b3.ts.net/volume1/AgentPlatform # Verify mount ls /Volumes/AgentPlatform/ # Should show: repos/, worktrees/, wikis/
  2. Create minimal local directory:

    mkdir -p ~/Sites/blueflyio/.claude cd ~/Sites/blueflyio/
  3. Copy CLAUDE.md from NAS:

    # Option A: From wiki cp /Volumes/AgentPlatform/wikis/blueflyio/technical-docs.wiki/getting-started/CLAUDE.md ~/Sites/blueflyio/ # Option B: From this Mac (via Tailscale) scp bluefly.tailcf98b3.ts.net:~/Sites/blueflyio/CLAUDE.md ~/Sites/blueflyio/
  4. That's it! Everything else (repos, worktrees, wikis) is on NAS.

Complete setup guide: /Volumes/AgentPlatform/wikis/blueflyio/technical-docs.wiki/getting-started/nas-setup-guide.md

Also see: /tmp/other-computer-setup.md (quick reference created for Mac M3)


Phase 1: Docker Cleanup (Safe - Do This Second)

Run this single command to remove all unused Docker resources:

docker system prune -af --volumes

This removes:

  • Dangling images (untagged, unused)
  • Stopped containers
  • Build cache
  • Unused volumes
  • Unused networks

Expected recovery: 20-30 GB (varies by Mac)

What it does NOT remove:

  • Running containers
  • Images used by running containers
  • Volumes attached to running containers

Phase 2: Service-Specific Cleanup (After Verification)

A. Agent-Tracer Local Services

Verify it's on NAS first:

# Confirm it's running on NAS (ask primary Mac user to verify) kubectl get pods -n development | grep agent-tracer

If confirmed on NAS, remove local:

# Stop and remove local agent-tracer stack docker ps | grep tracer # Should show nothing after NAS deployment # If any tracer containers exist locally: docker stop $(docker ps -q --filter "name=tracer") docker rm $(docker ps -aq --filter "name=tracer") # Remove tracer images docker images | grep tracer | awk '{print $3}' | xargs docker rmi -f

B. Old Phoenix Images

Keep only latest, remove old versions:

# List Phoenix images docker images arizephoenix/phoenix # Remove all but the latest (they're running on NAS now) docker images arizephoenix/phoenix --format "{{.ID}} {{.CreatedAt}}" | tail -n +2 | awk '{print $1}' | xargs docker rmi -f

Expected recovery: ~5 GB

C. Old LibreChat Images

# Remove old versions docker images ghcr.io/danny-avila/librechat | grep '<none>' | awk '{print $3}' | xargs docker rmi -f

Expected recovery: ~4-5 GB

D. Old ClickHouse Images

docker images clickhouse/clickhouse-server | grep '<none>' | awk '{print $3}' | xargs docker rmi -f

Expected recovery: ~1-2 GB

E. Old Prometheus Images

docker images prom/prometheus | grep '<none>' | awk '{print $3}' | xargs docker rmi -f

Expected recovery: ~1 GB


Phase 3: GitLab Runner Cache Cleanup (If Applicable)

Check if GitLab runners are active:

docker ps | grep gitlab-runner

If NO runners active, clean up cache volumes:

docker volume ls | grep runner- | awk '{print $2}' | xargs docker volume rm

Expected recovery: 1-2 GB


Phase 4: Old Compose Stack Volumes (Selective)

Only remove if you've confirmed these services are on NAS/K3s:

CSMA Platform Volumes (if on K3s)

docker volume rm csma-platform_grafana-data \ csma-platform_langflow-data \ csma-platform_loki-data \ csma-platform_neo4j-data \ csma-platform_neo4j-logs \ csma-platform_phoenix-data \ csma-platform_postgres-data \ csma-platform_prometheus-data \ csma-platform_redis-data

Old Infrastructure Volumes

docker volume rm infrastructure_librechat_logs \ infrastructure_librechat_meilisearch_data \ infrastructure_librechat_mongo_data \ infrastructure_librechat_storage

Expected recovery: 5-10 GB


WHAT TO KEEP LOCALLY

Active Docker Compose Stacks (DO NOT REMOVE)

1. agent-router Stack

  • Status: Keep for local development/monitoring
  • Services: Grafana, Prometheus, Postgres, Redis, Qdrant, Mongo (10 services)
  • Purpose: LLM Platform infrastructure

2. agent-mesh Stack

  • Status: Keep for local KAGENT development
  • Services: postgres, redis, qdrant (3 services)
  • Purpose: KAGENT local services

3. DDEV Stacks (If Active)

  • ddev-drupal-demo - Keep if actively used for demos
  • ddev-ssh-agent - Keep (DDEV infrastructure)

SERVICES STATUS REFERENCE

Running on NAS (K3s Cluster)

ServiceNamespaceStatusAccess
agent-tracerdevelopmentRunninghttp://blueflynas:3007
KAGENT PlatformkagentRunning (23 pods)Various ports
PhoenixmonitoringRunningMonitoring stack

Running on Vast.ai

ServiceInstanceStatusAccess
LLMLingua30386215Runningssh -p 26214 root@ssh9.vast.ai

Deprecated Locally (Safe to Remove)

  • Agent-tracer local containers
  • Old Phoenix images
  • Old LibreChat images
  • Old ClickHouse images
  • Old Prometheus images
  • GitLab runner caches (if no active runners)
  • CSMA platform volumes (if on K3s)

COMPLETE CLEANUP COMMANDS

⚠️ FILE POLICY NOTE: The commands below were previously packaged as a .sh script, but .sh scripts are restricted per project FILE POLICY. Run these commands manually or via npm scripts instead.

Run these commands manually after verification:

# Complete Docker cleanup after migration to NAS and Vast.ai set -e echo "=== Phase 1: Remove Unused Resources ===" docker system prune -af --volumes echo "✅ Phase 1 Complete" echo "" echo "=== Phase 2: Remove Old Service Images ===" # Phoenix docker images arizephoenix/phoenix --format "{{.ID}}" | tail -n +2 | xargs -r docker rmi -f # LibreChat docker images ghcr.io/danny-avila/librechat | grep '<none>' | awk '{print $3}' | xargs -r docker rmi -f # ClickHouse docker images clickhouse/clickhouse-server | grep '<none>' | awk '{print $3}' | xargs -r docker rmi -f # Prometheus docker images prom/prometheus | grep '<none>' | awk '{print $3}' | xargs -r docker rmi -f echo "✅ Phase 2 Complete" echo "" echo "=== Phase 3: Remove GitLab Runner Caches ===" docker volume ls --format "{{.Name}}" | grep runner- | xargs -r docker volume rm echo "✅ Phase 3 Complete" echo "" echo "=== Phase 4: Remove Old Stack Volumes ===" # CSMA Platform (now on K3s) docker volume rm -f csma-platform_grafana-data \ csma-platform_langflow-data \ csma-platform_loki-data \ csma-platform_neo4j-data \ csma-platform_neo4j-logs \ csma-platform_phoenix-data \ csma-platform_postgres-data \ csma-platform_prometheus-data \ csma-platform_redis-data 2>/dev/null || true # Old infrastructure docker volume rm -f infrastructure_librechat_logs \ infrastructure_librechat_meilisearch_data \ infrastructure_librechat_mongo_data \ infrastructure_librechat_storage 2>/dev/null || true echo "✅ Phase 4 Complete" echo "" echo "=== Cleanup Summary ===" docker system df echo "" echo "✅ All cleanup complete!"

Usage: Copy and paste the commands above into your terminal, or package them in an npm script in package.json (preferred over .sh files).


VERIFICATION STEPS

1. Check What's Running Locally

docker ps

2. Check Disk Usage Before/After

docker system df

3. Verify Services on NAS (From Primary Mac)

# Check K3s pods kubectl get pods -n development kubectl get pods -n kagent kubectl get pods -n monitoring # Check NAS services curl http://blueflynas.tailcf98b3.ts.net:3007/health

4. Verify LLMLingua on Vast.ai

ssh -p 26214 root@ssh9.vast.ai 'curl -s http://localhost:8000/health'

EXPECTED DISK RECOVERY

PhaseRecoverySafety
Phase 1: Unused Resources20-30 GB✅ Safe
Phase 2: Old Images10-15 GB✅ Safe (after NAS verification)
Phase 3: Runner Caches1-2 GB✅ Safe (if no active runners)
Phase 4: Old Volumes5-10 GB✅ Safe (after NAS verification)
TOTAL35-55 GB-

TIMELINE

  1. Immediate (Safe): Run Phase 1 cleanup → ~20-30 GB recovery (5 minutes)
  2. After NAS Verification: Run Phase 2-4 cleanup → ~15-25 GB recovery (10 minutes)
  3. Total Active Work: 15-20 minutes
  4. Total Recovery: 35-55 GB

SUPPORT

Questions? Ask the primary Mac user (thomas.scola) to verify:

  • Services running on NAS K3s cluster
  • Services running on Vast.ai
  • What's safe to remove locally

NAS Access:

  • Host: blueflynas.tailcf98b3.ts.net
  • User: bluefly
  • Via Tailscale VPN

Vast.ai Access:

  • Instance: 30386215
  • SSH: ssh -p 26214 root@ssh9.vast.ai

SUMMARY

What Changed:

Git Infrastructure (2026-01-23):

  • ✅ All 67 repositories → NAS bare repos (/Volumes/AgentPlatform/repos/bare/blueflyio/)
  • ✅ All worktrees → NAS device namespaces (/Volumes/AgentPlatform/worktrees/[DEVICE]/)
  • ✅ All wikis → NAS only (/Volumes/AgentPlatform/wikis/blueflyio/)
  • ✅ Local worktrees removed (~/Sites/blueflyio/.worktrees/ - freed 500MB)
  • ✅ Local wikis removed (~/Sites/blueflyio/_WIKI/ - freed 2.7GB)
  • ✅ NAS-centralized worktree strategy APPROVED and DEPLOYED

Docker Services (2026-01-22):

  • ✅ Agent-tracer → NAS K3s
  • ✅ KAGENT Platform → NAS K3s
  • ✅ Phoenix → NAS K3s
  • ✅ LLMLingua → Vast.ai RTX 4090

What to Do on Other Mac:

  1. Phase 0: Mount NAS, create minimal local setup (REQUIRED FIRST)
  2. Phase 1: Run Docker cleanup immediately (safe, 20-30GB)
  3. Phase 2-4: Run Docker service cleanup after NAS verification (15-25GB)
  4. Total Recovery: 35-55 GB of disk space

What to Keep Locally:

  • ~/Sites/blueflyio/CLAUDE.md (instructions only)
  • ~/Sites/blueflyio/.claude/ (config)
  • Docker: agent-router stack, agent-mesh stack, active DDEV stacks
  • NO repos, NO worktrees, NO wikis - all on NAS

Benefits:

  • Work from ANY device (Mac M3, M4, code-server, phone, iPad)
  • Same files, same state, always in sync
  • Infrastructure independence achieved

Generated: 2026-01-23 Status: ✅ MIGRATION COMPLETE - 100% NAS-Based Infrastructure For: Complete infrastructure migration guide (git + Docker) Reference: See /Volumes/AgentPlatform/wikis/blueflyio/technical-docs.wiki/getting-started/nas-setup-guide.md