Skip to main content

gpu cluster status

Vast.ai GPU Cluster Status

AUTHORITATIVE SOURCE: BULLETPROOF_VASTAI_PLAN.md

Complete Implementation Plan: See BULLETPROOF_VASTAI_PLAN.md for full details including Cloudflare Tunnel + Tailscale integration, agent-docker service, and CI/CD components.

Last Updated: 2026-01-04

Active Instances

Instance IDGPUIP (Tailscale)HostnameCost/hrStatus
29484611RTX 4090 (24GB)100.113.211.78vastai-gpu-worker-1$0.25Running

Service Discovery Registry

API Endpoint: https://mesh.bluefly.internal/api/v1/vastai/registry

Query active instances:

curl https://mesh.bluefly.internal/api/v1/vastai/registry?environment=prod

Register instance:

curl -X POST https://mesh.bluefly.internal/api/v1/vastai/registry/register \ -H "Content-Type: application/json" \ -d @instance-payload.json

OpenAPI Spec: See common_npm/agent-mesh/openapi/vastai-registry.openapi.yml

Network Configuration

  • Tailscale Mesh: bluefly tailnet
  • SSH Access: ssh root@100.113.211.78 (via Tailscale)
  • Public SSH: ssh -p 14610 root@ssh5.vast.ai
  • Registry API: https://mesh.bluefly.internal/api/v1/vastai/registry

Environment Variables Required

# Vast.ai API tokens (set in CI/CD or .env) VASTAI_CLUSTER_OP_KEY= # Instance management VASTAI_COST_MONITOR_KEY= # Billing/cost access VASTAI_TASK_DISPATCH_KEY= # Task coordination # Tailscale (optional - for automated joining) TAILSCALE_AUTHKEY= # Pre-auth key for mesh join

Quick Commands

# List instances via registry API curl https://mesh.bluefly.internal/api/v1/vastai/registry # SSH via Tailscale (preferred) ssh root@100.113.211.78 # SSH via public proxy ssh -p 14610 root@ssh5.vast.ai # Check GPU status ssh root@100.113.211.78 'nvidia-smi' # Heartbeat (keeps instance in registry) curl -X POST https://mesh.bluefly.internal/api/v1/vastai/registry/29484611/heartbeat
  • cluster-operator - Instance lifecycle management
  • cost-intelligence-monitor - Cost tracking and optimization
  • task-dispatcher - Workload distribution

Event Types

All instances emit canonical events (see agent-router/src/infrastructure/deployment/vastai/events.ts):

  • vastai.instance.created
  • vastai.instance.ready
  • vastai.instance.terminated
  • vastai.mesh.registered
  • vastai.mesh.heartbeat

Last updated: 2026-01-04