Skip to main content

Hosting Oracle vs Vast.ai

Hosting: Oracle vs Vast.ai for LLM Inference and Training

Related: technical-docs#172

Summary

RoleOracle (tunnel connector)Vast.aiNAS
LLM inferenceNo (calls out to Vast.ai or APIs)Yes (Ollama RTX 4090, vLLM)No (backup only)
Model trainingNoYes (A100/H100)No
Tunnel, MCP, GKG, meshYesNoBackup only
Always-on servicesPrimaryElastic GPUBackup

Oracle

  • Role: Tunnel connector; public ingress (Cloudflare Tunnel); MCP, GKG, mesh, LangFlow UI.
  • No GPU. All LLM inference is done by calling Vast.ai (Ollama/vLLM) or hosted APIs (OpenAI, Anthropic, Kimi) via foundation-bridge/agent-router.
  • Secrets: /root/.env.local on the tunnel host.

Vast.ai

  • Role: GPU compute for inference and training.
  • Ollama inference: RTX 4090; owner: @bluefly/foundation-bridge (src/providers/ollama/).
  • Embeddings: RTX 4090; owner: @bluefly/agent-brain.
  • Model training: A100/H100; code in models/ repo (training/).
  • Auto-scaling: @bluefly/agent-router (src/scaling/vastai.ts).
  • CI/CD: gitlab_components/templates/vastai-deploy/.

NAS

  • Backup only; not primary for LLM inference or training.
  • MinIO, PostgreSQL, Redis, A2A hub, code-server, etc.

Provider List (Routing)

  • Self-hosted GPU: Ollama/vLLM on Vast.ai.
  • Hosted APIs: OpenAI, Anthropic, Kimi (when integrated) — called from Oracle/tunnel host via foundation-bridge.

References

  • CLAUDE.md: Infrastructure, Vast.ai, Oracle, NAS layout.
  • separation-of-duties: Infrastructure ownership.