Hosting Oracle vs Vast.ai

Hosting: Oracle vs Vast.ai for LLM Inference and Training

Related: technical-docs#172

Summary

Role	Oracle (tunnel connector)	Vast.ai	NAS
LLM inference	No (calls out to Vast.ai or APIs)	Yes (Ollama RTX 4090, vLLM)	No (backup only)
Model training	No	Yes (A100/H100)	No
Tunnel, MCP, GKG, mesh	Yes	No	Backup only
Always-on services	Primary	Elastic GPU	Backup

Oracle

Role: Tunnel connector; public ingress (Cloudflare Tunnel); MCP, GKG, mesh, LangFlow UI.
No GPU. All LLM inference is done by calling Vast.ai (Ollama/vLLM) or hosted APIs (OpenAI, Anthropic, Kimi) via foundation-bridge/agent-router.
Secrets: /root/.env.local on the tunnel host.

Vast.ai

Role: GPU compute for inference and training.
Ollama inference: RTX 4090; owner: @bluefly/foundation-bridge (src/providers/ollama/).
Embeddings: RTX 4090; owner: @bluefly/agent-brain.
Model training: A100/H100; code in models/ repo (training/).
Auto-scaling: @bluefly/agent-router (src/scaling/vastai.ts).
CI/CD: gitlab_components/templates/vastai-deploy/.

NAS

Backup only; not primary for LLM inference or training.
MinIO, PostgreSQL, Redis, A2A hub, code-server, etc.

Provider List (Routing)

Self-hosted GPU: Ollama/vLLM on Vast.ai.
Hosted APIs: OpenAI, Anthropic, Kimi (when integrated) — called from Oracle/tunnel host via foundation-bridge.

References

CLAUDE.md: Infrastructure, Vast.ai, Oracle, NAS layout.
separation-of-duties: Infrastructure ownership.