Qdrant Runbook
Overview
- Purpose: Vector database for semantic search, similarity matching, and AI memory retrieval. Stores embeddings for agent memories, document chunks, and knowledge base entries.
- Port: 6333 (HTTP/gRPC), 6334 (gRPC only)
- Health endpoint:
GET /health or GET /readyz
- Namespace:
data (Kubernetes)
- Version: Qdrant 1.7+
Dependencies
- Persistent Volume (PVC) - Vector storage
- Agent Brain (port 3001) - Primary consumer
Collection Layout
| Collection | Purpose | Dimensions |
|---|
agent_memory | Agent long-term memory | 1536 (OpenAI) or 1024 (local) |
documents | Document embeddings for RAG | 1536 |
knowledge_base | Structured knowledge entries | 1536 |
conversation_history | Chat history embeddings | 1536 |
Common Issues
Issue 1: Collection Not Found
- Symptoms:
- 404 errors on vector operations
- "Collection does not exist" in logs
- Agent memory retrieval failing
- Cause:
- Collection not created
- Wrong collection name in configuration
- Collection deleted accidentally
- Resolution:
# List existing collections
curl http://localhost:6333/collections
# Create missing collection
curl -X PUT http://localhost:6333/collections/agent_memory \
-H "Content-Type: application/json" \
-d '{
"vectors": {
"size": 1536,
"distance": "Cosine"
},
"optimizers_config": {
"indexing_threshold": 20000
}
}'
# Verify collection exists
curl http://localhost:6333/collections/agent_memory
Issue 2: Slow Vector Search
- Symptoms:
- Search queries taking >1 second
- High CPU usage during search
- Memory retrieval timeouts
- Cause:
- Index not optimized
- Too many vectors in collection
- HNSW index not built
- Resolution:
# Check collection status
curl http://localhost:6333/collections/agent_memory
# Force index optimization
curl -X POST http://localhost:6333/collections/agent_memory/index \
-H "Content-Type: application/json" \
-d '{
"field_name": null,
"wait": true
}'
# Check if indexing is in progress
curl http://localhost:6333/collections/agent_memory | jq '.result.status'
# Reduce search scope with filters
# (Application-level optimization)
# Scale up resources if collection is large
kubectl set resources deployment/qdrant -n data \
--limits=cpu=4000m,memory=8Gi
Issue 3: Out of Memory
- Symptoms:
- OOMKilled events
- Qdrant crashing during indexing
- "Not enough memory" errors
- Cause:
- Large collection exceeding RAM
- HNSW index too large for memory
- Memory mapped files not working
- Resolution:
# Check memory usage
kubectl top pods -n data -l app=qdrant
# Check collection size
curl http://localhost:6333/collections/agent_memory | jq '.result.points_count'
# Enable on-disk storage for large collections
curl -X PATCH http://localhost:6333/collections/agent_memory \
-H "Content-Type: application/json" \
-d '{
"optimizers_config": {
"memmap_threshold": 10000
}
}'
# Reduce HNSW memory usage
curl -X PATCH http://localhost:6333/collections/agent_memory \
-H "Content-Type: application/json" \
-d '{
"hnsw_config": {
"on_disk": true
}
}'
# Increase memory limits
kubectl set resources deployment/qdrant -n data \
--limits=memory=16Gi
Issue 4: Write Failures
- Symptoms:
- Upsert operations failing
- "Write ahead log full" errors
- Vector insertions timing out
- Cause:
- Disk full
- WAL corruption
- Collection locked for optimization
- Resolution:
# Check disk usage
kubectl exec -it qdrant-0 -n data -- df -h /qdrant/storage
# Check collection status
curl http://localhost:6333/collections/agent_memory | jq '.result.status'
# If status is "yellow" or optimizing, wait or
# Clear old snapshots
kubectl exec -it qdrant-0 -n data -- ls -la /qdrant/storage/snapshots
kubectl exec -it qdrant-0 -n data -- rm -rf /qdrant/storage/snapshots/old_*
# Force WAL flush
curl -X POST http://localhost:6333/collections/agent_memory/points/flush
# Restart if WAL corrupted
kubectl rollout restart deployment/qdrant -n data
Issue 5: Embedding Dimension Mismatch
- Symptoms:
- "Vector size mismatch" errors
- 400 Bad Request on upsert
- Inconsistent search results
- Cause:
- Embedding model changed
- Wrong collection used
- Misconfigured client
- Resolution:
# Check expected dimension
curl http://localhost:6333/collections/agent_memory | jq '.result.config.params.vectors.size'
# Verify embedding from application
# (Check agent-brain configuration)
# If model changed, need to recreate collection and reindex
# 1. Backup existing data
curl -X POST "http://localhost:6333/collections/agent_memory/snapshots"
# 2. Delete and recreate with new dimension
curl -X DELETE http://localhost:6333/collections/agent_memory
curl -X PUT http://localhost:6333/collections/agent_memory \
-H "Content-Type: application/json" \
-d '{
"vectors": {
"size": 1024,
"distance": "Cosine"
}
}'
# 3. Trigger reindexing from source data
curl -X POST http://localhost:3001/api/v1/memory/reindex
Issue 6: Cluster Partition (Distributed Mode)
- Symptoms:
- Some shards unavailable
- Partial search results
- "Shard not available" errors
- Cause:
- Network partition between nodes
- Node crashed
- Replication lag
- Resolution:
# Check cluster status
curl http://localhost:6333/cluster
# Check peer status
curl http://localhost:6333/cluster/peers
# Force shard recovery
curl -X POST http://localhost:6333/collections/agent_memory/cluster/recover
# Remove unhealthy peer
curl -X DELETE http://localhost:6333/cluster/peer/{peer_id}
# Restart affected node
kubectl delete pod qdrant-{node} -n data
Restart Procedure
Graceful Restart (Recommended)
# 1. Check for ongoing operations
curl http://localhost:6333/collections | jq '.result.collections[].status'
# 2. Create snapshot for safety
curl -X POST "http://localhost:6333/collections/agent_memory/snapshots"
# 3. Wait for snapshot completion
sleep 30
# 4. Perform rolling restart
kubectl rollout restart deployment/qdrant -n data
# 5. Wait for ready
kubectl wait --for=condition=ready pod -l app=qdrant -n data --timeout=180s
# 6. Verify health
curl http://localhost:6333/health
curl http://localhost:6333/readyz
Emergency Restart
# Force restart
kubectl delete pod qdrant-0 -n data --force
# Wait for recovery
kubectl wait --for=condition=ready pod qdrant-0 -n data --timeout=180s
# Verify collection integrity
curl http://localhost:6333/collections/agent_memory | jq '.result.status'
# If corrupted, restore from snapshot
curl -X PUT "http://localhost:6333/collections/agent_memory/snapshots/recover" \
-H "Content-Type: application/json" \
-d '{"location": "/qdrant/storage/snapshots/agent_memory-SNAPSHOT_ID.snapshot"}'
Local Development Restart
# Docker
docker restart qdrant
# OrbStack
orb restart qdrant
# Using docker-compose
docker-compose restart qdrant
Logs Location
Kubernetes Logs
# Qdrant logs
kubectl logs -f deployment/qdrant -n data
# Filter for errors
kubectl logs deployment/qdrant -n data | grep -E "ERROR|WARN"
# Export logs
kubectl logs deployment/qdrant -n data > qdrant-logs-$(date +%Y%m%d).txt
Inside Container
# Check Qdrant storage directory
kubectl exec -it qdrant-0 -n data -- ls -la /qdrant/storage
# Check WAL files
kubectl exec -it qdrant-0 -n data -- ls -la /qdrant/storage/collections/agent_memory/0/wal/
Telemetry
# Get telemetry data
curl http://localhost:6333/telemetry
# Get detailed metrics
curl http://localhost:6333/metrics
Scaling
Vertical Scaling
# Increase resources for larger collections
kubectl set resources deployment/qdrant -n data \
--limits=cpu=8000m,memory=32Gi \
--requests=cpu=2000m,memory=8Gi
Horizontal Scaling (Distributed Mode)
# Scale to multiple nodes (requires cluster configuration)
kubectl scale statefulset/qdrant -n data --replicas=3
# Configure sharding
curl -X PUT http://localhost:6333/collections/agent_memory \
-H "Content-Type: application/json" \
-d '{
"vectors": {
"size": 1536,
"distance": "Cosine"
},
"shard_number": 3,
"replication_factor": 2
}'
Storage Scaling
# Expand PVC
kubectl patch pvc qdrant-storage -n data -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
Scaling Guidelines
| Metric | Threshold | Action |
|---|
| Memory Usage | > 80% | Increase memory, enable memmap |
| CPU Usage | > 70% | Add replicas, increase CPU |
| Search Latency P99 | > 500ms | Optimize index, add shards |
| Disk Usage | > 80% | Expand storage |
| Points Count | > 10M | Add shards, enable on-disk |
Alerts
| Alert | Condition | Runbook Action |
|---|
| QdrantDown | Cannot connect for 2min | Emergency Restart |
| CollectionUnavailable | Collection status "red" | Recover from snapshot |
| DiskFull | Disk > 95% | Expand storage, cleanup |
Warning Alerts (Slack)
| Alert | Condition | Runbook Action |
|---|
| HighMemory | Memory > 80% | Enable memmap, increase limit |
| SlowSearch | P99 > 500ms | Optimize index |
| IndexingStuck | Optimizing > 1hr | Check resources |
| ShardUnhealthy | Shard status != green | Recover shard |
Prometheus Alert Rules
groups:
- name: qdrant
rules:
- alert: QdrantDown
expr: up{job="qdrant"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Qdrant is down"
runbook_url: "https://gitlab.com/blueflyio/agent-platform/technical-docs/-/wikis/runbooks/qdrant"
- alert: QdrantHighMemory
expr: qdrant_memory_usage_bytes / qdrant_memory_limit_bytes > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Qdrant memory usage high"
- alert: QdrantSlowSearch
expr: histogram_quantile(0.99, rate(qdrant_search_duration_seconds_bucket[5m])) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "Qdrant search latency high"
Monitoring Dashboards
- Grafana:
https://grafana.local/d/qdrant
- Qdrant Dashboard:
http://localhost:6333/dashboard (built-in)
- Prometheus:
https://prometheus.local/graph?g0.expr=up{job="qdrant"}
Backup & Recovery
Create Snapshot
# Snapshot specific collection
curl -X POST "http://localhost:6333/collections/agent_memory/snapshots"
# List snapshots
curl http://localhost:6333/collections/agent_memory/snapshots
# Download snapshot
curl -O http://localhost:6333/collections/agent_memory/snapshots/{snapshot_name}
Restore from Snapshot
# Restore collection from snapshot
curl -X PUT "http://localhost:6333/collections/agent_memory/snapshots/recover" \
-H "Content-Type: application/json" \
-d '{
"location": "file:///qdrant/storage/snapshots/agent_memory-SNAPSHOT.snapshot"
}'
# Or from URL
curl -X PUT "http://localhost:6333/collections/agent_memory/snapshots/recover" \
-H "Content-Type: application/json" \
-d '{
"location": "https://backup-storage/snapshots/agent_memory.snapshot"
}'
Useful API Endpoints
# Health check
curl http://localhost:6333/health
curl http://localhost:6333/readyz
# Collection info
curl http://localhost:6333/collections
curl http://localhost:6333/collections/{name}
curl http://localhost:6333/collections/{name}/points/count
# Search example
curl -X POST http://localhost:6333/collections/agent_memory/points/search \
-H "Content-Type: application/json" \
-d '{
"vector": [0.1, 0.2, ...],
"limit": 10
}'
# Cluster info (distributed mode)
curl http://localhost:6333/cluster
curl http://localhost:6333/cluster/peers
# Metrics (Prometheus format)
curl http://localhost:6333/metrics
- On-call: PagerDuty rotation
- Slack: #platform-incidents
- Owner: AI/ML Team
- Agent Brain Runbook - Primary consumer
- PostgreSQL Runbook - Structured data
- Redis Runbook - Caching layer