Common Pitfalls
Common Pitfalls
Separation of Duties: See Separation of Duties - Getting started guides document onboarding. They do NOT own agent manifests, execution, or infrastructure configuration.
Learn from common mistakes and how to avoid them when working with the LLM Platform, BuildKit, and OSSA agents.
Table of Contents
- Installation & Setup
- OSSA Agent Development
- BuildKit CLI Usage
- Drupal Development
- Workflow Orchestration
- Production Deployment
- Performance & Optimization
- Security
Installation & Setup
Pitfall: Using Docker Desktop Instead of OrbStack (macOS)
Problem:
# Slow file sync, high CPU usage, poor performance docker ps # Takes 5+ seconds
Solution:
# Switch to OrbStack for 10x better performance brew install orbstack # Uninstall Docker Desktop # macOS Applications Docker Uninstall # Verify OrbStack docker ps # Should be instant
Why it matters: Docker Desktop on macOS uses inefficient file mounting. OrbStack uses VirtIO-FS for native-speed file access.
Pitfall: Not Installing DDEV Addons
Problem:
ddev drush status # Command not found: drush
Solution:
# Install platform-specific DDEV addons (one-time) cd ~/Sites/LLM/llm-platform ./infrastructure/ddev-addons/install-addons.sh # Now available: ddev drush status ddev tddai check ddev git-safe commit
Pitfall: Wrong PHP Version
Problem:
composer install # Your requirements could not be resolved to an installable set of packages # drupal/core requires php >=8.3
Solution:
# Check PHP version php --version # macOS: Install PHP 8.3 brew install php@8.3 brew link php@8.3 --force --overwrite # Verify php --version # Should show 8.3.x
Pitfall: Missing Environment Variables
Problem:
buildkit agents deploy # Error: GITLAB_TOKEN not set
Solution:
# Store tokens in ~/.tokens/ mkdir -p ~/.tokens echo "your-gitlab-token" > ~/.tokens/gitlab chmod 600 ~/.tokens/gitlab # Set environment variables export GITLAB_URL="https://gitlab.com" export GITLAB_TOKEN=$(cat ~/.tokens/gitlab) # Persist in shell profile echo 'export GITLAB_URL="https://gitlab.com"' >> ~/.zshrc echo 'export GITLAB_TOKEN=$(cat ~/.tokens/gitlab)' >> ~/.zshrc
OSSA Agent Development
Pitfall: Invalid OSSA Manifest
Problem:
# agent.ossa.yaml ossaVersion: "0.4.9" agent: id: my-agent name: My Agent # Missing required fields!
Solution:
ossaVersion: "0.4.9" agent: id: my-agent # Required: DNS-1123 format name: My Agent # Required: Human-readable name version: "1.0.0" # Required: Semantic version role: worker # Required: worker, governor, critic, observer runtime: # Required type: local # Required node: version: "20.x" entrypoint: "dist/index.js" capabilities: # Required: At least one - name: process_data description: Process data input_schema: { type: object } output_schema: { type: object }
Validate:
ossa validate agent.ossa.yaml
Pitfall: Missing Capability Input/Output Schemas
Problem:
capabilities: - name: validate_code description: Validate code # Missing input_schema and output_schema!
Impact: Agents can't communicate, workflow orchestration breaks, no type safety.
Solution:
capabilities: - name: validate_code description: Validate code quality input_schema: type: object properties: files: type: array items: type: string description: List of file paths standards: type: string enum: [drupal, javascript, python] required: [files] output_schema: type: object properties: valid: type: boolean violations: type: array items: type: object summary: type: string required: [valid]
Pitfall: Not Handling Agent Errors
Problem:
// Agent crashes on error app.post('/capabilities/validate', async (req, res) => { const result = await validateCode(req.body.files); // Might throw! res.json(result); });
Solution:
app.post('/capabilities/validate', async (req, res) => { try { const { files, standards = 'javascript' } = req.body; if (!files || !Array.isArray(files)) { return res.status(400).json({ error: 'Invalid input: files array required', code: 'INVALID_INPUT' }); } const result = await validateCode(files, standards); res.json(result); } catch (error) { logger.error('Validation failed', { error: error.message, stack: error.stack }); res.status(500).json({ error: 'Validation failed', code: 'VALIDATION_ERROR', message: error.message }); } });
Pitfall: Ignoring Agent Health Checks
Problem:
# Kubernetes kills agent repeatedly kubectl get pods -n agents # agent-pod 0/1 CrashLoopBackOff
Solution:
// Add health check endpoint app.get('/health', (req, res) => { const health = { status: 'ok', agent: 'my-agent', version: '1.0.0', uptime: process.uptime(), memory: process.memoryUsage(), }; res.json(health); }); // Add readiness check app.get('/ready', async (req, res) => { try { // Check dependencies await checkDatabaseConnection(); await checkExternalServices(); res.json({ ready: true }); } catch (error) { res.status(503).json({ ready: false, error: error.message }); } });
Kubernetes config:
livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 15 periodSeconds: 5
BuildKit CLI Usage
Pitfall: Not Checking Agent Status Before Deployment
Problem:
buildkit agents deploy my-agent # Deploys broken agent to production!
Solution:
# Always validate first buildkit ossa validate agent.ossa.yaml # Test locally buildkit agents start my-agent --local # Run health check curl http://localhost:3000/health # Then deploy buildkit agents deploy my-agent --namespace agents
Pitfall: Hardcoding Configuration
Problem:
// Hardcoded values const DATABASE_URL = 'postgresql://user:password@localhost:5432/db'; const API_KEY = 'sk-1234567890';
Solution:
// Use environment variables const DATABASE_URL = process.env.DATABASE_URL; const API_KEY = process.env.API_KEY; // Validate at startup if (!DATABASE_URL || !API_KEY) { console.error('Missing required environment variables'); process.exit(1); }
Set in Kubernetes:
env: - name: DATABASE_URL valueFrom: secretKeyRef: name: db-credentials key: url - name: API_KEY valueFrom: secretKeyRef: name: api-credentials key: key
Pitfall: Not Using BuildKit Golden Commands
Problem:
# Doing manual work that BuildKit automates grep -r "TODO" src/ find . -name "*.ts" -exec eslint {} \; git add . && git commit -m "Update"
Solution:
# Use BuildKit golden commands instead buildkit golden audit # Comprehensive security + quality audit buildkit golden fix # Auto-fix issues buildkit golden test # Run all tests buildkit golden sync # Sync GitLab (issues + wiki) buildkit golden deploy --env dev # Deploy with checks
Drupal Development
Pitfall: Editing Composer-Managed Modules
Problem:
# Editing files in web/modules/custom/ cd $LLM_ROOT/llm-platform/web/modules/custom/llm nano llm.module # Changes will be LOST on composer install!
Solution:
# Edit source files instead cd $LLM_ROOT/all_drupal_custom/modules/llm nano llm.module # Sync to llm-platform buildkit drupal sync --modules # Or manually cd $LLM_ROOT/llm-platform composer update drupal/llm
Why: web/modules/custom/* is managed by Composer and will be overwritten.
Pitfall: Not Clearing Drupal Cache
Problem:
# Made changes but don't see them # Updated routing, added service, changed config
Solution:
# Always clear cache after changes ddev drush cr # Or use DDEV shortcut ddev restart
When to clear cache:
- After configuration import (
drush cim) - After module enable/disable
- After routing changes
- After service definition changes
- After pretty much anything!
Pitfall: Skipping Configuration Export
Problem:
# Made configuration changes in UI # Didn't export to code # Lost on next deployment!
Solution:
# After any UI configuration changes ddev drush cex -y # Commit changes git add config/ git commit -m "feat: update content type configuration"
Automate with Git hook:
# .git/hooks/pre-commit #!/bin/bash ddev drush cex -y git add config/
Workflow Orchestration
Pitfall: Missing Workflow Dependencies
Problem:
stages: - name: deploy # Forgot depends_on! steps: - name: deploy_to_prod agent: deployment-orchestrator
Impact: Deploy runs before tests complete, deploys broken code.
Solution:
stages: - name: validate steps: [...] - name: test depends_on: [validate] steps: [...] - name: deploy depends_on: [test] condition: "{{ stages.test.status == 'passed' }}" steps: [...]
Pitfall: No Timeout Configuration
Problem:
steps: - name: run_tests agent: test-runner # No timeout! Hangs forever if tests freeze.
Solution:
steps: - name: run_tests agent: test-runner capability: run_tests timeout: 10m # Fail after 10 minutes retry: max_attempts: 2 backoff: exponential
Pitfall: Not Handling Workflow Failures
Problem:
# No failure handling # Leaves deployments in inconsistent state
Solution:
on_workflow_failure: - name: rollback agent: deployment-orchestrator capability: rollback input: environment: "{{ env.ENVIRONMENT }}" - name: notify_team agent: slack-notifier capability: send_message input: channel: "#incidents" message: "Deployment failed: {{ workflow.error }}"
Production Deployment
Pitfall: No Resource Limits
Problem:
# Kubernetes deployment without limits spec: containers: - name: agent image: my-agent:latest # No resources! Agent can consume all cluster resources!
Solution:
spec: containers: - name: agent image: my-agent:latest resources: requests: cpu: 250m memory: 512Mi limits: cpu: 1000m memory: 2Gi
Pitfall: Missing Health Checks in Kubernetes
Problem:
# Kubernetes doesn't know if pod is healthy kubectl get pods # Pod shows Running but agent is crashed inside
Solution:
spec: containers: - name: agent livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 3 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 15 periodSeconds: 5 failureThreshold: 3
Pitfall: Not Using Persistent Volumes
Problem:
# Pod restarts, loses all data! kubectl delete pod my-agent-xyz # Files, databases, logs - all gone!
Solution:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: agent-storage spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: fast-ssd --- spec: containers: - name: agent volumeMounts: - name: storage mountPath: /data volumes: - name: storage persistentVolumeClaim: claimName: agent-storage
Pitfall: No SSL/TLS Configuration
Problem:
# Accessing service via HTTP curl http://agents.yourcompany.com # Insecure! Credentials sent in plaintext!
Solution:
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: agents-ingress annotations: cert-manager.io/cluster-issuer: letsencrypt-prod spec: tls: - hosts: - agents.yourcompany.com secretName: agents-tls rules: - host: agents.yourcompany.com http: paths: - path: / backend: service: name: agents port: number: 80
Performance & Optimization
Pitfall: Not Enabling Caching
Problem:
// Recalculates expensive operation on every request app.get('/data', async (req, res) => { const data = await expensiveCalculation(); // Takes 5 seconds! res.json(data); });
Solution:
import Redis from 'ioredis'; const redis = new Redis(process.env.REDIS_URL); app.get('/data', async (req, res) => { // Check cache first const cached = await redis.get('expensive-data'); if (cached) { return res.json(JSON.parse(cached)); } // Calculate and cache const data = await expensiveCalculation(); await redis.setex('expensive-data', 3600, JSON.stringify(data)); res.json(data); });
Pitfall: Blocking Event Loop
Problem:
// Synchronous file operations block event loop app.post('/process', (req, res) => { const files = fs.readdirSync('./large-directory'); // Blocks! files.forEach(file => { const content = fs.readFileSync(file); // Blocks! processFile(content); }); res.json({ done: true }); });
Solution:
app.post('/process', async (req, res) => { // Use async operations const files = await fs.promises.readdir('./large-directory'); await Promise.all( files.map(async file => { const content = await fs.promises.readFile(file); await processFile(content); }) ); res.json({ done: true }); });
Pitfall: Not Monitoring Memory Usage
Problem:
# Agent crashes with OOM (Out of Memory) kubectl get pods # agent-xyz 0/1 OOMKilled
Solution:
// Monitor memory usage setInterval(() => { const usage = process.memoryUsage(); const mbUsed = Math.round(usage.heapUsed / 1024 / 1024); logger.info('Memory usage', { heapUsed: mbUsed }); if (mbUsed > 1500) { // 1.5 GB threshold logger.warn('High memory usage', { heapUsed: mbUsed }); // Trigger cleanup, restart, or scale } }, 60000); // Check every minute
Security
Pitfall: Committing Secrets to Git
Problem:
# Committed .env file with secrets! git add .env git commit -m "Add config" git push # Secrets now in Git history forever!
Solution:
# Add to .gitignore echo ".env" >> .gitignore echo ".env.*" >> .gitignore echo "*.pem" >> .gitignore echo "*.key" >> .gitignore # Remove from Git history if already committed git filter-branch --force --index-filter \ "git rm --cached --ignore-unmatch .env" \ --prune-empty --tag-name-filter cat -- --all
Store secrets securely:
# Use ~/.tokens/ directory mkdir -p ~/.tokens chmod 700 ~/.tokens echo "secret-value" > ~/.tokens/service-name chmod 600 ~/.tokens/service-name # Reference in code const token = fs.readFileSync(path.join(os.homedir(), '.tokens', 'gitlab'), 'utf8').trim();
Pitfall: Not Validating Input
Problem:
// Vulnerable to injection attacks app.post('/execute', (req, res) => { const command = req.body.command; exec(command); // Command injection! });
Solution:
import validator from 'validator'; app.post('/execute', (req, res) => { const { command } = req.body; // Validate input if (!command || typeof command !== 'string') { return res.status(400).json({ error: 'Invalid command' }); } // Whitelist allowed commands const allowedCommands = ['test', 'build', 'deploy']; if (!allowedCommands.includes(command)) { return res.status(403).json({ error: 'Command not allowed' }); } // Sanitize and execute safely exec(validator.escape(command), { timeout: 30000 }, (error, stdout) => { if (error) { return res.status(500).json({ error: error.message }); } res.json({ output: stdout }); }); });
Pitfall: Missing Rate Limiting
Problem:
// No rate limiting - vulnerable to DDoS app.post('/webhook', async (req, res) => { await processWebhook(req.body); res.json({ received: true }); });
Solution:
import rateLimit from 'express-rate-limit'; const limiter = rateLimit({ windowMs: 15 * 60 * 1000, // 15 minutes max: 100, // Limit each IP to 100 requests per windowMs message: 'Too many requests, please try again later', }); app.post('/webhook', limiter, async (req, res) => { await processWebhook(req.body); res.json({ received: true }); });
Quick Reference: Troubleshooting Commands
# DDEV ddev describe # Show DDEV project info ddev logs # View container logs ddev restart # Restart containers ddev delete -O && ddev start # Nuclear option: rebuild everything # BuildKit buildkit agents status <name> # Check agent health buildkit agents logs <name> # View agent logs buildkit agents restart <name> # Restart agent buildkit ossa validate <file> # Validate OSSA manifest # Kubernetes kubectl get pods -n agents # List agent pods kubectl describe pod <pod> # Pod details kubectl logs <pod> --follow # Stream logs kubectl exec -it <pod> -- /bin/sh # Shell into pod # Drupal ddev drush status # Drupal status ddev drush cr # Clear cache ddev drush cex -y # Export config ddev drush cim -y # Import config ddev drush updb -y # Run database updates
Next Steps
- Review System Requirements for optimal setup
- Follow Development Setup for best practices
- Learn Production Deployment patterns