README
Drupal AI Integration Guides
This directory contains comprehensive technical guides for integrating AI capabilities with Drupal, with a focus on vector database integration, semantic search, and retrieval-augmented generation (RAG).
Available Guides
Vector Databases Integration Guide
File: vector-databases-integration-guide.md
Complete reference for implementing vector database integration with Drupal, including:
-
Supported Databases:
- ChromaDB (lightweight, local development)
- Pinecone (managed SaaS service)
- Milvus (open-source, high-performance)
- Weaviate (with hybrid search capabilities)
-
Core Implementations:
- Full PHP service classes for each vector database
- Docker deployment configurations
- API key management and security
-
Semantic Search:
- Complete implementation from text extraction to ranking
- Multiple embedding providers (OpenAI, Cohere, Sentence Transformers)
- Snippet generation and result enrichment
-
RAG (Retrieval Augmented Generation):
- End-to-end RAG implementation
- Document chunking strategies (fixed-size, semantic, structured)
- Context retrieval and ranking
- LLM integration patterns
-
Configuration:
- Drupal schema configuration
- Services registration
- Composer dependencies
- Environment variable management
-
Performance Optimization:
- Batch indexing strategies
- Index pruning and maintenance
- Query result caching
- Vector deduplication
-
Troubleshooting:
- Common issues and solutions
- Performance tuning tips
- Comparison matrix
Quick Start by Use Case
For Local Development
Start with ChromaDB:
- No external services required
- Docker Compose setup included
- Perfect for prototyping
docker-compose up chromadb
For Production (Small-Medium Scale)
Use Pinecone or Milvus:
- Pinecone: Managed service, zero ops
- Milvus: Self-hosted, more control
For Complex Queries
Use Weaviate:
- GraphQL API for complex queries
- Hybrid search combining vector + keyword
- Multi-reference relationships
Implementation Checklist
- Choose vector database provider
- Set up database infrastructure (local or cloud)
- Install Drupal module dependencies
- Configure API keys securely (use Key module)
- Implement SemanticSearchService
- Index existing Drupal content
- Set up RAG service (optional)
- Test search functionality
- Monitor performance and adjust chunk sizes
- Set up automated indexing hooks
Key Files in This Documentation
-
Vector Database Services (ChromaDB, Pinecone, Milvus, Weaviate)
- Production-ready PHP classes
- Error handling and logging
- Batch operations support
-
Semantic Search Implementation
- Node indexing service
- Text extraction and chunking
- Multi-field search support
- Result ranking and enrichment
-
RAG Service
- Document ingestion with chunking
- Context retrieval
- LLM integration
- Deduplication and ranking
-
Configuration Management
- Drupal schema definitions
- Service registration
- Environment variable handling
- Secure key management
Database Comparison
| Feature | ChromaDB | Pinecone | Milvus | Weaviate |
|---|---|---|---|---|
| Setup | 5 min | 2 min | 15 min | 10 min |
| Scaling | Limited | Unlimited | Unlimited | Good |
| Cost | Free | $$ | Free | Free |
| GraphQL | No | No | No | Yes |
| GPU Support | No | No | Yes | No |
| Best For | Dev/Prototype | Enterprise | High-scale | Complex Queries |
Integration Examples
Basic Semantic Search
$searchService = \Drupal::service('my_ai_module.semantic_search'); $results = $searchService->search('What is AI?', limit: 10); foreach ($results as $result) { echo $result['node']->getTitle(); echo "Score: " . round($result['similarity'] * 100) . "%"; }
RAG-Based Response Generation
$ragService = \Drupal::service('my_ai_module.rag'); $response = $ragService->generateResponse( query: 'Explain machine learning', retrievalFilters: ['content_type' => 'article'], llmOptions: ['temperature' => 0.7] );
Indexing a Node
$searchService = \Drupal::service('my_ai_module.semantic_search'); $node = Node::load($nid); $searchService->indexNode($node);
Performance Recommendations
- Batch Size: 100-500 documents per batch (depends on vector dimension)
- Chunk Size: 256-1024 tokens (aim for ~512)
- Overlap: 10-20% of chunk size for continuity
- Caching: Enable for queries repeated within 1 hour
- Index Rebuild: Monthly for optimal performance
Security Considerations
- Store API keys using Drupal Key module or environment variables
- Implement access control on search endpoints
- Filter search results based on node access permissions
- Sanitize user input before embedding
- Monitor API key rotation and expiration
Monitoring & Debugging
- Check logs in
Recent Log Messagesfor errors - Use Drupal's Database Logging module
- Monitor vector database health endpoints
- Track embedding costs if using paid providers
- Set up alerts for API failures
Additional Resources
- See
drupal-graphql-ai-integration-guide.mdfor GraphQL integration - See
drupal-key-management-guide.mdfor secure API key handling - See
drupal-eca-integration-guide.mdfor automated indexing workflows
Contributing
When adding new vector database providers:
- Extend the VectorDatabaseInterface
- Implement all required methods
- Add Docker Compose configuration
- Include security/performance recommendations
- Add comparison data to the matrix
Last Updated: January 8, 2026