Skip to main content

README

Drupal AI Integration Guides

This directory contains comprehensive technical guides for integrating AI capabilities with Drupal, with a focus on vector database integration, semantic search, and retrieval-augmented generation (RAG).

Available Guides

Vector Databases Integration Guide

File: vector-databases-integration-guide.md

Complete reference for implementing vector database integration with Drupal, including:

  • Supported Databases:

    • ChromaDB (lightweight, local development)
    • Pinecone (managed SaaS service)
    • Milvus (open-source, high-performance)
    • Weaviate (with hybrid search capabilities)
  • Core Implementations:

    • Full PHP service classes for each vector database
    • Docker deployment configurations
    • API key management and security
  • Semantic Search:

    • Complete implementation from text extraction to ranking
    • Multiple embedding providers (OpenAI, Cohere, Sentence Transformers)
    • Snippet generation and result enrichment
  • RAG (Retrieval Augmented Generation):

    • End-to-end RAG implementation
    • Document chunking strategies (fixed-size, semantic, structured)
    • Context retrieval and ranking
    • LLM integration patterns
  • Configuration:

    • Drupal schema configuration
    • Services registration
    • Composer dependencies
    • Environment variable management
  • Performance Optimization:

    • Batch indexing strategies
    • Index pruning and maintenance
    • Query result caching
    • Vector deduplication
  • Troubleshooting:

    • Common issues and solutions
    • Performance tuning tips
    • Comparison matrix

Quick Start by Use Case

For Local Development

Start with ChromaDB:

  • No external services required
  • Docker Compose setup included
  • Perfect for prototyping
docker-compose up chromadb

For Production (Small-Medium Scale)

Use Pinecone or Milvus:

  • Pinecone: Managed service, zero ops
  • Milvus: Self-hosted, more control

For Complex Queries

Use Weaviate:

  • GraphQL API for complex queries
  • Hybrid search combining vector + keyword
  • Multi-reference relationships

Implementation Checklist

  • Choose vector database provider
  • Set up database infrastructure (local or cloud)
  • Install Drupal module dependencies
  • Configure API keys securely (use Key module)
  • Implement SemanticSearchService
  • Index existing Drupal content
  • Set up RAG service (optional)
  • Test search functionality
  • Monitor performance and adjust chunk sizes
  • Set up automated indexing hooks

Key Files in This Documentation

  1. Vector Database Services (ChromaDB, Pinecone, Milvus, Weaviate)

    • Production-ready PHP classes
    • Error handling and logging
    • Batch operations support
  2. Semantic Search Implementation

    • Node indexing service
    • Text extraction and chunking
    • Multi-field search support
    • Result ranking and enrichment
  3. RAG Service

    • Document ingestion with chunking
    • Context retrieval
    • LLM integration
    • Deduplication and ranking
  4. Configuration Management

    • Drupal schema definitions
    • Service registration
    • Environment variable handling
    • Secure key management

Database Comparison

FeatureChromaDBPineconeMilvusWeaviate
Setup5 min2 min15 min10 min
ScalingLimitedUnlimitedUnlimitedGood
CostFree$$FreeFree
GraphQLNoNoNoYes
GPU SupportNoNoYesNo
Best ForDev/PrototypeEnterpriseHigh-scaleComplex Queries

Integration Examples

$searchService = \Drupal::service('my_ai_module.semantic_search'); $results = $searchService->search('What is AI?', limit: 10); foreach ($results as $result) { echo $result['node']->getTitle(); echo "Score: " . round($result['similarity'] * 100) . "%"; }

RAG-Based Response Generation

$ragService = \Drupal::service('my_ai_module.rag'); $response = $ragService->generateResponse( query: 'Explain machine learning', retrievalFilters: ['content_type' => 'article'], llmOptions: ['temperature' => 0.7] );

Indexing a Node

$searchService = \Drupal::service('my_ai_module.semantic_search'); $node = Node::load($nid); $searchService->indexNode($node);

Performance Recommendations

  • Batch Size: 100-500 documents per batch (depends on vector dimension)
  • Chunk Size: 256-1024 tokens (aim for ~512)
  • Overlap: 10-20% of chunk size for continuity
  • Caching: Enable for queries repeated within 1 hour
  • Index Rebuild: Monthly for optimal performance

Security Considerations

  1. Store API keys using Drupal Key module or environment variables
  2. Implement access control on search endpoints
  3. Filter search results based on node access permissions
  4. Sanitize user input before embedding
  5. Monitor API key rotation and expiration

Monitoring & Debugging

  • Check logs in Recent Log Messages for errors
  • Use Drupal's Database Logging module
  • Monitor vector database health endpoints
  • Track embedding costs if using paid providers
  • Set up alerts for API failures

Additional Resources

  • See drupal-graphql-ai-integration-guide.md for GraphQL integration
  • See drupal-key-management-guide.md for secure API key handling
  • See drupal-eca-integration-guide.md for automated indexing workflows

Contributing

When adding new vector database providers:

  1. Extend the VectorDatabaseInterface
  2. Implement all required methods
  3. Add Docker Compose configuration
  4. Include security/performance recommendations
  5. Add comparison data to the matrix

Last Updated: January 8, 2026