Skip to main content

Webhook System Architecture

Webhook System Architecture

Status: Production Ready Version: 1.1.0 Last Updated: 2025-12-26 OpenAPI Spec: openapi/webhook-management.openapi.yaml Location: Moved from architecture/webhook-system.md to infrastructure/cloudflare/webhook-system.md


Overview

The BuildKit webhook system provides automated GitLab event handling through Cloudflare Tunnel. It enables agents to react to repository events (pushes, MRs, pipelines, issues) in real-time.


Critical Architecture Principle

Cloudflare Tunnel = Public Ingress ONLY Tailscale = Private Access ONLY These planes must NEVER be mixed.


How Cloudflare Tunnel Works (Important)

The domain api.blueflyagents.com does NOT point to your computer or network. It points to Cloudflare.

GitLab SaaS
   HTTPS POST
Cloudflare DNS (api.blueflyagents.com)
  
Cloudflare Edge (handles TLS, WAF, DDoS)
   (existing OUTBOUND tunnel from your Mac)
cloudflared (running on Mac M4)
  
http://localhost:3847

Key Properties:

  • cloudflared makes an OUTBOUND connection to Cloudflare
  • No inbound ports opened on your network
  • No router/NAT/port forwarding involved
  • Domain never resolves to your public IP
  • Works from home, hotel, coffee shop, LTE, Starlink
  • If your IP changes every 30 seconds, it still works

Architecture Diagram


                    PUBLIC INTERNET                              

                              
                         HTTPS POST
                              

                         GITLAB SaaS                             
  Projects: openstandardagents, agent-buildkit, gitlab_components

                              
                    api.blueflyagents.com
                              

                    CLOUDFLARE EDGE                              
   DNS Resolution                                               
   TLS Termination                                              
   WAF / DDoS Protection                                        
   Rate Limiting                                                

                              
                    OUTBOUND TLS Tunnel (443)
                    (initiated BY your Mac)
                              

                    cloudflared (Mac M4)                         
  Tunnel ID: f6da7bdf-d0f8-4796-a804-afb7984bbe11               
  Config: ~/.cloudflared/config.yml                              

                              
                    http://localhost:3847
                              

                   WEBHOOK SERVER                                
  Port: 3000                                                     
  Endpoint: /api/webhooks/gitlab                                
  Source: src/webhook-server.ts                                 

                              
                              

                   AGENT ROUTING                                 
  Push  CI Validation Agent                                    
  MR  Review Agent                                             
  Pipeline  Notification Agent                                 
  Issue  Triage Agent                                          


What NOT To Do (Critical)

Do NOTWhy
Use DDNS (tplinkdns.com)Requires inbound ports, exposes network
Point Cloudflare at Tailscale hostnamesCreates chain, breaks separation
Use *.cfargotunnel.com as service URLCreates tunnel-to-tunnel loop
Enable Tailscale Funnel for webhooksWrong tool for public ingress
Open inbound ports on routerUnnecessary attack surface
Use LAN IPs as service URLOnly works locally

Cloudflare vs Tailscale Separation

ToolResponsibilityTraffic Type
Cloudflare TunnelPublic webhook ingressGitLab localhost
TailscalePrivate admin/travel accessYou home network

Mental Model:

  • Cloudflare stops at localhost
  • Tailscale never accepts public traffic
  • No overlap between these planes

Quick Start

# One command to start everything buildkit webhook start

This command:

  1. Starts Cloudflare tunnel (cloudflared tunnel run agent-webhook)
  2. Starts webhook server on port 3001
  3. Runs in daemon mode (Ctrl+C to stop)

CLI Commands

Start System

buildkit webhook start # Start in foreground (daemon mode) buildkit webhook start --port 8080 # Custom port buildkit webhook start --detach # Background mode (WIP)

Stop System

buildkit webhook stop

Check Status

buildkit webhook status # Human-readable buildkit webhook status --json # JSON output

List Webhooks

buildkit webhook list # All projects buildkit webhook list -p 76265294 # Specific project buildkit webhook list --json # JSON output

Create Webhook

buildkit webhook create -p 76265294 buildkit webhook create -p 76265294 -u https://custom.example.com/webhook buildkit webhook create -p 76265294 -s "custom-secret-token"

Delete Webhook

buildkit webhook delete -p 76265294 -h 12345 buildkit webhook delete -p 76265294 -h 12345 -f # Force (no confirm)

Test Webhook

buildkit webhook test # Test default endpoint buildkit webhook test -e push # Test push event buildkit webhook test -e merge_request # Test MR event

Interactive Setup

buildkit webhook setup # Guided setup wizard

Configuration

Token Files

FilePurpose
~/.tokens/gitlabGitLab API token
~/.tokens/gitlab-webhook-secretWebhook secret
~/.cloudflared/config.ymlTunnel configuration
~/.cloudflared/*.jsonTunnel credentials

Environment Variables

GITLAB_WEBHOOK_SECRET=... # Webhook secret token PORT=3000 # Server port GITLAB_TOKEN=... # GitLab API token (for CRUD)

Cloudflare Tunnel Config

Located at ~/.cloudflared/config.yml:

tunnel: f6da7bdf-d0f8-4796-a804-afb7984bbe11 credentials-file: ~/.cloudflared/f6da7bdf-d0f8-4796-a804-afb7984bbe11.json ingress: - hostname: api.blueflyagents.com service: http://localhost:3847 - service: http_status:404

Group Webhooks (Dual Architecture)

Last Updated: 2025-12-21

The BlueFly group uses two webhooks for separation of concerns:

IDNameBranch FilterIssuesPurpose
66989164Milestone Reviewrelease/*OFFRelease workflow, milestone automation
67391807Agent Core IntakeALLONFull agent coverage (CI, MR, Issue, Pipeline)

Webhook 1: Milestone Review (Scoped)

Purpose: Release gating, milestone review

Description: Handles release/* branch events for milestone automation, release gating, and tag management. Scoped to release workflow only.

SettingValue
URLhttps://api.blueflyagents.com/api/webhooks/gitlab
Branch Filterrelease/*
Push EventsON
MR EventsON
Pipeline EventsON
Issues EventsOFF
Tags/ReleasesON
MilestonesON

Custom Template:

{ "source": "gitlab", "webhook": "milestone-review", "event": "{{object_kind}}", "project": "{{project.path_with_namespace}}", "ref": "{{ref}}", "sha": "{{checkout_sha}}", "user": "{{user_username}}", "action": "{{object_attributes.action}}", "state": "{{object_attributes.state}}", "iid": "{{object_attributes.iid}}", "title": "{{object_attributes.title}}", "url": "{{object_attributes.url}}", "milestone": "{{object_attributes.milestone.title}}", "routing": { "agent": "release-manager", "priority": "high" } }

Webhook 2: Agent Core Intake (Broad)

Purpose: Feed CI, MR, Issue, and general automation agents

Description: Primary intake for all agent automation. Receives push, MR, pipeline, and issue events from all branches. Routes to CI validation, MR reviewer, pipeline monitor, and issue triage agents.

SettingValue
URLhttps://api.blueflyagents.com/api/webhooks/gitlab
Branch FilterNONE (all branches)
Push EventsON
MR EventsON
Pipeline EventsON
Issues EventsON
Tags/ReleasesON

Custom Template:

{ "source": "gitlab", "webhook": "agent-core-intake", "event": "{{object_kind}}", "project": { "id": "{{project.id}}", "path": "{{project.path_with_namespace}}", "url": "{{project.web_url}}" }, "ref": "{{ref}}", "sha": "{{checkout_sha}}", "user": { "id": "{{user_id}}", "username": "{{user_username}}" }, "object": { "iid": "{{object_attributes.iid}}", "action": "{{object_attributes.action}}", "state": "{{object_attributes.state}}", "title": "{{object_attributes.title}}", "url": "{{object_attributes.url}}" }, "pipeline": { "id": "{{object_attributes.pipeline_id}}", "status": "{{object_attributes.status}}" }, "routing": { "push": "ci-validation-agent", "merge_request": "mr-reviewer-agent", "pipeline": "pipeline-monitor-agent", "issue": "issue-triage-agent", "tag_push": "release-manager-agent" } }

Agent Coverage Matrix

AgentWebhookStatus
CI ValidationAgent Core IntakeAll branches
MR ReviewerAgent Core IntakeAll MRs
Pipeline MonitorBothAll pipelines
Issue TriageAgent Core IntakeAll issues
Release ManagerMilestone ReviewRelease branches only

Why Two Webhooks?

  1. Clean separation: Release workflow isolated from noisy development events
  2. Lower blast radius: Release agent only sees release/* branches
  3. Easier auditing: Each webhook has clear responsibility
  4. Scalability: Add new webhooks for new concerns without touching existing ones

Legacy Project Webhooks (Reference Only)

ProjectIDWebhook ID
openstandardagents7626529467331295
agent-buildkit7627074467331296
gitlab_components7626714267331297

Note: Group-level webhooks cover all projects. Per-project webhooks only needed for project-specific routing.

Supported Events

Event TypeHandlerAgent
Push HookLog + routeCI Validation
Merge Request HookLog + routeMR Reviewer Auto-Fix System
Pipeline HookLog + routePipeline Monitor Auto-Fix System
Issue HookLog + routeIssue Triage
Note HookLog + routeComment Handler
Tag Push HookLog + routeRelease Manager
Deployment HookLog + routeDeploy Monitor
Release HookLog + routeRelease Notes

GitLab Webhook Auto-Fix System

Status: Production Ready
Version: 1.0.0
Last Updated: 2026-01-07

Overview

The GitLab Webhook Auto-Fix System automatically detects and fixes broken merge requests using 16 canonical agents, orchestrated via task-dispatcher, with vast.ai cloud execution for GPU-intensive operations.

Architecture

The system consists of:

  1. GitLab Webhook Receiver

    • Agent-Mesh Endpoint: /webhooks/gitlab/group/:groupId (handles group-level webhooks)
    • Legacy Endpoint: /api/webhooks/gitlab (project-level webhooks)
    • Location: common_npm/agent-mesh/src/server.ts
    • Validates GitLab webhook signatures (optional for test requests)
    • Accepts test requests (empty payloads) and returns 200 OK
    • Routes MR events to MR Analysis Service via duo-gateway
  2. MR Analysis Service (agent-buildkit/src/services/mr-analysis.service.ts)

    • Detects MR issues: conflicts, failing pipelines, review comments, code quality issues
    • Classifies issue types and routes to appropriate agents
    • Returns structured analysis with recommended agents
  3. MR Remediation Orchestrator (agent-buildkit/src/services/mr-remediation-orchestrator.service.ts)

    • Orchestrates multi-agent MR fix workflow
    • Coordinates parallel and sequential agent execution
    • Uses agent-mesh (common_npm/agent-mesh) for A2A communication
    • Each agent communicates via A2A protocol with structured messages
  4. Webhook Manager (agent-buildkit/src/infrastructure/webhook-manager.service.ts)

    • Manages GitLab webhook lifecycle
    • Auto-creates/updates webhooks via GitLab API
    • Health checks and automatic re-registration
  5. Service Account Integration (agent-buildkit/src/services/gitlab/service-account-token.service.ts)

    • Manages GitLab service account tokens per agent
    • Each agent uses its own service account (from OSSA manifests)
    • Token rotation and refresh
    • Agents appear as real developers in GitLab (username, email, display_name)
  6. Activity Stream Service (agent-buildkit/src/services/gitlab/gitlab-activity-stream.service.ts)

    • Posts GitLab issue/MR comments for each agent action
    • Creates activity stream showing agent-to-agent communication
    • Tracks agent actions: task assignment, status updates, results
    • Format: @agent-name @target-agent: [action] - [details]

Agent Workflow

GitLab Webhook  MR Analysis  Task Dispatcher  Agent Chain  Vast.ai Execution  GitLab Update
                                                                    
                                                          Activity Stream (GitLab Comments)

Agent-to-Agent Communication Flow

The system uses agent-mesh for A2A communication with full activity stream visibility:

  1. Task Assignment: @task-dispatcher @mr-reviewer: Assigned review task for !123
  2. Agent Action: @mr-reviewer: Completed review - Found 3 issues, 2 auto-fixable
  3. Agent Result: @pipeline-remediation: Fixed pipeline failure in job test:unit``
  4. A2A Message: @vulnerability-scanner @mr-reviewer: Security scan complete - 0 vulnerabilities
  5. Status Update: @task-dispatcher: All agents completed - MR ready for merge

Agent Mapping

Issue TypePrimary AgentSupporting AgentsVast.ai Required
ConflictsTask Dispatcher MR ReviewerPipeline RemediationNo
Pipeline FailuresPipeline RemediationCode Quality ReviewerMaybe
Review CommentsMR ReviewerCode Quality Reviewer, Drupal Standards CheckerNo
Code QualityCode Quality ReviewerOSSA ValidatorNo
Security IssuesVulnerability ScannerMR ReviewerYes
Missing ModulesModule GeneratorRecipe PublisherNo
Recipe IssuesRecipe PublisherModule GeneratorNo
Release CoordinationRelease CoordinatorIssue Lifecycle ManagerNo
DocumentationDocumentation Aggregator-No
InfrastructureCluster OperatorKagent Catalog SyncMaybe
Cost MonitoringCost Intelligence Monitor-No

Service Account Integration

  • Per-Agent Tokens: Each agent uses its own GitLab service account token
  • Service Account Mapping: Load from OSSA manifests (registry.yaml)
    • Example: pipeline-remediation uses service account ID 31840513
    • Example: merge-request-reviewer uses username merge-request-reviewer
  • GitLab Identity: Agents appear as real developers in GitLab
    • Username: from service_account.username in manifest
    • Email: from service_account.email in manifest
    • Display Name: from service_account.display_name in manifest
  • Token Management:
    • Load tokens from env vars (per agent: GITLAB_{AGENT_NAME}_TOKEN)
    • Fallback to vault: secret/agents/{agent-name}/gitlab_token
    • Auto-refresh and rotation per agent's rotation policy

Activity Stream

  • GitLab Comments: Each agent action posts a GitLab comment using the agent's own service account token
  • MR Comments: For MR-related actions (fixes, reviews, rebases) - posted directly to MR
  • Issue Comments: For issue-related actions (analysis, implementation) - posted to related issue
  • A2A Communication: Agent-to-agent messages posted as threaded comments showing the communication flow
  • Comment Format:
    • Task Assignment: @task-dispatcher @mr-reviewer: Assigned MR review task for !123
    • Agent Action: @mr-reviewer: Completed review - Found 3 issues, 2 auto-fixable
    • Agent Result: @pipeline-remediation: Fixed pipeline failure in job test:unit``
    • A2A Message: @vulnerability-scanner @mr-reviewer: Security scan complete - 0 vulnerabilities
    • Status Update: @task-dispatcher: All agents completed - MR ready for merge
  • Threading: Related comments are threaded (reply to parent comment) to show conversation flow
  • Agent Attribution: Each comment shows which agent posted it (via service account username)
  • Timestamps: All comments include timestamps for full audit trail
  • Aggregation: Activity stream service aggregates all agent actions per MR/issue
  • Visibility: Full audit trail in GitLab UI, searchable and filterable

Implementation Files

PathDescription
agent-buildkit/src/servers/gitlab-webhook-server.tsGitLab webhook handler
agent-buildkit/src/infrastructure/webhook-manager.service.tsWebhook lifecycle management
agent-buildkit/src/services/mr-analysis.service.tsMR issue detection
agent-buildkit/src/services/mr-remediation-orchestrator.service.tsMulti-agent orchestration
agent-buildkit/src/infrastructure/gitlab-service-account-manager.service.tsPer-agent token management
agent-buildkit/src/services/gitlab/gitlab-activity-stream.service.tsActivity stream via GitLab comments
agent-buildkit/src/integrations/gitlab-mr-fix.service.tsMR operations with service accounts
agent-buildkit/src/infrastructure/vastai-agent-deployer.service.tsVast.ai deployment
agent-buildkit/src/services/cost-aware-agent-router.service.tsCost-aware routing
agent-buildkit/src/services/mr-fix-metrics.service.tsMetrics and monitoring

Cost Management

  • Use vastai-cost-enforcement.service.ts for admission control
  • Set budgets per trigger ID (e.g., mr-auto-fix-{projectId})
  • Prefer local execution for non-GPU tasks
  • Only use vast.ai for: vulnerability scanning, complex code analysis, GPU-accelerated operations
  • Monitor costs via cost-intelligence-monitor agent

Security & Compliance

  • Webhook signature verification (required)
  • Service account token rotation (per-agent rotation policies)
  • Audit logging for all MR modifications (via activity stream)
  • Approval gates for production branches
  • Rate limiting per project/MR
  • Each agent action attributed to its service account (full traceability)

Files

PathDescription
src/webhook-server.tsMinimal webhook server
src/services/webhook/webhook.service.tsCRUD + management service
src/cli/commands/webhook.command.tsCLI commands
openapi/webhook-management.openapi.yamlOpenAPI specification
  • #137 - Configure project-migrator agent webhook integration
  • #147 - Multi-Machine K3s Cluster Setup

Security

  • HMAC SHA-256 signature verification
  • Token stored in ~/.tokens/gitlab-webhook-secret
  • SSL verification enabled
  • Rate limiting (100 req/min per project)

Group Webhook Endpoint Fix (2026-01-08)

Issue: GitLab group webhook failing with "Ensure the project has merge requests" error

Root Cause:

  • Webhook URL: https://mesh.bluefly.internal/webhooks/gitlab/group/blueflyio
  • Agent-mesh only had /api/webhooks/gitlab endpoint
  • GitLab test requests were failing because endpoint didn't exist

Fix Applied:

  • Added /webhooks/gitlab/group/:groupId route to common_npm/agent-mesh/src/server.ts
  • Endpoint now accepts test requests (empty payloads) and returns 200 OK
  • Made token verification optional for test requests
  • Routes real events properly via duo-gateway

Deployment:

  • Fix committed to feature/cli-tunnel-commands branch
  • Will auto-deploy when merged to release/v0.1.x or main
  • Webhook will work once agent-mesh is redeployed

Verification:

# Test the endpoint curl -X POST https://mesh.bluefly.internal/webhooks/gitlab/group/blueflyio \ -H "Content-Type: application/json" \ -d '{}' # Should return: {"status":"ok","message":"Webhook endpoint is active"}

Troubleshooting

502 Bad Gateway Error

Symptoms:

  • GitLab webhook returns "Hook executed successfully but returned HTTP 502"
  • Cloudflare shows "Bad gateway" error page
  • Error mentions api.blueflyagents.com | 502: Bad gateway

Diagnosis Steps:

# 1. Check if webhook server is running curl -s http://localhost:3847/health # Should return: {"status":"healthy","timestamp":"..."} # 2. Check what's listening on port 3000 lsof -i :3001 # 3. Check cloudflared tunnel status cloudflared tunnel list # 4. Check local cloudflared config cat ~/.cloudflared/config.yml # Should show: service: http://localhost:3847

Common Causes:

CauseSolution
Webhook server not runningStart it: npx tsx webhook-server.ts or buildkit webhook start
Cloudflare Dashboard has wrong service URLFix in dashboard: set to http://localhost:3847
cloudflared not runningStart it: cloudflared tunnel run agent-webhook
Dashboard overrides local configCheck Cloudflare Zero Trust Tunnels Public Hostnames

The Most Common Issue (December 2025):

Cloudflare Dashboard remotely configured wrong service URL:

  • Wrong: http://f6da7bdf-...cfargotunnel.com (tunnel-to-tunnel loop)
  • Correct: http://localhost:3847

Fix: Go to Cloudflare Dashboard Tunnels Edit Public Hostnames Change service to http://localhost:3847


Tunnel not connecting

# Check tunnel status cloudflared tunnel info agent-webhook # Recreate if needed cloudflared tunnel delete agent-webhook cloudflared tunnel create agent-webhook cloudflared tunnel route dns agent-webhook api.blueflyagents.com

Webhook returning 401

  1. Check secret matches: cat ~/.tokens/gitlab-webhook-secret
  2. Verify GitLab webhook token matches
  3. Restart server with correct secret

Port already in use

lsof -i :3001 kill -9 <PID>

Preflight Checklist (Regression Prevention)

Run this checklist anytime webhooks break.

Cloudflare Tunnel Checks

  • Tunnel service URL = http://localhost:3847
  • No cfargotunnel.com targets
  • No Tailscale hostnames referenced
  • Tunnel is connected (cloudflared tunnel list)
  • cloudflared running as service
  • Origin service bound to 127.0.0.1:3001

DNS / Domain Checks

  • api.blueflyagents.com resolves to Cloudflare (not public IP)
  • No DDNS domains used for production ingress
  • No router port forwards exist

Tailscale Checks

  • Tailscale Serve is tailnet-only (no Funnel for webhooks)
  • GitLab is NOT referenced anywhere in Tailscale config
  • Tailscale and Cloudflare are separate planes

Mental Model Check

Ask: "If my public IP changed every 30 seconds, would this still work?"

  • If NO something is misconfigured
  • If YES architecture is correct