Webhook System Architecture
Webhook System Architecture
Status: Production Ready
Version: 1.1.0
Last Updated: 2025-12-26
OpenAPI Spec: openapi/webhook-management.openapi.yaml
Location: Moved from architecture/webhook-system.md to infrastructure/cloudflare/webhook-system.md
Overview
The BuildKit webhook system provides automated GitLab event handling through Cloudflare Tunnel. It enables agents to react to repository events (pushes, MRs, pipelines, issues) in real-time.
Critical Architecture Principle
Cloudflare Tunnel = Public Ingress ONLY Tailscale = Private Access ONLY These planes must NEVER be mixed.
How Cloudflare Tunnel Works (Important)
The domain api.blueflyagents.com does NOT point to your computer or network.
It points to Cloudflare.
GitLab SaaS
HTTPS POST
Cloudflare DNS (api.blueflyagents.com)
Cloudflare Edge (handles TLS, WAF, DDoS)
(existing OUTBOUND tunnel from your Mac)
cloudflared (running on Mac M4)
http://localhost:3847
Key Properties:
cloudflaredmakes an OUTBOUND connection to Cloudflare- No inbound ports opened on your network
- No router/NAT/port forwarding involved
- Domain never resolves to your public IP
- Works from home, hotel, coffee shop, LTE, Starlink
- If your IP changes every 30 seconds, it still works
Architecture Diagram
PUBLIC INTERNET
HTTPS POST
GITLAB SaaS
Projects: openstandardagents, agent-buildkit, gitlab_components
api.blueflyagents.com
CLOUDFLARE EDGE
DNS Resolution
TLS Termination
WAF / DDoS Protection
Rate Limiting
OUTBOUND TLS Tunnel (443)
(initiated BY your Mac)
cloudflared (Mac M4)
Tunnel ID: f6da7bdf-d0f8-4796-a804-afb7984bbe11
Config: ~/.cloudflared/config.yml
http://localhost:3847
WEBHOOK SERVER
Port: 3000
Endpoint: /api/webhooks/gitlab
Source: src/webhook-server.ts
AGENT ROUTING
Push CI Validation Agent
MR Review Agent
Pipeline Notification Agent
Issue Triage Agent
What NOT To Do (Critical)
| Do NOT | Why |
|---|---|
| Use DDNS (tplinkdns.com) | Requires inbound ports, exposes network |
| Point Cloudflare at Tailscale hostnames | Creates chain, breaks separation |
Use *.cfargotunnel.com as service URL | Creates tunnel-to-tunnel loop |
| Enable Tailscale Funnel for webhooks | Wrong tool for public ingress |
| Open inbound ports on router | Unnecessary attack surface |
| Use LAN IPs as service URL | Only works locally |
Cloudflare vs Tailscale Separation
| Tool | Responsibility | Traffic Type |
|---|---|---|
| Cloudflare Tunnel | Public webhook ingress | GitLab localhost |
| Tailscale | Private admin/travel access | You home network |
Mental Model:
- Cloudflare stops at
localhost - Tailscale never accepts public traffic
- No overlap between these planes
Quick Start
# One command to start everything buildkit webhook start
This command:
- Starts Cloudflare tunnel (
cloudflared tunnel run agent-webhook) - Starts webhook server on port 3001
- Runs in daemon mode (Ctrl+C to stop)
CLI Commands
Start System
buildkit webhook start # Start in foreground (daemon mode) buildkit webhook start --port 8080 # Custom port buildkit webhook start --detach # Background mode (WIP)
Stop System
buildkit webhook stop
Check Status
buildkit webhook status # Human-readable buildkit webhook status --json # JSON output
List Webhooks
buildkit webhook list # All projects buildkit webhook list -p 76265294 # Specific project buildkit webhook list --json # JSON output
Create Webhook
buildkit webhook create -p 76265294 buildkit webhook create -p 76265294 -u https://custom.example.com/webhook buildkit webhook create -p 76265294 -s "custom-secret-token"
Delete Webhook
buildkit webhook delete -p 76265294 -h 12345 buildkit webhook delete -p 76265294 -h 12345 -f # Force (no confirm)
Test Webhook
buildkit webhook test # Test default endpoint buildkit webhook test -e push # Test push event buildkit webhook test -e merge_request # Test MR event
Interactive Setup
buildkit webhook setup # Guided setup wizard
Configuration
Token Files
| File | Purpose |
|---|---|
~/.tokens/gitlab | GitLab API token |
~/.tokens/gitlab-webhook-secret | Webhook secret |
~/.cloudflared/config.yml | Tunnel configuration |
~/.cloudflared/*.json | Tunnel credentials |
Environment Variables
GITLAB_WEBHOOK_SECRET=... # Webhook secret token PORT=3000 # Server port GITLAB_TOKEN=... # GitLab API token (for CRUD)
Cloudflare Tunnel Config
Located at ~/.cloudflared/config.yml:
tunnel: f6da7bdf-d0f8-4796-a804-afb7984bbe11 credentials-file: ~/.cloudflared/f6da7bdf-d0f8-4796-a804-afb7984bbe11.json ingress: - hostname: api.blueflyagents.com service: http://localhost:3847 - service: http_status:404
Group Webhooks (Dual Architecture)
Last Updated: 2025-12-21
The BlueFly group uses two webhooks for separation of concerns:
| ID | Name | Branch Filter | Issues | Purpose |
|---|---|---|---|---|
| 66989164 | Milestone Review | release/* | OFF | Release workflow, milestone automation |
| 67391807 | Agent Core Intake | ALL | ON | Full agent coverage (CI, MR, Issue, Pipeline) |
Webhook 1: Milestone Review (Scoped)
Purpose: Release gating, milestone review
Description: Handles release/* branch events for milestone automation, release gating, and tag management. Scoped to release workflow only.
| Setting | Value |
|---|---|
| URL | https://api.blueflyagents.com/api/webhooks/gitlab |
| Branch Filter | release/* |
| Push Events | ON |
| MR Events | ON |
| Pipeline Events | ON |
| Issues Events | OFF |
| Tags/Releases | ON |
| Milestones | ON |
Custom Template:
{ "source": "gitlab", "webhook": "milestone-review", "event": "{{object_kind}}", "project": "{{project.path_with_namespace}}", "ref": "{{ref}}", "sha": "{{checkout_sha}}", "user": "{{user_username}}", "action": "{{object_attributes.action}}", "state": "{{object_attributes.state}}", "iid": "{{object_attributes.iid}}", "title": "{{object_attributes.title}}", "url": "{{object_attributes.url}}", "milestone": "{{object_attributes.milestone.title}}", "routing": { "agent": "release-manager", "priority": "high" } }
Webhook 2: Agent Core Intake (Broad)
Purpose: Feed CI, MR, Issue, and general automation agents
Description: Primary intake for all agent automation. Receives push, MR, pipeline, and issue events from all branches. Routes to CI validation, MR reviewer, pipeline monitor, and issue triage agents.
| Setting | Value |
|---|---|
| URL | https://api.blueflyagents.com/api/webhooks/gitlab |
| Branch Filter | NONE (all branches) |
| Push Events | ON |
| MR Events | ON |
| Pipeline Events | ON |
| Issues Events | ON |
| Tags/Releases | ON |
Custom Template:
{ "source": "gitlab", "webhook": "agent-core-intake", "event": "{{object_kind}}", "project": { "id": "{{project.id}}", "path": "{{project.path_with_namespace}}", "url": "{{project.web_url}}" }, "ref": "{{ref}}", "sha": "{{checkout_sha}}", "user": { "id": "{{user_id}}", "username": "{{user_username}}" }, "object": { "iid": "{{object_attributes.iid}}", "action": "{{object_attributes.action}}", "state": "{{object_attributes.state}}", "title": "{{object_attributes.title}}", "url": "{{object_attributes.url}}" }, "pipeline": { "id": "{{object_attributes.pipeline_id}}", "status": "{{object_attributes.status}}" }, "routing": { "push": "ci-validation-agent", "merge_request": "mr-reviewer-agent", "pipeline": "pipeline-monitor-agent", "issue": "issue-triage-agent", "tag_push": "release-manager-agent" } }
Agent Coverage Matrix
| Agent | Webhook | Status |
|---|---|---|
| CI Validation | Agent Core Intake | All branches |
| MR Reviewer | Agent Core Intake | All MRs |
| Pipeline Monitor | Both | All pipelines |
| Issue Triage | Agent Core Intake | All issues |
| Release Manager | Milestone Review | Release branches only |
Why Two Webhooks?
- Clean separation: Release workflow isolated from noisy development events
- Lower blast radius: Release agent only sees release/* branches
- Easier auditing: Each webhook has clear responsibility
- Scalability: Add new webhooks for new concerns without touching existing ones
Legacy Project Webhooks (Reference Only)
| Project | ID | Webhook ID |
|---|---|---|
| openstandardagents | 76265294 | 67331295 |
| agent-buildkit | 76270744 | 67331296 |
| gitlab_components | 76267142 | 67331297 |
Note: Group-level webhooks cover all projects. Per-project webhooks only needed for project-specific routing.
Supported Events
| Event Type | Handler | Agent |
|---|---|---|
| Push Hook | Log + route | CI Validation |
| Merge Request Hook | Log + route | MR Reviewer Auto-Fix System |
| Pipeline Hook | Log + route | Pipeline Monitor Auto-Fix System |
| Issue Hook | Log + route | Issue Triage |
| Note Hook | Log + route | Comment Handler |
| Tag Push Hook | Log + route | Release Manager |
| Deployment Hook | Log + route | Deploy Monitor |
| Release Hook | Log + route | Release Notes |
GitLab Webhook Auto-Fix System
Status: Production Ready
Version: 1.0.0
Last Updated: 2026-01-07
Overview
The GitLab Webhook Auto-Fix System automatically detects and fixes broken merge requests using 16 canonical agents, orchestrated via task-dispatcher, with vast.ai cloud execution for GPU-intensive operations.
Architecture
The system consists of:
-
GitLab Webhook Receiver
- Agent-Mesh Endpoint:
/webhooks/gitlab/group/:groupId(handles group-level webhooks) - Legacy Endpoint:
/api/webhooks/gitlab(project-level webhooks) - Location:
common_npm/agent-mesh/src/server.ts - Validates GitLab webhook signatures (optional for test requests)
- Accepts test requests (empty payloads) and returns 200 OK
- Routes MR events to MR Analysis Service via duo-gateway
- Agent-Mesh Endpoint:
-
MR Analysis Service (
agent-buildkit/src/services/mr-analysis.service.ts)- Detects MR issues: conflicts, failing pipelines, review comments, code quality issues
- Classifies issue types and routes to appropriate agents
- Returns structured analysis with recommended agents
-
MR Remediation Orchestrator (
agent-buildkit/src/services/mr-remediation-orchestrator.service.ts)- Orchestrates multi-agent MR fix workflow
- Coordinates parallel and sequential agent execution
- Uses agent-mesh (common_npm/agent-mesh) for A2A communication
- Each agent communicates via A2A protocol with structured messages
-
Webhook Manager (
agent-buildkit/src/infrastructure/webhook-manager.service.ts)- Manages GitLab webhook lifecycle
- Auto-creates/updates webhooks via GitLab API
- Health checks and automatic re-registration
-
Service Account Integration (
agent-buildkit/src/services/gitlab/service-account-token.service.ts)- Manages GitLab service account tokens per agent
- Each agent uses its own service account (from OSSA manifests)
- Token rotation and refresh
- Agents appear as real developers in GitLab (username, email, display_name)
-
Activity Stream Service (
agent-buildkit/src/services/gitlab/gitlab-activity-stream.service.ts)- Posts GitLab issue/MR comments for each agent action
- Creates activity stream showing agent-to-agent communication
- Tracks agent actions: task assignment, status updates, results
- Format:
@agent-name @target-agent: [action] - [details]
Agent Workflow
GitLab Webhook MR Analysis Task Dispatcher Agent Chain Vast.ai Execution GitLab Update
Activity Stream (GitLab Comments)
Agent-to-Agent Communication Flow
The system uses agent-mesh for A2A communication with full activity stream visibility:
- Task Assignment:
@task-dispatcher @mr-reviewer: Assigned review task for !123 - Agent Action:
@mr-reviewer: Completed review - Found 3 issues, 2 auto-fixable - Agent Result:
@pipeline-remediation: Fixed pipeline failure in jobtest:unit`` - A2A Message:
@vulnerability-scanner @mr-reviewer: Security scan complete - 0 vulnerabilities - Status Update:
@task-dispatcher: All agents completed - MR ready for merge
Agent Mapping
| Issue Type | Primary Agent | Supporting Agents | Vast.ai Required |
|---|---|---|---|
| Conflicts | Task Dispatcher MR Reviewer | Pipeline Remediation | No |
| Pipeline Failures | Pipeline Remediation | Code Quality Reviewer | Maybe |
| Review Comments | MR Reviewer | Code Quality Reviewer, Drupal Standards Checker | No |
| Code Quality | Code Quality Reviewer | OSSA Validator | No |
| Security Issues | Vulnerability Scanner | MR Reviewer | Yes |
| Missing Modules | Module Generator | Recipe Publisher | No |
| Recipe Issues | Recipe Publisher | Module Generator | No |
| Release Coordination | Release Coordinator | Issue Lifecycle Manager | No |
| Documentation | Documentation Aggregator | - | No |
| Infrastructure | Cluster Operator | Kagent Catalog Sync | Maybe |
| Cost Monitoring | Cost Intelligence Monitor | - | No |
Service Account Integration
- Per-Agent Tokens: Each agent uses its own GitLab service account token
- Service Account Mapping: Load from OSSA manifests (registry.yaml)
- Example:
pipeline-remediationuses service account ID31840513 - Example:
merge-request-revieweruses usernamemerge-request-reviewer
- Example:
- GitLab Identity: Agents appear as real developers in GitLab
- Username: from
service_account.usernamein manifest - Email: from
service_account.emailin manifest - Display Name: from
service_account.display_namein manifest
- Username: from
- Token Management:
- Load tokens from env vars (per agent:
GITLAB_{AGENT_NAME}_TOKEN) - Fallback to vault:
secret/agents/{agent-name}/gitlab_token - Auto-refresh and rotation per agent's rotation policy
- Load tokens from env vars (per agent:
Activity Stream
- GitLab Comments: Each agent action posts a GitLab comment using the agent's own service account token
- MR Comments: For MR-related actions (fixes, reviews, rebases) - posted directly to MR
- Issue Comments: For issue-related actions (analysis, implementation) - posted to related issue
- A2A Communication: Agent-to-agent messages posted as threaded comments showing the communication flow
- Comment Format:
- Task Assignment:
@task-dispatcher @mr-reviewer: Assigned MR review task for !123 - Agent Action:
@mr-reviewer: Completed review - Found 3 issues, 2 auto-fixable - Agent Result:
@pipeline-remediation: Fixed pipeline failure in jobtest:unit`` - A2A Message:
@vulnerability-scanner @mr-reviewer: Security scan complete - 0 vulnerabilities - Status Update:
@task-dispatcher: All agents completed - MR ready for merge
- Task Assignment:
- Threading: Related comments are threaded (reply to parent comment) to show conversation flow
- Agent Attribution: Each comment shows which agent posted it (via service account username)
- Timestamps: All comments include timestamps for full audit trail
- Aggregation: Activity stream service aggregates all agent actions per MR/issue
- Visibility: Full audit trail in GitLab UI, searchable and filterable
Implementation Files
| Path | Description |
|---|---|
agent-buildkit/src/servers/gitlab-webhook-server.ts | GitLab webhook handler |
agent-buildkit/src/infrastructure/webhook-manager.service.ts | Webhook lifecycle management |
agent-buildkit/src/services/mr-analysis.service.ts | MR issue detection |
agent-buildkit/src/services/mr-remediation-orchestrator.service.ts | Multi-agent orchestration |
agent-buildkit/src/infrastructure/gitlab-service-account-manager.service.ts | Per-agent token management |
agent-buildkit/src/services/gitlab/gitlab-activity-stream.service.ts | Activity stream via GitLab comments |
agent-buildkit/src/integrations/gitlab-mr-fix.service.ts | MR operations with service accounts |
agent-buildkit/src/infrastructure/vastai-agent-deployer.service.ts | Vast.ai deployment |
agent-buildkit/src/services/cost-aware-agent-router.service.ts | Cost-aware routing |
agent-buildkit/src/services/mr-fix-metrics.service.ts | Metrics and monitoring |
Cost Management
- Use
vastai-cost-enforcement.service.tsfor admission control - Set budgets per trigger ID (e.g.,
mr-auto-fix-{projectId}) - Prefer local execution for non-GPU tasks
- Only use vast.ai for: vulnerability scanning, complex code analysis, GPU-accelerated operations
- Monitor costs via
cost-intelligence-monitoragent
Security & Compliance
- Webhook signature verification (required)
- Service account token rotation (per-agent rotation policies)
- Audit logging for all MR modifications (via activity stream)
- Approval gates for production branches
- Rate limiting per project/MR
- Each agent action attributed to its service account (full traceability)
Files
| Path | Description |
|---|---|
src/webhook-server.ts | Minimal webhook server |
src/services/webhook/webhook.service.ts | CRUD + management service |
src/cli/commands/webhook.command.ts | CLI commands |
openapi/webhook-management.openapi.yaml | OpenAPI specification |
Related Issues
Security
- HMAC SHA-256 signature verification
- Token stored in
~/.tokens/gitlab-webhook-secret - SSL verification enabled
- Rate limiting (100 req/min per project)
Group Webhook Endpoint Fix (2026-01-08)
Issue: GitLab group webhook failing with "Ensure the project has merge requests" error
Root Cause:
- Webhook URL:
https://mesh.bluefly.internal/webhooks/gitlab/group/blueflyio - Agent-mesh only had
/api/webhooks/gitlabendpoint - GitLab test requests were failing because endpoint didn't exist
Fix Applied:
- Added
/webhooks/gitlab/group/:groupIdroute tocommon_npm/agent-mesh/src/server.ts - Endpoint now accepts test requests (empty payloads) and returns 200 OK
- Made token verification optional for test requests
- Routes real events properly via duo-gateway
Deployment:
- Fix committed to
feature/cli-tunnel-commandsbranch - Will auto-deploy when merged to
release/v0.1.xormain - Webhook will work once agent-mesh is redeployed
Verification:
# Test the endpoint curl -X POST https://mesh.bluefly.internal/webhooks/gitlab/group/blueflyio \ -H "Content-Type: application/json" \ -d '{}' # Should return: {"status":"ok","message":"Webhook endpoint is active"}
Troubleshooting
502 Bad Gateway Error
Symptoms:
- GitLab webhook returns "Hook executed successfully but returned HTTP 502"
- Cloudflare shows "Bad gateway" error page
- Error mentions
api.blueflyagents.com | 502: Bad gateway
Diagnosis Steps:
# 1. Check if webhook server is running curl -s http://localhost:3847/health # Should return: {"status":"healthy","timestamp":"..."} # 2. Check what's listening on port 3000 lsof -i :3001 # 3. Check cloudflared tunnel status cloudflared tunnel list # 4. Check local cloudflared config cat ~/.cloudflared/config.yml # Should show: service: http://localhost:3847
Common Causes:
| Cause | Solution |
|---|---|
| Webhook server not running | Start it: npx tsx webhook-server.ts or buildkit webhook start |
| Cloudflare Dashboard has wrong service URL | Fix in dashboard: set to http://localhost:3847 |
| cloudflared not running | Start it: cloudflared tunnel run agent-webhook |
| Dashboard overrides local config | Check Cloudflare Zero Trust Tunnels Public Hostnames |
The Most Common Issue (December 2025):
Cloudflare Dashboard remotely configured wrong service URL:
- Wrong:
http://f6da7bdf-...cfargotunnel.com(tunnel-to-tunnel loop) - Correct:
http://localhost:3847
Fix: Go to Cloudflare Dashboard Tunnels Edit Public Hostnames Change service to http://localhost:3847
Tunnel not connecting
# Check tunnel status cloudflared tunnel info agent-webhook # Recreate if needed cloudflared tunnel delete agent-webhook cloudflared tunnel create agent-webhook cloudflared tunnel route dns agent-webhook api.blueflyagents.com
Webhook returning 401
- Check secret matches:
cat ~/.tokens/gitlab-webhook-secret - Verify GitLab webhook token matches
- Restart server with correct secret
Port already in use
lsof -i :3001 kill -9 <PID>
Preflight Checklist (Regression Prevention)
Run this checklist anytime webhooks break.
Cloudflare Tunnel Checks
- Tunnel service URL =
http://localhost:3847 - No
cfargotunnel.comtargets - No Tailscale hostnames referenced
- Tunnel is connected (
cloudflared tunnel list) -
cloudflaredrunning as service - Origin service bound to
127.0.0.1:3001
DNS / Domain Checks
-
api.blueflyagents.comresolves to Cloudflare (not public IP) - No DDNS domains used for production ingress
- No router port forwards exist
Tailscale Checks
- Tailscale Serve is tailnet-only (no Funnel for webhooks)
- GitLab is NOT referenced anywhere in Tailscale config
- Tailscale and Cloudflare are separate planes
Mental Model Check
Ask: "If my public IP changed every 30 seconds, would this still work?"
- If NO something is misconfigured
- If YES architecture is correct