Skip to main content

cicd analytics

CI/CD Analytics in GitLab

Overview

GitLab CI/CD Analytics provides comprehensive insights into pipeline performance, success rates, and efficiency metrics. These analytics help teams identify bottlenecks, optimize build times, and improve overall DevOps performance.

What is CI/CD Analytics?

CI/CD Analytics enables you to:

  • Track pipeline performance: Monitor execution times and success rates
  • Identify bottlenecks: Find slow jobs and stages
  • Optimize efficiency: Reduce pipeline duration and costs
  • Monitor trends: Track improvements over time
  • Correlate with deployments: Link CI/CD performance to releases

Accessing CI/CD Analytics

Project-Level Analytics

Navigate to: Analyze CI/CD analytics

Group-Level Analytics

Navigate to: Group Analyze CI/CD analytics

Key Metrics

1. Pipeline Statistics

Track overall pipeline health:

Pipeline Overview (Last 30 days)

 Total Pipelines: 1,234                  
 Success Rate: 87.5%                     
 Failure Rate: 12.5%                     
 Median Duration: 8m 45s                 
 P95 Duration: 15m 30s                   

Metrics Included:

  • Total pipeline runs: Count of all pipelines
  • Success rate: Percentage of successful pipelines
  • Failure rate: Percentage of failed pipelines
  • Duration statistics: Median and 95th percentile

Visualize performance over time:

Pipeline Duration Over Time
20m                             
18m                          
16m                       
14m                    
12m                 
10m              
 8m           
 6m        
 4m     
 2m 
    
    Week 1  Week 2  Week 3  Week 4

Median (P50): 8m 45s
P95: 15m 30s
Trend: +15% increase 

3. Success/Failure Rates

Track reliability:

Pipeline Success Rate
100%  
 90%  
 80%  
 70% 
 60% 
     
     Jan    Feb    Mar    Apr    May

Success: 87.5%
Failures: 12.5%
  - Build errors: 5.5%
  - Test failures: 4.0%
  - Deployment errors: 3.0%

Enhanced Analytics with ClickHouse

GitLab 18.0+ uses ClickHouse for improved analytics performance.

Benefits

  • Faster queries: Sub-second response times
  • Larger datasets: Analyze years of history
  • Complex aggregations: Multi-dimensional analysis
  • Real-time updates: Near-instantaneous data

Example Queries

-- Pipeline duration by branch SELECT ref as branch, count() as pipeline_count, avg(duration) as avg_duration_sec, quantile(0.95)(duration) as p95_duration_sec FROM ci_pipelines WHERE created_at >= now() - INTERVAL 30 DAY GROUP BY branch ORDER BY pipeline_count DESC LIMIT 10; -- Failure rate by stage SELECT stage, countIf(status = 'failed') as failures, count() as total, round(failures / total * 100, 2) as failure_rate_pct FROM ci_jobs WHERE created_at >= now() - INTERVAL 7 DAY GROUP BY stage ORDER BY failure_rate_pct DESC; -- Most time-consuming jobs SELECT name, count() as executions, avg(duration) as avg_duration_sec, sum(duration) as total_duration_sec FROM ci_jobs WHERE created_at >= now() - INTERVAL 30 DAY GROUP BY name ORDER BY total_duration_sec DESC LIMIT 20;

Filtering and Segmentation

Filter by Pipeline Trigger

Analyze pipelines by trigger source:

  • Push: Commits to branches
  • Merge request: MR pipelines
  • Schedule: Scheduled pipelines
  • Manual: Manually triggered
  • API: API-triggered pipelines
  • Web: GitLab UI triggers
-- Performance by trigger source SELECT source, count() as pipeline_count, avg(duration) as avg_duration_sec, countIf(status = 'success') / count() as success_rate FROM ci_pipelines WHERE created_at >= now() - INTERVAL 30 DAY GROUP BY source ORDER BY pipeline_count DESC;

Filter by Branch

Compare pipeline performance across branches:

-- Main vs. feature branch performance SELECT CASE WHEN ref IN ('main', 'master', 'development') THEN 'Protected' ELSE 'Feature' END as branch_type, count() as pipeline_count, avg(duration) as avg_duration_sec, countIf(status = 'failed') / count() as failure_rate FROM ci_pipelines WHERE created_at >= now() - INTERVAL 30 DAY GROUP BY branch_type;

Filter by Date Range

Analyze specific time periods:

  • Last 7 days
  • Last 30 days
  • Last 90 days
  • Custom date range

Job-Level Analytics

Job Performance

Identify slow jobs:

Top 10 Slowest Jobs (Last 30 days)

 Job Name              Avg Duration  Runs     

 build:production      12m 45s       234      
 test:integration      8m 30s        456      
 security:sast         7m 15s        234      
 test:e2e              6m 50s        456      
 deploy:production     5m 30s        89       
 build:docker          4m 45s        234      
 test:unit             3m 20s        456      
 lint:code             2m 10s        456      
 security:dependency   1m 45s        234      
 deploy:staging        1m 30s        178      

Job Failure Analysis

Find unreliable jobs:

-- Jobs with highest failure rates SELECT name, count() as total_runs, countIf(status = 'failed') as failures, round(failures / total_runs * 100, 2) as failure_rate_pct, groupArray(failure_reason) as common_reasons FROM ci_jobs WHERE created_at >= now() - INTERVAL 30 DAY AND status IN ('failed', 'success') GROUP BY name HAVING failure_rate_pct > 5 ORDER BY failure_rate_pct DESC LIMIT 10;

Stage-Level Analytics

Stage Performance

Analyze pipeline stages:

Stage Performance (Last 30 days)

 Stage          Avg Duration  Success Rate    

 build          5m 30s        95.2%           
 test           8m 45s        87.5%           
 security       4m 20s        92.1%           
 deploy         3m 15s        98.5%           


Bottleneck: test stage (longest duration)
Unreliable: test stage (lowest success rate)

Stage Optimization

Optimize slow stages:

# Before: Sequential tests (slow) test: stage: test script: - npm run test:unit # 2 minutes - npm run test:integration # 5 minutes - npm run test:e2e # 8 minutes # Total: 15 minutes # After: Parallel tests (fast) test:unit: stage: test script: - npm run test:unit # 2 minutes test:integration: stage: test script: - npm run test:integration # 5 minutes test:e2e: stage: test script: - npm run test:e2e # 8 minutes # Total: 8 minutes (parallel execution)

Pipeline Efficiency Metrics

1. Cycle Time

Time from commit to deployment:

Cycle Time Breakdown

 Commit  Pipeline Start: 30s            
 Build Stage: 5m 30s                     
 Test Stage: 8m 45s                      
 Security Stage: 4m 20s                  
 Deploy Stage: 3m 15s                    
 Total Cycle Time: 22m 20s               


Target: < 20 minutes
Current: 22m 20s (above target) 

2. Queue Time

Time waiting for runner:

-- Jobs with longest queue times SELECT name, avg(queued_duration) as avg_queue_sec, max(queued_duration) as max_queue_sec, count() as job_count FROM ci_jobs WHERE created_at >= now() - INTERVAL 7 DAY GROUP BY name ORDER BY avg_queue_sec DESC LIMIT 10;

3. Runner Utilization

Track runner efficiency:

-- Runner utilization SELECT runner_id, count() as jobs_executed, sum(duration) as total_busy_time_sec, avg(duration) as avg_job_duration_sec FROM ci_jobs WHERE created_at >= now() - INTERVAL 7 DAY AND runner_id IS NOT NULL GROUP BY runner_id ORDER BY jobs_executed DESC;

Pipeline Optimization Strategies

1. Parallelize Jobs

Run independent jobs concurrently:

# Use 'needs' for fine-grained parallelism build:app: stage: build script: npm run build test:unit: stage: test needs: [build:app] script: npm run test:unit test:integration: stage: test needs: [build:app] script: npm run test:integration # test:unit and test:integration run in parallel

2. Cache Dependencies

Reduce installation time:

.node_cache: &node_cache cache: key: files: - package-lock.json paths: - node_modules/ policy: pull install: stage: .pre script: - npm ci cache: <<: *node_cache policy: push build: stage: build script: - npm run build cache: <<: *node_cache

3. Optimize Docker Builds

Use layer caching:

build:docker: stage: build script: - docker build --cache-from $CI_REGISTRY_IMAGE:latest --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA --tag $CI_REGISTRY_IMAGE:latest . - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA - docker push $CI_REGISTRY_IMAGE:latest

4. Skip Unnecessary Jobs

Use rules to conditionally run jobs:

test:e2e: stage: test script: - npm run test:e2e rules: # Only run E2E tests for MRs and main branch - if: $CI_PIPELINE_SOURCE == "merge_request_event" - if: $CI_COMMIT_BRANCH == "main" # Skip for feature branches - when: never

5. Use Faster Runners

Optimize runner configuration:

  • Use SSD storage
  • Increase CPU cores
  • Add more RAM
  • Use spot/preemptible instances for cost savings

Integration with External Tools

GitLab CI Pipelines Exporter

Export metrics to Prometheus:

# Install gitlab-ci-pipelines-exporter # https://github.com/mvisonneau/gitlab-ci-pipelines-exporter # Config file gitlab: url: https://gitlab.com token: ${GITLAB_TOKEN} projects: - name: my-org/my-project refs: - main - development metrics: - name: pipeline_duration_seconds kind: gauge labels: - project - ref - status

Grafana Dashboards

Visualize CI/CD metrics:

{ "dashboard": { "title": "CI/CD Performance", "panels": [ { "title": "Pipeline Success Rate", "targets": [ { "expr": "sum(rate(gitlab_ci_pipeline_status{status=\"success\"}[5m])) / sum(rate(gitlab_ci_pipeline_status[5m]))" } ] }, { "title": "Pipeline Duration (P95)", "targets": [ { "expr": "histogram_quantile(0.95, rate(gitlab_ci_pipeline_duration_seconds_bucket[5m]))" } ] } ] } }

CI/CD Best Practices

1. Monitor Key Metrics

Track essential KPIs:

  • Build frequency: Deployments per day
  • Build duration: Time to complete pipeline
  • Build success rate: Percentage of passing pipelines
  • Queue time: Time waiting for runners
  • Recovery time: Time to fix broken builds

2. Set Performance Targets

Establish acceptable thresholds:

performance_targets: pipeline_duration: target: 15m acceptable: 20m critical: 30m success_rate: target: 95% acceptable: 90% critical: 85% queue_time: target: 30s acceptable: 2m critical: 5m

3. Regular Optimization

Schedule pipeline reviews:

  • Weekly: Review failed pipelines
  • Monthly: Analyze duration trends
  • Quarterly: Major optimization initiatives

4. Cost Monitoring

Track CI/CD costs:

-- Compute cost per pipeline SELECT DATE(created_at) as date, count() as pipeline_count, sum(duration) / 3600 as compute_hours, compute_hours * 0.10 as cost_usd -- $0.10/hour FROM ci_pipelines WHERE created_at >= now() - INTERVAL 30 DAY GROUP BY date ORDER BY date;

Alerting on CI/CD Metrics

Alert Configuration

Set up alerts for CI/CD issues:

# Prometheus alert rules groups: - name: cicd_alerts rules: # High failure rate - alert: HighPipelineFailureRate expr: | sum(rate(gitlab_ci_pipeline_status{status="failed"}[1h])) / sum(rate(gitlab_ci_pipeline_status[1h])) > 0.2 for: 30m labels: severity: warning annotations: summary: "High pipeline failure rate" description: "Pipeline failure rate is {{ $value | humanizePercentage }}" # Slow pipelines - alert: SlowPipelines expr: | histogram_quantile(0.95, rate(gitlab_ci_pipeline_duration_seconds_bucket[1h]) ) > 1800 for: 1h labels: severity: warning annotations: summary: "Pipelines running slowly" description: "P95 pipeline duration is {{ $value }}s" # Runner queue backup - alert: RunnerQueueBackup expr: gitlab_ci_runner_jobs_queued > 10 for: 15m labels: severity: warning annotations: summary: "CI runner queue backed up" description: "{{ $value }} jobs waiting in queue"

Troubleshooting Slow Pipelines

Common Issues

1. Slow Dependency Installation

Problem: npm install takes 3+ minutes

Solution:

# Cache node_modules cache: key: files: - package-lock.json paths: - node_modules/ # Or use npm ci with cache script: - npm ci --cache .npm --prefer-offline

2. Unnecessary Test Runs

Problem: All tests run for small changes

Solution:

# Use changed files detection test:unit: script: - npm run test:changed rules: - changes: - src/**/*.js - test/**/*.js

3. Sequential Job Execution

Problem: Jobs run serially when they could be parallel

Solution:

# Use 'needs' for parallel execution test:unit: needs: [build] test:integration: needs: [build] # Both run in parallel after build

4. Large Docker Images

Problem: Docker pulls take 2+ minutes

Solution:

# Multi-stage build FROM node:18-alpine AS builder WORKDIR /app COPY package*.json ./ RUN npm ci --only=production FROM node:18-alpine COPY --from=builder /app/node_modules ./node_modules COPY . . # Final image only contains production dependencies

References