cicd analytics

CI/CD Analytics in GitLab

Overview

GitLab CI/CD Analytics provides comprehensive insights into pipeline performance, success rates, and efficiency metrics. These analytics help teams identify bottlenecks, optimize build times, and improve overall DevOps performance.

What is CI/CD Analytics?

CI/CD Analytics enables you to:

Track pipeline performance: Monitor execution times and success rates
Identify bottlenecks: Find slow jobs and stages
Optimize efficiency: Reduce pipeline duration and costs
Monitor trends: Track improvements over time
Correlate with deployments: Link CI/CD performance to releases

Accessing CI/CD Analytics

Project-Level Analytics

Navigate to: Analyze CI/CD analytics

Group-Level Analytics

Navigate to: Group Analyze CI/CD analytics

Key Metrics

1. Pipeline Statistics

Track overall pipeline health:

Pipeline Overview (Last 30 days)

 Total Pipelines: 1,234                  
 Success Rate: 87.5%                     
 Failure Rate: 12.5%                     
 Median Duration: 8m 45s                 
 P95 Duration: 15m 30s

Metrics Included:

Total pipeline runs: Count of all pipelines
Success rate: Percentage of successful pipelines
Failure rate: Percentage of failed pipelines
Duration statistics: Median and 95th percentile

2. Pipeline Duration Trends

Visualize performance over time:

Pipeline Duration Over Time
20m                             
18m                          
16m                       
14m                    
12m                 
10m              
 8m           
 6m        
 4m     
 2m 
    
    Week 1  Week 2  Week 3  Week 4

Median (P50): 8m 45s
P95: 15m 30s
Trend: +15% increase

3. Success/Failure Rates

Track reliability:

Pipeline Success Rate
100%  
 90%  
 80%  
 70% 
 60% 
     
     Jan    Feb    Mar    Apr    May

Success: 87.5%
Failures: 12.5%
  - Build errors: 5.5%
  - Test failures: 4.0%
  - Deployment errors: 3.0%

Enhanced Analytics with ClickHouse

GitLab 18.0+ uses ClickHouse for improved analytics performance.

Benefits

Faster queries: Sub-second response times
Larger datasets: Analyze years of history
Complex aggregations: Multi-dimensional analysis
Real-time updates: Near-instantaneous data

Example Queries

-- Pipeline duration by branch
SELECT
  ref as branch,
  count() as pipeline_count,
  avg(duration) as avg_duration_sec,
  quantile(0.95)(duration) as p95_duration_sec
FROM ci_pipelines
WHERE created_at >= now() - INTERVAL 30 DAY
GROUP BY branch
ORDER BY pipeline_count DESC
LIMIT 10;

-- Failure rate by stage
SELECT
  stage,
  countIf(status = 'failed') as failures,
  count() as total,
  round(failures / total * 100, 2) as failure_rate_pct
FROM ci_jobs
WHERE created_at >= now() - INTERVAL 7 DAY
GROUP BY stage
ORDER BY failure_rate_pct DESC;

-- Most time-consuming jobs
SELECT
  name,
  count() as executions,
  avg(duration) as avg_duration_sec,
  sum(duration) as total_duration_sec
FROM ci_jobs
WHERE created_at >= now() - INTERVAL 30 DAY
GROUP BY name
ORDER BY total_duration_sec DESC
LIMIT 20;

Filtering and Segmentation

Filter by Pipeline Trigger

Analyze pipelines by trigger source:

Push: Commits to branches
Merge request: MR pipelines
Schedule: Scheduled pipelines
Manual: Manually triggered
API: API-triggered pipelines
Web: GitLab UI triggers

-- Performance by trigger source
SELECT
  source,
  count() as pipeline_count,
  avg(duration) as avg_duration_sec,
  countIf(status = 'success') / count() as success_rate
FROM ci_pipelines
WHERE created_at >= now() - INTERVAL 30 DAY
GROUP BY source
ORDER BY pipeline_count DESC;

Filter by Branch

Compare pipeline performance across branches:

-- Main vs. feature branch performance
SELECT
  CASE
    WHEN ref IN ('main', 'master', 'development') THEN 'Protected'
    ELSE 'Feature'
  END as branch_type,
  count() as pipeline_count,
  avg(duration) as avg_duration_sec,
  countIf(status = 'failed') / count() as failure_rate
FROM ci_pipelines
WHERE created_at >= now() - INTERVAL 30 DAY
GROUP BY branch_type;

Filter by Date Range

Analyze specific time periods:

Last 7 days
Last 30 days
Last 90 days
Custom date range

Job-Level Analytics

Job Performance

Identify slow jobs:

Top 10 Slowest Jobs (Last 30 days)

 Job Name              Avg Duration  Runs     

 build:production      12m 45s       234      
 test:integration      8m 30s        456      
 security:sast         7m 15s        234      
 test:e2e              6m 50s        456      
 deploy:production     5m 30s        89       
 build:docker          4m 45s        234      
 test:unit             3m 20s        456      
 lint:code             2m 10s        456      
 security:dependency   1m 45s        234      
 deploy:staging        1m 30s        178

Job Failure Analysis

Find unreliable jobs:

-- Jobs with highest failure rates
SELECT
  name,
  count() as total_runs,
  countIf(status = 'failed') as failures,
  round(failures / total_runs * 100, 2) as failure_rate_pct,
  groupArray(failure_reason) as common_reasons
FROM ci_jobs
WHERE created_at >= now() - INTERVAL 30 DAY
  AND status IN ('failed', 'success')
GROUP BY name
HAVING failure_rate_pct > 5
ORDER BY failure_rate_pct DESC
LIMIT 10;

Stage-Level Analytics

Stage Performance

Analyze pipeline stages:

Stage Performance (Last 30 days)

 Stage          Avg Duration  Success Rate    

 build          5m 30s        95.2%           
 test           8m 45s        87.5%           
 security       4m 20s        92.1%           
 deploy         3m 15s        98.5%           


Bottleneck: test stage (longest duration)
Unreliable: test stage (lowest success rate)

Stage Optimization

Optimize slow stages:

# Before: Sequential tests (slow)
test:
  stage: test
  script:
    - npm run test:unit      # 2 minutes
    - npm run test:integration  # 5 minutes
    - npm run test:e2e       # 8 minutes
  # Total: 15 minutes

# After: Parallel tests (fast)
test:unit:
  stage: test
  script:
    - npm run test:unit
  # 2 minutes

test:integration:
  stage: test
  script:
    - npm run test:integration
  # 5 minutes

test:e2e:
  stage: test
  script:
    - npm run test:e2e
  # 8 minutes

# Total: 8 minutes (parallel execution)

Pipeline Efficiency Metrics

1. Cycle Time

Time from commit to deployment:

Cycle Time Breakdown

 Commit  Pipeline Start: 30s            
 Build Stage: 5m 30s                     
 Test Stage: 8m 45s                      
 Security Stage: 4m 20s                  
 Deploy Stage: 3m 15s                    
 Total Cycle Time: 22m 20s               


Target: < 20 minutes
Current: 22m 20s (above target)

2. Queue Time

Time waiting for runner:

-- Jobs with longest queue times
SELECT
  name,
  avg(queued_duration) as avg_queue_sec,
  max(queued_duration) as max_queue_sec,
  count() as job_count
FROM ci_jobs
WHERE created_at >= now() - INTERVAL 7 DAY
GROUP BY name
ORDER BY avg_queue_sec DESC
LIMIT 10;

3. Runner Utilization

Track runner efficiency:

-- Runner utilization
SELECT
  runner_id,
  count() as jobs_executed,
  sum(duration) as total_busy_time_sec,
  avg(duration) as avg_job_duration_sec
FROM ci_jobs
WHERE created_at >= now() - INTERVAL 7 DAY
  AND runner_id IS NOT NULL
GROUP BY runner_id
ORDER BY jobs_executed DESC;

Pipeline Optimization Strategies

1. Parallelize Jobs

Run independent jobs concurrently:

# Use 'needs' for fine-grained parallelism
build:app:
  stage: build
  script: npm run build

test:unit:
  stage: test
  needs: [build:app]
  script: npm run test:unit

test:integration:
  stage: test
  needs: [build:app]
  script: npm run test:integration

# test:unit and test:integration run in parallel

2. Cache Dependencies

Reduce installation time:

.node_cache: &node_cache
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
    policy: pull

install:
  stage: .pre
  script:
    - npm ci
  cache:
    <<: *node_cache
    policy: push

build:
  stage: build
  script:
    - npm run build
  cache:
    <<: *node_cache

3. Optimize Docker Builds

Use layer caching:

build:docker:
  stage: build
  script:
    - docker build
      --cache-from $CI_REGISTRY_IMAGE:latest
      --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
      --tag $CI_REGISTRY_IMAGE:latest
      .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - docker push $CI_REGISTRY_IMAGE:latest

4. Skip Unnecessary Jobs

Use rules to conditionally run jobs:

test:e2e:
  stage: test
  script:
    - npm run test:e2e
  rules:
    # Only run E2E tests for MRs and main branch
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"
    # Skip for feature branches
    - when: never

5. Use Faster Runners

Optimize runner configuration:

Use SSD storage
Increase CPU cores
Add more RAM
Use spot/preemptible instances for cost savings

Integration with External Tools

GitLab CI Pipelines Exporter

Export metrics to Prometheus:

# Install gitlab-ci-pipelines-exporter
# https://github.com/mvisonneau/gitlab-ci-pipelines-exporter

# Config file
gitlab:
  url: https://gitlab.com
  token: ${GITLAB_TOKEN}

projects:
  - name: my-org/my-project
    refs:
      - main
      - development

metrics:
  - name: pipeline_duration_seconds
    kind: gauge
    labels:
      - project
      - ref
      - status

Grafana Dashboards

Visualize CI/CD metrics:

{
  "dashboard": {
    "title": "CI/CD Performance",
    "panels": [
      {
        "title": "Pipeline Success Rate",
        "targets": [
          {
            "expr": "sum(rate(gitlab_ci_pipeline_status{status=\"success\"}[5m])) / sum(rate(gitlab_ci_pipeline_status[5m]))"
          }
        ]
      },
      {
        "title": "Pipeline Duration (P95)",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(gitlab_ci_pipeline_duration_seconds_bucket[5m]))"
          }
        ]
      }
    ]
  }
}

CI/CD Best Practices

1. Monitor Key Metrics

Track essential KPIs:

Build frequency: Deployments per day
Build duration: Time to complete pipeline
Build success rate: Percentage of passing pipelines
Queue time: Time waiting for runners
Recovery time: Time to fix broken builds

2. Set Performance Targets

Establish acceptable thresholds:

performance_targets:
  pipeline_duration:
    target: 15m
    acceptable: 20m
    critical: 30m
  success_rate:
    target: 95%
    acceptable: 90%
    critical: 85%
  queue_time:
    target: 30s
    acceptable: 2m
    critical: 5m

3. Regular Optimization

Schedule pipeline reviews:

Weekly: Review failed pipelines
Monthly: Analyze duration trends
Quarterly: Major optimization initiatives

4. Cost Monitoring

Track CI/CD costs:

-- Compute cost per pipeline
SELECT
  DATE(created_at) as date,
  count() as pipeline_count,
  sum(duration) / 3600 as compute_hours,
  compute_hours * 0.10 as cost_usd  -- $0.10/hour
FROM ci_pipelines
WHERE created_at >= now() - INTERVAL 30 DAY
GROUP BY date
ORDER BY date;

Alerting on CI/CD Metrics

Alert Configuration

Set up alerts for CI/CD issues:

# Prometheus alert rules
groups:
  - name: cicd_alerts
    rules:
      # High failure rate
      - alert: HighPipelineFailureRate
        expr: |
          sum(rate(gitlab_ci_pipeline_status{status="failed"}[1h]))
          /
          sum(rate(gitlab_ci_pipeline_status[1h]))
          > 0.2
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "High pipeline failure rate"
          description: "Pipeline failure rate is {{ $value | humanizePercentage }}"

      # Slow pipelines
      - alert: SlowPipelines
        expr: |
          histogram_quantile(0.95,
            rate(gitlab_ci_pipeline_duration_seconds_bucket[1h])
          ) > 1800
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Pipelines running slowly"
          description: "P95 pipeline duration is {{ $value }}s"

      # Runner queue backup
      - alert: RunnerQueueBackup
        expr: gitlab_ci_runner_jobs_queued > 10
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "CI runner queue backed up"
          description: "{{ $value }} jobs waiting in queue"

Troubleshooting Slow Pipelines

Common Issues

1. Slow Dependency Installation

Problem: npm install takes 3+ minutes

Solution:

# Cache node_modules
cache:
  key:
    files:
      - package-lock.json
  paths:
    - node_modules/

# Or use npm ci with cache
script:
  - npm ci --cache .npm --prefer-offline

2. Unnecessary Test Runs

Problem: All tests run for small changes

Solution:

# Use changed files detection
test:unit:
  script:
    - npm run test:changed
  rules:
    - changes:
        - src/**/*.js
        - test/**/*.js

3. Sequential Job Execution

Problem: Jobs run serially when they could be parallel

Solution:

# Use 'needs' for parallel execution
test:unit:
  needs: [build]

test:integration:
  needs: [build]

# Both run in parallel after build

4. Large Docker Images

Problem: Docker pulls take 2+ minutes

Solution:

# Multi-stage build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

FROM node:18-alpine
COPY --from=builder /app/node_modules ./node_modules
COPY . .
# Final image only contains production dependencies

References

DORA Metrics - DevOps performance measurement
Metrics - Prometheus metrics collection
Dashboards - Visualization and reporting
Value Stream Analytics - End-to-end workflow analysis

cicd analytics

CI/CD Analytics in GitLab

Overview

What is CI/CD Analytics?

Accessing CI/CD Analytics

Project-Level Analytics

Group-Level Analytics

Key Metrics

1. Pipeline Statistics

2. Pipeline Duration Trends

3. Success/Failure Rates

Enhanced Analytics with ClickHouse

Benefits

Example Queries

Filtering and Segmentation

Filter by Pipeline Trigger

Filter by Branch

Filter by Date Range

Job-Level Analytics

Job Performance

Job Failure Analysis

Stage-Level Analytics

Stage Performance

Stage Optimization

Pipeline Efficiency Metrics

1. Cycle Time

2. Queue Time

3. Runner Utilization

Pipeline Optimization Strategies

1. Parallelize Jobs

2. Cache Dependencies

3. Optimize Docker Builds

4. Skip Unnecessary Jobs

5. Use Faster Runners

Integration with External Tools

GitLab CI Pipelines Exporter

Grafana Dashboards

CI/CD Best Practices

1. Monitor Key Metrics

2. Set Performance Targets

3. Regular Optimization

4. Cost Monitoring

Alerting on CI/CD Metrics

Alert Configuration

Troubleshooting Slow Pipelines

Common Issues

1. Slow Dependency Installation

2. Unnecessary Test Runs

3. Sequential Job Execution

4. Large Docker Images

References

Related Documentation