Skip to main content

monitoring

Cost Monitoring and Analytics

Overview

Continuous monitoring ensures optimizations remain effective and identifies new opportunities for cost reduction.


Key Metrics to Track

1. Total Compute Minutes per Month

Target: Consistent or decreasing

Track:

# Monthly usage glab api /namespaces/:id/ci_minutes | jq '.minutes_used' # Percentage of quota glab api /namespaces/:id/ci_minutes | \ jq '(.minutes_used / .monthly_minutes_limit) * 100'

Trend Analysis:

# Save monthly (automate this) echo "$(date +%Y-%m),$(glab api /namespaces/:id/ci_minutes | jq '.minutes_used')" >> ci-usage.csv # Plot with gnuplot or similar mlr --csv stats1 -a mean,min,max -f 2 ci-usage.csv

2. Top Projects by Usage

Identify cost centers:

# Get top 10 projects glab api /groups/:id/usage_stats | \ jq -r '.projects[] | "\(.ci_minutes)\t\(.name)"' | \ sort -rn | head -10

Export to CSV:

glab api /groups/:id/usage_stats | \ jq -r '.projects[] | [.name, .ci_minutes] | @csv' > project-usage.csv # Analyze with Miller mlr --csv --opprint \ sort -nr ci_minutes then \ head -n 20 project-usage.csv

Track over time:

# Average duration last 30 days glab api "/projects/:id/pipelines?per_page=100" | \ jq '[.[] | select(.status == "success") | .duration] | add / length / 60' # By branch glab api "/projects/:id/pipelines?per_page=100" | \ jq -r '.[] | "\(.ref)\t\(.duration / 60)"' | \ mlr --tsv stats1 -a mean,p95 -f 2 -g 1

4. Job Failure Rate

Wasted minutes from failures:

# Failed job percentage glab api "/projects/:id/jobs?per_page=100" | \ jq '[.[] | .status] | group_by(.) | map({status: .[0], count: length})' # Cost of failures glab api "/projects/:id/jobs?per_page=100" | \ jq '[.[] | select(.status == "failed") | (.duration / 60)] | add'

5. Cache Hit Rate

Effectiveness of caching:

# Add to .gitlab-ci.yml test: before_script: - | if [ -d "node_modules" ]; then echo "CACHE_HIT=true" >> metrics.env else echo "CACHE_HIT=false" >> metrics.env fi artifacts: reports: dotenv: metrics.env # Analyze cache hits glab api "/projects/:id/jobs?per_page=100" | \ jq '[.[] | select(.name == "test")] | map(select(.trace | contains("CACHE_HIT=true"))) | length'

6. Cost per Deploy

Efficiency metric:

# Minutes consumed per successful deployment DEPLOY_COUNT=$(glab api "/projects/:id/deployments?per_page=100" | jq 'length') TOTAL_MINUTES=$(glab api /namespaces/:id/ci_minutes | jq '.minutes_used') echo "scale=2; $TOTAL_MINUTES / $DEPLOY_COUNT" | bc

Automated Monitoring Dashboard

GitLab CI Job for Tracking

Create monitoring job:

# .gitlab-ci.yml monitor:usage: image: alpine:latest stage: .post rules: - if: $CI_PIPELINE_SOURCE == "schedule" # Daily before_script: - apk add --no-cache curl jq script: - | echo " CI/CD Cost Report - $(date)" echo "================================" # Get usage data USAGE=$(curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$CI_API_V4_URL/namespaces/$CI_PROJECT_NAMESPACE_ID/ci_minutes" | jq '.minutes_used') QUOTA=$(curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$CI_API_V4_URL/namespaces/$CI_PROJECT_NAMESPACE_ID/ci_minutes" | jq '.monthly_minutes_limit') PERCENT=$((100 * USAGE / QUOTA)) echo "Usage: $USAGE / $QUOTA minutes ($PERCENT%)" # Alert if high if [ $PERCENT -gt 75 ]; then echo " WARNING: High usage detected!" curl -X POST $SLACK_WEBHOOK_URL \ -H 'Content-Type: application/json' \ -d "{\"text\":\" CI Minutes at $PERCENT% of quota\"}" fi # Top projects echo "" echo "Top 5 Projects by Usage:" curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$CI_API_V4_URL/groups/$CI_PROJECT_NAMESPACE_ID/usage_stats" | \ jq -r '.projects[] | "\(.name): \(.ci_minutes) min"' | \ sort -t: -k2 -rn | head -5 # Save to artifact for trending echo "$USAGE,$QUOTA,$PERCENT,$(date +%Y-%m-%d)" >> usage-history.csv artifacts: paths: - usage-history.csv expire_in: 1 year

Schedule daily:

  • Navigate to: CI/CD Schedules
  • Add schedule: 0 9 * * * (daily at 9 AM)
  • Target branch: main

Slack Notifications

Alert on milestones:

notify:slack: image: alpine:latest stage: .post rules: - if: $CI_PIPELINE_SOURCE == "schedule" script: - | USAGE=$(curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$CI_API_V4_URL/namespaces/$CI_PROJECT_NAMESPACE_ID/ci_minutes" | jq '.minutes_used') QUOTA=$(curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$CI_API_V4_URL/namespaces/$CI_PROJECT_NAMESPACE_ID/ci_minutes" | jq '.monthly_minutes_limit') PERCENT=$((100 * USAGE / QUOTA)) MESSAGE="" if [ $PERCENT -ge 90 ]; then MESSAGE=" CRITICAL: CI minutes at $PERCENT% ($USAGE/$QUOTA)" elif [ $PERCENT -ge 75 ]; then MESSAGE=" WARNING: CI minutes at $PERCENT% ($USAGE/$QUOTA)" elif [ $PERCENT -ge 50 ]; then MESSAGE=" INFO: CI minutes at $PERCENT% ($USAGE/$QUOTA)" fi if [ -n "$MESSAGE" ]; then curl -X POST $SLACK_WEBHOOK_URL \ -H 'Content-Type: application/json' \ -d "{ \"text\": \"$MESSAGE\", \"attachments\": [{ \"color\": \"$([ $PERCENT -ge 90 ] && echo 'danger' || echo 'warning')\", \"fields\": [ {\"title\": \"Used\", \"value\": \"$USAGE min\", \"short\": true}, {\"title\": \"Quota\", \"value\": \"$QUOTA min\", \"short\": true} ] }] }" fi

GitLab Ultimate Analytics

CI/CD Analytics Dashboard

Navigate to: Group Analytics CI/CD Analytics

Metrics Available:

  • Pipeline duration trends
  • Success/failure rates
  • DORA metrics (deployment frequency, lead time, MTTR)
  • Job duration by stage

Value Stream Analytics

Navigate to: Group Analytics Value Stream

Track:

  • Issue to deploy time
  • Code review time
  • Testing time
  • Deployment time

Use for: Identifying bottlenecks that waste CI minutes

Usage Quotas Page

Navigate to: Group Settings Usage Quotas Pipelines

Shows:

  • Monthly compute usage
  • Projects sorted by usage
  • Storage usage (artifacts)
  • Minutes consumed per day (graph)

Custom Dashboards with Prometheus/Grafana

GitLab Runner Metrics

Enable Prometheus metrics on runners:

config.toml:

listen_address = ":9252"

Prometheus scrape config:

scrape_configs: - job_name: 'gitlab-runner' static_configs: - targets: - 'runner1.example.com:9252' - 'runner2.example.com:9252'

Key Metrics to Scrape

Runner metrics:

gitlab_runner_jobs{state="running"}
gitlab_runner_jobs{state="failed"}
gitlab_runner_job_duration_seconds

Project metrics (via API exporter):

gitlab_project_pipeline_duration_seconds
gitlab_project_pipeline_status
gitlab_ci_minutes_used

Grafana Dashboard

Panels to include:

  1. CI Minute Usage (Gauge)
gitlab_ci_minutes_used / gitlab_ci_minutes_quota * 100
  1. Usage Trend (Graph)
rate(gitlab_ci_minutes_used[1d])
  1. Top Projects (Table)
topk(10, gitlab_ci_minutes_per_project)
  1. Pipeline Duration (Graph)
avg(gitlab_project_pipeline_duration_seconds) by (project)
  1. Failure Rate (Gauge)
sum(rate(gitlab_runner_jobs{state="failed"}[1h])) / sum(rate(gitlab_runner_jobs[1h])) * 100

Import dashboard: Grafana.com dashboard #12833 (GitLab Runner)


Cost Attribution Reports

By Team

Tag projects with team labels:

# .gitlab-ci.yml variables: TEAM: "platform-engineering" COST_CENTER: "infrastructure"

Generate report:

#!/bin/bash # cost-by-team.sh echo "Team,Project,CI Minutes,Estimated Cost" for team in platform-engineering agent-team ossa-team; do # Get all projects for team (customize query) glab api "/groups/blueflyio/projects?search=$team" | \ jq -r '.[] | "\($team),\(.name),\(.statistics.ci_minutes_used // 0)"' | \ while IFS=, read -r team project minutes; do cost=$(echo "scale=2; $minutes / 1000 * 10" | bc) echo "$team,$project,$minutes,\$$cost" done done | mlr --csv stats1 -a sum -f 3,4 -g 1

By Project Type

Categorize projects:

# Projects by category echo "Category,Projects,Total Minutes,Cost" # Frontend projects FRONTEND=$(glab api "/groups/blueflyio/projects?topic=frontend" | \ jq '[.[] | .statistics.ci_minutes_used // 0] | add') # Backend projects BACKEND=$(glab api "/groups/blueflyio/projects?topic=backend" | \ jq '[.[] | .statistics.ci_minutes_used // 0] | add') # Infrastructure projects INFRA=$(glab api "/groups/blueflyio/projects?topic=infrastructure" | \ jq '[.[] | .statistics.ci_minutes_used // 0] | add') echo "Frontend,$(glab api '/groups/blueflyio/projects?topic=frontend' | jq 'length'),$FRONTEND,\$$(echo "scale=2; $FRONTEND / 1000 * 10" | bc)" echo "Backend,$(glab api '/groups/blueflyio/projects?topic=backend' | jq 'length'),$BACKEND,\$$(echo "scale=2; $BACKEND / 1000 * 10" | bc)" echo "Infrastructure,$(glab api '/groups/blueflyio/projects?topic=infrastructure' | jq 'length'),$INFRA,\$$(echo "scale=2; $INFRA / 1000 * 10" | bc)"

Optimization Impact Tracking

Before/After Comparison

Create baseline:

# Save baseline before optimization date=$(date +%Y-%m) glab api /namespaces/:id/ci_minutes > baseline-$date.json echo "Baseline saved: baseline-$date.json" cat baseline-$date.json | jq '{ month: "'$date'", used: .minutes_used, quota: .monthly_minutes_limit, percent: (.minutes_used / .monthly_minutes_limit * 100) }'

Track improvement:

# After optimizations date_new=$(date +%Y-%m) glab api /namespaces/:id/ci_minutes > current-$date_new.json # Compare echo "Optimization Impact Report" echo "==========================" OLD_USAGE=$(jq '.minutes_used' baseline-$date.json) NEW_USAGE=$(jq '.minutes_used' current-$date_new.json) SAVED=$((OLD_USAGE - NEW_USAGE)) PERCENT_SAVED=$((100 * SAVED / OLD_USAGE)) echo "Before: $OLD_USAGE minutes" echo "After: $NEW_USAGE minutes" echo "Saved: $SAVED minutes ($PERCENT_SAVED%)" echo "Cost savings: \$$(echo "scale=2; $SAVED / 1000 * 10" | bc)"

A/B Testing Optimizations

Test optimization in one project:

# Feature flag for optimization workflow: rules: - if: $ENABLE_OPTIMIZATION == "true" variables: USE_CACHE: "true" USE_INTERRUPTIBLE: "true" test: interruptible: $USE_INTERRUPTIBLE cache: key: $CI_COMMIT_REF_SLUG paths: - node_modules/ when: $USE_CACHE

Compare metrics:

# With optimization OPTIMIZED=$(glab api "/projects/:id/pipelines?per_page=50&variables[][key]=ENABLE_OPTIMIZATION&variables[][value]=true" | \ jq '[.[] | .duration] | add / length / 60') # Without optimization BASELINE=$(glab api "/projects/:id/pipelines?per_page=50&variables[][key]=ENABLE_OPTIMIZATION&variables[][value]=false" | \ jq '[.[] | .duration] | add / length / 60') echo "Baseline: $BASELINE minutes" echo "Optimized: $OPTIMIZED minutes" echo "Improvement: $(echo "scale=2; ($BASELINE - $OPTIMIZED) / $BASELINE * 100" | bc)%"

Alerting and Notifications

Email Alerts

Built-in GitLab alerts:

  • 75% quota: Warning email
  • 95% quota: Critical email
  • 100% quota: Exhausted email

Recipients: Namespace owners and maintainers

Custom Alerts

API-based monitoring:

alert:high-usage: image: alpine:latest stage: .post rules: - if: $CI_PIPELINE_SOURCE == "schedule" script: - | PERCENT=$(curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$CI_API_V4_URL/namespaces/$CI_PROJECT_NAMESPACE_ID/ci_minutes" | \ jq '(.minutes_used / .monthly_minutes_limit) * 100') if (( $(echo "$PERCENT > 80" | bc -l) )); then # Send email curl -X POST https://api.sendgrid.com/v3/mail/send \ -H "Authorization: Bearer $SENDGRID_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "personalizations": [{ "to": [{"email": "ops@example.com"}], "subject": "GitLab CI Minutes Alert" }], "from": {"email": "noreply@example.com"}, "content": [{ "type": "text/plain", "value": "CI minutes at '"$PERCENT"'% of quota" }] }' fi

PagerDuty Integration

Critical usage alert:

alert:critical: rules: - if: $CI_PIPELINE_SOURCE == "schedule" script: - | PERCENT=$(curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$CI_API_V4_URL/namespaces/$CI_PROJECT_NAMESPACE_ID/ci_minutes" | \ jq '(.minutes_used / .monthly_minutes_limit) * 100') if (( $(echo "$PERCENT > 95" | bc -l) )); then curl -X POST https://events.pagerduty.com/v2/enqueue \ -H "Content-Type: application/json" \ -d '{ "routing_key": "'"$PAGERDUTY_KEY"'", "event_action": "trigger", "payload": { "summary": "GitLab CI minutes at critical level", "severity": "critical", "source": "gitlab-ci-monitor", "custom_details": { "usage_percent": "'"$PERCENT"'", "namespace": "'"$CI_PROJECT_NAMESPACE"'" } } }' fi

Monthly Cost Reports

Automated Report Generation

Scheduled pipeline:

report:monthly: image: alpine:latest stage: .post rules: - if: $CI_PIPELINE_SOURCE == "schedule" && $REPORT_TYPE == "monthly" before_script: - apk add --no-cache curl jq bc script: - | echo "# GitLab CI/CD Monthly Cost Report" > report.md echo "Generated: $(date)" >> report.md echo "" >> report.md # Overall usage USAGE=$(curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$CI_API_V4_URL/namespaces/$CI_PROJECT_NAMESPACE_ID/ci_minutes" | jq '.minutes_used') COST=$(echo "scale=2; $USAGE / 1000 * 10" | bc) echo "## Summary" >> report.md echo "- Total Minutes: $USAGE" >> report.md echo "- Total Cost: \$$COST" >> report.md echo "" >> report.md # Top projects echo "## Top 10 Projects" >> report.md echo "| Project | Minutes | Cost |" >> report.md echo "|---------|---------|------|" >> report.md curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$CI_API_V4_URL/groups/$CI_PROJECT_NAMESPACE_ID/usage_stats" | \ jq -r '.projects[] | "\(.name)|\(.ci_minutes)"' | \ sort -t'|' -k2 -rn | head -10 | \ while IFS='|' read -r name minutes; do cost=$(echo "scale=2; $minutes / 1000 * 10" | bc) echo "| $name | $minutes | \$$cost |" >> report.md done # Recommendations echo "" >> report.md echo "## Recommendations" >> report.md if [ $USAGE -gt 40000 ]; then echo "- High usage detected. Review top projects for optimization opportunities." >> report.md fi cat report.md artifacts: paths: - report.md expire_in: 1 year

Schedule: 1st of each month at 9 AM

0 9 1 * *

Continuous Improvement

Weekly Review Checklist

## Weekly CI/CD Cost Review - [ ] Check current usage vs quota (target <70%) - [ ] Review top 5 projects by usage - [ ] Identify any unusual spikes - [ ] Check pipeline failure rate (target <10%) - [ ] Review cache hit rate (target >80%) - [ ] Look for optimization opportunities - [ ] Update team on findings

Quarterly Optimization Sprint

Every quarter, dedicate time to:

  1. Deep dive on top 10 projects

    • Profile each pipeline
    • Identify optimization opportunities
    • Implement improvements
  2. Review pipeline patterns

    • Are best practices being followed?
    • Are components being reused?
    • Is caching configured correctly?
  3. Update documentation

    • Share learnings
    • Update guidelines
    • Create examples
  4. Track ROI

    • Measure time invested
    • Calculate minutes saved
    • Document success stories

Tools and Scripts

ci-cost-analyzer (Custom Tool)

Create analysis tool:

#!/bin/bash # ci-cost-analyzer.sh NAMESPACE_ID="12345" API_URL="https://gitlab.com/api/v4" function usage_summary() { echo " CI/CD Cost Summary" echo "====================" USAGE=$(curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$API_URL/namespaces/$NAMESPACE_ID/ci_minutes" | jq '.minutes_used') QUOTA=$(curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$API_URL/namespaces/$NAMESPACE_ID/ci_minutes" | jq '.monthly_minutes_limit') PERCENT=$(echo "scale=2; $USAGE / $QUOTA * 100" | bc) COST=$(echo "scale=2; $USAGE / 1000 * 10" | bc) echo "Used: $USAGE / $QUOTA minutes ($PERCENT%)" echo "Cost: \$$COST this month" } function top_projects() { echo "" echo " Top 10 Projects" echo "==================" curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$API_URL/groups/$NAMESPACE_ID/usage_stats" | \ jq -r '.projects[] | "\(.name):\t\(.ci_minutes) min"' | \ sort -t: -k2 -rn | head -10 } function optimization_tips() { echo "" echo " Optimization Tips" echo "===================" # Check for common issues FAILED_RATE=$(curl -s --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "$API_URL/groups/$NAMESPACE_ID/pipelines?per_page=100" | \ jq '[.[] | select(.status == "failed")] | length') if [ $FAILED_RATE -gt 20 ]; then echo " High failure rate detected ($FAILED_RATE%). Consider:" echo " - Pre-commit hooks" echo " - Better local testing" fi if (( $(echo "$PERCENT > 75" | bc -l) )); then echo " Usage above 75%. Consider:" echo " - Auto-cancel redundant pipelines" echo " - Aggressive caching" echo " - Self-hosted runners" fi } # Main usage_summary top_projects optimization_tips

Usage:

chmod +x ci-cost-analyzer.sh ./ci-cost-analyzer.sh

Next Steps