dora metrics
DORA Metrics in GitLab
Overview
DORA (DevOps Research and Assessment) metrics provide evidence-based insights into DevOps performance. GitLab provides built-in DORA metrics tracking to help teams measure and improve software delivery capabilities.
What are DORA Metrics?
DORA metrics measure four key indicators of DevOps performance, validated through eight years of research analyzing thousands of organizations worldwide.
The Four Key Metrics
- Deployment Frequency: How often you deploy to production
- Lead Time for Changes: Time from commit to production deployment
- Change Failure Rate: Percentage of deployments causing failures
- Mean Time to Recovery (MTTR): Time to recover from production incidents
Why DORA Metrics Matter
Performance Benchmarking
DORA metrics categorize teams into performance tiers:
Performance Tiers (2026 Standards)
Elite Performers:
Deployment Frequency: Multiple times per day
Lead Time: < 1 hour
Change Failure Rate: < 5%
MTTR: < 1 hour
High Performers:
Deployment Frequency: Once per day to once per week
Lead Time: < 1 day
Change Failure Rate: < 10%
MTTR: < 1 day
Medium Performers:
Deployment Frequency: Once per week to once per month
Lead Time: < 1 week
Change Failure Rate: < 15%
MTTR: < 1 week
Low Performers:
Deployment Frequency: Less than once per month
Lead Time: > 1 month
Change Failure Rate: > 15%
MTTR: > 1 week
Business Value
- Faster time-to-market: Ship features quickly
- Higher quality: Fewer production incidents
- Better reliability: Recover from failures faster
- Improved productivity: Streamlined workflows
- Competitive advantage: Outpace competitors
Accessing DORA Metrics in GitLab
Navigation
- Project-level: Analyze Value stream analytics DORA metrics
- Group-level: Group Analyze Value stream analytics DORA metrics
Dashboard View
DORA Metrics Dashboard
Deployment Frequency
12
10
8
6
4
2
0
Jan Feb Mar Apr May Jun
Current: 10 deployments/day (Elite)
Trend: +15% vs. last month
Lead Time for Changes
Current: 2.5 hours (High Performer)
Target: < 1 hour (Elite)
P50: 1.5 hours | P95: 6 hours
Change Failure Rate
Current: 3.2% (Elite)
Target: < 5%
Failed: 8 / Total: 250
Mean Time to Recovery (MTTR)
Current: 45 minutes (Elite)
Target: < 1 hour
Incidents: 8 | Avg recovery: 45m
Metric 1: Deployment Frequency
Definition
The frequency of successful deployments to production over a given period.
How GitLab Measures It
GitLab tracks deployments through:
- Environment deployments: Deployments to
productionenvironment - Release creation: GitLab releases tagged as production
- API tracking: Custom deployment tracking via API
Calculation
-- Deployments per day (last 30 days) SELECT DATE(finished_at) as date, COUNT(*) as deployments FROM deployments WHERE environment = 'production' AND status = 'success' AND finished_at >= NOW() - INTERVAL 30 DAY GROUP BY date ORDER BY date; -- Average deployments per day SELECT COUNT(*) / 30 as avg_deployments_per_day FROM deployments WHERE environment = 'production' AND status = 'success' AND finished_at >= NOW() - INTERVAL 30 DAY;
Improving Deployment Frequency
1. Automate Deployments
# .gitlab-ci.yml - Automatic production deployment deploy:production: stage: deploy script: - kubectl apply -f k8s/production/ environment: name: production url: https://app.example.com only: - main when: on_success
2. Implement Feature Flags
// Deploy code without activating features const featureFlags = require('./featureFlags'); app.get('/api/new-feature', (req, res) => { if (featureFlags.isEnabled('new-feature', req.user)) { return handleNewFeature(req, res); } return handleOldFeature(req, res); });
3. Use Progressive Delivery
# Canary deployment deploy:canary: stage: deploy script: - kubectl apply -f k8s/canary/ environment: name: production-canary only: - main # Full rollout after validation deploy:production: stage: deploy script: - kubectl apply -f k8s/production/ environment: name: production only: - main when: manual needs: [deploy:canary]
Metric 2: Lead Time for Changes
Definition
Time from when development teams start working on a feature to when it gets deployed to production.
How GitLab Measures It
Lead Time = Deployment Time - First Commit Time
GitLab tracks:
- First commit: When code changes begin
- Merge to main: When MR merges to main/development
- Production deployment: When deployed to production
Calculation
-- Lead time for changes (last 30 days) SELECT AVG( TIMESTAMPDIFF(SECOND, merge_requests.created_at, deployments.finished_at ) ) / 3600 as avg_lead_time_hours FROM merge_requests JOIN deployments ON merge_requests.merge_commit_sha = deployments.sha WHERE deployments.environment = 'production' AND deployments.status = 'success' AND deployments.finished_at >= NOW() - INTERVAL 30 DAY; -- Lead time distribution SELECT CASE WHEN lead_time_hours < 1 THEN '< 1 hour' WHEN lead_time_hours < 24 THEN '< 1 day' WHEN lead_time_hours < 168 THEN '< 1 week' ELSE '> 1 week' END as lead_time_bucket, COUNT(*) as deployment_count FROM ( SELECT TIMESTAMPDIFF(HOUR, merge_requests.created_at, deployments.finished_at ) as lead_time_hours FROM merge_requests JOIN deployments ON merge_requests.merge_commit_sha = deployments.sha WHERE deployments.environment = 'production' AND deployments.finished_at >= NOW() - INTERVAL 30 DAY ) lead_times GROUP BY lead_time_bucket;
Improving Lead Time
1. Reduce Batch Size
# Small, frequent changes # Bad: Large MR with 50+ files # Good: Small MRs with 5-10 files each # CI validation for MR size check:mr-size: stage: validate script: - | FILES_CHANGED=$(git diff --name-only $CI_MERGE_REQUEST_DIFF_BASE_SHA | wc -l) if [ $FILES_CHANGED -gt 20 ]; then echo "MR too large: $FILES_CHANGED files changed" echo "Consider splitting into smaller MRs" exit 1 fi only: - merge_requests
2. Optimize CI/CD Pipeline
# Parallel job execution build: stage: build script: npm run build test:unit: stage: test needs: [build] script: npm run test:unit test:integration: stage: test needs: [build] script: npm run test:integration # Both tests run in parallel
3. Streamline Code Review
# Automated code review code_quality: stage: test script: - npm run lint - npm run type-check artifacts: reports: codequality: gl-code-quality-report.json only: - merge_requests
Metric 3: Change Failure Rate
Definition
Percentage of deployments that result in failure requiring a hotfix, rollback, or patch.
How GitLab Measures It
GitLab tracks failures through:
- Failed deployments: Deployments with
failedstatus - Rollbacks: Redeployment of previous version
- Incidents: Issues labeled as incidents linked to deployments
Calculation
-- Change failure rate (last 30 days) SELECT COUNT(CASE WHEN status = 'failed' THEN 1 END) as failed_deployments, COUNT(*) as total_deployments, ROUND( COUNT(CASE WHEN status = 'failed' THEN 1 END) * 100.0 / COUNT(*), 2 ) as failure_rate_pct FROM deployments WHERE environment = 'production' AND finished_at >= NOW() - INTERVAL 30 DAY; -- Failed deployments with linked incidents SELECT d.id as deployment_id, d.finished_at, d.sha, i.title as incident_title, i.created_at as incident_created_at FROM deployments d LEFT JOIN issues i ON d.id = i.deployment_id AND i.labels LIKE '%incident%' WHERE d.environment = 'production' AND d.finished_at >= NOW() - INTERVAL 30 DAY AND (d.status = 'failed' OR i.id IS NOT NULL);
Improving Change Failure Rate
1. Comprehensive Testing
# Multi-layer testing strategy test:unit: stage: test script: - npm run test:unit coverage: '/Coverage: \d+\.\d+%/' artifacts: reports: coverage_report: coverage_format: cobertura path: coverage/cobertura-coverage.xml test:integration: stage: test script: - npm run test:integration test:e2e: stage: test script: - npm run test:e2e only: - merge_requests - main
2. Automated Quality Gates
# Enforce quality standards quality_gate: stage: validate script: - | # Check test coverage COVERAGE=$(npm run test:coverage:summary | grep -oP '\d+\.\d+' | head -1) if (( $(echo "$COVERAGE < 80" | bc -l) )); then echo "Coverage $COVERAGE% below 80% threshold" exit 1 fi # Check code quality QUALITY_SCORE=$(cat gl-code-quality-report.json | jq '.[] | length') if [ $QUALITY_SCORE -gt 10 ]; then echo "$QUALITY_SCORE code quality issues found" exit 1 fi only: - merge_requests
3. Progressive Rollout
# Gradual deployment with monitoring deploy:canary: stage: deploy script: - kubectl apply -f k8s/canary/ - sleep 300 # Monitor for 5 minutes - ./scripts/check-error-rate.sh environment: name: production-canary on_failure: - ./scripts/rollback-canary.sh deploy:production: stage: deploy script: - kubectl apply -f k8s/production/ environment: name: production when: manual needs: [deploy:canary]
Metric 4: Mean Time to Recovery (MTTR)
Definition
Average time to recover from a production failure, measured from incident start to resolution.
How GitLab Measures It
GitLab tracks recovery through:
- Incidents: Issues labeled as incidents
- Resolution time: Time from incident creation to closure
- Deployment recovery: Time from failed deployment to successful rollback/fix
Calculation
-- MTTR for incidents (last 30 days) SELECT COUNT(*) as total_incidents, AVG( TIMESTAMPDIFF(SECOND, created_at, closed_at) ) / 3600 as avg_mttr_hours, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY TIMESTAMPDIFF(SECOND, created_at, closed_at) ) / 3600 as median_mttr_hours FROM issues WHERE labels LIKE '%incident%' AND closed_at IS NOT NULL AND created_at >= NOW() - INTERVAL 30 DAY; -- MTTR by severity SELECT severity, COUNT(*) as incident_count, AVG( TIMESTAMPDIFF(SECOND, created_at, closed_at) ) / 60 as avg_mttr_minutes FROM issues WHERE labels LIKE '%incident%' AND closed_at IS NOT NULL AND created_at >= NOW() - INTERVAL 30 DAY GROUP BY severity ORDER BY avg_mttr_minutes DESC;
Improving MTTR
1. Automated Rollback
# Automatic rollback on failure deploy:production: stage: deploy script: - kubectl apply -f k8s/production/ - sleep 60 - ./scripts/health-check.sh || ./scripts/rollback.sh environment: name: production on_stop: rollback:production rollback:production: stage: deploy script: - kubectl rollout undo deployment/app environment: name: production action: stop when: on_failure
2. Incident Response Runbooks
# .gitlab/incident-runbooks/database-down.md name: Database Down Runbook severity: critical steps: - Check database status: kubectl get pods -l app=postgres - Review logs: kubectl logs -l app=postgres --tail=100 - Restart if needed: kubectl rollout restart deployment/postgres - Verify recovery: ./scripts/db-health-check.sh - Update incident: glab issue comment --issue incident-123 "Database recovered"
3. Observability and Alerting
# Prometheus alert rules groups: - name: production_alerts rules: - alert: HighErrorRate expr: | rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate detected" runbook: "https://gitlab.com/runbooks/high-error-rate" action: "Create incident and page on-call"
Tracking DORA Metrics via API
GraphQL API
query { project(fullPath: "my-org/my-project") { dora { metrics { deploymentFrequency leadTimeForChanges changeFailureRate timeToRestoreService } } } }
REST API
# Get DORA metrics curl --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "https://gitlab.com/api/v4/projects/123/dora/metrics?environment_tier=production&interval=monthly"
Custom Tracking Script
// scripts/track-dora-metrics.js const axios = require('axios'); async function getDORAMetrics() { const response = await axios.get( `https://gitlab.com/api/v4/projects/${PROJECT_ID}/dora/metrics`, { headers: { 'PRIVATE-TOKEN': process.env.GITLAB_TOKEN }, params: { environment_tier: 'production', interval: 'monthly', start_date: '2026-01-01', end_date: '2026-01-31' } } ); const metrics = response.data; console.log('DORA Metrics Summary:'); console.log(`Deployment Frequency: ${metrics.deployment_frequency} per day`); console.log(`Lead Time: ${metrics.lead_time_for_changes} hours`); console.log(`Change Failure Rate: ${metrics.change_failure_rate}%`); console.log(`MTTR: ${metrics.time_to_restore_service} hours`); // Send to monitoring system await sendToDatadog(metrics); } getDORAMetrics();
DORA Metrics Dashboard
Creating Custom Dashboards
# .gitlab/analytics/dashboards/dora.yaml title: DORA Metrics Dashboard description: DevOps performance indicators visualizations: - title: Deployment Frequency type: line_chart data: query: deployment_frequency_per_day xaxis: date yaxis: deployments options: goal: 10 - title: Lead Time Distribution type: histogram data: query: lead_time_distribution buckets: [0, 1, 4, 8, 24, 168] options: labels: ['< 1h', '< 4h', '< 8h', '< 1d', '< 1w', '> 1w'] - title: Change Failure Rate type: gauge data: query: change_failure_rate_pct options: max: 100 ranges: - from: 0 to: 5 color: green - from: 5 to: 15 color: yellow - from: 15 to: 100 color: red - title: MTTR Trend type: line_chart data: query: mttr_by_week xaxis: week yaxis: hours options: goal: 1
Continuous Improvement
Monthly DORA Review
Schedule regular reviews:
# Monthly DORA Metrics Review - January 2026 ## Current Performance - Deployment Frequency: 10/day (Elite) - Lead Time: 2.5 hours (High) - Change Failure Rate: 3.2% (Elite) - MTTR: 45 minutes (Elite) ## Improvement Goals 1. Reduce lead time to < 1 hour (Elite tier) - Action: Optimize CI/CD pipeline - Owner: DevOps team - Target: February 2026 2. Maintain current performance in other metrics - Continue monitoring and alerting - Refine incident response procedures ## Initiatives - Implement parallel testing (Week 1-2) - Optimize Docker builds (Week 3) - Review and update runbooks (Week 4)
Team Performance Tracking
-- Team performance comparison SELECT t.name as team, AVG(deployment_frequency) as avg_deployments_per_day, AVG(lead_time_hours) as avg_lead_time_hours, AVG(change_failure_rate) as avg_failure_rate_pct, AVG(mttr_hours) as avg_mttr_hours FROM teams t JOIN projects p ON t.id = p.team_id JOIN dora_metrics dm ON p.id = dm.project_id WHERE dm.month = '2026-01' GROUP BY t.name ORDER BY avg_deployments_per_day DESC;
References
- GitLab DORA Metrics Documentation
- DORA Research
- GitLab Value Stream Management
- Google Four Keys Project
Related Documentation
- CI/CD Analytics - Pipeline performance metrics
- Value Stream Analytics - End-to-end workflow analysis
- Dashboards - Metric visualization
- Alerting - Performance alerting