Skip to main content

caching deep dive

Caching Deep Dive

Overview

Caching is one of the highest-impact optimizations for reducing CI/CD costs. Properly configured caches can reduce job duration by 30-70%, directly saving compute minutes.

Key Principle: Never download/install what you already have.


How GitLab Caching Works

Cache vs Artifacts

FeatureCacheArtifacts
PurposeSpeed up jobsPass data between jobs
ScopeGlobal (by key)Pipeline-specific
GuaranteeBest-effortGuaranteed
StorageExternal (S3, etc)GitLab storage
Size limitVaries1 GB default
Use forDependenciesBuild outputs

Rule of Thumb:

  • Use cache for dependencies (node_modules, pip cache, etc)
  • Use artifacts for build outputs (dist/, binaries, etc)

Cache Lifecycle

Job Start
  
Check cache key
  
Cache exists? No Download dependencies  Run job  Upload cache
  
 Yes
  
Download cache  Run job  Upload cache (if policy allows)

Cache Storage

GitLab stores caches in:

  • SaaS: AWS S3 (per region)
  • Self-Managed: Local filesystem or configured object storage

Important: Caches are NOT guaranteed. If cache is unavailable, job proceeds without it.


Cache Key Strategies

The cache key determines when caches are shared or regenerated.

Static Keys (Simple)

Same cache for all branches/jobs:

cache: key: "global-cache" paths: - node_modules/

Pros: Maximum reuse Cons: No invalidation when dependencies change

File-based keys (automatic invalidation):

cache: key: files: - package-lock.json # Regenerate when lockfile changes paths: - node_modules/

How it works:

  1. GitLab hashes package-lock.json
  2. Uses hash as part of cache key
  3. When file changes, hash changes, new cache created

Key with Prefix

Per-job or per-branch caching:

cache: key: files: - package-lock.json prefix: $CI_JOB_NAME # Different cache per job paths: - node_modules/

Per-branch:

cache: key: files: - package-lock.json prefix: $CI_COMMIT_REF_SLUG # Different cache per branch paths: - node_modules/

Composite Keys

Multiple files:

cache: key: files: - package-lock.json - package.json - .gitlab-ci.yml # Invalidate when CI config changes prefix: $CI_JOB_NAME paths: - node_modules/

Cache Key Hierarchy

From specific to general:

cache: - key: "$CI_COMMIT_REF_SLUG-$CI_JOB_NAME" paths: - node_modules/ - key: "$CI_COMMIT_REF_SLUG" paths: - node_modules/ - key: "default" paths: - node_modules/

Lookup order: Branch+Job Branch Default


Cache Scope

Project-Scoped (Default)

Cache shared across all branches/jobs in the same project.

cache: key: "shared" paths: - node_modules/

Best for: Single-project workflows

Branch-Scoped

Separate cache per branch.

cache: key: "$CI_COMMIT_REF_SLUG" paths: - node_modules/

Best for: Long-lived feature branches with different dependencies

Job-Scoped

Separate cache per job.

cache: key: prefix: "$CI_JOB_NAME" files: - package-lock.json paths: - node_modules/

Best for: Jobs with different dependency sets


Cache Fallback Keys

GitLab 16.1+: Use fallback keys for better cache reuse.

Basic Fallback

cache: - key: "cache-$CI_COMMIT_REF_SLUG" fallback_keys: - "cache-$CI_DEFAULT_BRANCH" # Try main branch - "cache-default" # Last resort paths: - node_modules/

How it works:

  1. Try branch-specific cache
  2. If not found, try main branch cache
  3. If not found, try default cache
  4. If none exist, proceed without cache

Advanced Fallback Chain

cache: - key: files: - package-lock.json prefix: "$CI_COMMIT_REF_SLUG" fallback_keys: # Same lockfile, different branch - files: - package-lock.json prefix: "$CI_DEFAULT_BRANCH" # Any cache from main branch - "$CI_DEFAULT_BRANCH-default" # Global fallback - "global-cache" paths: - node_modules/

Benefits:

  • New branches inherit cache from main
  • Reduces initial build time on new branches
  • Graceful degradation

Cache Policy

Controls when cache is downloaded/uploaded.

pull-push (Default)

Download before job, upload after job:

cache: policy: pull-push # Default

Use for: Jobs that modify dependencies (install, update)

Cost: 2x cache operations per job

Download only, don't upload:

cache: policy: pull

Use for: Jobs that only read dependencies (test, lint)

Benefit: 50% cache operation reduction

push

Upload only, don't download:

cache: policy: push

Use for: Initial setup jobs that create cache

Combined Strategy

# Job that installs dependencies install: stage: .pre cache: key: files: - package-lock.json paths: - node_modules/ policy: pull-push # Create/update cache script: - npm ci --prefer-offline # Jobs that use dependencies test: cache: key: files: - package-lock.json paths: - node_modules/ policy: pull # Only download script: - npm test lint: cache: key: files: - package-lock.json paths: - node_modules/ policy: pull # Only download script: - npm run lint

Savings: 40-60% reduction in cache upload operations


Dependency Caching by Language

Node.js / npm

Basic:

cache: key: files: - package-lock.json paths: - node_modules/

Advanced (with npm cache):

variables: npm_config_cache: "$CI_PROJECT_DIR/.npm" cache: key: files: - package-lock.json paths: - node_modules/ - .npm/ # npm cache directory before_script: - npm ci --prefer-offline --no-audit

Yarn:

cache: key: files: - yarn.lock paths: - node_modules/ - .yarn/cache/ before_script: - yarn install --frozen-lockfile --cache-folder .yarn/cache

Python / pip

Basic:

variables: PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip" cache: key: files: - requirements.txt paths: - .cache/pip/ - venv/ before_script: - python -m venv venv - source venv/bin/activate - pip install -r requirements.txt

Poetry:

variables: POETRY_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pypoetry" cache: key: files: - poetry.lock prefix: "$CI_JOB_NAME" paths: - .venv/ - .cache/pypoetry/ before_script: - poetry config virtualenvs.in-project true - poetry install --no-root

Ruby / Bundler

variables: BUNDLE_PATH: "$CI_PROJECT_DIR/vendor/bundle" cache: key: files: - Gemfile.lock paths: - vendor/bundle/ before_script: - bundle install --jobs $(nproc) --path=vendor/bundle

Go

variables: GOPATH: "$CI_PROJECT_DIR/.go" cache: key: files: - go.sum paths: - .go/pkg/mod/ before_script: - go mod download

Rust / Cargo

variables: CARGO_HOME: "$CI_PROJECT_DIR/.cargo" cache: key: files: - Cargo.lock paths: - .cargo/ - target/ before_script: - cargo fetch

Docker Layer Caching

Separate from GitLab cache - uses Docker registry.

Problem

Building Docker images from scratch every time:

FROM node:20 COPY package*.json ./ RUN npm install # Downloads packages every time COPY . . RUN npm run build

Solution 1: BuildKit Registry Cache

Enable BuildKit:

variables: DOCKER_BUILDKIT: 1 build: image: docker:24 services: - docker:24-dind script: - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY - | docker buildx create --use docker buildx build \ --cache-from type=registry,ref=$CI_REGISTRY_IMAGE:cache \ --cache-to type=registry,ref=$CI_REGISTRY_IMAGE:cache,mode=max \ --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA \ --push \ .

Benefits:

  • Reuses all layers across builds
  • mode=max stores intermediate layers
  • 40-70% faster builds

Solution 2: Multi-Stage Build Caching

build: script: - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY # Pull previous images for cache - docker pull $CI_REGISTRY_IMAGE:builder || true - docker pull $CI_REGISTRY_IMAGE:latest || true # Build with cache - | docker build \ --target builder \ --cache-from $CI_REGISTRY_IMAGE:builder \ --tag $CI_REGISTRY_IMAGE:builder \ . - | docker build \ --cache-from $CI_REGISTRY_IMAGE:builder \ --cache-from $CI_REGISTRY_IMAGE:latest \ --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA \ --tag $CI_REGISTRY_IMAGE:latest \ . - docker push $CI_REGISTRY_IMAGE:builder - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA - docker push $CI_REGISTRY_IMAGE:latest

Dockerfile:

FROM node:20 AS builder WORKDIR /app COPY package*.json ./ RUN npm ci --only=production FROM node:20-alpine WORKDIR /app COPY --from=builder /app/node_modules ./node_modules COPY . . RUN npm run build

Solution 3: Kaniko

Google's Kaniko for caching without Docker daemon:

build: image: name: gcr.io/kaniko-project/executor:debug entrypoint: [""] script: - | /kaniko/executor \ --context $CI_PROJECT_DIR \ --dockerfile $CI_PROJECT_DIR/Dockerfile \ --destination $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA \ --cache=true \ --cache-repo $CI_REGISTRY_IMAGE/cache

Benefits:

  • No Docker-in-Docker needed
  • Built-in layer caching
  • More efficient in GitLab

Build Artifact Caching

Incremental Builds

Cache build outputs between runs:

build: cache: key: "$CI_COMMIT_REF_SLUG" paths: - node_modules/ - .next/cache/ # Next.js build cache - dist/.cache/ # Custom build cache script: - npm run build artifacts: paths: - dist/ expire_in: 1 day

Webpack/Rollup Cache

cache: key: files: - package-lock.json - webpack.config.js paths: - node_modules/ - .webpack-cache/

webpack.config.js:

module.exports = { cache: { type: 'filesystem', cacheDirectory: path.resolve(__dirname, '.webpack-cache'), }, };

Cache Expiration and Cleanup

Automatic Expiration

GitLab automatically removes caches:

  • Not used in 7 days
  • Exceeding storage quota
  • Manually cleared

Manual Cleanup

UI: Project CI/CD Pipelines Clear runner caches

API:

# Clear project cache curl --request POST \ --header "PRIVATE-TOKEN: $GITLAB_TOKEN" \ "https://gitlab.com/api/v4/projects/:id/jobs/cache"

Versioned Caches

Force cache invalidation with version prefix:

cache: key: files: - package-lock.json prefix: "v2-$CI_JOB_NAME" # Increment when needed paths: - node_modules/

When to bump version:

  • Major dependency changes
  • CI configuration changes
  • Cache corruption suspected

Troubleshooting Cache Issues

Cache Not Being Used

Symptoms: Jobs always download dependencies

Causes:

  1. Cache key changes every run
  2. Cache upload failed (quota, permissions)
  3. Cache storage unavailable

Debug:

test: script: - echo "Cache key: $CI_CACHE_KEY" - ls -la node_modules/ || echo "Cache miss" - npm ci - ls -la node_modules/

Fix:

  • Use stable cache keys (file-based)
  • Check runner logs for upload errors
  • Verify cache storage configuration

Cache Corruption

Symptoms: Jobs fail with "module not found" errors

Fix:

# Clear cache and rebuild cache: key: files: - package-lock.json prefix: "v2" # Increment version paths: - node_modules/

Or manually clear: Project CI/CD Clear runner caches

Slow Cache Download

Symptoms: 5+ minutes to download cache

Causes:

  • Cache too large (>500 MB)
  • Network latency to cache storage

Fix:

  • Reduce cache size (exclude unnecessary files)
  • Use .gitignore-style patterns
  • Consider splitting into multiple caches
cache: - key: files: - package-lock.json paths: - node_modules/ - "!node_modules/.cache/" # Exclude large subdirs - "!node_modules/**/*.md" # Exclude docs

Cache Quota Exceeded

Symptoms: Warning in job logs about cache quota

Fix:

  • Delete old caches
  • Reduce cache size
  • Use cache expiration
  • Contact admin to increase quota

Advanced Patterns

Monorepo Caching

Problem: Different services have different dependencies

# Shared cache config .cache_template: cache: key: files: - $SERVICE_DIR/package-lock.json prefix: "$CI_JOB_NAME" paths: - $SERVICE_DIR/node_modules/ policy: pull # Service-specific jobs test:agent-mesh: extends: .cache_template variables: SERVICE_DIR: "services/agent-mesh" script: - cd services/agent-mesh - npm test test:agent-router: extends: .cache_template variables: SERVICE_DIR: "services/agent-router" script: - cd services/agent-router - npm test

Matrix Builds with Caching

Different Node versions:

test: parallel: matrix: - NODE_VERSION: ["18", "20", "22"] image: node:${NODE_VERSION} cache: key: files: - package-lock.json prefix: "node-${NODE_VERSION}" script: - npm ci - npm test

Conditional Caching

# Only cache on main branch build: cache: key: "$CI_COMMIT_REF_SLUG" paths: - node_modules/ policy: !reference [.cache_policy, $CI_COMMIT_BRANCH] .cache_policy: main: pull-push "*": pull # All other branches: pull only

Measuring Cache Effectiveness

Key Metrics

Cache Hit Rate:

Cache Hit Rate = (Jobs with cache / Total jobs)  100%

Target: >80%

Time Savings:

Time Saved = (Avg time without cache - Avg time with cache)  Job count

Cost Savings:

Cost Saved = Time Saved  Cost Factor  $10/1000 minutes

Monitoring

Add cache hit detection:

test: before_script: - | if [ -d "node_modules" ]; then echo " Cache hit" else echo " Cache miss" fi - npm ci --prefer-offline

Track in CI/CD variables:

script: - | if [ -d "node_modules" ]; then export CACHE_HIT=1 else export CACHE_HIT=0 fi - echo "CACHE_HIT=$CACHE_HIT" >> metrics.env dotenv: metrics.env

Best Practices Summary

  1. Use file-based cache keys (package-lock.json, poetry.lock)
  2. Add fallback keys for better reuse across branches
  3. Use pull-only policy for read-only jobs
  4. Cache both dependencies and package manager cache (.npm, .cache/pip)
  5. Enable Docker layer caching for build jobs
  6. Version your cache keys when forcing invalidation
  7. Monitor cache hit rate (target >80%)
  8. Keep cache size reasonable (<500 MB)
  9. Use separate caches for different dependency sets
  10. Clear cache when corrupted

Example: Complete Caching Setup

variables: npm_config_cache: "$CI_PROJECT_DIR/.npm" DOCKER_BUILDKIT: 1 # Default cache configuration default: cache: - key: files: - package-lock.json prefix: "$CI_JOB_NAME" fallback_keys: - files: - package-lock.json prefix: "$CI_DEFAULT_BRANCH" - "default-cache" paths: - node_modules/ - .npm/ policy: pull # Install job creates cache install: stage: .pre cache: - key: files: - package-lock.json paths: - node_modules/ - .npm/ policy: pull-push script: - npm ci --prefer-offline --no-audit artifacts: paths: - node_modules/ expire_in: 1 hour # All other jobs use cache (pull-only from default) lint: script: - npm run lint test: script: - npm test # Docker build with layer caching build:docker: image: docker:24 services: - docker:24-dind cache: [] # No npm cache needed script: - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY - | docker buildx create --use docker buildx build \ --cache-from type=registry,ref=$CI_REGISTRY_IMAGE:cache \ --cache-to type=registry,ref=$CI_REGISTRY_IMAGE:cache,mode=max \ --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA \ --push \ .

Next Steps