Fixing Time-to-First-PR Regressions After a Dependency Upgrade
A dependency or toolchain upgrade slowed onboarding. Measure the setup-time regression, bisect the cause, fix it with lockfile and cache changes, and gate it in CI.
New hires used to be productive in twenty minutes; after a dependency or toolchain upgrade, first-run setup now takes an hour — a measurable time-to-first-PR regression you can attribute to a specific commit.
Diagnostic
Time a cold bootstrap (no caches) and compare it against the pre-upgrade baseline.
#!/usr/bin/env bash
set -euo pipefail
# cold-bootstrap-timer.sh — measure setup from a pristine state.
docker compose down -v >/dev/null 2>&1 || true
docker builder prune -af >/dev/null 2>&1 || true
rm -rf node_modules
start=$(date +%s)
make bootstrap >/dev/null
end=$(date +%s)
echo "cold_bootstrap_seconds=$((end - start))"
Expected BAD output — the cold setup time has roughly tripled versus the recorded baseline:
cold_bootstrap_seconds=3120
# baseline (pre-upgrade): cold_bootstrap_seconds=1080
Root cause
A dependency or toolchain upgrade slows onboarding when it changes how the dependency graph resolves or how caches are keyed. Common triggers: a major version bump pulls a transitive dependency that now compiles a native module from source (minutes of node-gyp/cffi build per cold install); a lockfile churn invalidates every entry so the package manager re-resolves the whole tree instead of replaying the lock; or a base-image/tool bump changes a cache key (hashFiles('**/package-lock.json'), a .tool-versions line, a Dockerfile layer) so every Docker layer and CI cache misses and rebuilds. The regression is invisible on a warm machine — the author's caches are already populated — and only appears on a cold clone, which is exactly the new-hire path. Bisecting cold-bootstrap time across the suspect commit range pinpoints the offending change.
Resolution
- Bisect the commit range with the cold-bootstrap timer as the test.
- Inspect what the bad commit changed in the lockfile or cache key.
- Restore deterministic resolution (commit the lockfile, pin the native dep to a prebuilt) and stable cache keys.
- Re-time a cold bootstrap to confirm the regression is gone.
Use git bisect run with the timer wrapped to fail above a threshold:
#!/usr/bin/env bash
set -euo pipefail
# bisect-test.sh — exit non-zero when cold setup exceeds the SLA (seconds).
SLA=1500
secs=$(./cold-bootstrap-timer.sh | sed 's/cold_bootstrap_seconds=//')
[ "$secs" -le "$SLA" ]
#!/usr/bin/env bash
set -euo pipefail
git bisect start HEAD <last-good-tag>
git bisect run ./bisect-test.sh # prints the first commit that blew the SLA
git bisect reset
Once found, restore deterministic install and a prebuilt binary so cold installs stop compiling:
// package.json — pin a version with a prebuilt arm64/x86_64 binary and keep the lock authoritative
{
"dependencies": {
"sharp": "0.33.4"
},
"overrides": {
"node-gyp": "10.1.0"
}
}
Stabilize the Docker/CI cache key so a tool bump no longer busts every layer:
# .github/workflows/ci.yml — key on the lockfile only, restore on partial match
steps:
- uses: actions/cache@v4
with:
path: |
~/.npm
node_modules
key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
Expected output
A cold bootstrap after the fix returns to (or below) the baseline:
$ ./cold-bootstrap-timer.sh
cold_bootstrap_seconds=1015
The bisect run names the exact regressing commit, making the cause auditable:
$ git bisect run ./bisect-test.sh
4f2a9c1 is the first bad commit
chore: bump image-lib to 5.x (drops prebuilt binaries)
Prevention
- Add a CI job that runs
cold-bootstrap-timer.shand fails when setup exceeds an SLA, so a future upgrade that slows onboarding is blocked at PR time. - Commit lockfiles and prefer dependencies that ship prebuilt binaries for both architectures — coordinate with debugging works-on-my-machine runtime drift.
- Watch the dependency graph for newly introduced heavy transitive deps with mapping microservice dependencies for local dev.
# .github/workflows/onboarding-sla.yml
name: Onboarding Time Gate
on: [pull_request]
jobs:
cold-setup:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Time a cold bootstrap
run: |
chmod +x ./cold-bootstrap-timer.sh ./bisect-test.sh
./bisect-test.sh
macOS (Docker Desktop): cold timings include VirtioFS sync overhead; record the baseline on the same OS you compare against, never mix host platforms. WSL2: measure on the Linux filesystem —
node_modulesextraction over/mnt/cinflates cold setup several-fold and pollutes the bisect signal. Apple Silicon (ARM64): a dep that dropped its arm64 prebuilt forces compilation locally but not on x86_64 CI; bisect on the developer architecture, not the runner's, to see the real regression.
Rollback
#!/usr/bin/env bash
set -euo pipefail
git revert <bad-commit> && npm ci # revert the upgrade and restore the prior lockfile state