Visual regression pixel-diff tool — parked sketch (2026-05-15)
DARE.CO.UK · PARKED SKETCH · 2026-05-18
Mirrored from ~/.claude/.../memory/project_visual_regression_pixel_diff_parked.md. This is a design sketch parked for future build — read for context, not as a current deliverable.
Automated pixel-level comparison between staging vs production (or canonical-page vs new-page) screenshots, surfacing variance % per viewport. Parked because the canonical-header rollout reached 96% visual parity by eyeballing; remaining 4% is below human-noticing threshold but worth catching automatically when stakes rise.
Sketched 2026-05-15 at the close of the canonical-header rollout when Dan flagged residual ~4% pixel-level differences between methods-of-business-design (gold standard) and the section indexes:
“4% difference is nit-picking, and the subject of a script that can detect such variances.”
The 4% gap is below human-noticing threshold (eye toggles between pages and senses “shift” without being able to name it). Automation closes that gap.
What it would do
For a list of (canonical page, comparison page) pairs at multiple viewports:
1. Take screenshots via headless browser (Playwright in Python)
2. Diff each pair pixel-by-pixel via pixelmatch (or similar)
3. Compute variance % + surface a heatmap visualisation
4. Output ~/Downloads/dare_visual_regression_<DATE>.md (+ HTML via publish pipeline)
5. Per-pair: pass/fail against a threshold (e.g., <0.5% variance = pass)
Naming: ~/bin/dare_pixel_diff_audit.py or ~/bin/dare_visual_regression.py.
When this becomes useful (resume conditions)
- Third visual incident in a single rollout — today the rollout hit two (font-family inheritance, background-tone). Third would suggest a systematic gap eyeballing misses.
- Client engagement at scale — when a client paid-engagement needs guaranteed visual consistency across hundreds of pages. The audit becomes a contractual artefact.
- Cross-portfolio rollouts — when applying a canonical template to dogwood / audrey / clients. The pattern transfers; the diff matters per-brand.
- Pre-production gate — once it exists, add to CI as a fast-fail before main-branch promotion.
Approach sketch
Stack:
- Playwright (Python) — same library Claude Code can drive via MCP
- pixelmatch (npm package, callable from Python via subprocess, OR Python ports exist)
- markdown reporting via existing publish pipeline
- screenshot artefacts → R2 grabs bucket under grabs/internal/dare/<date>/regression/ (durable evidence)
Config:
�STASH7�
Outputs: - Markdown table per pair: viewport, variance %, pass/fail, link to heatmap image - Heatmaps uploaded to grabs/ for in-report embedding - Exit code: 0 if all pass, 1 if any fail (for CI gating)
What’s already in place
- Playwright MCP available in this session — proven to work
- R2 grabs bucket (built earlier today) — heatmap evidence has a durable home
- publish pipeline + REPORT_PATTERNS — adding
dare_visual_regression_*.htmlis one line
Linked memories
project_grabs_bucket_pipeline_built— heatmap evidence destinationfeedback_intelligence_framework— third-occurrence build rulefeedback_park_with_resume_conditions— the discipline this park honoursfeedback_automation_maturity_ladder— would start at v1 (ad-hoc local run), graduate to v2 (programmatic + weekly review), then v3 (CI gate)
The aphorism
Eyeballing scales to ten pages; pixel-diffing scales to a thousand. Until the corpus crosses that threshold, eyeballing is the right tool.