Visual regression pixel-diff tool — parked sketch (2026-05-15)

DARE.CO.UK · PARKED SKETCH · 2026-05-18

Mirrored from ~/.claude/.../memory/project_visual_regression_pixel_diff_parked.md. This is a design sketch parked for future build — read for context, not as a current deliverable.

Automated pixel-level comparison between staging vs production (or canonical-page vs new-page) screenshots, surfacing variance % per viewport. Parked because the canonical-header rollout reached 96% visual parity by eyeballing; remaining 4% is below human-noticing threshold but worth catching automatically when stakes rise.

Sketched 2026-05-15 at the close of the canonical-header rollout when Dan flagged residual ~4% pixel-level differences between methods-of-business-design (gold standard) and the section indexes:

“4% difference is nit-picking, and the subject of a script that can detect such variances.”

The 4% gap is below human-noticing threshold (eye toggles between pages and senses “shift” without being able to name it). Automation closes that gap.

What it would do

For a list of (canonical page, comparison page) pairs at multiple viewports: 1. Take screenshots via headless browser (Playwright in Python) 2. Diff each pair pixel-by-pixel via pixelmatch (or similar) 3. Compute variance % + surface a heatmap visualisation 4. Output ~/Downloads/dare_visual_regression_<DATE>.md (+ HTML via publish pipeline) 5. Per-pair: pass/fail against a threshold (e.g., <0.5% variance = pass)

Naming: ~/bin/dare_pixel_diff_audit.py or ~/bin/dare_visual_regression.py.

When this becomes useful (resume conditions)

Third visual incident in a single rollout — today the rollout hit two (font-family inheritance, background-tone). Third would suggest a systematic gap eyeballing misses.
Client engagement at scale — when a client paid-engagement needs guaranteed visual consistency across hundreds of pages. The audit becomes a contractual artefact.
Cross-portfolio rollouts — when applying a canonical template to dogwood / audrey / clients. The pattern transfers; the diff matters per-brand.
Pre-production gate — once it exists, add to CI as a fast-fail before main-branch promotion.

Approach sketch

Stack: - Playwright (Python) — same library Claude Code can drive via MCP - pixelmatch (npm package, callable from Python via subprocess, OR Python ports exist) - markdown reporting via existing publish pipeline - screenshot artefacts → R2 grabs bucket under grabs/internal/dare/<date>/regression/ (durable evidence)

Config:

�STASH7�

Outputs: - Markdown table per pair: viewport, variance %, pass/fail, link to heatmap image - Heatmaps uploaded to grabs/ for in-report embedding - Exit code: 0 if all pass, 1 if any fail (for CI gating)

What’s already in place

Playwright MCP available in this session — proven to work
R2 grabs bucket (built earlier today) — heatmap evidence has a durable home
publish pipeline + REPORT_PATTERNS — adding dare_visual_regression_*.html is one line

Linked memories

project_grabs_bucket_pipeline_built — heatmap evidence destination
feedback_intelligence_framework — third-occurrence build rule
feedback_park_with_resume_conditions — the discipline this park honours
feedback_automation_maturity_ladder — would start at v1 (ad-hoc local run), graduate to v2 (programmatic + weekly review), then v3 (CI gate)

The aphorism

Eyeballing scales to ten pages; pixel-diffing scales to a thousand. Until the corpus crosses that threshold, eyeballing is the right tool.

Source: parked_sketch_visual_regression_pixel_diff_2026-05-15.md · Rendered 2026-05-18 12:53