A/B testing on Cloudflare Workers + GCP — parked sketch (2026-05-15)

DARE.CO.UK · PARKED SKETCH · 2026-05-18

Mirrored from ~/.claude/.../memory/project_ab_testing_cf_workers_parked.md. This is a design sketch parked for future build — read for context, not as a current deliverable.

Edge-A/B framework using CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Workers + KV stickiness + HTMLRewriter + Logpush→BigQuery; parked because dare lacks the traffic to support hypothesis testing (11k req/day); resume condition is a client engagement at >100k req/day where small-effect tests become tractable.


Sketched 2026-05-15 during the canonical-header rollout when the mobile-kicker question surfaced (“hide on mobile or keep?”). Dan’s framing:

“I would deliberate over the decision. Typically I would serve A/B tests to 1000 users and study the data… mostly this is about client projects, where I test it on dare.co.uk (but don’t have the traffic to support the assertions from the hypothesis).”

The senior shape: dare is the methodology testbed, but dare’s own traffic (~11k req/day per project_dare_traffic_baseline) doesn’t support statistically-valid A/B for small UI tweaks. The framework matters for client projects (audrey commerce at scale, future paid engagements) where 100k+ daily sessions make small-effect detection feasible.

The framework (for resume)

Edge-A/B on CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Workers, fronting CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Pages:

  1. Worker intercepts every request at the dare/dogwood/audrey edge
  2. Read sticky-bucket cookie (or assign one on first visit; key = anonymous user ID)
  3. Use HTMLRewriter API to modify response HTML inline per bucket — e.g., strip <span class="kicker"> for the B group, or swap the wordmark text, or rewrite a CTA button
  4. Set cookie on response so the same visitor sees the same variant across sessions
  5. Emit a log line: {user_id, variant, path, ts} to CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Workers logging

Stickiness via KV (or D1 if richer queries needed): - Key: anonymous user ID (hashed IP + UA, or first-visit-generated UUID) - Value: assigned variant string - TTL: experiment duration (e.g., 30 days)

Analytics pipeline: - CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Logpush → Cloudflare R2 (or GCS) → BigQuery - BigQuery view joins variant-assignment logs with downstream-event logs (bounce, time-on-page, conversion) - Looker Studio dashboard for the funnel + significance test

GCP-side considerations: - BigQuery is the natural home for experiment data (CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Logpush integrates with GCS) - Looker Studio for analysis (free) - Could lean on Optimize or similar if migrating to GCP-native A/B, but CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF-native + BQ is cheaper at start

Statistical reality at small scale (why this is parked for dare)

At 11k req/day on dare: - Tracking a 5% effect size at 95% confidence needs ~10k sessions per variant → 2 days at 50/50 split for mobile-only changes (since mobile is ~40% of traffic) - For small visual tweaks (kicker visibility, active-state colour), the effect size is likely <2% — needs weeks to reach significance - Experiment overhead (Worker code + analytics setup) doesn’t pay back if the test takes 3 weeks and Dan would have shipped a confident design judgment in 15 minutes

The math reverses at 100k+ sessions per day or for high-effect-size tests (homepage redesign, hero-copy rewrite, conversion-flow change). That’s the resume threshold.

Resume conditions

  1. A paid client engagement at >100k daily sessions where editorial decisions hang on multi-thousand-hour cost (e.g., e-commerce homepage redesign, B2B SaaS hero rewrite)
  2. Audrey commerce traffic crosses ~50k daily sessions and a meaningful conversion-rate test is on the table
  3. Cross-portfolio big-bet like “wordmark change site-wide” — the framework’s setup cost amortises across all three sites

What’s already in place (zero-cost ground)

Linked memories

The aphorism

Methodology lives on the testbed; statistical significance lives on the production traffic. dare gives us one; clients give us the other.

Source: parked_sketch_ab_testing_cf_workers_2026-05-15.md · Rendered 2026-05-18 12:53