dashboard.audreyinc.com — A/B testing + editorial + Google/Meta ads experimentation surface (parked sketch)
DARE.CO.UK · PARKED SKETCH · 2026-05-18
Mirrored from ~/.claude/.../memory/project_dashboard_audreyinc_com_ab_editorial_ads_parked.md. This is a design sketch parked for future build — read for context, not as a current deliverable.
2026-05-18 sketch. dashboard.audreyinc.com is reserved for commerce / experimentation (distinct from health.audreyinc.com which hosts Site Health metrics). This surface needs plumbing for A/B testing landing pages + editorial copy variants + Google/Meta ad-creative testing + organic social-pattern testing, with conversion attribution back to Shopify. Unparks the previously-parked dare A/B framework (dare lacked traffic; audrey has commerce). Resume on Dan’s first commerce-experimentation engagement, OR when stages 4-8 of the audrey commerce flywheel come online.
Dan, 2026-05-18 evening: “dashboard.audreyinc.com has a landing page, but we need to start and sketch the plumbing/schematics for how A/B testing, editorial and ways to test google and meta ad’s can be designed and deployed, to evaluate copy/ideas/social patterns.”
Subdomain delineation (clarified 2026-05-18)
health.audreyinc.com— Site Health metrics surface. Tripwires, GSC markers, content-quality signals. Pattern lifted from dare’s Site Health row + click-through-to-latest-authoritative-report.dashboard.audreyinc.com— commerce / experimentation surface. A/B tests, editorial variants, ad attribution, social patterns. This sketch.bookings.audreyinc.com— Phase A booking page (live; perproject_booking_product_evolution.md).
The three surfaces talk to overlapping data (same Shopify backend, same GSC + GA4) but serve different mental modes: health = “is the site OK,” dashboard = “what’s working,” bookings = “transact.”
Architecture — 5-stage experimentation pipeline
�STASH5�
Builds on project_ab_testing_cf_workers_parked.md (originally parked because dare lacked traffic — audrey has commerce, so the trigger is now met).
Stage 1: DEFINE — what’s being tested
Variants live in CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF KV namespace audrey_experiments. Each variant entry:
�STASH8�
Variants seeded via a small CLI helper (audrey_experiment_define.py) that writes to KV via Worker API. No UI for v1.
Stage 2: ASSIGN — traffic split + stickiness
CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Worker bound to audreyinc.com/* routes. For requests matching an active experiment’s surface:
- Read
_audrey_expcookie. If present and valid → serve that variant. - If absent → hash visitor ID (IP + UA → SHA-256, first 8 bytes) % 100; assign per weight; set sticky cookie (30-day TTL).
- Rewrite the response HTML using HTMLRewriter (CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF-native): swap hero copy / image / CTA based on variant.
For ad-creative tests: rewriting isn’t needed — the variant IS the ad creative; the Worker only needs to capture the gclid/fbclid and pass it through to Shopify’s add-to-cart endpoint with the variant tag.
For editorial tests (newsletter subject lines, social captions): the Worker isn’t involved — these are tested in the publishing tool itself (Mailchimp / Klaviyo / Buffer / whatever audrey uses). The pipeline just RECORDS the variant tag against the conversion outcome.
Stage 3: TRACK — conversion events
Per-event, fire to multiple destinations:
- Shopify Analytics — native conversion attribution (add_to_cart, purchase). Inject
_audrey_expinto Shopify cart attributes so it carries through the funnel. - GA4 — custom event with
experiment_id+variant_idparameters. Enables Looker Studio reporting. - Meta Pixel — same shape; required for Meta’s audience targeting to learn off the variants.
- Google Ads conversion endpoint — if the test surface is a Google Ad landing.
- CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Analytics Engine — our own time-series store for fast / cheap queries on traffic-split health (per-variant request count, error rate, etc.). Free tier covers our volume.
A small audrey_event_logger.js library in the Worker handles the fan-out — one event in → six destinations out. Idempotent (same event ID → no duplicate fires).
Stage 4: AGGREGATE — daily roll-ups
Logpush from CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Analytics Engine → BigQuery audrey.experiments.events_raw (free for CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF, ~$0 GCS storage at our volume).
Daily Cloud Run job (sibling to dare-devreports per the existing pattern):
- Reads raw events for the previous UTC day.
- Pivots into audrey.experiments.daily table: (experiment_id, variant_id, date, impressions, conversions, conversion_rate, lower_95ci, upper_95ci).
- Computes statistical significance per pair-wise comparison (chi-square or two-proportion z-test). Outputs winner_recommendation field: null | "control" | "<variant_id>" | "no_winner_yet".
Stage 5: REPORT — dashboard.audreyinc.com surface
Reuses every UI pattern shipped on dare’s dashboard today:
- Card-row of running experiments — each card a single experiment. Headline number = current leading variant’s conversion lift. Pill verdict:
RUNNING/CONCLUSIVE WINNER/INSUFFICIENT DATA/STOPPED. Sparkline = the time-series of conversion-rate delta over the test window. Click-through to per-experiment detail report. - Window toggle (24h / 7d / 30d / 90d) — same shape as Edge Health. Different windows reveal different stories: 24h surfaces today’s spend efficiency; 30d/90d show the engagement-arc Dan named in
feedback_window_toggles_are_high_value.md. - Smoothed-area trend chart (per
feedback_smoothed_area_chart_over_time.md) — overlay of total impressions × converted impressions per experiment. The visual idiom from NextDNS’s chart applied to audrey’s commerce data. - Editorial-test results table — newsletter / social / caption variant winners with confidence intervals.
- Ad-attribution row — Google + Meta spend × conversions × CAC per variant. Highlights underperforming ad creative that can be paused.
- Social-pattern panel — organic post variants per channel (Instagram / Threads / Twitter/X / Pinterest), engagement deltas. Cross-references with which posts drove site visits via UTM.
Authentication
dashboard.audreyinc.com should be CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF-Access gated (same pattern as dashboard.dare.co.uk). Dan + collaborators by email; service token for CLI / agent access.
Surfaces being tested — initial scope
When the build unparks, these are the priority experiment surfaces (~6 worth shipping in v1):
- Homepage hero copy variants — register/voice tests. The DARE-style 4-pillar framing won’t fit; audrey’s voice is more luxury-craft. Workshop the variant set with chat-Claude before defining.
- Product page descriptions — short-vs-medium-vs-long form. Connects to the GSC crawled-not-indexed cohort (19 audrey pages).
- Add-to-cart CTA microcopy — single-word changes (“Add to bag” vs “Order” vs “Get yours”). Small variants, big delta historically.
- Newsletter subject lines — variant pairs per send, tracked via Klaviyo or similar.
- Google Ads creative — same product, three copy variants. Win → scale spend; lose → kill.
- Meta Ads creative — Reels vs static vs carousel formats. Same product, three creative shapes.
Integration with existing audrey work
- Shopify UCP/llms.txt (per
project_shopify_ucp_audrey_native.md) — agent-discoverability surface is independent of the experimentation pipeline, but both feed the broader commerce flywheel. - Judge.me reviews (per
project_audrey_reviews_app_install.md) — review counts + ratings per product become an experiment variable (does adding reviews-snippet to a product card move conversion?). - Booking phase A (per
project_booking_product_evolution.md) — bookings page is a candidate experiment surface once it has traffic.
Why this gets built now (vs deferred indefinitely)
Per user_invest_in_content_quality_not_more_tooling.md: don’t build more tooling unless it directly supports content investment. This framework directly supports content investment — it’s how we know whether new content (rewritten product descriptions, new homepage hero, new ad creative) actually moves the needle. Without it, content investment is gut-feel; with it, content investment is measurable + iterable.
So: this is tooling in service of editorial work. The exception that proves the rule.
Cost envelope (verified-or-best-guess)
| Component | Cost shape |
|---|---|
| CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Workers requests | ~$0 at our volume (under free tier) |
| CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF KV reads/writes | ~$0 |
| CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Analytics Engine | ~$0 |
| Logpush → BigQuery | CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Logpush free; GCS storage ~$0.02/GB/mo (negligible at our volume) |
| BigQuery query cost | first 1 TB scanned/mo free; we’ll scan MB |
| Cloud Run daily aggregator | ~$0 at our run frequency |
| Meta + Google Ads conversion endpoints | free (just API calls) |
| Shopify analytics native | included in Shopify plan |
| Total recurring | ~$0/mo at audrey’s expected volume |
Open design questions
- HTMLRewriter vs JS-side variant rendering. Worker rewrite is faster + works without JS; JS-side allows richer client-side experiments (animation variants, interaction variants). v1 = HTMLRewriter; v2 = JS hook if needed.
- Statistical significance method. Frequentist (z-test) vs Bayesian. Frequentist is simpler + matches GA4’s native reporting. v1 = frequentist; v2 = Bayesian if we want richer “probability of winner” reads.
- Stickiness duration. 30 days is generous; 14 might be better for fast-cycling tests. Calibrate after v1.
- Holdout group. Should we reserve 10% of traffic as a control-of-controls so we can measure baseline drift independent of any active experiment?
- Editorial-test integration shape. Newsletter / social variants happen in OTHER tools (Klaviyo / Buffer). Do we ingest from their APIs, or do we just record variant tags + match conversions by UTM?
Sibling memories
project_ab_testing_cf_workers_parked.md— original A/B framework, parked for lack-of-traffic on dare. This sketch is the audrey-flavoured unpark.project_audrey_commerce_flywheel.md— 8-stage chain; this dashboard is the instrumentation layer for stages 4-8 (reviews → JSON-LD → SERP → programmatic recommendation → conversion).feedback_window_toggles_are_high_value.md— window-toggle pattern lifts into the experiment-status cards.feedback_smoothed_area_chart_over_time.md— visual idiom for the per-experiment trend chart.feedback_show_the_future_grey_it_out.md— pending-state cards for experiments not yet started.feedback_internal_seo.md— experiment IDs follow descriptive-slug naming (homepage_hero_2026_q2notexp_001).user_invest_in_content_quality_not_more_tooling.md— this framework qualifies as tooling-in-service-of-content per the exception clause.
Resume conditions
- ✅ First commerce-experimentation engagement starts (Dan begins testing a homepage variant or ad creative).
- ✅ Stages 4-8 of
project_audrey_commerce_flywheel.mdcome online (reviews → JSON-LD → SERP visibility creates A/B-testable surface). - ✅ Client engagement asks for a commerce-experimentation dashboard (lifts the pattern as portfolio-portable, like Site Health did today).
- Earliest qualifying trigger gets v1 build; subsequent triggers exercise the cross-site portability.