Purchase-archive enrichment — from hand-rules to Haiku Layer-2 in one session (2026-05-22)

DARE.CO.UK · SESSION REPORT · 2026-05-22

Today’s arc on the Amazon archive (5,476 items / 25 years / $228K lifetime spend): two cycles of hand-rule nit-picking → architectural pivot → one-shot Haiku 4.5 batch over 4,351 unique ASINs at ~$5. Plus the Audrey eras Wayback restoration. Plus 7 architectural principles banked. Status of long-running jobs at end of report.

TL;DR

Amazon Layer-1 (rule engine) — fixed three classes of false-positive (Shipment-Subtotal double-count, “compatible with” brand mis-attribution, generic “mouse” matching computer mice); added 14 brands + 10 categories; shipped the address-audit diagnostic as the foundation metric.
Pivot moment — Dan: “No point running hundreds of hours fixing when a single batch run will get us 95-98% correct.” Switched substrate from hand-rules to Haiku.
Amazon Layer-2 (LLM enrichment) — one-shot Haiku 4.5 batch with rich JSON schema (10+ structured fields per item). In flight at time of writing; 898 of 4,351 done with 0.91 avg confidence, lifecycle/use_context dimensions opening up. Total cost projected ~$5.50.
Audrey eras — diagnosed why 2017-2025 snapshots were rendering broken (Wayback was returning HTML wrappers instead of raw CSS/JS); fixed downloader to use cs_/js_/im_/id_ flavor modifiers + content-type validation. 2017 redeployed; 2018-2025 re-fetching in background.
7 architectural principles banked to memory — see end of report.

The Amazon arc — three cycles of nit-pick → rule → re-render → archive improves

Cycle 1 — Shipment-Subtotal double-count (€125.95 × 4 = €503.80 bug)

Dan flagged order 403-6740866-2197145 — a Dublin shipment of 4 items showing items_subtotal as €503.80. The actual shipment was €125.95.

Root cause: Amazon’s GDPR Order History.csv columns Shipment Item Subtotal / Shipment Item Subtotal Tax are shipment-level totals duplicated on every item row, not per-item amounts. A 4-item shipment has 4 rows each carrying the same €125.95, so naïve sum() multiplies by item count.

Fix: in pa_amazon_gdpr_process.py aggregate_orders(), compute line_subtotal = unit_price × max(qty, 1) per item, sum those instead. Similar for tax_total. Renderer at pa_purchases_dashboard.py:882 now displays line_subtotal instead of the shipment-level repeat.

Banked as: feedback_amazon_gdpr_shipment_columns_are_repeated.md with a sanity check (if items_subtotal == shipment_subtotal × item_count, the bug is present).

Cycle 2 — “compatible with Brand X, Y, Z” false-positives

Dan flagged order 114-3857118-8876251 — espresso cleaning tablets tagged Philips because the product name read “compatible with Breville, Bosch, Philips, Casabrews, KitchenAid, De’Longhi, Ninja”. Generic third-party accessory. Wrong brand.

Root cause: Layer-1 keyword matching has no context awareness. Any product mentioning “philips” gets tagged Philips.

Fix: added _brand_in_compat_context() to apply_tags(). Suppresses a brand match when every occurrence of the keyword falls inside a 80-char window after compatible with, compatible for, fits, fit for, works with, replacement for, substitute for, alternative to, for use with, designed for, made for, to fit, to replace, equivalent to, oem replacement.

Verified against: - “Cleaning Tablets … compatible with Breville, Bosch, Philips, Casabrews…” → no brand tag ✓ - “Philips Handheld Steamer 1000 Series” → Philips brand kept ✓ - “Replacement Filter for use with Brita Maxtra+ Pitchers” → no Brita tag ✓ - “Brita Maxtra+ Replacement Water Filter Cartridges” → Brita brand kept ✓

Cycle 3 — “mouse” matched every Logitech computer mouse

Dan flagged order 114-4457192-8701066 — Logitech B100 wired mouse tagged Pest Control because the Pest Control category had "mouse" as a keyword.

Root cause: keyword too broad. Computer mouse / mouse trap / mouse poison all share the word.

Fix: tightened Pest Control to require specific pest variants — "mouse trap", "mousetrap", "mouse bait", "mouse poison", "rodent deterrent", "pest deterrent", "insect deterrent", "fly trap", "fruit fly trap", "moth trap", "rodenticide". Removed bare "mouse".

Added Tech · Input Device category ("wired mouse", "wireless mouse", "bluetooth mouse", "gaming mouse", "computer mouse", "logitech mouse", "magic mouse", "trackball", "trackpad", "keyboard,", "mechanical keyboard", "wireless keyboard", "bluetooth keyboard") so computer peripherals have a real home.

Cumulative Layer-1 rule additions today

14 new brands: Acer, Dualit, VonShef, Russell Hobbs, Silentnight, Hama, Victorinox, Fender, Helly Hansen, Continental Tyres, Bulk, Sevenhills Wholefoods, Drayton, TOAKS, Nokian, Vamvo, OKI, Brother, HP, Filtrete, Honeywell, Brita
10+ new categories: Tech · Laptop, Tech · TV, Tech · TV mount, Tech · Audio accessories, Tech · Projector, Tech · Input Device, Home · Furnishing, Home · Small Appliances, Home · Consumables, Home · HVAC controls, Apparel · Accessories, Outdoor · Camping, Equipment · Ladders, Vehicle · Winter Tires, Office · Consumables
Expanded existing: Home · Lighting (downlight, under cabinet light, ceiling lights), Home · Bedding (mattress topper, mattress pad), Cycling (bike parts, clipless pedal, pedal plate, bike tour), Hardware · DIY (grout reviver, tile adhesive, grout), Home · Bathroom (kitchen tap, kitchen mixer, shower caddy)
Plural tolerance: _kw_matches now uses \b<kw>(?:s|es)?\b so “ceiling light” matches “ceiling lights” without manual duplication

Per-bucket Misc% improvements after rule passes

Ship-to	Before today	After Layer-1
London	78% Misc	35% Misc (large rule-additions pass)
Solebury PO Box	n/a	28%
New Hope	n/a	52%
Brooklyn	n/a	68%
Dublin	n/a	70%

Brooklyn and Dublin are next targets — Layer-2 batch results will inform whether to add more Layer-1 rules or let Layer-2 absorb the long tail.

Address-audit diagnostic — the FIRST metric

Banked as feedback_address_audit_first_diagnostic.md. Live at /amazon/_diagnostics/address-audit.html.

Principle: every order’s ship-to bucket assignment must be y/n verifiable BEFORE any tag-quality measurement. If bucket is wrong, every downstream metric (Misc%, brand coverage, ship-to totals) is corrupted.

Three confidence tiers per order: - HIGH — clear single-keyword match ("6279 GREENHILL", "SOLEBURY", "BROOKLYN") - MEDIUM — compound AND-logic match ("LONDON" + UK indicator) - LOW / fallback — no keyword matched; suspicious, needs eyeballing

Page renders: per-bucket confidence summary table (green ≥95%, amber ≥80%, red <80%) plus a full list of every low-confidence order with raw address visible, so each is a candidate for adding a new bucket keyword.

The architectural pivot — Layer-2 Haiku batch

After three Layer-1 cycles cleared the high-frequency clusters, the next iteration would have been ~5%/cycle = hours per percentage point. Dan called the inflection:

“No point running hundreds of hours fixing when a single batch run will get us I assume 95-98% correct, with layers of JSON friendly syntax for you to expand and grow the connected-threads.”

Then:

“Its the conditional-logic that needs strengthening. It’s difficult problem-assessment as the signal-to-noise can be harder to infer without solid secondary look-ups.”

Together: stop hand-iterating, switch substrate. One Haiku call per unique ASIN, rich JSON output.

Schema designed for “connected-threads”

STASH73

Each new field opens a new dimension of analysis: - lifecycle → “what subscriptions / consumables am I running?” - use_context → “cost per room / per life-area” - compat_brands → “I keep buying things compatible with Brand X — should I switch?” - confidence_overall → flag the 2-5% residue for human review

Live results (898 of 4,351 in)

Quality: avg confidence 0.91, only 7 of 898 below 0.7. Two of those self-flagged at 0.05: “Unknown Product” and “Heat (Ambiguous)” — the system knows what it doesn’t know.

Top-level category distribution:

STASH78

Lifecycle distribution (the new substrate):

STASH79

The ~40% consumable + replacement-part bucket is genuinely new signal — Layer-1 rules had no way to surface this.

Use-context distribution (new orthogonal axis):

STASH80

compat_brands captured on 171 of 898 items — exactly the false-positive class Layer-1 anti-pattern was suppressing: - USB Wall Charger → brand=Anker, compat=[Apple, Samsung, HTC, Motorola, LG, Google] - USB-C to USB-A Cable → brand=CableCreation, compat=[Apple, Google] - Hybrid Protective Phone Case → brand=SupCase, compat=[Palm] - Slatwall Hooks Set → brand=Intpro, compat=[Generic Slatwall Systems]

Top brands (high-confidence): Apple 9, Stella & Chewy’s 8, Philips 7, Bona 6, LEGO 6, Lutron 5, Miele 4, Anker 4, Lavazza 4, The Honest Kitchen 4.

Dan’s reaction: “It’s doing amazing. Its high-value signal intelligence.”

Audrey eras — Wayback restoration

Diagnosis

The archive.audreyinc.com/versions/shopify-original-2011-2026/eras/ page renders 15 era snapshots (one per year 2011-2025) via iframes pointing at local snapshot.html + downloaded assets per year. Dan flagged: only 2016 and 2024 render visually correctly.

Investigation: 2017-2025 era folders had ~25-31 “asset” files each, but file reported every one as HTML document text. The original downloader was fetching CSS/JS/image URLs without Wayback’s flavor modifiers, so Wayback returned its wrapper HTML page (with toolbar/playback header) instead of the raw content. Files saved under random sha1 hashes, browser couldn’t use any of them.

Fix

audrey_wayback_era_fetch.py updates:

find_asset_urls now tracks asset kind — returns (url, wb_url, kind) triples where kind ∈ {css, js, image, font, other}. Determined by tag (<script src> → js, <img> → image, <link rel="stylesheet"> → css), then by URL extension, then by <link rel="preload" as="...">.
apply_wb_modifier injects Wayback flavor modifier before fetch: - cs_ for CSS — forces raw text/css - js_ for JS — forces raw application/javascript - im_ for image — forces raw image bytes - id_ for everything else (identity, original Content-Type)
looks_like_html validates response — if we asked for CSS/JS/image and got HTML back, skip + don’t pollute the assets dir.
Tightened <link> tag parser — only follows rel="stylesheet|icon|shortcut icon|apple-touch-icon|mask-icon|manifest|preload|prefetch". Skips canonical/alternate/dns-prefetch/next/prev (those are navigation pointers, not assets).
JSON-embedded URL extractor now only matches URLs ending in image extensions (.png|.jpg|.gif|.webp|.svg|.ico|.bmp|.avif). Was previously fetching every embedded /web/<ts>/... URL, most of which were navigation refs returning HTML pages saved as fake assets.
_absolutize_wayback handles protocol-relative URLs (//web.archive.org/...) — was previously dropped, missing the main CSS files on modern Shopify themes.
local_asset_path falls back to kind-derived extension (.css/.js/.png/.woff2) — so CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Pages serves the right Content-Type instead of application/octet-stream.

State

2017 deployed; got 8 real assets (SVGs, GIF, JS), 12 files 404 directly from Wayback (those were never archived — the main styles.scss.css is one of these, so 2017 may still render mostly unstyled even with the fix). Honest limit of the source.
2018-2025 re-fetching in background; expected to land with higher real-asset counts since modern Shopify themes are better preserved on Wayback.

Pending sweep

Some 2011-2015 eras have mixed-content issues — their snapshot.html references http://www.audreyinc.com/... and http://fonts.googleapis.com/... that browsers block when iframe loads via HTTPS. Banked as next-sweep work; would rewrite remaining http:// → https:// or localize the external CSS via Wayback.

Hero image crop — small but representative

Dan: “how’s your cropping skills of photographs? Can we trim the picture hero down. Too much house.”

Cropped pa/vehicles/documents/bmw-2013/2026-01-17-winter-driveway-snow.jpg: removed top 700px of dormers + roof; kept garage doors as architectural backdrop. 1500x2000 → 1500x1300; 1MB → 324KB (PIL, quality 85, progressive JPEG). BMW + winter tires + driveway is now the visual story.

Cross-references Dan’s earlier banked rule: “iPhone photo orientation — don’t auto-rotate, just resize/scale/crop” — explicit authority granted for crop+resize moves, no rotation.

Principles banked to memory today

feedback_amazon_gdpr_shipment_columns_are_repeated.md — Shipment Item Subtotal columns are shipment-level, repeated per item row. Use unit_price × qty for per-line math.
feedback_small_batch_iteration_tempo.md — quality improvements compound faster in small batches (20 items) than whole-archive runs. Dan’s nit-picking IS small-batch iteration in action.
feedback_small_bucket_as_tagging_diagnostic.md — bucket Misc% >25% in a small bucket = real coverage gap. Three lifts (rule expansion, plural matcher, LLM long-tail). Bucket-level Misc% is the diagnostic signal.
feedback_retagging_enrichment_scripts_no_sidecar_bloat.md — three-layer architecture: rule files (central, audit-friendly) → render-time application (idempotent) → sidecars (hand-edits ONLY, never rule-derived facts). Improve rules once, archive re-tags.
feedback_iterative_engine_compounding_rate.md — Dan-named rate: every rule-pass drops Misc% 15-30%. Diminishing returns plateau ~5% means switch substrate. Multi-cycle nit-picking IS what builds resource-intelligence.
feedback_address_audit_first_diagnostic.md — every order’s bucket assignment must be y/n verifiable BEFORE tag-quality. If bucket is wrong, every downstream metric is corrupted.
feedback_two_layer_decision_architecture.md — Layer 1 (rules, fast/free/blind) + Layer 2 (LLM/external taxonomy, slow/cheap/contextual). Layer 2 runs only on items where L1 returns Misc or low-confidence. Disagreements are diagnostic.
feedback_one_batch_llm_beats_hand_iteration.md — after 3 rule cycles, hand-rules give 5%/cycle = bad ROI. One Haiku batch = ~$5 + 95-98% coverage + rich JSON. Don’t keep iterating past the inflection.
feedback_high_value_signal_intelligence.md — Dan’s name for the rich-JSON LLM output. Three properties stacked: high-value (10+ structured fields each opens a new analysis axis), signal not raw data (LLM does the disambiguation), intelligence (confidence + rationale = system knows what it knows).

Status of long-running jobs (snapshot at report write-time)

Layer-2 Haiku batch (b7tfu0fbl) — running. 898 of 4,351 done. Avg confidence 0.91. ETA ~28 more minutes. Output: pa/amazon/_data/layer2_tags.jsonl.
Audrey eras 2018-2025 re-fetch (bd7h1gk16) — running. ETA ~25 more minutes.
pa.gf.cx — deployed at 1c5dd0d5 (hero crop landed after the address-audit + Layer-1 fixes commit at 850657ab).
audrey-archive — deployed at c71bb40 (2017 era fix).

Next moves (after batch lands)

Render Layer-2 data into Amazon dashboard — load layer2_tags.jsonl keyed by ASIN; per-order display gets product_type + Google-taxonomy path + brand+confidence + compat_brands as separate chips + lifecycle/use_context badges. New diagnostic: Layer-1 vs Layer-2 disagreements audit page.
New landing pages per Layer-2 axis — /amazon/lifecycle/consumable/, /amazon/use-context/garden/, etc. Each becomes a new way to slice the substrate.
eBay ingest with Layer-2 wired in from day one — eBay’s chaotic titles + structured category_id + item_specifics → ideal Haiku input. Build after Dan’s eBay GDPR export arrives.
Layer-2 disagreement-driven Layer-1 expansion — items where L1=Misc but L2=specific become candidate keyword additions; items where L1=BrandX but L2=BrandY become candidate anti-patterns.

Session captured by Claude. Live state: pa.gf.cx Amazon dashboard renders Layer-1 result + audit diagnostic; Layer-2 batch + Wayback re-fetch continuing in background. Update this report (or follow-up dated) once both jobs complete.

Source: dare_session_report_purchases_layer2_2026-05-22.md · Rendered 2026-05-22 21:48