Purchase-archive enrichment — from hand-rules to Haiku Layer-2 in one session (2026-05-22)
DARE.CO.UK · SESSION REPORT · 2026-05-22
Today’s arc on the Amazon archive (5,476 items / 25 years / $228K lifetime spend): two cycles of hand-rule nit-picking → architectural pivot → one-shot Haiku 4.5 batch over 4,351 unique ASINs at ~$5. Plus the Audrey eras Wayback restoration. Plus 7 architectural principles banked. Status of long-running jobs at end of report.
TL;DR
- Amazon Layer-1 (rule engine) — fixed three classes of false-positive (Shipment-Subtotal double-count, “compatible with” brand mis-attribution, generic “mouse” matching computer mice); added 14 brands + 10 categories; shipped the address-audit diagnostic as the foundation metric.
- Pivot moment — Dan: “No point running hundreds of hours fixing when a single batch run will get us 95-98% correct.” Switched substrate from hand-rules to Haiku.
- Amazon Layer-2 (LLM enrichment) — one-shot Haiku 4.5 batch with rich JSON schema (10+ structured fields per item). In flight at time of writing; 898 of 4,351 done with 0.91 avg confidence, lifecycle/use_context dimensions opening up. Total cost projected ~$5.50.
- Audrey eras — diagnosed why 2017-2025 snapshots were rendering broken (Wayback was returning HTML wrappers instead of raw CSS/JS); fixed downloader to use
cs_/js_/im_/id_flavor modifiers + content-type validation. 2017 redeployed; 2018-2025 re-fetching in background. - 7 architectural principles banked to memory — see end of report.
The Amazon arc — three cycles of nit-pick → rule → re-render → archive improves
Cycle 1 — Shipment-Subtotal double-count (€125.95 × 4 = €503.80 bug)
Dan flagged order 403-6740866-2197145 — a Dublin shipment of 4 items showing items_subtotal as €503.80. The actual shipment was €125.95.
Root cause: Amazon’s GDPR Order History.csv columns Shipment Item Subtotal / Shipment Item Subtotal Tax are shipment-level totals duplicated on every item row, not per-item amounts. A 4-item shipment has 4 rows each carrying the same €125.95, so naïve sum() multiplies by item count.
Fix: in pa_amazon_gdpr_process.py aggregate_orders(), compute line_subtotal = unit_price × max(qty, 1) per item, sum those instead. Similar for tax_total. Renderer at pa_purchases_dashboard.py:882 now displays line_subtotal instead of the shipment-level repeat.
Banked as: feedback_amazon_gdpr_shipment_columns_are_repeated.md with a sanity check (if items_subtotal == shipment_subtotal × item_count, the bug is present).
Cycle 2 — “compatible with Brand X, Y, Z” false-positives
Dan flagged order 114-3857118-8876251 — espresso cleaning tablets tagged Philips because the product name read “compatible with Breville, Bosch, Philips, Casabrews, KitchenAid, De’Longhi, Ninja”. Generic third-party accessory. Wrong brand.
Root cause: Layer-1 keyword matching has no context awareness. Any product mentioning “philips” gets tagged Philips.
Fix: added _brand_in_compat_context() to apply_tags(). Suppresses a brand match when every occurrence of the keyword falls inside a 80-char window after compatible with, compatible for, fits, fit for, works with, replacement for, substitute for, alternative to, for use with, designed for, made for, to fit, to replace, equivalent to, oem replacement.
Verified against: - “Cleaning Tablets … compatible with Breville, Bosch, Philips, Casabrews…” → no brand tag ✓ - “Philips Handheld Steamer 1000 Series” → Philips brand kept ✓ - “Replacement Filter for use with Brita Maxtra+ Pitchers” → no Brita tag ✓ - “Brita Maxtra+ Replacement Water Filter Cartridges” → Brita brand kept ✓
Cycle 3 — “mouse” matched every Logitech computer mouse
Dan flagged order 114-4457192-8701066 — Logitech B100 wired mouse tagged Pest Control because the Pest Control category had "mouse" as a keyword.
Root cause: keyword too broad. Computer mouse / mouse trap / mouse poison all share the word.
Fix: tightened Pest Control to require specific pest variants — "mouse trap", "mousetrap", "mouse bait", "mouse poison", "rodent deterrent", "pest deterrent", "insect deterrent", "fly trap", "fruit fly trap", "moth trap", "rodenticide". Removed bare "mouse".
Added Tech · Input Device category ("wired mouse", "wireless mouse", "bluetooth mouse", "gaming mouse", "computer mouse", "logitech mouse", "magic mouse", "trackball", "trackpad", "keyboard,", "mechanical keyboard", "wireless keyboard", "bluetooth keyboard") so computer peripherals have a real home.
Cumulative Layer-1 rule additions today
- 14 new brands: Acer, Dualit, VonShef, Russell Hobbs, Silentnight, Hama, Victorinox, Fender, Helly Hansen, Continental Tyres, Bulk, Sevenhills Wholefoods, Drayton, TOAKS, Nokian, Vamvo, OKI, Brother, HP, Filtrete, Honeywell, Brita
- 10+ new categories: Tech · Laptop, Tech · TV, Tech · TV mount, Tech · Audio accessories, Tech · Projector, Tech · Input Device, Home · Furnishing, Home · Small Appliances, Home · Consumables, Home · HVAC controls, Apparel · Accessories, Outdoor · Camping, Equipment · Ladders, Vehicle · Winter Tires, Office · Consumables
- Expanded existing: Home · Lighting (downlight, under cabinet light, ceiling lights), Home · Bedding (mattress topper, mattress pad), Cycling (bike parts, clipless pedal, pedal plate, bike tour), Hardware · DIY (grout reviver, tile adhesive, grout), Home · Bathroom (kitchen tap, kitchen mixer, shower caddy)
- Plural tolerance:
_kw_matchesnow uses\b<kw>(?:s|es)?\bso “ceiling light” matches “ceiling lights” without manual duplication
Per-bucket Misc% improvements after rule passes
| Ship-to | Before today | After Layer-1 |
|---|---|---|
| London | 78% Misc | 35% Misc (large rule-additions pass) |
| Solebury PO Box | n/a | 28% |
| New Hope | n/a | 52% |
| Brooklyn | n/a | 68% |
| Dublin | n/a | 70% |
Brooklyn and Dublin are next targets — Layer-2 batch results will inform whether to add more Layer-1 rules or let Layer-2 absorb the long tail.
Address-audit diagnostic — the FIRST metric
Banked as feedback_address_audit_first_diagnostic.md. Live at /amazon/_diagnostics/address-audit.html.
Principle: every order’s ship-to bucket assignment must be y/n verifiable BEFORE any tag-quality measurement. If bucket is wrong, every downstream metric (Misc%, brand coverage, ship-to totals) is corrupted.
Three confidence tiers per order:
- HIGH — clear single-keyword match ("6279 GREENHILL", "SOLEBURY", "BROOKLYN")
- MEDIUM — compound AND-logic match ("LONDON" + UK indicator)
- LOW / fallback — no keyword matched; suspicious, needs eyeballing
Page renders: per-bucket confidence summary table (green ≥95%, amber ≥80%, red <80%) plus a full list of every low-confidence order with raw address visible, so each is a candidate for adding a new bucket keyword.
The architectural pivot — Layer-2 Haiku batch
After three Layer-1 cycles cleared the high-frequency clusters, the next iteration would have been ~5%/cycle = hours per percentage point. Dan called the inflection:
“No point running hundreds of hours fixing when a single batch run will get us I assume 95-98% correct, with layers of JSON friendly syntax for you to expand and grow the connected-threads.”
Then:
“Its the conditional-logic that needs strengthening. It’s difficult problem-assessment as the signal-to-noise can be harder to infer without solid secondary look-ups.”
Together: stop hand-iterating, switch substrate. One Haiku call per unique ASIN, rich JSON output.