6-quality content framework + audit/suggest/apply pipeline — parked 2026-05-17
DARE.CO.UK · PARKED SKETCH · 2026-05-18
Mirrored from ~/.claude/.../memory/project_content_quality_framework_parked.md. This is a design sketch parked for future build — read for context, not as a current deliverable.
Sketch parked at Dan’s request. For pages with no body image (253 today) + low-content pages broadly, audit each on 6 qualities, generate LLM-suggested improvements, human-approve. Phase 1 = audit script. Triggers when the next round of content improvement starts.
Dan, 2026-05-17, while triaging the SEO audit’s “253 pages with no body image” leftover: “we need to sketch a plan for ‘pages with no body image’ and figure out what to do. I have high confidence that a script can inspect the context, look for prior intent, and do some gap-analysis on what creatively can be added.”
The 6-quality framework — proposed today, parked for build:
Dan’s original triad (categorical — every page has these by definition): 1. Why — the motivation, what question the page answers 2. What — the specific content / insight 3. Framed-perspective — the writer’s POV / register
My 3 refinements (demonstrable — detect-and-count, auditable):
4. EVIDENCE — every assertion has a receipt: a load-bearing image, a primary-source quote, or a specific data anchor (named person + year + place). Without it, the page is assertion-shaped, not knowledge-shaped.
5. LATERAL — every page connects outward: ≥1 internal <a> to another dare.co.uk page AND ≥1 named entity outside the immediate topic. Activates the archive’s cross-network; orphaned pages dilute it.
6. CARRY-AWAY — every page leaves the reader with a quotable line — not a summary, an aphorism. The post-tab-close test: would a reader quote this at a dinner party?
Why mine improve Dan’s: - His 3 are categorical (existence-of, can’t fail). Mine are demonstrable (detect-and-count, can score). - Mine fit dare.co.uk’s archive nature specifically: evidence makes assertion knowable, lateral activates the cross-network, carry-away is the line that travels. - Each maps to a script signal — EVIDENCE is HTML-detectable, LATERAL is link/entity detection, CARRY-AWAY is an LLM call against the closing paragraph.
The 3-script pipeline (same audit-then-batch shape as today’s SEO work):
�STASH2�
Per-quality detection rules (audit Phase 1, mechanical):
| Quality | Signal |
|---|---|
| Why | First paragraph contains a verb + a “because” / “matters” / “happens when” framing OR the kicker contains the page’s claim. LLM-graded for nuance. |
| What | Body length > 80 words AND contains ≥2 specific nouns (entities, places, years). |
| Framed-perspective | First-person register OR distinctive opinion-marker phrases. LLM-graded — hardest to detect mechanically. |
| Evidence | ≥1 of: <img> in body, <blockquote>, OR specific anchor (proper noun + 4-digit year). |
| Lateral | ≥1 internal <a href="/..."> + ≥1 named entity from a different section/topic. |
| Carry-away | Closing paragraph contains a quotable line. LLM-graded; benchmark = the dare brand-voice four pillars. |
Cost estimate (Phase 2, LLM-graded): - 253 affected pages × ~1k input tokens + ~500 output tokens × Sonnet-4.6 pricing ≈ a few dollars per full-portfolio audit run. Negligible.
Breadth + depth audit run 2026-05-17 (dare_content_breadth_audit.py, registered as 7th hygiene check):
Surveyed all 697 article-shape pages. Bucketed by body word count:
| Bucket | Count | % of total | Missing body image |
|---|---|---|---|
| Stub (<30 words) | 244 | 35% | 104 (43%) |
| Brief (30-99) | 266 | 38% | 87 (33%) |
| Medium (100-399) | 169 | 24% | 57 (34%) |
| Long-form (400+) | 18 | 3% | 5 (28%) |
Key finding: the archive is fundamentally short-form. Median page = 56 words. The “253 pages missing body image” cohort from the SEO audit is mostly stubs (104) + briefs (87) — together 75% of the no-image total. The framework’s quality bar must shift per content-class:
- Stub class (244 pages, 35% of all) — single-thought capture. Evidence/Lateral/Carry-away tests don’t fit; this class is BY DEFINITION the stripped-down register. Maybe they want consolidation rather than improvement — themed collections that aggregate stubs into something coherent (e.g. observations 2010 / cinema-noted / etc).
- Brief class (266 pages, 38%) — short observational posts. EVIDENCE + LATERAL apply lightly; CARRY-AWAY is the right test for register.
- Medium class (169 pages, 24%) — actual short articles. Full 6-quality framework applies.
- Long-form class (18 pages, 3%) — load-bearing articles. The 5 long-form pages without body images are the real creative-content gap. This is where the SUGGEST phase has the highest leverage.
Section distribution (top 5): - observations: 213 pages, median 45 words - cinema: 85 pages, median 50 words - daring-acts: 66 pages, median 56 words - people: 45 pages, median 76 words - fine-arts: 44 pages, median 26 words
The data justifies a content-strategy distinction: archive-section pages (observations / cinema / fine-arts) are stub-shaped by intent; methods / culture / field-notes / people are where long-form work happens.
Phase 2 (suggest) targeting strategy (informed by the audit):
- First target: 5 long-form pages without body image. Highest leverage, smallest scope. LLM-suggest an image per page; Dan approves; ship.
- Second target: 57 medium-class pages without body image. Same framework, larger batch.
- Skip the stub + brief class entirely for image-suggestion. Instead, propose a consolidation script that groups them into themed collections (separate project —
project_dare_stub_consolidation_parked.mdcandidate).
Resume conditions — build out when: - The next round of content improvement begins (Dan flagged this as the foundational bedrock for dare.co.uk’s archive) - audrey/dogwood/gf.cx need the same pattern applied to their content - A specific page-quality concern surfaces in another audit run (e.g. SEO audit’s 253-pages-no-body-image leftover finds an obvious creative-content gap)
Inputs the suggester would need (Phase 2):
- The page’s current body text (HTML stripped)
- The page’s section (architecture / cinema / etc.) for register context
- 2-3 canonical “good” sibling pages from the same section (to model the register)
- The brand-voice memory (feedback_dare_brand_voice_four_pillars.md)
- Optional: the page’s git log for prior intent (when last edited, what was the commit message)
Sibling memories:
- feedback_audit_first_then_batch.md — methodology that makes this pipeline shape work.
- feedback_audit_tripwire_pattern.md — the daily-hygiene wiring this would join.
- feedback_dare_brand_voice_four_pillars.md — the LLM register / quality benchmark.
- feedback_dare_content_strategy.md — what content qualifies as load-bearing.
- feedback_reports_evaluate_visuals.md — visual-as-evidence ties into the EVIDENCE quality.
- user_lost_cats_stray_cats_archival_recovery.md — the recovery thread this would compound with (recover the image, then ensure it’s load-bearing not decorative).
Portfolio applicability: same shape works site-portably with --repo <path> flag. dogwood / audrey / gf.cx + future client work would inherit the framework, only the per-brand register and the canonical-sibling sample differ.