6-quality content framework + audit/suggest/apply pipeline — parked 2026-05-17

DARE.CO.UK · PARKED SKETCH · 2026-05-18

Mirrored from ~/.claude/.../memory/project_content_quality_framework_parked.md. This is a design sketch parked for future build — read for context, not as a current deliverable.

Sketch parked at Dan’s request. For pages with no body image (253 today) + low-content pages broadly, audit each on 6 qualities, generate LLM-suggested improvements, human-approve. Phase 1 = audit script. Triggers when the next round of content improvement starts.

Dan, 2026-05-17, while triaging the SEO audit’s “253 pages with no body image” leftover: “we need to sketch a plan for ‘pages with no body image’ and figure out what to do. I have high confidence that a script can inspect the context, look for prior intent, and do some gap-analysis on what creatively can be added.”

The 6-quality framework — proposed today, parked for build:

Dan’s original triad (categorical — every page has these by definition): 1. Why — the motivation, what question the page answers 2. What — the specific content / insight 3. Framed-perspective — the writer’s POV / register

My 3 refinements (demonstrable — detect-and-count, auditable): 4. EVIDENCE — every assertion has a receipt: a load-bearing image, a primary-source quote, or a specific data anchor (named person + year + place). Without it, the page is assertion-shaped, not knowledge-shaped. 5. LATERAL — every page connects outward: ≥1 internal <a> to another dare.co.uk page AND ≥1 named entity outside the immediate topic. Activates the archive’s cross-network; orphaned pages dilute it. 6. CARRY-AWAY — every page leaves the reader with a quotable line — not a summary, an aphorism. The post-tab-close test: would a reader quote this at a dinner party?

Why mine improve Dan’s: - His 3 are categorical (existence-of, can’t fail). Mine are demonstrable (detect-and-count, can score). - Mine fit dare.co.uk’s archive nature specifically: evidence makes assertion knowable, lateral activates the cross-network, carry-away is the line that travels. - Each maps to a script signal — EVIDENCE is HTML-detectable, LATERAL is link/entity detection, CARRY-AWAY is an LLM call against the closing paragraph.

The 3-script pipeline (same audit-then-batch shape as today’s SEO work):

�STASH2�

Per-quality detection rules (audit Phase 1, mechanical):

Quality	Signal
Why	First paragraph contains a verb + a “because” / “matters” / “happens when” framing OR the kicker contains the page’s claim. LLM-graded for nuance.
What	Body length > 80 words AND contains ≥2 specific nouns (entities, places, years).
Framed-perspective	First-person register OR distinctive opinion-marker phrases. LLM-graded — hardest to detect mechanically.
Evidence	≥1 of: `<img>` in body, `<blockquote>`, OR specific anchor (proper noun + 4-digit year).
Lateral	≥1 internal `<a href="/...">` + ≥1 named entity from a different section/topic.
Carry-away	Closing paragraph contains a quotable line. LLM-graded; benchmark = the dare brand-voice four pillars.

Cost estimate (Phase 2, LLM-graded): - 253 affected pages × ~1k input tokens + ~500 output tokens × Sonnet-4.6 pricing ≈ a few dollars per full-portfolio audit run. Negligible.

Breadth + depth audit run 2026-05-17 (dare_content_breadth_audit.py, registered as 7th hygiene check):

Surveyed all 697 article-shape pages. Bucketed by body word count:

Bucket	Count	% of total	Missing body image
Stub (<30 words)	244	35%	104 (43%)
Brief (30-99)	266	38%	87 (33%)
Medium (100-399)	169	24%	57 (34%)
Long-form (400+)	18	3%	5 (28%)

Key finding: the archive is fundamentally short-form. Median page = 56 words. The “253 pages missing body image” cohort from the SEO audit is mostly stubs (104) + briefs (87) — together 75% of the no-image total. The framework’s quality bar must shift per content-class:

Stub class (244 pages, 35% of all) — single-thought capture. Evidence/Lateral/Carry-away tests don’t fit; this class is BY DEFINITION the stripped-down register. Maybe they want consolidation rather than improvement — themed collections that aggregate stubs into something coherent (e.g. observations 2010 / cinema-noted / etc).
Brief class (266 pages, 38%) — short observational posts. EVIDENCE + LATERAL apply lightly; CARRY-AWAY is the right test for register.
Medium class (169 pages, 24%) — actual short articles. Full 6-quality framework applies.
Long-form class (18 pages, 3%) — load-bearing articles. The 5 long-form pages without body images are the real creative-content gap. This is where the SUGGEST phase has the highest leverage.

Section distribution (top 5): - observations: 213 pages, median 45 words - cinema: 85 pages, median 50 words - daring-acts: 66 pages, median 56 words - people: 45 pages, median 76 words - fine-arts: 44 pages, median 26 words

The data justifies a content-strategy distinction: archive-section pages (observations / cinema / fine-arts) are stub-shaped by intent; methods / culture / field-notes / people are where long-form work happens.

Phase 2 (suggest) targeting strategy (informed by the audit):

First target: 5 long-form pages without body image. Highest leverage, smallest scope. LLM-suggest an image per page; Dan approves; ship.
Second target: 57 medium-class pages without body image. Same framework, larger batch.
Skip the stub + brief class entirely for image-suggestion. Instead, propose a consolidation script that groups them into themed collections (separate project — project_dare_stub_consolidation_parked.md candidate).

Resume conditions — build out when: - The next round of content improvement begins (Dan flagged this as the foundational bedrock for dare.co.uk’s archive) - audrey/dogwood/gf.cx need the same pattern applied to their content - A specific page-quality concern surfaces in another audit run (e.g. SEO audit’s 253-pages-no-body-image leftover finds an obvious creative-content gap)

Inputs the suggester would need (Phase 2): - The page’s current body text (HTML stripped) - The page’s section (architecture / cinema / etc.) for register context - 2-3 canonical “good” sibling pages from the same section (to model the register) - The brand-voice memory (feedback_dare_brand_voice_four_pillars.md) - Optional: the page’s git log for prior intent (when last edited, what was the commit message)

Sibling memories: - feedback_audit_first_then_batch.md — methodology that makes this pipeline shape work. - feedback_audit_tripwire_pattern.md — the daily-hygiene wiring this would join. - feedback_dare_brand_voice_four_pillars.md — the LLM register / quality benchmark. - feedback_dare_content_strategy.md — what content qualifies as load-bearing. - feedback_reports_evaluate_visuals.md — visual-as-evidence ties into the EVIDENCE quality. - user_lost_cats_stray_cats_archival_recovery.md — the recovery thread this would compound with (recover the image, then ensure it’s load-bearing not decorative).

Portfolio applicability: same shape works site-portably with --repo <path> flag. dogwood / audrey / gf.cx + future client work would inherit the framework, only the per-brand register and the canonical-sibling sample differ.

Source: parked_sketch_content_quality_framework_2026-05-17.md · Rendered 2026-05-18 12:53