COUPLING metrics — day/week/month tracking of Opus-orchestrates-Sonnet-subagent ratio on claude.gf.cx

DARE.CO.UK · PARKED SKETCH · 2026-06-11

Mirrored from ~/.claude/.../memory/parked_sketch_coupling_metrics_day_week_month_2026-06-06.md. This is a design sketch parked for future build — read for context, not as a current deliverable.

As the model-tier split pattern (Opus reviews, Sonnet subagents heavy-lift) takes hold across happiness.gf.cx voicing + future editorial work, the visible signal of “is the operating model actually saving Opus tokens” is a COUPLING ratio — Sonnet-subagent share of total work — tracked over rolling 24h / 7d / 30d windows. Today’s pilot (Ch1 + Ch2 of Vitality Blueprint) added ~62k Sonnet tokens; they’re already in the 72% weekly envelope but invisible as Sonnet contribution. The metric makes the coupling visible and gives Dan a forward indicator of efficiency.

Sketch: add a COUPLING metric to claude_usage_dashboard.py (and surface on claude.gf.cx + the status.gf.cx Claude cockpit card) — the share of work delivered via Sonnet subagents vs Opus directly, tracked over rolling 24h / 7d / 30d windows.

Why (Dan 2026-06-06):

“we should track COUPLING metrics day-week-month” “looking forward to catching some sonnet-percentage from COUPLING models”

Today’s pilot validated the Opus-orchestrates-Sonnet-subagent operating model (memo: feedback_happiness_pipeline_model_tier_split_2026-06-06). Two Ch1+Ch2 Sonnet dispatches added ~62k Sonnet tokens to the weekly bucket — they’re already counted in the 72% envelope but rolled into a single aggregate number. Without a coupling view, there’s no signal that “the right kind of work is being done by the right model” — only that total spend is whatever it is.

The metric becomes a leading indicator: as coupling rises, Opus tokens stretch further for the same output volume. It’s the operational mirror of feedback_compounding_kb_is_the_kernel applied to model spend.

Metric definition:

STASH5

Higher is “more leveraged.” Track:

Day (24h rolling) — current-shift signal; volatile but immediate feedback
Week (7d rolling) — pairs with the weekly envelope; the load-bearing window
Month (30d rolling) — trend; should drift upward as the pattern compounds across more editorial work

Optional supporting metrics for the card:

Coupled tokens absolute (raw Sonnet subagent count) — how much heavy-lifting was outsourced
Cost-multiplier saved — (opus_$/tok / sonnet_$/tok) × coupled_tokens × opus_$/tok ≈ what Opus would have cost for the same output
Coupling streak — consecutive days where Sonnet share was ≥X% (gamifies the habit)

Data source:

Anthropic admin API already returns per-model cost breakdown — auto_calibrate_from_admin_api() in claude_usage_dashboard.py calls it for ratio calibration. The breakdown is there; the dashboard just doesn’t surface it. So this is parsing-and-rendering work, not API integration.

Build sketch:

Extend claude-usage-latest.json schema with per-model fields: json "models": { "opus": {"tokens_24h": ..., "tokens_7d": ..., "tokens_30d": ..., "cost_24h": ...}, "sonnet": {"tokens_24h": ..., "tokens_7d": ..., "tokens_30d": ..., "cost_24h": ...}, "haiku": { ... } }, "coupling_ratio_24h": 0.18, "coupling_ratio_7d": 0.09, "coupling_ratio_30d": 0.04
Render on claude.gf.cx as a sub-bar beneath the main weekly envelope: weekly · all models ████████████████░░░░ 72% opus ██████████████░░░░░░ 91% sonnet (coupled) ██░░░░░░░░░░░░░░░░░░ 9%
Add coupling card to the status.gf.cx hub: three numbers (24h / 7d / 30d), trend arrow, gamified streak.

Trigger to build: the cleanest time is just after Chapter 3 voicing (more Sonnet usage to render against). Don’t wait for “enough data” — the metric works at 1% as well as at 30%.

Anti-pattern: don’t optimise for the coupling ratio as a target. It’s a leading indicator of the operating model working, not a goal to maximise. Some work genuinely belongs to Opus end-to-end (final voice polish, architectural decisions, the editorial through-line). Coupling at 100% would mean Opus stopped reviewing — which would destroy quality.

Sister memories:

feedback_happiness_pipeline_model_tier_split_2026-06-06 — the operating model this metric makes visible
feedback_claude_usage_calibration_re_anchor_runbook_2026-06-06 — the same dashboard’s calibration story; coupling rollout shouldn’t disrupt the anchor logic
feedback_antifragile_tripwires_after_silent_regression_2026-06-05 — coupling-stall (sudden drop) is the kind of thing that warrants a tripwire once the metric is live

Source: parked_sketch_coupling_metrics_day_week_month_2026-06-06.md · Rendered 2026-06-11 09:38 EDT