COUPLING metrics — day/week/month tracking of Opus-orchestrates-Sonnet-subagent ratio on claude.gf.cx
DARE.CO.UK · PARKED SKETCH · 2026-06-07
Mirrored from ~/.claude/.../memory/parked_sketch_coupling_metrics_day_week_month_2026-06-06.md. This is a design sketch parked for future build — read for context, not as a current deliverable.
As the model-tier split pattern (Opus reviews, Sonnet subagents heavy-lift) takes hold across happiness.gf.cx voicing + future editorial work, the visible signal of “is the operating model actually saving Opus tokens” is a COUPLING ratio — Sonnet-subagent share of total work — tracked over rolling 24h / 7d / 30d windows. Today’s pilot (Ch1 + Ch2 of Vitality Blueprint) added ~62k Sonnet tokens; they’re already in the 72% weekly envelope but invisible as Sonnet contribution. The metric makes the coupling visible and gives Dan a forward indicator of efficiency.
Sketch: add a COUPLING metric to claude_usage_dashboard.py (and surface on claude.gf.cx + the status.gf.cx Claude cockpit card) — the share of work delivered via Sonnet subagents vs Opus directly, tracked over rolling 24h / 7d / 30d windows.
Why (Dan 2026-06-06):
“we should track COUPLING metrics day-week-month” “looking forward to catching some sonnet-percentage from COUPLING models”
Today’s pilot validated the Opus-orchestrates-Sonnet-subagent operating model (memo: feedback_happiness_pipeline_model_tier_split_2026-06-06). Two Ch1+Ch2 Sonnet dispatches added ~62k Sonnet tokens to the weekly bucket — they’re already counted in the 72% envelope but rolled into a single aggregate number. Without a coupling view, there’s no signal that “the right kind of work is being done by the right model” — only that total spend is whatever it is.
The metric becomes a leading indicator: as coupling rises, Opus tokens stretch further for the same output volume. It’s the operational mirror of feedback_compounding_kb_is_the_kernel applied to model spend.
Metric definition:
�STASH5�
Higher is “more leveraged.” Track:
- Day (24h rolling) — current-shift signal; volatile but immediate feedback
- Week (7d rolling) — pairs with the weekly envelope; the load-bearing window
- Month (30d rolling) — trend; should drift upward as the pattern compounds across more editorial work
Optional supporting metrics for the card:
- Coupled tokens absolute (raw Sonnet subagent count) — how much heavy-lifting was outsourced
- Cost-multiplier saved —
(opus_$/tok / sonnet_$/tok) × coupled_tokens × opus_$/tok≈ what Opus would have cost for the same output - Coupling streak — consecutive days where Sonnet share was ≥X% (gamifies the habit)
Data source:
Anthropic admin API already returns per-model cost breakdown — auto_calibrate_from_admin_api() in claude_usage_dashboard.py calls it for ratio calibration. The breakdown is there; the dashboard just doesn’t surface it. So this is parsing-and-rendering work, not API integration.
Build sketch:
-
Extend
claude-usage-latest.jsonschema with per-model fields:json "models": { "opus": {"tokens_24h": ..., "tokens_7d": ..., "tokens_30d": ..., "cost_24h": ...}, "sonnet": {"tokens_24h": ..., "tokens_7d": ..., "tokens_30d": ..., "cost_24h": ...}, "haiku": { ... } }, "coupling_ratio_24h": 0.18, "coupling_ratio_7d": 0.09, "coupling_ratio_30d": 0.04 -
Render on claude.gf.cx as a sub-bar beneath the main weekly envelope:
weekly · all models ████████████████░░░░ 72% opus ██████████████░░░░░░ 91% sonnet (coupled) ██░░░░░░░░░░░░░░░░░░ 9% -
Add coupling card to the status.gf.cx hub: three numbers (24h / 7d / 30d), trend arrow, gamified streak.
Trigger to build: the cleanest time is just after Chapter 3 voicing (more Sonnet usage to render against). Don’t wait for “enough data” — the metric works at 1% as well as at 30%.
Anti-pattern: don’t optimise for the coupling ratio as a target. It’s a leading indicator of the operating model working, not a goal to maximise. Some work genuinely belongs to Opus end-to-end (final voice polish, architectural decisions, the editorial through-line). Coupling at 100% would mean Opus stopped reviewing — which would destroy quality.
Sister memories:
feedback_happiness_pipeline_model_tier_split_2026-06-06— the operating model this metric makes visiblefeedback_claude_usage_calibration_re_anchor_runbook_2026-06-06— the same dashboard’s calibration story; coupling rollout shouldn’t disrupt the anchor logicfeedback_antifragile_tripwires_after_silent_regression_2026-06-05— coupling-stall (sudden drop) is the kind of thing that warrants a tripwire once the metric is live