Parked sketch — audreylam Workspace ingest needs disk + token-hold rebuild (2026-06-05)

DARE.CO.UK · PARKED SKETCH · 2026-06-06

Mirrored from ~/.claude/.../memory/parked_sketch_audreylam_workspace_rebuild_2026-06-05.md. This is a design sketch parked for future build — read for context, not as a current deliverable.

The existing gvr-ingest-from-singapore orchestrator broke against audreylam’s 395 GB Workspace export — 50 GB VM disk too small for staging AND gcloud hits Workspace SSO reauth policy mid-transfer. Sketch-first rebuild needed before next fire.


Dan, at 157M replay cost after a long bounce through gcloud errors:

“Rebuild using the sketch first model. We need disc space and find a long-term token ‘hold’ solution”

Right call. The two assumptions baked into the original orchestrator turned out to be wrong for audreylam.

What broke today (the empirical edges)

  1. Multi-line paste — shell line-continuations don’t survive markdown→terminal copy. Affects every multi-line command I hand back. Default to single-line or write-to-file-then-source.
  2. OAuth account-binding driftgcloud auth login without explicit --account defaults to the browser’s active Google account. Bound to dare.co.uk first, then audreyinc became “active” after the second login. Forced an explicit --account flag on the orchestrator (commit 03f3af7), then Dan ran gcloud config set account on VM to belt+suspenders.
  3. Workspace SSO reauth policygcloud storage cp errored with “Reauthentication failed. cannot prompt during non-interactive execution” even with valid refresh tokens. This is gcloud’s hardened reauth-on-sensitive-operations check, not ordinary token expiry. Workspace admins can extend the session policy but the default fires on sensitive scopes.
  4. Bucket not in our projectdwt-takeout-export-* buckets live in Google’s Data Warehouse for Takeout (DWT) project, not the customer’s GCP project. Even Workspace superadmin lacks storage.buckets.getIamPolicy on it. Compute SA grant path = DEAD.
  5. Disk size mismatch — VM has 50 GB boot+work disk. audreylam export is 395 GB. Stage-then-rclone fails disk-full inside 5 minutes.

The two architectural needs

A. Disk strategy

Three viable approaches:

Approach Trade-off
Resize VM boot disk to 500 GB pd-standard $20/mo extra; same fragile staging pattern; survives one big export but not the audreylam Photos 1.1 TB or audreyinc unknown size
Streaming copy (no local stage) rclone GCS→OneDrive direct, zero local persistence, scales to any export size; needs rclone GCS remote configured on VM (one-time OAuth)
Mounted GCS (gcsfuse) Treat bucket as filesystem; rclone reads via fuse mount; cleanest but adds a moving piece

Pick: streaming copy. Doesn’t grow VM costs, scales infinitely, removes disk-size as a future failure mode. The orchestrator stages-first because it was originally built for the audreyinc 24 GiB pilot — streaming wasn’t needed then.

B. Token-hold strategy

The constraint: the source bucket is Google-owned (DWT), only the admin who initiated the export has read access, and gcloud hits SSO reauth on sensitive operations.

Approach Status
Service account on audreylam project ❌ Admin can’t grant IAM on DWT bucket
Service account JSON key delegated by admin ❌ Same constraint — bucket IAM is Google-managed
rclone OAuth (refresh token flow) ✅ Standard OAuth refresh, NOT subject to gcloud’s sensitive-op reauth check. Empirically: rclone OAuth survives where gcloud SSO bombs.
Extend Workspace session-length policy ✅ admin.google.com → Security → Session control → set to “Session never expires” for the admin user. Belt for the rclone path.

Pick: rclone OAuth + extended session policy. rclone’s GCS remote uses plain OAuth2 refresh-token flow, which Workspace doesn’t gate the way gcloud does. Pair with the session-length policy bump as belt+suspenders.

The intended architecture (sketch)

STASH9

Two-step fire (was four-step staged):

  1. One-time setup: configure gcs-audreylam: rclone remote on VM (headless OAuth: rclone authorize on Mac → paste JSON to rclone config create on VM).
  2. Fire: rclone copy gcs-audreylam:<bucket>/ gvr-google-audrey-lam:<dest>/ in tmux on VM. Detach. Monitor via log tail or status card.

Touch points when picked up

File Change
~/bin/gvr-ingest-from-singapore.sh Add --no-stage flag that switches to streaming rclone copy gcs-remote:... instead of gcloud storage cp → local → rclone copy. Default stays staged for backward compat with audreyinc 24 GiB pilot.
~/bin/gvr-takeout-audreylam-fire.sh Pass --no-stage to the orchestrator
New: ~/bin/gvr-rclone-gcs-mint.sh Wraps the OAuth dance (rclone authorize Mac → config create VM) for any new audrey* tenant. Codify the painful one-time step.
Status card Add throughput metrics (parked sibling) — this’ll be the first transfer that actually empirically validates the streaming approach.
Memory: this sketch + parked_sketch_status_card_throughput_metrics_2026-06-05 + project_apac_window_hypothesis_under_20pct_variance_2026-06-05 All three converge on the audreylam transfer as the empirical test bed.

Workspace session-policy bump (do before fire)

Dan, while in admin.google.com: - Security → Authentication → Google session control - For studio@audreylam.com (or the OU it lives in): set Session length to “Session never expires” OR longest available (14 days typical). - This extends the gcloud SSO reauth window so future ingests don’t hit the same wall mid-transfer.

Stop rules

Cross-references

Source: parked_sketch_audreylam_workspace_rebuild_2026-06-05.md · Rendered 2026-06-06 14:45