Parked sketch — audreylam Workspace ingest needs disk + token-hold rebuild (2026-06-05)
DARE.CO.UK · PARKED SKETCH · 2026-06-06
Mirrored from ~/.claude/.../memory/parked_sketch_audreylam_workspace_rebuild_2026-06-05.md. This is a design sketch parked for future build — read for context, not as a current deliverable.
The existing gvr-ingest-from-singapore orchestrator broke against audreylam’s 395 GB Workspace export — 50 GB VM disk too small for staging AND gcloud hits Workspace SSO reauth policy mid-transfer. Sketch-first rebuild needed before next fire.
Dan, at 157M replay cost after a long bounce through gcloud errors:
“Rebuild using the sketch first model. We need disc space and find a long-term token ‘hold’ solution”
Right call. The two assumptions baked into the original orchestrator turned out to be wrong for audreylam.
What broke today (the empirical edges)
- Multi-line paste — shell line-continuations don’t survive markdown→terminal copy. Affects every multi-line command I hand back. Default to single-line or write-to-file-then-source.
- OAuth account-binding drift —
gcloud auth loginwithout explicit--accountdefaults to the browser’s active Google account. Bound to dare.co.uk first, then audreyinc became “active” after the second login. Forced an explicit--accountflag on the orchestrator (commit03f3af7), then Dan rangcloud config set accounton VM to belt+suspenders. - Workspace SSO reauth policy —
gcloud storage cperrored with “Reauthentication failed. cannot prompt during non-interactive execution” even with valid refresh tokens. This is gcloud’s hardened reauth-on-sensitive-operations check, not ordinary token expiry. Workspace admins can extend the session policy but the default fires on sensitive scopes. - Bucket not in our project —
dwt-takeout-export-*buckets live in Google’s Data Warehouse for Takeout (DWT) project, not the customer’s GCP project. Even Workspace superadmin lacksstorage.buckets.getIamPolicyon it. Compute SA grant path = DEAD. - Disk size mismatch — VM has 50 GB boot+work disk. audreylam export is 395 GB. Stage-then-rclone fails disk-full inside 5 minutes.
The two architectural needs
A. Disk strategy
Three viable approaches:
| Approach | Trade-off |
|---|---|
| Resize VM boot disk to 500 GB pd-standard | $20/mo extra; same fragile staging pattern; survives one big export but not the audreylam Photos 1.1 TB or audreyinc unknown size |
| Streaming copy (no local stage) | rclone GCS→OneDrive direct, zero local persistence, scales to any export size; needs rclone GCS remote configured on VM (one-time OAuth) |
| Mounted GCS (gcsfuse) | Treat bucket as filesystem; rclone reads via fuse mount; cleanest but adds a moving piece |
Pick: streaming copy. Doesn’t grow VM costs, scales infinitely, removes disk-size as a future failure mode. The orchestrator stages-first because it was originally built for the audreyinc 24 GiB pilot — streaming wasn’t needed then.
B. Token-hold strategy
The constraint: the source bucket is Google-owned (DWT), only the admin who initiated the export has read access, and gcloud hits SSO reauth on sensitive operations.
| Approach | Status |
|---|---|
| Service account on audreylam project | ❌ Admin can’t grant IAM on DWT bucket |
| Service account JSON key delegated by admin | ❌ Same constraint — bucket IAM is Google-managed |
| rclone OAuth (refresh token flow) | ✅ Standard OAuth refresh, NOT subject to gcloud’s sensitive-op reauth check. Empirically: rclone OAuth survives where gcloud SSO bombs. |
| Extend Workspace session-length policy | ✅ admin.google.com → Security → Session control → set to “Session never expires” for the admin user. Belt for the rclone path. |
Pick: rclone OAuth + extended session policy. rclone’s GCS remote uses plain OAuth2 refresh-token flow, which Workspace doesn’t gate the way gcloud does. Pair with the session-length policy bump as belt+suspenders.