Parallel ingest-user identities · xlab.studio (APAC) lane multiplication

SIGNAL · HYPOTHESIS TESTED · 5 JUNE 2026

Working hypothesis (Dan, 2026-06-05) plus its same-day empirical test result. Premise (confirmed via JWT decode): all 13 today’s gvr-* rclone remotes share one Azure AD app + one service principal. Original proposed move: provision multiple Azure AD apps for parallel lanes. Empirical bake-off (gf-cx-singapore → xlab.studio, 1 GB blobs, 16:55-17:05 ET) showed multiple destination drives under the SAME app+SP give nearly-linear scaling — 5 lanes hit ~4.9× single-lane throughput, 8 lanes hit ~6× before VM memory ceiling. Multiple AD apps NOT required. The architecture implication for Immich RAID backups is simplified accordingly.

🔬 Empirical bake-off results (run 2026-06-05)

5 tests run from gf-cx-singapore to xlab.studio (APAC), each transferring 1 GB random blobs to distinct destination drives. All lanes used the same app + service principal (data-gfcx-ingest, sp 2ecde9c4…).

Lanes Wall clock Aggregate MiB/s Per-lane avg Speedup Notes
1 (baseline) 73 s 14.03 14.03 1.0× gvr-admin:
2 87 s 23.54 11.77 1.68× + gvr-dan:
3 82 s 37.46 12.49 2.67× + gvr-luke:
5 75 s 68.27 13.65 4.87× + gvr-cindy:, + gvr-google-audrey-inc:
8 97 s 84.45 10.56 6.02×¹ + gvr-google-audrey-lam:, + gvr-icloud-audrey-inc:, + gvr-icloud-audrey-lam:

¹ One lane (gvr-google-audrey-inc:) OOM-killed at the 8-lane test — VM has 7.8 GiB RAM and 8 concurrent rclones with --onedrive-chunk-size 250M --transfers=4 buffer ~1+ GiB each. True 8-lane sustained aggregate without OOM constraint is likely a touch higher.

Key reads:

What this means for Immich RAID backups

(Original section below stands; the architectural simplifications are listed here.)

TL;DR — the huge discovery

Empirical evidence — JWT decode of all gvr-* remotes

Decoded the access_token JWT payload for every gvr-* remote in ~/.config/rclone/rclone.conf today (2026-06-05 16:36 ET):

Remote name app_displayname appid (first 8) sub (first 8) idtyp upn
gvr-dan data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)
gvr-luke data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)
gvr-cindy data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)
gvr-admin data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)
gvr-dropbox data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)
gvr-google-audrey-inc data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)
gvr-google-audrey-lam data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)
gvr-icloud-audrey-inc data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)
gvr-icloud-audrey-lam data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)
gvr-timemachine-dan-m2 data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)
gvr-timemachine-mac-mini data-gfcx-ingest e935a3e3 2ecde9c4 app (app-only)

Every row is identical for the identity fields. The remote name is just an rclone label; it does NOT change who Microsoft sees on the wire.

What the ceilings really mean

Per project_azure_tenant_cap_investigation_2026-06-04 and confirmed via empirical measurement on gf-cx-singapore:

Cap Value Applies to
Per-user ingress 50 GB/hr ≈ 14 MB/s A single identity (user OR service principal).
Per-app + tenant 400 GB/hr ≈ 111 MB/s All calls from one Azure AD app, summed across all identities it acts as.
Tenant RU budget 18,750 / 5min Tenant-wide Graph operations rate.

Today we’re using one app + one service principal → we’re capped by the per-user line (~14 MB/s, observed ~7-9 MB/s sustained). The per-app line (111 MB/s) is untouched; we have ~13× of headroom inside the tenant before hitting it.

Two architectural moves that unlock lanes

Move A — multiple Azure AD apps (cleaner)

Register N apps in xlab.studio’s Azure AD: - data-gfcx-ingest-audrey-inc - data-gfcx-ingest-audrey-lam - data-gfcx-ingest-dan-archives - data-gfcx-ingest-immich-shard-{1..N} - (etc.)

Each app: - Gets its own per-app + tenant 400 GB/hr ceiling (separate quota buckets). - Acts as its own service principal (oid differs) → separate per-user buckets too. - Re-mint each gvr-* remote against the matching new app.

Concurrency math: 8 apps × 14 MB/s (per-principal) = ~112 MB/s aggregate, sitting right at the per-tenant 18,750 RU/5min budget. Diminishing returns past ~6-7 apps because the shared RU budget becomes the next bottleneck.

Move B — user-delegated tokens (more accounts, simpler quota)

Mint actual M365 user accounts in xlab.studio for each ingest identity: - gvr-google-audrey-inc@xlab.studio - gvr-google-audrey-lam@xlab.studio - gvr-icloud-audrey-inc@xlab.studio - gvr-immich-shard-{1..N}@xlab.studio

OAuth each one separately; rclone keeps the user-delegated refresh token. Each user is a separate per-user 14 MB/s bucket. Capacity: xlab.studio has 25 E5-Dev licenses × 5TB = 125TB pool, more than enough to allocate ingest-only users.

Trade-off vs Move A: - Move B uses license slots (cheap, we have 25). - Move B has the OAuth account-binding-drift risk from feedback_oauth_account_drift_safari_private_2026-06-05.md — more accounts = more mint operations = more chances to bind to the wrong tenant. - Move A scales without burning license slots, but provisioning an app per shard at Immich scale (10s of shards) may hit Azure AD app registration limits.

Recommendation: Move A for the existing gvr-* surface (~5-8 apps, one per logical source). Move B (or a hybrid where Move A apps act AS specific Move B users) for Immich’s RAID shards if the shard count grows past ~10.

Immich RAID backup design — bake parallelism in from day one

Dan, 2026-06-05: “when we structure immich raid back-up’s it will have to take advantage of this fact”

Design implications:

  1. Shard the asset library by identity-aligned partitions. Per-owner is the obvious split (one identity per Immich owner — matches the existing gvr-google-audrey-inc / audrey-lam pattern). For single-owner archives, shard by year or by alphabetic asset-hash prefix.
  2. Each shard binds 1:1 to a gvr-immich-* remote that uses its own Azure AD app (Move A) or user (Move B).
  3. Parallel rclone instances on gf-cx-singaporerclone copy --transfers=N won’t multiply throughput once the per-principal cap kicks in, but N CONCURRENT rclone invocations against distinct remotes will. Wrap in parallel or a small Python orchestrator (sibling to gvr-ingest-from-singapore.sh).
  4. Per-shard transfer log keyed by remote — feeds the existing rclone_logspeed.py history with shard granularity, so the APAC-window dashboard chart can show per-lane throughput.
  5. RAID restore is symmetric. Pulling FROM xlab.studio is the same shape — N parallel identities reading distinct shards ≈ N × 14 MB/s effective restore rate. A multi-TB Immich restore that takes ~1.5 days at today’s single-lane rate could complete in ~4-5 hours at 6-lane parallel.

Test plan (empirical falsification)

Before shipping any provisioning, confirm the model with a tight empirical test:

# Provision app #2 in xlab.studio Azure AD (Portal → App
# Registrations → New Registration, name = "data-gfcx-ingest-test").
# Mint a fresh rclone remote against it: `gvr-admin-test`.

# Test 1 — sequential single-lane (baseline)
ssh vm-asia-gcp 'rclone_logspeed copy /test/1gb gvr-admin: \
    --transfers=4 --checkers=8'

# Test 2 — parallel two-lane (1× gvr-admin, 1× gvr-admin-test)
ssh vm-asia-gcp 'rclone_logspeed copy /test/1gb gvr-admin: \
    --transfers=4 &
                 rclone_logspeed copy /test/1gb gvr-admin-test: \
                     --transfers=4 ; wait'

# Expected:
#   Test 1: ~7-9 MB/s sustained (matches today's measurements)
#   Test 2: ~14-18 MB/s aggregate (each lane at ~7-9, sum ~doubles)
#
# If Test 2 doesn't ~double, the bottleneck is NOT per-principal —
# revisit the model (could be network leg, OneDrive backend write
# rate, or tenant RU budget hit).

Cross-references

Status — post-bake-off

Hypothesis validated + refined the same day. Multi-lane writes to distinct destination drives DO multiply throughput nearly linearly up to the per-app+tenant cap. Move A (multiple Azure AD apps) is unnecessary — collapsed out of the plan. Move B remains relevant (distinct destination drives = distinct destination users) and is the path forward.

Outstanding action items:

  1. Resize gf-cx-singapore to ≥16 GiB RAM (n2-standard-4 or similar) to unblock 7-8 lane parallelism. Confirms the empirical trajectory toward the 111 MiB/s ceiling.
  2. Decide Immich shard taxonomy — per-owner / per-year / hash-prefix. Mint the matching gvr-immich-*@xlab.studio users ahead of Immich provisioning so the destination drives exist when the first restore/backup fires.
  3. Document the multi-lane orchestrator — sibling to gvr-ingest-from-singapore.sh. Takes a --lanes N flag and a manifest of (source, destination-remote) pairs. Logs per-lane throughput via rclone_logspeed.py.
  4. Update memory — the rotation-collision pathology (feedback_gvr_destination_remote_token_rotation_2026-06-05) becomes higher-stakes when N lanes are in flight; pre-flight destination smoke-test now needs to run per-lane, not per-source.

Generated 2026-06-05 16:50 ET. Builds on the 2026-06-04 Azure tenant cap investigation + the 2026-06-05 traceroute. Empirical premise (single shared identity) decoded directly from rclone token JWTs. Architectural proposal validated against the published Microsoft caps; awaits empirical confirmation via the test plan.

Source: dare_parallel_ingest_identities_2026-06-05.md · Rendered 2026-06-05 19:03