Parallel ingest-user identities · xlab.studio (APAC) lane multiplication

SIGNAL · HYPOTHESIS TESTED · 5 JUNE 2026

Working hypothesis (Dan, 2026-06-05) plus its same-day empirical test result. Premise (confirmed via JWT decode): all 13 today’s gvr-* rclone remotes share one Azure AD app + one service principal. Original proposed move: provision multiple Azure AD apps for parallel lanes. Empirical bake-off (gf-cx-singapore → xlab.studio, 1 GB blobs, 16:55-17:05 ET) showed multiple destination drives under the SAME app+SP give nearly-linear scaling — 5 lanes hit ~4.9× single-lane throughput, 8 lanes hit ~6× before VM memory ceiling. Multiple AD apps NOT required. The architecture implication for Immich RAID backups is simplified accordingly.

🔬 Empirical bake-off results (run 2026-06-05)

5 tests run from gf-cx-singapore to xlab.studio (APAC), each transferring 1 GB random blobs to distinct destination drives. All lanes used the same app + service principal (data-gfcx-ingest, sp 2ecde9c4…).

Lanes	Wall clock	Aggregate MiB/s	Per-lane avg	Speedup	Notes
1 (baseline)	73 s	14.03	14.03	1.0×	gvr-admin:
2	87 s	23.54	11.77	1.68×	+ gvr-dan:
3	82 s	37.46	12.49	2.67×	+ gvr-luke:
5	75 s	68.27	13.65	4.87×	+ gvr-cindy:, + gvr-google-audrey-inc:
8	97 s	84.45	10.56	6.02×¹	+ gvr-google-audrey-lam:, + gvr-icloud-audrey-inc:, + gvr-icloud-audrey-lam:

¹ One lane (gvr-google-audrey-inc:) OOM-killed at the 8-lane test — VM has 7.8 GiB RAM and 8 concurrent rclones with --onedrive-chunk-size 250M --transfers=4 buffer ~1+ GiB each. True 8-lane sustained aggregate without OOM constraint is likely a touch higher.

Key reads:

Per-lane throughput stays cleanly in the 12-14 MiB/s band regardless of how many concurrent lanes are running, until VM memory pressure forces a kill at 8.
Aggregate scales near-linearly through 5 lanes, then bends as it approaches the published per-app + tenant ceiling (400 GB/hr ≈ 111 MiB/s). At 8 lanes we measured 84 MiB/s = ~76% of that ceiling — close to the theoretical wall.
The hypothesis is REFINED, not refuted. Multiple identities DO unlock more throughput; what we got wrong was assuming we needed multiple Azure AD APPS. The actual unlock is multiple destination DRIVES (each user’s OneDrive in the tenant has its own write bucket).
Multiple Azure AD apps unnecessary. Same app+SP can saturate ~6 destination drives. Move A (multi-app) is moot.

What this means for Immich RAID backups

(Original section below stands; the architectural simplifications are listed here.)

No new Azure AD apps needed. Use the existing data-gfcx-ingest app + service principal.
Shard by destination drive. Mint enough gvr-immich-shard-{1..N}@xlab.studio USERS (xlab.studio’s 25 E5-Dev license pool covers up to ~17 more user slots without buying anything). Each gets a 5 TB drive; sharding by year/owner/hash-prefix.
Optimal lane count for current VM: 5. Hits ~68 MiB/s ≈ 245 GB/hr ≈ ~24 hours to ship 6 TB. Adding lanes 6-8 needs more RAM than gf-cx-singapore’s 7.8 GiB.
Theoretical aggregate ceiling = ~111 MiB/s = ~400 GB/hr = ~6.7 hours per TB. Reach it with ~8 lanes on a VM with ≥16 GiB RAM, OR with smaller --onedrive-chunk-size (50-100 MiB) letting more concurrent rclones fit in memory.
VM resize is a one-line gcloud change. Bumping asia-southeast1-b’s instance from current size to n2-standard-4 (16 GiB RAM) costs ~$50/month and unblocks 8-lane parallelism. Inside the GCP $300 trial we’re still ~250 days from burning it out at that rate.
Restore math: at 5-lane = 68 MiB/s, a 6 TB Immich library restores in ~24 h instead of today’s single-lane ~5 days.

TL;DR — the huge discovery

All 13 gvr-* rclone remotes today are app-only auth (idtyp=app) via the same Azure AD app data-gfcx-ingest (appid=e935a3e3…) and the same service principal (sub/oid=2ecde9c4…). Microsoft sees them as one identity.
That means the per-principal ~14 MB/s ingress cap (Azure tenant cap memory, 2026-06-04) applies to all gvr-* traffic combined, not per-remote. Today’s empirical ~7-9 MB/s sustained from gf-cx-singapore is consistent with that cap.
Splitting work across multiple identities multiplies lanes. Two paths: 1. Multiple Azure AD apps (each is a separate per-app + tenant bucket — 400 GB/hr / ~111 MB/s ceiling per app, easily provisioned). 2. User-delegated tokens against distinct UPNs (per-user 14 MB/s, scales linearly with number of UPNs).
Headroom: per-app+tenant cap is 400 GB/hr ≈ 111 MB/s, so 5-8 parallel identities running their per-principal 14 MB/s each ≈ 70-110 MB/s aggregate before hitting the tenant ceiling. ~13× the current measured rate, all empirically reachable.
Architectural implication for Immich RAID backups (Dan’s highlighted next surface): design the ingest with the N-parallel-identities pattern baked in from day one. Don’t retrofit. Each Immich asset subset (per-owner / per-album / per-year shard) maps to its own identity and pulls in parallel.

Empirical evidence — JWT decode of all gvr-* remotes

Decoded the access_token JWT payload for every gvr-* remote in ~/.config/rclone/rclone.conf today (2026-06-05 16:36 ET):

Remote name	app_displayname	appid (first 8)	sub (first 8)	idtyp	upn
`gvr-dan`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)
`gvr-luke`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)
`gvr-cindy`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)
`gvr-admin`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)
`gvr-dropbox`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)
`gvr-google-audrey-inc`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)
`gvr-google-audrey-lam`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)
`gvr-icloud-audrey-inc`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)
`gvr-icloud-audrey-lam`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)
`gvr-timemachine-dan-m2`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)
`gvr-timemachine-mac-mini`	data-gfcx-ingest	`e935a3e3`	`2ecde9c4`	app	(app-only)

Every row is identical for the identity fields. The remote name is just an rclone label; it does NOT change who Microsoft sees on the wire.

What the ceilings really mean

Per project_azure_tenant_cap_investigation_2026-06-04 and confirmed via empirical measurement on gf-cx-singapore:

Cap	Value	Applies to
Per-user ingress	50 GB/hr ≈ 14 MB/s	A single identity (user OR service principal).
Per-app + tenant	400 GB/hr ≈ 111 MB/s	All calls from one Azure AD app, summed across all identities it acts as.
Tenant RU budget	18,750 / 5min	Tenant-wide Graph operations rate.

Today we’re using one app + one service principal → we’re capped by the per-user line (~14 MB/s, observed ~7-9 MB/s sustained). The per-app line (111 MB/s) is untouched; we have ~13× of headroom inside the tenant before hitting it.

Two architectural moves that unlock lanes

Move A — multiple Azure AD apps (cleaner)

Register N apps in xlab.studio’s Azure AD: - data-gfcx-ingest-audrey-inc - data-gfcx-ingest-audrey-lam - data-gfcx-ingest-dan-archives - data-gfcx-ingest-immich-shard-{1..N} - (etc.)

Each app: - Gets its own per-app + tenant 400 GB/hr ceiling (separate quota buckets). - Acts as its own service principal (oid differs) → separate per-user buckets too. - Re-mint each gvr-* remote against the matching new app.

Concurrency math: 8 apps × 14 MB/s (per-principal) = ~112 MB/s aggregate, sitting right at the per-tenant 18,750 RU/5min budget. Diminishing returns past ~6-7 apps because the shared RU budget becomes the next bottleneck.

Move B — user-delegated tokens (more accounts, simpler quota)

Mint actual M365 user accounts in xlab.studio for each ingest identity: - gvr-google-audrey-inc@xlab.studio - gvr-google-audrey-lam@xlab.studio - gvr-icloud-audrey-inc@xlab.studio - gvr-immich-shard-{1..N}@xlab.studio

OAuth each one separately; rclone keeps the user-delegated refresh token. Each user is a separate per-user 14 MB/s bucket. Capacity: xlab.studio has 25 E5-Dev licenses × 5TB = 125TB pool, more than enough to allocate ingest-only users.

Trade-off vs Move A: - Move B uses license slots (cheap, we have 25). - Move B has the OAuth account-binding-drift risk from feedback_oauth_account_drift_safari_private_2026-06-05.md — more accounts = more mint operations = more chances to bind to the wrong tenant. - Move A scales without burning license slots, but provisioning an app per shard at Immich scale (10s of shards) may hit Azure AD app registration limits.

Recommendation: Move A for the existing gvr-* surface (~5-8 apps, one per logical source). Move B (or a hybrid where Move A apps act AS specific Move B users) for Immich’s RAID shards if the shard count grows past ~10.

Immich RAID backup design — bake parallelism in from day one

Dan, 2026-06-05: “when we structure immich raid back-up’s it will have to take advantage of this fact”

Design implications:

Shard the asset library by identity-aligned partitions. Per-owner is the obvious split (one identity per Immich owner — matches the existing gvr-google-audrey-inc / audrey-lam pattern). For single-owner archives, shard by year or by alphabetic asset-hash prefix.
Each shard binds 1:1 to a gvr-immich-* remote that uses its own Azure AD app (Move A) or user (Move B).
Parallel rclone instances on gf-cx-singapore — rclone copy --transfers=N won’t multiply throughput once the per-principal cap kicks in, but N CONCURRENT rclone invocations against distinct remotes will. Wrap in parallel or a small Python orchestrator (sibling to gvr-ingest-from-singapore.sh).
Per-shard transfer log keyed by remote — feeds the existing rclone_logspeed.py history with shard granularity, so the APAC-window dashboard chart can show per-lane throughput.
RAID restore is symmetric. Pulling FROM xlab.studio is the same shape — N parallel identities reading distinct shards ≈ N × 14 MB/s effective restore rate. A multi-TB Immich restore that takes ~1.5 days at today’s single-lane rate could complete in ~4-5 hours at 6-lane parallel.

Test plan (empirical falsification)

Before shipping any provisioning, confirm the model with a tight empirical test:

# Provision app #2 in xlab.studio Azure AD (Portal → App
# Registrations → New Registration, name = "data-gfcx-ingest-test").
# Mint a fresh rclone remote against it: `gvr-admin-test`.

# Test 1 — sequential single-lane (baseline)
ssh vm-asia-gcp 'rclone_logspeed copy /test/1gb gvr-admin: \
    --transfers=4 --checkers=8'

# Test 2 — parallel two-lane (1× gvr-admin, 1× gvr-admin-test)
ssh vm-asia-gcp 'rclone_logspeed copy /test/1gb gvr-admin: \
    --transfers=4 &
                 rclone_logspeed copy /test/1gb gvr-admin-test: \
                     --transfers=4 ; wait'

# Expected:
#   Test 1: ~7-9 MB/s sustained (matches today's measurements)
#   Test 2: ~14-18 MB/s aggregate (each lane at ~7-9, sum ~doubles)
#
# If Test 2 doesn't ~double, the bottleneck is NOT per-principal —
# revisit the model (could be network leg, OneDrive backend write
# rate, or tenant RU budget hit).

Cross-references

Network Trace · gf-cx-singapore vs Florida Mac → xlab.studio (APAC) — empirical traceroute showing 2 ms VM→MS-Singapore-POP path.
Azure tenant cap investigation (2026-06-04) — full three-layer throttling model.
APAC transfer window policy (2026-06-04) — JST 23-07 = NYC 10-18 EDT window.
xlab.studio tenant lives in Japan (NextDNS attribution) — backend geography.
xlab.studio tenant naming taxonomy (2026-06-02) — the gvr-<service>-<owner> naming pattern this proposes to operationalise as real identities.
GCP audreylam Singapore relay — production pattern (2026-06-04) — the relay this parallelism would apply to.

Status — post-bake-off

Hypothesis validated + refined the same day. Multi-lane writes to distinct destination drives DO multiply throughput nearly linearly up to the per-app+tenant cap. Move A (multiple Azure AD apps) is unnecessary — collapsed out of the plan. Move B remains relevant (distinct destination drives = distinct destination users) and is the path forward.

Outstanding action items:

Resize gf-cx-singapore to ≥16 GiB RAM (n2-standard-4 or similar) to unblock 7-8 lane parallelism. Confirms the empirical trajectory toward the 111 MiB/s ceiling.
Decide Immich shard taxonomy — per-owner / per-year / hash-prefix. Mint the matching gvr-immich-*@xlab.studio users ahead of Immich provisioning so the destination drives exist when the first restore/backup fires.
Document the multi-lane orchestrator — sibling to gvr-ingest-from-singapore.sh. Takes a --lanes N flag and a manifest of (source, destination-remote) pairs. Logs per-lane throughput via rclone_logspeed.py.
Update memory — the rotation-collision pathology (feedback_gvr_destination_remote_token_rotation_2026-06-05) becomes higher-stakes when N lanes are in flight; pre-flight destination smoke-test now needs to run per-lane, not per-source.

Generated 2026-06-05 16:50 ET. Builds on the 2026-06-04 Azure tenant cap investigation + the 2026-06-05 traceroute. Empirical premise (single shared identity) decoded directly from rclone token JWTs. Architectural proposal validated against the published Microsoft caps; awaits empirical confirmation via the test plan.

Source: dare_parallel_ingest_identities_2026-06-05.md · Rendered 2026-06-05 19:03