Parallel ingest-user identities · xlab.studio (APAC) lane multiplication
SIGNAL · HYPOTHESIS TESTED · 5 JUNE 2026
Working hypothesis (Dan, 2026-06-05) plus its same-day empirical
test result. Premise (confirmed via JWT decode): all 13 today’s
gvr-* rclone remotes share one Azure AD app + one service
principal. Original proposed move: provision multiple Azure AD apps
for parallel lanes. Empirical bake-off (gf-cx-singapore → xlab.studio,
1 GB blobs, 16:55-17:05 ET) showed multiple destination drives
under the SAME app+SP give nearly-linear scaling — 5 lanes hit
~4.9× single-lane throughput, 8 lanes hit ~6× before VM memory
ceiling. Multiple AD apps NOT required. The architecture
implication for Immich RAID backups is simplified accordingly.
🔬 Empirical bake-off results (run 2026-06-05)
5 tests run from gf-cx-singapore to xlab.studio (APAC), each
transferring 1 GB random blobs to distinct destination drives.
All lanes used the same app + service principal (data-gfcx-ingest,
sp 2ecde9c4…).
| Lanes | Wall clock | Aggregate MiB/s | Per-lane avg | Speedup | Notes |
|---|---|---|---|---|---|
| 1 (baseline) | 73 s | 14.03 | 14.03 | 1.0× | gvr-admin: |
| 2 | 87 s | 23.54 | 11.77 | 1.68× | + gvr-dan: |
| 3 | 82 s | 37.46 | 12.49 | 2.67× | + gvr-luke: |
| 5 | 75 s | 68.27 | 13.65 | 4.87× | + gvr-cindy:, + gvr-google-audrey-inc: |
| 8 | 97 s | 84.45 | 10.56 | 6.02×¹ | + gvr-google-audrey-lam:, + gvr-icloud-audrey-inc:, + gvr-icloud-audrey-lam: |
¹ One lane (gvr-google-audrey-inc:) OOM-killed at the 8-lane test
— VM has 7.8 GiB RAM and 8 concurrent rclones with --onedrive-chunk-size
250M --transfers=4 buffer ~1+ GiB each. True 8-lane sustained
aggregate without OOM constraint is likely a touch higher.
Key reads:
- Per-lane throughput stays cleanly in the 12-14 MiB/s band regardless of how many concurrent lanes are running, until VM memory pressure forces a kill at 8.
- Aggregate scales near-linearly through 5 lanes, then bends as it approaches the published per-app + tenant ceiling (400 GB/hr ≈ 111 MiB/s). At 8 lanes we measured 84 MiB/s = ~76% of that ceiling — close to the theoretical wall.
- The hypothesis is REFINED, not refuted. Multiple identities DO unlock more throughput; what we got wrong was assuming we needed multiple Azure AD APPS. The actual unlock is multiple destination DRIVES (each user’s OneDrive in the tenant has its own write bucket).
- Multiple Azure AD apps unnecessary. Same app+SP can saturate ~6 destination drives. Move A (multi-app) is moot.
What this means for Immich RAID backups
(Original section below stands; the architectural simplifications are listed here.)
- No new Azure AD apps needed. Use the existing
data-gfcx-ingestapp + service principal. - Shard by destination drive. Mint enough
gvr-immich-shard-{1..N}@xlab.studioUSERS (xlab.studio’s 25 E5-Dev license pool covers up to ~17 more user slots without buying anything). Each gets a 5 TB drive; sharding by year/owner/hash-prefix. - Optimal lane count for current VM: 5. Hits ~68 MiB/s ≈ 245 GB/hr ≈ ~24 hours to ship 6 TB. Adding lanes 6-8 needs more RAM than gf-cx-singapore’s 7.8 GiB.
- Theoretical aggregate ceiling = ~111 MiB/s = ~400 GB/hr =
~6.7 hours per TB. Reach it with ~8 lanes on a VM with ≥16 GiB
RAM, OR with smaller
--onedrive-chunk-size(50-100 MiB) letting more concurrent rclones fit in memory. - VM resize is a one-line
gcloudchange. Bumping asia-southeast1-b’s instance from current size ton2-standard-4(16 GiB RAM) costs ~$50/month and unblocks 8-lane parallelism. Inside the GCP $300 trial we’re still ~250 days from burning it out at that rate. - Restore math: at 5-lane = 68 MiB/s, a 6 TB Immich library restores in ~24 h instead of today’s single-lane ~5 days.
TL;DR — the huge discovery
- All 13
gvr-*rclone remotes today are app-only auth (idtyp=app) via the same Azure AD appdata-gfcx-ingest(appid=e935a3e3…) and the same service principal (sub/oid=2ecde9c4…). Microsoft sees them as one identity. - That means the per-principal ~14 MB/s ingress cap (Azure tenant
cap memory, 2026-06-04) applies to all
gvr-*traffic combined, not per-remote. Today’s empirical ~7-9 MB/s sustained from gf-cx-singapore is consistent with that cap. - Splitting work across multiple identities multiplies lanes. Two paths: 1. Multiple Azure AD apps (each is a separate per-app + tenant bucket — 400 GB/hr / ~111 MB/s ceiling per app, easily provisioned). 2. User-delegated tokens against distinct UPNs (per-user 14 MB/s, scales linearly with number of UPNs).
- Headroom: per-app+tenant cap is 400 GB/hr ≈ 111 MB/s, so 5-8 parallel identities running their per-principal 14 MB/s each ≈ 70-110 MB/s aggregate before hitting the tenant ceiling. ~13× the current measured rate, all empirically reachable.
- Architectural implication for Immich RAID backups (Dan’s highlighted next surface): design the ingest with the N-parallel-identities pattern baked in from day one. Don’t retrofit. Each Immich asset subset (per-owner / per-album / per-year shard) maps to its own identity and pulls in parallel.
Empirical evidence — JWT decode of all gvr-* remotes
Decoded the access_token JWT payload for every gvr-* remote in
~/.config/rclone/rclone.conf today (2026-06-05 16:36 ET):
| Remote name | app_displayname | appid (first 8) | sub (first 8) | idtyp | upn |
|---|---|---|---|---|---|
gvr-dan |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
gvr-luke |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
gvr-cindy |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
gvr-admin |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
gvr-dropbox |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
gvr-google-audrey-inc |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
gvr-google-audrey-lam |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
gvr-icloud-audrey-inc |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
gvr-icloud-audrey-lam |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
gvr-timemachine-dan-m2 |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
gvr-timemachine-mac-mini |
data-gfcx-ingest | e935a3e3 |
2ecde9c4 |
app | (app-only) |
Every row is identical for the identity fields. The remote name is just an rclone label; it does NOT change who Microsoft sees on the wire.
What the ceilings really mean
Per project_azure_tenant_cap_investigation_2026-06-04 and confirmed
via empirical measurement on gf-cx-singapore:
| Cap | Value | Applies to |
|---|---|---|
| Per-user ingress | 50 GB/hr ≈ 14 MB/s | A single identity (user OR service principal). |
| Per-app + tenant | 400 GB/hr ≈ 111 MB/s | All calls from one Azure AD app, summed across all identities it acts as. |
| Tenant RU budget | 18,750 / 5min | Tenant-wide Graph operations rate. |
Today we’re using one app + one service principal → we’re capped by the per-user line (~14 MB/s, observed ~7-9 MB/s sustained). The per-app line (111 MB/s) is untouched; we have ~13× of headroom inside the tenant before hitting it.
Two architectural moves that unlock lanes
Move A — multiple Azure AD apps (cleaner)
Register N apps in xlab.studio’s Azure AD:
- data-gfcx-ingest-audrey-inc
- data-gfcx-ingest-audrey-lam
- data-gfcx-ingest-dan-archives
- data-gfcx-ingest-immich-shard-{1..N}
- (etc.)
Each app:
- Gets its own per-app + tenant 400 GB/hr ceiling (separate quota
buckets).
- Acts as its own service principal (oid differs) → separate
per-user buckets too.
- Re-mint each gvr-* remote against the matching new app.
Concurrency math: 8 apps × 14 MB/s (per-principal) = ~112 MB/s aggregate, sitting right at the per-tenant 18,750 RU/5min budget. Diminishing returns past ~6-7 apps because the shared RU budget becomes the next bottleneck.
Move B — user-delegated tokens (more accounts, simpler quota)
Mint actual M365 user accounts in xlab.studio for each ingest
identity:
- gvr-google-audrey-inc@xlab.studio
- gvr-google-audrey-lam@xlab.studio
- gvr-icloud-audrey-inc@xlab.studio
- gvr-immich-shard-{1..N}@xlab.studio
OAuth each one separately; rclone keeps the user-delegated refresh token. Each user is a separate per-user 14 MB/s bucket. Capacity: xlab.studio has 25 E5-Dev licenses × 5TB = 125TB pool, more than enough to allocate ingest-only users.
Trade-off vs Move A:
- Move B uses license slots (cheap, we have 25).
- Move B has the OAuth account-binding-drift risk from
feedback_oauth_account_drift_safari_private_2026-06-05.md —
more accounts = more mint operations = more chances to bind to
the wrong tenant.
- Move A scales without burning license slots, but provisioning
an app per shard at Immich scale (10s of shards) may hit Azure AD
app registration limits.
Recommendation: Move A for the existing gvr-* surface
(~5-8 apps, one per logical source). Move B (or a hybrid where
Move A apps act AS specific Move B users) for Immich’s RAID
shards if the shard count grows past ~10.
Immich RAID backup design — bake parallelism in from day one
Dan, 2026-06-05: “when we structure immich raid back-up’s it will have to take advantage of this fact”
Design implications:
- Shard the asset library by identity-aligned partitions.
Per-owner is the obvious split (one identity per Immich owner —
matches the existing
gvr-google-audrey-inc/audrey-lampattern). For single-owner archives, shard by year or by alphabetic asset-hash prefix. - Each shard binds 1:1 to a
gvr-immich-*remote that uses its own Azure AD app (Move A) or user (Move B). - Parallel rclone instances on gf-cx-singapore —
rclone copy --transfers=Nwon’t multiply throughput once the per-principal cap kicks in, but N CONCURRENT rclone invocations against distinct remotes will. Wrap inparallelor a small Python orchestrator (sibling togvr-ingest-from-singapore.sh). - Per-shard transfer log keyed by remote — feeds the existing
rclone_logspeed.pyhistory with shard granularity, so the APAC-window dashboard chart can show per-lane throughput. - RAID restore is symmetric. Pulling FROM xlab.studio is the same shape — N parallel identities reading distinct shards ≈ N × 14 MB/s effective restore rate. A multi-TB Immich restore that takes ~1.5 days at today’s single-lane rate could complete in ~4-5 hours at 6-lane parallel.
Test plan (empirical falsification)
Before shipping any provisioning, confirm the model with a tight empirical test:
# Provision app #2 in xlab.studio Azure AD (Portal → App
# Registrations → New Registration, name = "data-gfcx-ingest-test").
# Mint a fresh rclone remote against it: `gvr-admin-test`.
# Test 1 — sequential single-lane (baseline)
ssh vm-asia-gcp 'rclone_logspeed copy /test/1gb gvr-admin: \
--transfers=4 --checkers=8'
# Test 2 — parallel two-lane (1× gvr-admin, 1× gvr-admin-test)
ssh vm-asia-gcp 'rclone_logspeed copy /test/1gb gvr-admin: \
--transfers=4 &
rclone_logspeed copy /test/1gb gvr-admin-test: \
--transfers=4 ; wait'
# Expected:
# Test 1: ~7-9 MB/s sustained (matches today's measurements)
# Test 2: ~14-18 MB/s aggregate (each lane at ~7-9, sum ~doubles)
#
# If Test 2 doesn't ~double, the bottleneck is NOT per-principal —
# revisit the model (could be network leg, OneDrive backend write
# rate, or tenant RU budget hit).
Cross-references
- Network Trace · gf-cx-singapore vs Florida Mac → xlab.studio (APAC) — empirical traceroute showing 2 ms VM→MS-Singapore-POP path.
- Azure tenant cap investigation (2026-06-04) — full three-layer throttling model.
- APAC transfer window policy (2026-06-04) — JST 23-07 = NYC 10-18 EDT window.
- xlab.studio tenant lives in Japan (NextDNS attribution) — backend geography.
- xlab.studio tenant naming taxonomy (2026-06-02) — the
gvr-<service>-<owner>naming pattern this proposes to operationalise as real identities. - GCP audreylam Singapore relay — production pattern (2026-06-04) — the relay this parallelism would apply to.
Status — post-bake-off
Hypothesis validated + refined the same day. Multi-lane writes to distinct destination drives DO multiply throughput nearly linearly up to the per-app+tenant cap. Move A (multiple Azure AD apps) is unnecessary — collapsed out of the plan. Move B remains relevant (distinct destination drives = distinct destination users) and is the path forward.
Outstanding action items:
- Resize gf-cx-singapore to ≥16 GiB RAM (
n2-standard-4or similar) to unblock 7-8 lane parallelism. Confirms the empirical trajectory toward the 111 MiB/s ceiling. - Decide Immich shard taxonomy — per-owner / per-year /
hash-prefix. Mint the matching
gvr-immich-*@xlab.studiousers ahead of Immich provisioning so the destination drives exist when the first restore/backup fires. - Document the multi-lane orchestrator — sibling to
gvr-ingest-from-singapore.sh. Takes a--lanes Nflag and a manifest of (source, destination-remote) pairs. Logs per-lane throughput viarclone_logspeed.py. - Update memory — the rotation-collision pathology
(
feedback_gvr_destination_remote_token_rotation_2026-06-05) becomes higher-stakes when N lanes are in flight; pre-flight destination smoke-test now needs to run per-lane, not per-source.
Generated 2026-06-05 16:50 ET. Builds on the 2026-06-04 Azure tenant cap investigation + the 2026-06-05 traceroute. Empirical premise (single shared identity) decoded directly from rclone token JWTs. Architectural proposal validated against the published Microsoft caps; awaits empirical confirmation via the test plan.