Audrey’s 4TB photo library — migrate from Google/iCloud-dupe-mess to R2 (parked 2026-05-24)
DARE.CO.UK · PARKED SKETCH · 2026-05-26
Mirrored from ~/.claude/.../memory/parked_sketch_audrey_4tb_photo_library_to_r2_2026-05-24.md. This is a design sketch parked for future build — read for context, not as a current deliverable.
Audrey’s photo library is a duplicates-of-duplicates mess scattered across Google Photos + iCloud + multiple Drive locations. Three architectural options sketched (R2-native + Worker · Docker Immich + cloudflared · flat files + rclone sync + read-only viewer). Recommendation: Option C (flat files + rclone) — matches the portfolio’s “flat files in git, hosting swappable” promise, graduates to B when face-search / on-this-day become missed features. First reversible move: create R2 bucket + one rclone sync. ~$5-10/mo for moderate library, no hardware required for C.
Dan 2026-05-24: “Audrey has duplicate problem, copies-of-copies-of-copies in multiple google locations, plus icloud, it’s a troublesome mess. so I’m keen to go into sketch-mode.”
The problem
| Where photos live today | Status |
|---|---|
| Google Drive (multiple folders) | Duplicates-of-duplicates, no canonical version |
| Google Photos | Auto-uploaded copies overlapping with Drive |
| iCloud | Yet another copy plane |
| Local Mac | Where actual edits + imports happen |
The duplicate sprawl is the real pain — same photo lives in 3-5 places, no single source of truth, no clear “what’s been backed up” status, no programmatic dedupe path.
Three architectural options (from Claude-on-desktop transcript)
Option A — R2-native, no always-on hardware
- R2 bucket holds originals
- Tiny Cloudflare Worker provides API (upload, list, signed-URL fetch, thumbnail variants via Cloudflare Images)
- Pages app at
photos.gf.cxis the UI - CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy on hostname keyed to Dan’s email; Worker validates JWT
- Bucket stays private; Worker is only reader
Trade-offs: - ✓ Zero hardware, zero always-on - ✓ ~$0.015/GB/month, zero egress · 500GB ≈ $7.50/mo - ✗ No Immich-like app (no face recognition, no on-this-day, no search-by-content) - ✗ Fine as working archive, weak as daily-driver Google Photos replacement
Option B — Docker box at home running Immich, behind cloudflared
- Mac mini / Pi 5 / corner of dev box
- Immich (dominant self-hosted Google Photos clone in 2026): face detection, iOS/Android apps with auto-upload, smart search
- Docker compose, very actively developed
- cloudflared exposes
immich.gf.cx; CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access in front - Immich machine binds to localhost only — only path in is outbound tunnel
Trade-offs: - ✓ Full Google-Photos-equivalent UX - ✓ Mobile auto-upload, face search, smart albums - ✗ Machine has to be on (electricity + maintenance) - ✗ Immich keeps its own database mapping files → metadata; on-disk layout isn’t quite “just files in dated folders” (exportable, not transparent)
Option C — Flat files locally, rclone to R2, tiny read-only Pages viewer ⭐ RECOMMENDED
- Photos live in
~/Photos/2026/05/...on Mac — just HEIC/JPG in dated folders rclone sync ~/Photos r2:photos-gf-cxruns hourly via launchd or Hazel rule- Pages site at
photos.gf.cxreads R2 (via signed-URL Worker), behind CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access, browse-anywhere - Claude Code does filesystem operations (
mv,cp,find) directly because files are right there
Trade-offs: - ✓ Matches the portfolio’s “flat files in git, hosting swappable” promise (architectural consistency) - ✓ Mac is write surface (already is — that’s where import + edit happen) - ✓ R2 is the durable copy - ✓ Viewer is near-static read of what’s in R2 - ✓ Vendor-independent (rclone speaks B2, S3, Azure, local disk) - ✓ Claude Code is best at filesystem ops - ✓ ~$5-10/mo for moderate library, no hardware - ✗ No face search, no on-this-day (graduate to B if those become missed features) - ✗ Read-only from mobile (graduate to A when upload-from-mobile matters)
Recommendation — REVISED 2026-05-24 (after second pass on classification)
Go with B (Immich) directly. The previous “start with C, graduate later” recommendation is superseded.
The flip is driven by the classification problem that wasn’t fully accounted for in pass 1:
- Audrey’s 4TB library mixes client work, personal life, baby photos (10,000+ for first two years)
- Without face clustering + semantic search, the library is “vendor-portable” but unfindable
- The directory tree is movable; the index that makes it usable is not
- Once index volume crosses ~tens of thousands of photos, the flat-file promise quietly breaks
The compromise that preserves the portfolio’s “files-not-platform” promise:
| Layer | What |
|---|---|
| Working | Immich on a Mac mini at home (Docker) — face clusters, CLIP search, mobile auto-upload, dated/EXIF-correct flat files underneath |
| Durable mirror | rclone sync of Immich’s storage folder → R2 (tier 3) — same files Immich operates on, mirrored for vendor independence |
| Immutable origin | Takeout dump kept frozen in R2 Archive tier — the T=0 ground truth, never deleted |
If Immich disappeared tomorrow: you lose face-cluster + CLIP index, but the underlying organised flat files survive. Same trade as everything else in the portfolio — underlying record durable, index on top replaceable.
The Mac mini becomes the photographic equivalent of the dare.co.uk Worker — small, single-purpose, behind a CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access tunnel, nothing public, files on disk underneath. Same architectural shape as the rest of the gf.cx stack.
Why C alone falls short
| Workflow | C (flat files) | B (Immich) |
|---|---|---|
| “All photos of Audrey + baby with grandma” | ✗ manual scroll | ✓ face cluster intersection |
| “Whiteboards from client meetings” | ✗ rgrep filenames + hope | ✓ CLIP semantic search |
| “Receipts from work travel 2024” | ✗ manual triage | ✓ CLIP “receipts” + date filter |
| “Photos of the broken sprayer for insurance” | ✗ remember when it broke | ✓ CLIP “sprayer” or face/place |
| Mobile auto-upload | ✗ doesn’t exist | ✓ standard Immich feature |
| Vendor independence | ✓ R2-mirrored flat files | ✓ same — rclone mirror of Immich storage |
C is still a fine FIRST step (the immutable origin lands in R2 regardless), but the working layer should be Immich from day one.
Maps directly onto the gf.cx tier diagram
The photo work is tier 3 + tier 4 territory in the payload.gf.cx framing:
| Tier | Role in photos work |
|---|---|
| Tier 1 (git) | Not used — too large for git |
| Tier 2 (payload.gf.cx public R2) | Not used — photos are private |
| Tier 3 (CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF-Access-gated R2) | The destination — photos.gf.cx bucket, Worker-fronted, Access policy keyed to Dan’s email. Audrey + Dan have a browse-anywhere copy. No one else does. |
| Tier 4 (off-cloud) | The source of truth — Mac local ~/Photos/... + Time Machine + encrypted external. Stays authoritative even if R2 disappears. |
All 3 options (A / B / C) keep tier 4 intact. They differ in WHAT lives in tier 3: - A — R2 bucket + minimal Worker API (no app, just access) - B — Immich app running on Docker box, R2 as Immich’s backing store - C — flat-file R2 mirror via rclone + read-only viewer
The “first reversible move” is purely a tier-3 setup — create the bucket, sync once, you have tier-3 today.
The common architecture across all three
“the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access tunnel pattern is doing the same trick in all three options — it lets the actual service run with zero internal auth, because the network path itself is the auth”
Cloudflare Access on the hostname = the auth. The photos app, the worker, the Immich instance — none of them know who Dan is. Access knows, that’s enough. This is the de-facto homelab shape in 2026.
The “first reversible move” (no-commitment first step)
- Create R2 bucket
photos-gf-cx - Generate R2 API token
rclone sync ~/Photos r2:photos-gf-cxof whatever’s currently in the library- Now there’s a vendor-independent backup TODAY, while you decide whether C is enough or you want Immich on top
Cost: essentially $0 for the first sync (R2 PUTs free under 1M/month). Storage cost ramps with size: - 100 GB ≈ $1.50/mo - 500 GB ≈ $7.50/mo - 1 TB ≈ $15/mo - 4 TB (Audrey’s full library scope) ≈ $60/mo
Dedupe — the actual pain point (3-pass strategy)
| Pass | What | Auto-resolve? |
|---|---|---|
| 1. Exact byte-hash (SHA256) | Catches album-duplication + trivial copies. Run at ingest. | ✓ safe to auto-dedupe |
| 2. Perceptual hash (pHash / dHash) | Catches edited-vs-original, crops, same shot at different resolutions | ✗ surface for review — never auto-delete |
| 3. EXIF heuristics | Same camera + timestamp ± 2s + same dimensions = almost certainly same shot in different formats | ✗ surface for review |
Tools:
- Immich does pass 1 + 2 natively at ingest + ongoing maintenance task
- czkawka (free, Rust, GUI + CLI) — standalone perceptual dedup; best-in-class image dedup including near-dupes
- rclone dedupe — built-in for the rclone-sync path; file-level hash only
- fdupes / jdupes — classic CLI hash dedup
- exiftool — manual surgery when automated tools don’t quite get it right
Pattern: hash-dedupe AT INGEST so we don’t import duplicates in the first place; perceptual-dedupe as a maintenance task once library is loaded; manual review of perceptual matches when in the mood.
Google Takeout — the five known sharp edges
Critical to know BEFORE running migration:
-
Metadata isn’t in the photos. Each photo exports with a
*.jsonsidecar carrying real date, GPS, description, album memberships, favorites flag. The EXIF inside the image is often stripped or wrong because Google modified it server-side. Tools must reconcile sidecar → image and write metadata back. Naive ingestion loses dates entirely. -
Filenames get truncated. Google chops filenames around 46 chars, and the JSON sidecar may use a different truncation. Pairing
long_filename_2023.jpgtolong_filename_20.jsonis non-trivial. Tools handle this with varying quality. -
Album duplication. A photo in 3 albums → exported 3 times in 3 folders. Without smart ingestion you triple your library. You usually want to preserve album structure but as tags, not duplicates.
-
Live Photos split into .HEIC + .MOV (sometimes recombined, sometimes not). Edited versions exported alongside originals. Both inconsistent.
-
Downloads are chunked. A 200GB library is ~40 × 50GB zips. Scripting the assembly is required. Takeout itself can take days to generate for a large account (Audrey’s 4TB → several days minimum).
Migration tools — two clean paths
| Tool | When to use |
|---|---|
immich-go |
Written by the Immich team for Takeout. Sidecar reconciliation + album mapping + hash dedupe + edited-version pairing. Cleanest path if Immich is the destination. |
google-photos-takeout-helper (TheLastGimbus / GitHub) |
Flat-file path — reconciles JSON back into EXIF, organises by date, dedupes within albums, outputs “real” files. Use if going flat-file route OR as a pre-step before Immich import. |
Both are well-maintained, both have edge-case bugs you’ll discover if the library has anything weird in it.
The 90-day guardrail
Don’t delete from Google for 90 days minimum after migration.
- Once a week, spot-check date ranges (oldest month, newest, a known-volume holiday) and verify counts
- If something’s missing, pull from Google before source goes away
- Google won’t delete on you, just keep charging — the value of being able to query the old source during verification is enormous
- People have stories about discovering missing batches three weeks in
Detecting what’s missing
| Check | How |
|---|---|
| Total counts | Google Storage panel = total media count. Takeout file count should match minus JSON sidecars. Discrepancy = export failure |
| Sample by date | Pick months from 2018 / 2022 / last year, count + compare to Google’s date browser |
| Sample by album | “Italy 2019” had 312 photos in Google? Count yours |
| Exhaustive verification | NOT easily possible — Google’s API isn’t designed for it. 90-day rule is the practical mitigation |
Source of truth — 3 states, not 1
The architectural clarification that matters most:
| State | What | Where |
|---|---|---|
| T=0 Takeout dump (frozen) | Ground truth on the day of export. Never touched. The “if I screwed up the import, I can re-derive everything” backstop. Keep for at least a year, ideally forever. | R2 Archive tier (~$0.0036/GB/mo) + external SSD in a drawer (belt-and-braces) |
| T=0+1 working library (going forward) | Source of truth from here on. Where new photos land, where edits + tags accrue. | Immich instance (in option B); the Mac filesystem (in option C) |
| R2 durable mirror | NOT the truth itself — a copy of working state. Continuous sync. | R2 standard storage; rclone or Immich → S3 backend |
Don’t conflate “I have a copy on R2” with “I have a working archive.” R2 backs up state; the working library is state.
For 4TB Audrey-scale: - T=0 frozen in R2 Archive: ~$14/mo forever - Working library + R2 mirror at standard: ~$60/mo
Classification — why Immich wins (the killer features)
| Feature | Why it earns its keep |
|---|---|
| Face clustering | Tag a cluster ONCE (“Wife” / “Client X” / “Baby”) → 8,000 photos instantly searchable. For baby photos this is enormous — first two years = 10,000+ photos, manual tagging impossible |
| CLIP semantic search | “Whiteboards” finds every client meeting whiteboard → instant client-work segregation. “Beach” finds vacations. “Receipts” finds receipts. Free text query, no manual tagging required |
| Folder-based at ingest | Google albums “Client — Acme” / “Personal — 2024” become Immich albums automatically via immich-go |
| Geographic | Photos at client addresses during business hours = client. Photos at home = personal. Crude but useful for first-pass triage |
Realistic workflow: face-cluster first (one evening → ~70% of human-photo classification). Album-preserve at ingest (another ~15%). Manual tagging for the rest, ambient as you encounter photos during normal browsing. Don’t try to do it all at once.
Can it run on RAID inside Docker? (Dan’s question)
Yes — standard pattern. Immich + Docker Compose + RAID is the dominant Immich-at-home shape.
- Immich runs as a Docker Compose stack (web + ML + database + storage)
- The
UPLOAD_LOCATIONenv var points at any host path you mount in — a RAID array, NAS over NFS/SMB, or just plain disk - RAID gives you intra-tier-4 redundancy on the working set: if one drive in the mirror fails, no data loss + no downtime
- Time Machine on the Mac hosting Docker gives a separate snapshot layer
- R2 mirror gives off-site copy
Typical home setup:
- Mac mini (or Synology NAS, or Linux box) with 2× large drives in RAID 1 mirror, mounted as /srv/photos
- Docker Compose stack with Immich pointing UPLOAD_LOCATION=/srv/photos
- cloudflared tunneling immich.gf.cx → localhost:2283 (Immich’s port)
- CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy on the hostname
If the Mac mini host dies: the RAID drives can be physically moved to a replacement host, Docker stack comes back up reading from same paths, no data loss. If a single drive fails: hot-swap, RAID rebuilds.
Dan’s layer model (2026-05-24, mid-sketch refinement)
“Layer 0 could involve a lot of batching and building independent library that can live on photos.gf.cx — and we serve behind a CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access”
| Layer | What | Where | Auth |
|---|---|---|---|
| Layer 0 — infrastructure | RAID array + Docker host + cloudflared tunnel | Mac mini @ home | physical access |
| Layer 1 — application + library | Immich app + working library (the indexed substrate, where face clusters / CLIP / tags / albums live) | Container reaching the RAID + photos.gf.cx hostname served via tunnel | CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access keyed to Dan + Audrey |
| Layer 2 — durable mirror | rclone sync to R2 standard bucket | R2 region us-east | bucket private; Worker-fronted if browseable |
| Layer 3 — immutable origin | T=0 Takeout dump frozen | R2 Archive tier | private bucket, manual retrieval |
The CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access boundary at Layer 1 means photos.gf.cx is the only public hostname — the Docker host’s internal network, the RAID mounts, the Postgres metadata DB are all unreachable except via the tunnel. Same architectural promise as claim.gf.cx (signed-content / PII-bearing surfaces) and ask-opus.gf.cx.
It IS a massive undertaking — Dan’s framing is correct. The smallest first useful step that doesn’t commit to the whole stack:
- Today / this week — Trigger Google Takeout export (takes Google days to generate). No infrastructure decisions yet.
- Within the 90-day window — Set up Mac mini + RAID + Docker + Immich locally (one weekend). Ingest just one year of photos via
immich-goas a feel-test. - After feel-test passes — Bind
photos.gf.cxvia the same recipe we used foramazon-evidence.gf.cx+ CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy. - After photos.gf.cx is live + happy — Backfill remaining ~3.9 TB. Set up rclone → R2 sync (Layer 2).
- After Layer 2 is verified — Begin deletion from Google (one folder at a time, never the whole library at once).
Each step is reversible. Each step delivers a working capability. The full migration spans probably 2-3 months of evenings + weekends, not a single sprint.
Multi-axis addressability — sub-subdomains as smart views (Dan 2026-05-24)
“This is where audrey.photos.gf.cx — and dan.photos.gf.cx comes into play, as you can say, work.photos.gf.cx/2010, or school.photos.gf.cx/1995, which are tailored and smart”
The photo library isn’t a single surface — it’s a substrate with N addressable views, each pre-filtered to a persona, category, or era:
| URL shape | Filter applied at edge | Use |
|---|---|---|
photos.gf.cx/ |
unfiltered (default Immich view) | full browse |
audrey.photos.gf.cx/ |
face-cluster: Audrey | “everything of Audrey” |
dan.photos.gf.cx/ |
face-cluster: Dan | “everything of Dan” |
baby.photos.gf.cx/ |
face-cluster: baby | the 10K-photo cohort |
work.photos.gf.cx/2010 |
album-tag: work + year: 2010 | client-work photos from a specific year |
school.photos.gf.cx/1995 |
album-tag: school + year: 1995 | era-specific recall |
wedding.photos.gf.cx/ |
album: wedding | guest-share-friendly |
claim-evidence.photos.gf.cx/ |
tag: insurance-claim | tier-3-gated subset for claims work |
Each URL is a shareable, bookmarkable, tailored view of the same substrate. Audrey doesn’t navigate to “the photos site, click filters” — she goes to audrey.photos.gf.cx and is already there. Dan texts a tenant “check bathroom.photos.gf.cx” for a specific damage view.
This tips the ACM ($10/mo) decision
Earlier in the session (feedback_cf_pages_subdomain_setup_recipe.md) we noted: ACM becomes worth it when you have 8+ sub-subdomain candidates. The sketch above lists 8 immediately and implies many more (bathroom, kitchen, 2008, vacation, food, etc.). Photos library is the use case that justifies enabling ACM for the gf.cx zone.
Two implementation paths
Path A — one Immich, filter at the Worker edge (RECOMMENDED):
- One Immich instance running on the Mac mini + RAID + Docker
- All photos in one library with face clusters + albums + tags
- Each
<persona>.photos.gf.cxis a tiny Cloudflare Worker that hits Immich’s REST API with a pre-baked filter (face cluster ID / album ID / tag) - Worker URL pattern:
<persona>.photos.gf.cx/*→ Worker translates request → fetches from Immich behind cloudflared tunnel → returns pre-filtered HTML/JSON - Cheap to spin up new views: deploy another Worker route in ~5 min
- Cross-references work naturally: a photo of Audrey + Dan together shows on both
audrey.ANDdan.subdomains
Path B — multiple Immich instances:
- One Immich per persona/category
- Heavier (multiple Postgres instances, multiple ML model copies, multiple Docker stacks)
- Watertight isolation if persona-level privacy ever mattered
- Probably overkill for a married couple sharing a library; needed only if guest-share without leaking adjacent content becomes important
Recommendation: Path A. Same architectural promise as the existing tier-3 surfaces — one substrate, many views, Workers filter at the edge. Single source of truth, many smart entry points.
Public OR private per hostname — same architecture, different policy
Dan 2026-05-24: “I can managed these sets, at it will always load super-fast, edge-cache, protect behind CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF access, or publically available for some images.”
The architecture supports both CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF-Access-gated (private) AND fully-public hostnames, identically, with one config flag:
| Hostname | Filter | Access policy | Use |
|---|---|---|---|
audrey.photos.gf.cx |
face: Audrey | CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access (Dan + Audrey emails) | private daily view |
dan.photos.gf.cx |
face: Dan | CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access (Dan + Audrey emails) | private daily view |
baby.photos.gf.cx |
face: baby | CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access (Dan + Audrey + immediate family) | semi-private |
wedding.photos.gf.cx |
album: wedding | CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access (guest list email allowlist) | broader semi-private |
claim-evidence.photos.gf.cx |
tag: insurance | CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access (Dan + temp adjuster grant) | tightly-scoped private |
portfolio.photos.gf.cx |
tag: portfolio-public | NONE — fully public | audreyinc photography portfolio |
press.photos.gf.cx |
album: press-kit | NONE — fully public | brand press kit for Audrey’s work |
landscape-art.photos.gf.cx |
tag: showcase-landscape | NONE — fully public | exhibition showcase |
Adding a public-facing photo surface is the same 5-minute deploy as adding a private one — just skip the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy add. The Worker reads the same R2 mirror; the filter logic is the same shape.
This means the architecture supports BOTH halves of a creative person’s photo life: - Private: family, baby, household inventory, claim evidence - Public: portfolio work, press, brand assets, exhibition showcases
…from ONE library, ONE source of truth, ONE substrate. The public/private cut is a deploy-time decision per hostname, not a storage-time decision per file.
CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policies can be per-subdomain (private side detail)
The ACM cert covers the whole second-level zone, but each sub-subdomain still gets its own CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy:
- audrey.photos.gf.cx — gates to Audrey’s email + Dan’s
- dan.photos.gf.cx — gates to Dan’s email + Audrey’s
- baby.photos.gf.cx — gates to Dan + Audrey + immediate-family allowlist
- wedding.photos.gf.cx — gates to a broader guest list (or maybe a service token + share-link)
- claim-evidence.photos.gf.cx — gates to Dan + the insurance adjuster’s email (temp)
The CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access boundary becomes the share-control plane, sub-subdomain by sub-subdomain.
Adds to the layer model
Updated Layer 1 row in the table above:
| Layer | What |
|---|---|
| Layer 1 — application + library | Immich (single instance) on Mac mini + RAID + Docker; many small Worker filters per sub-subdomain at *.photos.gf.cx (Path A); CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy per hostname; ACM enabled for the gf.cx zone to cover the sub-subdomain SSL |
The compromise architecture (consolidated)
�STASH70�
Daily reality: phone auto-uploads to Immich · iOS/Android apps for browsing · Immich CLI / API for bulk ops · Claude Code SSHs in via Access-for-SSH to do filesystem work directly · rclone keeps R2 in sync.
Cost at 10TB — honest scaling math (Dan 2026-05-24)
“On a more serious note, will a 10TB R2 cost $84 per month…”
No — the $74-84/mo numbers above were the 4TB scenario. R2 scales linearly:
| Tier | $/TB/mo | 10TB total |
|---|---|---|
| Standard (hot, working) | $15.00 | $150/mo |
| Infrequent Access (warm, weekly) | $10.00 | $100/mo |
| Archive (cold, rarely-touched) | $3.60 | $36/mo |
Realistic 10TB with tier optimization (the way Google Photos auto-archives transparently): - Recent 2 years working in Standard (~2TB): $30/mo - Years 3-7 in IA (~3TB): $30/mo - Years 8+ + frozen Takeout in Archive (~5TB): $18/mo - Total: ~$78/mo at 10TB with proper lifecycle policies
Comparison at 10TB (so trade-offs are visible)
| Provider | 10TB/mo | Trade-off |
|---|---|---|
| Google One 2TB × 5 | ~$50/mo | Capped at 2TB each, Google ToS lock-in (the problem we’re escaping) |
| iCloud+ 12TB | $59.99/mo | Apple-ecosystem only, no S3 API |
| Dropbox Business 9TB | $24/mo | Cheapest pure storage, no developer API |
| Backblaze B2 | $50/mo | Cheap storage, but cross-network egress to CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF stack |
| R2 with tiering | ~$78/mo | $0 egress + same-network as Pages/Workers/Access |
R2 wins on integration, not on raw storage price. Since photos.gf.cx will run through CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Workers + Access anyway, R2 is the same-network tier — no bandwidth fees when serving photos to browsers. Backblaze B2 would add cross-network egress costs that erode the storage savings at any meaningful viewing volume.
The R2 question isn’t “is it cheapest” — it’s “does it save me from the next vendor-lock problem.” And yes: S3-compatible API + $0 egress + no ToS-driven feature changes = files-portable-forever.
Daily-UX replacement — Immich as the new “catch-all” (Dan 2026-05-24)
“I assume immich app becomes our new google photos, we are so used to using Google as the catch-all”
The habit change is the slowest part of the migration, not the technical work. What changes for Dan + Audrey day-to-day:
What stays the same (Immich actually delivers Google-Photos-grade UX)
| Daily habit | Google Photos | Immich (after setup) |
|---|---|---|
| Phone snap → auto-uploaded | ✓ | ✓ (iOS + Android apps, Wi-Fi auto-backup) |
| Search “beach photos” | ✓ | ✓ (CLIP semantic search) |
| Search “Audrey + baby” | ✓ | ✓ (face cluster intersection) |
| Albums | ✓ | ✓ (full albums + smart albums) |
| Share an album with grandma | ✓ | ✓ (shared link or shared user) |
| “On this day” / memories | ✓ | ✓ (Immich added this in 2024) |
| Map view of where photos were taken | ✓ | ✓ |
| Browse on iPad / web / phone | ✓ | ✓ (responsive web + native apps) |
What’s different (mostly better, one notably worse)
| Aspect | Google Photos | Immich |
|---|---|---|
| Where photos go after upload | Google’s cloud, you-don’t-know-where | YOUR Mac mini’s RAID, then R2 mirror |
| Privacy of face data | Google trains on it | Stays on YOUR ML container, never leaves the home network |
| Pricing model | Tiered, gets expensive past 2TB | Linear R2 storage cost, $0 egress |
| Multi-axis URLs | “Click filters, scroll, find” | audrey.photos.gf.cx, baby.photos.gf.cx — URL IS the filter |
| Algorithmic “look at this!” surfacing | Aggressive — Google decides what’s important | Gentle — Immich shows memories but doesn’t push them |
| Mobile upload speed | Fast (Google’s CDN) | Depends on home upload bandwidth + mini being on |
| Mini going offline = uploads queue | N/A (always available) | Uploads queue locally on phone until mini reachable ← the one regression |
The transition cost (in weeks of habit-change)
| Phase | Calendar time | Notes |
|---|---|---|
| Install Immich apps on both phones | 10 min | Side-by-side with Google Photos initially |
| Turn ON Immich auto-backup | 1 click each | Both apps backup simultaneously, double-store |
| Use Immich app daily, leave Google running | 2-4 weeks | Build trust + verify nothing’s missing |
| Turn OFF Google Photos auto-backup | 1 click each | The flip moment |
| Keep Google Photos APP installed read-only | ~3 months | Reference during the 90-day Google retention window |
| Delete Google Photos app + cancel Google One storage tier | once verified | The exit completion |
The “where’s that photo of…” reflex shifts gradually. Probably 4-6 weeks before opening Immich becomes the unconscious default instead of Google Photos.
The “share with grandma” use case
Often the #1 unstated requirement when leaving Google Photos. Three options inside Immich:
- Shared albums (in-Immich) — grandma gets an Immich account, sees specific albums you grant her. Works if grandma’s tech-savvy enough to install another app.
- Public share links — Immich generates a URL with optional password + expiry. Grandma clicks, sees photos in browser, no account needed. Best fit for casual relatives.
grandma.photos.gf.cx— Worker filters Immich to a face-cluster: grandma’s pre-selected favorites, behind a wider-allowlist CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy (or a temp-grant token). Same architectural pattern as the persona subdomains.
Most likely production answer: shared public link for casual sharing, persona subdomain for repeat-viewer relatives.
Directional flow — RAID is the central management surface (Dan 2026-05-24)
“And we would be syncing always from a home RAID, that’s the non-cloud, private access, where the management and library central is doing its work.”
Critical architectural rule: the Mac mini + RAID is the SINGLE WRITE SURFACE. Everything else is either an input source (writes THROUGH the mini) or a read mirror (mini writes OUT to it).
�STASH75�
Why this directionality matters
| Rule | Consequence |
|---|---|
| RAID is the single source of truth | Conflicts can’t arise (no merging from multiple write surfaces). The “where’s the latest version?” question has one answer. |
| R2 is read-only mirror | If R2 gets corrupted / hacked / deleted, rebuild from RAID — no data loss. The mirror failing is recoverable; the source failing is not. |
| Inputs (phones, Takeout) write THROUGH the mini | All ingestion passes Immich’s dedup + classify + EXIF-reconcile pipeline. No raw bytes land in R2 without going through the library’s brain first. |
| Worker-views read from R2, not from the mini | Mini doesn’t get hammered by serving photo browsing. ML training + ingest runs on mini; viewers run on R2’s edge cache. Separation of concerns. |
| Phones can be offline; mini can be offline | Different failure modes. Phone-offline: photos queue locally on phone. Mini-offline: photos queue on phone for whenever mini comes back. R2-offline: viewers can’t load (rare for CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF). Each is recoverable independently. |
The “layered access” framing (NOT “air-gap” — Dan’s terminology correction)
Important precision (Dan 2026-05-24): “air-gap is reserved more for off-site, non-connected, is that fair?” — Yes, that’s correct.
The Mac mini + RAID + cloudflared + CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access architecture is layered access, not air-gap:
| Term | Meaning | The mini-with-tunnel setup |
|---|---|---|
| Air-gapped | Physically disconnected. Data moves only via removable media. | ✗ The mini IS reachable via the cloudflared tunnel |
| Layered access / conditionally accessible | Connected, but reach is mediated by multiple gates (network policy → tunnel auth → CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access → app auth) | ✓ This is what we have |
| Self-hosted private | On your own infra, no cloud vendor in path | ✓ Also true (the data lives on YOUR disk) |
| Cloud-vendor-managed | Vendor controls infra + access | ✗ Definitely not |
What’s true: - The RAID is air-gappable on demand — pull the network cable and it becomes air-gapped instantly, library still fully usable from the Mac mini console - Day-to-day it operates layered-access: connected for the phone-auto-upload and remote browse use cases, gated by multiple auth layers - True air-gap belongs to tier 4 (the off-cloud row on payload.gf.cx): family video masters on a USB drive in a drawer, signed tax-return PDFs on encrypted external
This precision matters because “air-gap” implies stronger isolation than the architecture actually provides. Describing it accurately = no false security claims when someone (Audrey, a tenant, an auditor) asks “is this safe?”
Why this still beats the cloud-vendor model
Even though it’s not truly air-gapped, the layered-access shape gives: - Vendor independence: pull the plug, library still works locally - Defense in depth: a compromised CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF account doesn’t get the data (still gated by app auth + filesystem ACLs) - Auditable access: every fetch passes the cloudflared tunnel; logs exist - Reversible: can go to fully-air-gapped mode in 3 seconds (yank the cable) without losing the library
Multi-region option — Mac minis in both NH + London (Dan 2026-05-24)
“Do you think I can run a mac-mini in london, and one in new hope, and have them stay in sync via rsync, so that two identical systems are online”
Yes — but raw rsync is the wrong tool. Three patterns, each with different trade-offs:
DECISION 2026-05-24: Pattern A chosen. Dan: “I like A. Log that to the report.”
Pattern A — Active/Passive · NH canonical, London read replica ⭐ CHOSEN
- NH mini + RAID = only place writes land (preserves single-write-surface rule from Dan’s directional-flow diagram)
- R2 = sync intermediary; NH pushes hourly via rclone
- London mini + RAID = pulls hourly from R2 via rclone, runs Immich reading from local cache
- Phones in London still upload via Immich app → London mini → R2 → NH eventually catches up (~hourly latency)
- Disaster recovery: if NH burns down, manually promote London to canonical (~30 min: change rclone direction, point R2 sync the other way)
- Cost: ~$5/mo extra R2 GET operations for the doubled-read pattern
Best fit for the architecture we just established.
Pattern B — Active/Active · R2 as shared metadata backend
- Both minis treat R2 as Immich’s S3-compatible storage backend
- Postgres metadata DB replicated between minis via cloudflared tunnel
- Both sites can write directly; conflicts (rare for photos) handled via Immich’s UUID-keyed records
- Heaviest setup: needs PG replication, careful Immich config
- Cost: ~$50-100 extra/mo in R2 GET operations + cross-tunnel bandwidth
True HA, but probably overkill for a family library where conflict-risk is near-zero.
Pattern C — Syncthing peer-to-peer · NOT RECOMMENDED
- Syncthing keeps photo directories in sync between minis
- Each Immich has its OWN Postgres DB → face clusters and tags will diverge
- Need periodic “re-index from sidecar JSON” to converge tag state
- Simplest tooling but breaks the “one library” promise
The metadata divergence is the killer. Two libraries that disagree about who’s in a photo isn’t one library — it’s two libraries that share a file tree.
Decision matrix
| Question | A | B | C |
|---|---|---|---|
| Preserves single-write-surface rule? | ✓ | ✗ (R2 is the surface now) | ✗ (two surfaces) |
| Phones work in both locations? | ✓ | ✓ | ✓ |
| Cross-site write visibility | hourly | near-real-time | as fast as Syncthing propagates |
| Face cluster / tag state consistency | ✓ (single DB) | ✓ (replicated DB) | ✗ (diverges) |
| Setup complexity | low | high | medium |
| Disaster recovery | manual promote (~30min) | automatic | unclear |
| Extra R2 cost/mo at 4TB | ~$5 | ~$50-100 | ~$0 |
A’s failure modes (so you know what you’re signing up for)
| Failure | Impact |
|---|---|
| NH ISP outage | London can still read locally; new writes from London queue at London’s mini until R2 reachable. Eventual consistency within hours. |
| London ISP outage | NH unaffected. London queues for whenever it comes back. |
| NH mini hardware dies | London is a warm replica. Restore is “promote London + buy replacement NH mini + reverse rclone direction.” ~Days, not hours. |
| Both ISPs out simultaneously | Each mini browses its local RAID independently. Phones queue. World hasn’t ended. |
| R2 outage (rare) | Both minis still serve from local RAID. Sync resumes when R2 returns. |
This is genuinely “the best of both worlds” — local-disk autonomy on TWO continents, with the R2 layer doing the eventual-consistency lift between them.
Pilot target — dare.co.uk Google Photos first (Dan 2026-05-24)
“dare.co.uk on google photos has to be the first using takeout.”
dare.co.uk = Daniel Dare Ltd, the UK design brand. Its Google Photos library is the right pilot before Audrey’s 4TB family library because:
| Why dare.co.uk first | Why NOT Audrey’s library first |
|---|---|
| Smaller scope (probably 100-500 GB vs 4TB) | 4TB initial Takeout = days to generate, weeks to ingest |
| Business/portfolio content = naturally public-shareable | Family photos = high “must not lose anything” risk |
Existing dare.co.uk domain = photos.dare.co.uk natural fit |
photos.gf.cx requires more architectural decisions upfront |
| Lower emotional risk = good test case for Takeout knives | Mistakes on family library are unrecoverable in feeling |
| Validates the Worker-filter persona pattern on a real use | Validates nothing until the architecture is already proven |
| Result: public portfolio surface that audreyinc/Dan can share | Result: private family archive (no immediate public value) |
Sequence for the dare.co.uk pilot:
1. Trigger Google Takeout for dare.co.uk Google account (takes hours-to-days)
2. Set up Mac mini + RAID + Docker + Immich (one weekend) — this is the LAYER-0 infrastructure that BOTH dare.co.uk and audreyinc photos will use
3. Ingest dare.co.uk Takeout via immich-go — test the dedup + sidecar reconcile pipeline on real data
4. Stand up photos.dare.co.uk (cdn-router instance) — first deployed photo surface
5. Maybe spin up portfolio.dare.co.uk (public) as the proving ground for the public/private hostname pattern
6. Lessons learned from the pilot inform the audreyinc 4TB migration — knowns are knowns by then
Note: photos.dare.co.uk uses the dare.co.uk zone (not gf.cx), but the architecture is identical — different DNS zone, same cdn-router Worker + R2 bucket + CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy + Immich filter. Pattern-portable across all Dan-owned domains.
Immich install — Docker is the right answer (Dan 2026-05-24)
“I’ll need to spin up the RAID and install docker, I assume that’s the best way to connect it, unless you have a preferred way of running immich?”
Docker Compose is the official + most-supported install path for Immich. Use it.
| Option | Verdict |
|---|---|
| Docker Compose | ✓ Official. The way Immich is built + tested + documented. docker compose pull && up -d for upgrades. Standard config; huge community support. |
| Unraid | Fine if the host OS is Unraid; uses Docker underneath anyway. Not a fit for Mac mini. |
| TrueNAS Scale | Same as Unraid — Linux-only NAS OS, not relevant for Mac mini. |
| Native macOS install | Possible but not supported; you’d be on your own for bugs. Skip. |
| Kubernetes | Overkill for home + family-library scope. Skip. |
| Proxmox LXC | Linux homelab pattern; not relevant for Mac mini. |
On Mac mini specifically: use OrbStack, not Docker Desktop
OrbStack is the modern preferred Docker runtime on macOS:
| OrbStack | Docker Desktop | |
|---|---|---|
| Native Apple Silicon | ✓ optimized | ✓ but heavier |
| Memory footprint | ~1 GB | 4-8 GB |
| Boot time | ~1 sec | 10-30 sec |
| License | Free for personal | Free for personal, $5/user/mo for business |
| Compatibility | Drop-in for docker CLI + Compose | Same |
| File-share performance | Significantly faster | Slower (especially on RAID-mounted volumes) |
For Audrey’s full 4TB scale, OrbStack’s file-share performance is a real win — Immich indexing scans need to read every photo file, and Docker Desktop’s VirtIO-FS layer adds latency on every read.
RAID enclosure recommendations for Mac mini
Mac mini has no internal bays, so external Thunderbolt RAID is the answer:
| Option | Notes |
|---|---|
| OWC ThunderBay 4 Mini | 4-bay TB4 enclosure, software RAID via SoftRAID Pro. ~$300 + drives. Solid + widely-used. |
| OWC ThunderBlade | SSD-only, very fast but $$ |
| Sabrent 4-bay TB4 | Cheaper alternative; less polished software |
| Synology DS923+ (over network) | NAS instead of direct-attached; slower for Immich indexing scans but offers more services (Synology Photos as a fallback) |
| MacBook + external = NO | Don’t run a home-server on a portable laptop |
For Audrey’s 4TB scale: 2× 8TB drives in RAID 1 mirror in an OWC ThunderBay = ~8TB usable, single-drive-failure tolerant, ~$600 all-in for enclosure + drives. Add Time Machine to a separate external for full belt-and-braces.
Build cost / sequence
| Phase | Effort | Cost |
|---|---|---|
| Inventory current copy locations (Google Drive folders, iCloud, local) | 1-2 hr | $0 |
| Dedupe local pass (czkawka or rclone dedupe) | 4-8 hr CPU + manual review | $0 |
| First rclone sync to R2 (~4 TB if full library) | overnight | ~$0 (within free PUT tier) |
| Tiny Pages viewer at photos.gf.cx | 2-3 hr build | $0 |
| CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy on photos.gf.cx | 10 min | $0 |
| (Optional) hourly launchd sync | 30 min | $0 |
| Total to Option C live | ~2-3 days incl. dedupe | ~$60/mo R2 storage |
Empirical throughput data (2026-05-25 disk-cleanup rehearsal)
While freeing the Mac’s internal SSD by archiving large media folders to the current 4TB external (the same drive that would host RAID for the build phase), measured sustained ditto throughput is ~22-25 MB/s on the 52GB 4K Stogram folder.
That implies for a 4TB local-attach copy (mini ↔ RAID enclosure) at the current drive speed: ~44 hours sustained — NOT “overnight.” Two consequences for the build plan:
- Pre-flight before the real Immich migration: verify the 4TB drive is on a USB-C / Thunderbolt port at native speed; current rate suggests USB 2.0 or a slower spinning enclosure. The recommended OWC ThunderBay 4 Mini over TB4 should hit 500-1000+ MB/s and collapse this to a single overnight.
- Cable-swap is cheap when timed right: if the current enclosure has a TB port that’s running over USB-A/2.0 by accident (common with bundled cables), swapping to a TB4 cable BETWEEN copy operations (never during) can 10-50× the rate. Dan flagged this 2026-05-25 — worth a quick port + cable check before the next bulk move.
- Throughput is a real planning variable, not a footnote. When the time comes, run a 100GB synthetic copy to the new enclosure first and measure — the build plan should size all phases against actual hardware, not assumed numbers.
Note: this only bounds local-attach copies. The R2-sync phase is bounded by home upload bandwidth instead (100 Mbps up = ~89 hrs for 4TB; gigabit fiber = ~9 hrs).
SYNTHESIS — the four-layer storage model (Dan 2026-05-24, late-session compounding)
Four patterns compose into a single architecture that’s substantially cheaper and more honest than the original “mirror everything to R2” first-cut. Each refinement was a Dan-driven correction; together they collapse the monthly bill from ~$60/mo to ~$2-12/mo for the same 4TB library.
| Layer | What it does | What it touches |
|---|---|---|
| L0 — process-at-source (the meta-rule) | Raw inputs never write directly to cloud — they pass through the brain (Immich) which dedupes, classifies, emits sidecar metadata | Everything below it depends on this gate |
| L1 — derivatives-not-masters (size tier) | R2 mirrors web-display derivatives (1500px JPG + 1080p H.264 + thumb + sidecar); RAID keeps RAW/HEIC/HEVC masters | Cuts R2 storage ~10x (4TB→500GB) |
| L2 — curation tier (judgment filter) | Only items tagged publish (or in an album, favorited, etc.) trigger the derivative pipeline; the other 70% live only as RAID masters |
Cuts R2 storage another 3-5x (500GB→150GB) |
| L3 — video-codec normalization | Mac mini transcodes HEVC → H.264 baseline + faststart at ingest so derivatives play in every browser; raw HEVC stays untouched on RAID | Makes the derivatives actually usable across Safari/Chrome/Firefox |
What R2 actually holds (per item)
For a surfaced item:
�STASH84�
For an archived item (curation tier filtered out):
�STASH85�
The view always knows the count: “showing 10 of 300 from camping trip — full archive on RAID.”
Honest-by-construction property
The compound result is an architecture where no layer can lose track of what exists vs what’s a copy:
- R2 always knows it’s the mirror (sidecar marks it as
derivative) - The mirror always knows the way home (
sidecar.original.on_raid_at+ sha256 + access paths) - The view always knows what it’s hiding (stub sidecars + footer count)
- Masters never leave the RAID (the tier that fails gracefully)
- Curation never destroys anything (tag toggles, not file deletes)
Revised cost at 4TB
| Tier | Storage | Monthly |
|---|---|---|
| RAID masters (sunk hardware) | 4 TB | $0 |
| R2 derivatives (curation tier ~30%) | ~150 GB | ~$2.25 |
| R2 metadata mirror (DB dumps + sidecars) | ~1 GB | ~$0.02 |
| ACM for sub-subdomain SSL | — | ~$10 |
| Total monthly bill | ~$12 |
Down from the original ~$60-84/mo estimate. Cheaper than the Google One subscription it replaces.
Critical durability note — back up Immich’s DB
The curation metadata (tags, albums, face IDs, favorites) lives ONLY in Immich’s PostgreSQL DB. If that DB dies, masters survive on RAID but curation is lost. Two defenses to wire in from day one:
- Nightly
pg_dump→ R2 — small, cheap, restores Immich state - XMP sidecar export — write tags+ratings into XMP files next to each master so curation survives Immich entirely (Lightroom-standard, vendor-independent)
Cross-references
feedback_process_at_source_sync_clean_outputs.md— L0 meta-rule (the brain pattern)feedback_derivatives_in_r2_originals_on_raid.md— L1 size-tier (negatives-stay-home)feedback_curation_tier_album_as_worker_query.md— L2 curation-tier (albums as Worker queries)feedback_video_in_r2_photo_library_substrate.md— L3 video codec normalizationfeedback_image_evidence_via_r2_pattern.md— sibling pattern (R2 + cdn-router for image substrate); same Worker shapeparked_sketch_sandbox_gfcx_subdomain_template.md— scaffold could grow a--photo-librarymodefeedback_cf_pages_subdomain_setup_recipe.md—photos.gf.cxsubdomain setup recipeparked_sketch_audrey_eras_few_clicks_deep.md— sibling audreyinc archive workfeedback_api_burn_rate_throttle_chunks.md— dedupe + initial-sync should be chunked per this rulefeedback_pattern_library_composes_into_exits.md— meta-observation that this whole sketch is composition of named patterns, not invention
Build trigger
Build when ANY: - Audrey’s frustration with the duplicate problem hits a “fix this now” threshold - A specific photo is needed urgently + can’t be found across the 5 copy planes - Google or iCloud quota costs spike from the duplicates - Dan + Audrey have a clean weekend to do the inventory + dedupe pass together
The aphorism
Three plans, one auth layer. The network path is the policy. Pick the storage shape that matches how the library actually gets used today, graduate when the missing feature becomes the bottleneck.