Audrey’s 4TB photo library — migrate from Google/iCloud-dupe-mess to R2 (parked 2026-05-24)

DARE.CO.UK · PARKED SKETCH · 2026-07-15

Mirrored from ~/.claude/.../memory/parked_sketch_audrey_4tb_photo_library_to_r2_2026-05-24.md. This is a design sketch parked for future build — read for context, not as a current deliverable.

Audrey’s photo library is a duplicates-of-duplicates mess scattered across Google Photos + iCloud + multiple Drive locations. Three architectural options sketched (R2-native + Worker · Docker Immich + cloudflared · flat files + rclone sync + read-only viewer). Recommendation: Option C (flat files + rclone) — matches the portfolio’s “flat files in git, hosting swappable” promise, graduates to B when face-search / on-this-day become missed features. First reversible move: create R2 bucket + one rclone sync. ~$5-10/mo for moderate library, no hardware required for C.

Dan 2026-05-24: “Audrey has duplicate problem, copies-of-copies-of-copies in multiple google locations, plus icloud, it’s a troublesome mess. so I’m keen to go into sketch-mode.”

The problem

Where photos live today	Status
Google Drive (multiple folders)	Duplicates-of-duplicates, no canonical version
Google Photos	Auto-uploaded copies overlapping with Drive
iCloud	Yet another copy plane
Local Mac	Where actual edits + imports happen

The duplicate sprawl is the real pain — same photo lives in 3-5 places, no single source of truth, no clear “what’s been backed up” status, no programmatic dedupe path.

Three architectural options (from Claude-on-desktop transcript)

Option A — R2-native, no always-on hardware

R2 bucket holds originals
Tiny Cloudflare Worker provides API (upload, list, signed-URL fetch, thumbnail variants via Cloudflare Images)
Pages app at photos.gf.cx is the UI
CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy on hostname keyed to Dan’s email; Worker validates JWT
Bucket stays private; Worker is only reader

Trade-offs: - ✓ Zero hardware, zero always-on - ✓ ~$0.015/GB/month, zero egress · 500GB ≈ $7.50/mo - ✗ No Immich-like app (no face recognition, no on-this-day, no search-by-content) - ✗ Fine as working archive, weak as daily-driver Google Photos replacement

Option B — Docker box at home running Immich, behind cloudflared

Mac mini / Pi 5 / corner of dev box
Immich (dominant self-hosted Google Photos clone in 2026): face detection, iOS/Android apps with auto-upload, smart search
Docker compose, very actively developed
cloudflared exposes immich.gf.cx; CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access in front
Immich machine binds to localhost only — only path in is outbound tunnel

Trade-offs: - ✓ Full Google-Photos-equivalent UX - ✓ Mobile auto-upload, face search, smart albums - ✗ Machine has to be on (electricity + maintenance) - ✗ Immich keeps its own database mapping files → metadata; on-disk layout isn’t quite “just files in dated folders” (exportable, not transparent)

Option C — Flat files locally, rclone to R2, tiny read-only Pages viewer ⭐ RECOMMENDED

Photos live in ~/Photos/2026/05/... on Mac — just HEIC/JPG in dated folders
rclone sync ~/Photos r2:photos-gf-cx runs hourly via launchd or Hazel rule
Pages site at photos.gf.cx reads R2 (via signed-URL Worker), behind CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access, browse-anywhere
Claude Code does filesystem operations (mv, cp, find) directly because files are right there

Trade-offs: - ✓ Matches the portfolio’s “flat files in git, hosting swappable” promise (architectural consistency) - ✓ Mac is write surface (already is — that’s where import + edit happen) - ✓ R2 is the durable copy - ✓ Viewer is near-static read of what’s in R2 - ✓ Vendor-independent (rclone speaks B2, S3, Azure, local disk) - ✓ Claude Code is best at filesystem ops - ✓ ~$5-10/mo for moderate library, no hardware - ✗ No face search, no on-this-day (graduate to B if those become missed features) - ✗ Read-only from mobile (graduate to A when upload-from-mobile matters)

Recommendation — REVISED 2026-05-24 (after second pass on classification)

Go with B (Immich) directly. The previous “start with C, graduate later” recommendation is superseded.

The flip is driven by the classification problem that wasn’t fully accounted for in pass 1:

Audrey’s 4TB library mixes client work, personal life, baby photos (10,000+ for first two years)
Without face clustering + semantic search, the library is “vendor-portable” but unfindable
The directory tree is movable; the index that makes it usable is not
Once index volume crosses ~tens of thousands of photos, the flat-file promise quietly breaks

The compromise that preserves the portfolio’s “files-not-platform” promise:

Layer	What
Working	Immich on a Mac mini at home (Docker) — face clusters, CLIP search, mobile auto-upload, dated/EXIF-correct flat files underneath
Durable mirror	rclone sync of Immich’s storage folder → R2 (tier 3) — same files Immich operates on, mirrored for vendor independence
Immutable origin	Takeout dump kept frozen in R2 Archive tier — the T=0 ground truth, never deleted

If Immich disappeared tomorrow: you lose face-cluster + CLIP index, but the underlying organised flat files survive. Same trade as everything else in the portfolio — underlying record durable, index on top replaceable.

The Mac mini becomes the photographic equivalent of the dare.co.uk Worker — small, single-purpose, behind a CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access tunnel, nothing public, files on disk underneath. Same architectural shape as the rest of the gf.cx stack.

Why C alone falls short

Workflow	C (flat files)	B (Immich)
“All photos of Audrey + baby with grandma”	✗ manual scroll	✓ face cluster intersection
“Whiteboards from client meetings”	✗ rgrep filenames + hope	✓ CLIP semantic search
“Receipts from work travel 2024”	✗ manual triage	✓ CLIP “receipts” + date filter
“Photos of the broken sprayer for insurance”	✗ remember when it broke	✓ CLIP “sprayer” or face/place
Mobile auto-upload	✗ doesn’t exist	✓ standard Immich feature
Vendor independence	✓ R2-mirrored flat files	✓ same — rclone mirror of Immich storage

C is still a fine FIRST step (the immutable origin lands in R2 regardless), but the working layer should be Immich from day one.

Maps directly onto the gf.cx tier diagram

The photo work is tier 3 + tier 4 territory in the payload.gf.cx framing:

Tier	Role in photos work
Tier 1 (git)	Not used — too large for git
Tier 2 (payload.gf.cx public R2)	Not used — photos are private
Tier 3 (CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF-Access-gated R2)	The destination — `photos.gf.cx` bucket, Worker-fronted, Access policy keyed to Dan’s email. Audrey + Dan have a browse-anywhere copy. No one else does.
Tier 4 (off-cloud)	The source of truth — Mac local `~/Photos/...` + Time Machine + encrypted external. Stays authoritative even if R2 disappears.

All 3 options (A / B / C) keep tier 4 intact. They differ in WHAT lives in tier 3: - A — R2 bucket + minimal Worker API (no app, just access) - B — Immich app running on Docker box, R2 as Immich’s backing store - C — flat-file R2 mirror via rclone + read-only viewer

The “first reversible move” is purely a tier-3 setup — create the bucket, sync once, you have tier-3 today.

The common architecture across all three

“the CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access tunnel pattern is doing the same trick in all three options — it lets the actual service run with zero internal auth, because the network path itself is the auth”

Cloudflare Access on the hostname = the auth. The photos app, the worker, the Immich instance — none of them know who Dan is. Access knows, that’s enough. This is the de-facto homelab shape in 2026.

The “first reversible move” (no-commitment first step)

Create R2 bucket photos-gf-cx
Generate R2 API token
rclone sync ~/Photos r2:photos-gf-cx of whatever’s currently in the library
Now there’s a vendor-independent backup TODAY, while you decide whether C is enough or you want Immich on top

Cost: essentially $0 for the first sync (R2 PUTs free under 1M/month). Storage cost ramps with size: - 100 GB ≈ $1.50/mo - 500 GB ≈ $7.50/mo - 1 TB ≈ $15/mo - 4 TB (Audrey’s full library scope) ≈ $60/mo

Dedupe — the actual pain point (3-pass strategy)

Pass	What	Auto-resolve?
1. Exact byte-hash (SHA256)	Catches album-duplication + trivial copies. Run at ingest.	✓ safe to auto-dedupe
2. Perceptual hash (pHash / dHash)	Catches edited-vs-original, crops, same shot at different resolutions	✗ surface for review — never auto-delete
3. EXIF heuristics	Same camera + timestamp ± 2s + same dimensions = almost certainly same shot in different formats	✗ surface for review

Tools: - Immich does pass 1 + 2 natively at ingest + ongoing maintenance task - czkawka (free, Rust, GUI + CLI) — standalone perceptual dedup; best-in-class image dedup including near-dupes - rclone dedupe — built-in for the rclone-sync path; file-level hash only - fdupes / jdupes — classic CLI hash dedup - exiftool — manual surgery when automated tools don’t quite get it right

Pattern: hash-dedupe AT INGEST so we don’t import duplicates in the first place; perceptual-dedupe as a maintenance task once library is loaded; manual review of perceptual matches when in the mood.

Google Takeout — the five known sharp edges

Critical to know BEFORE running migration:

Metadata isn’t in the photos. Each photo exports with a *.json sidecar carrying real date, GPS, description, album memberships, favorites flag. The EXIF inside the image is often stripped or wrong because Google modified it server-side. Tools must reconcile sidecar → image and write metadata back. Naive ingestion loses dates entirely.
Filenames get truncated. Google chops filenames around 46 chars, and the JSON sidecar may use a different truncation. Pairing long_filename_2023.jpg to long_filename_20.json is non-trivial. Tools handle this with varying quality.
Album duplication. A photo in 3 albums → exported 3 times in 3 folders. Without smart ingestion you triple your library. You usually want to preserve album structure but as tags, not duplicates.
Live Photos split into .HEIC + .MOV (sometimes recombined, sometimes not). Edited versions exported alongside originals. Both inconsistent.
Downloads are chunked. A 200GB library is ~40 × 50GB zips. Scripting the assembly is required. Takeout itself can take days to generate for a large account (Audrey’s 4TB → several days minimum).

Migration tools — two clean paths

Tool	When to use
`immich-go`	Written by the Immich team for Takeout. Sidecar reconciliation + album mapping + hash dedupe + edited-version pairing. Cleanest path if Immich is the destination.
`google-photos-takeout-helper` (TheLastGimbus / GitHub)	Flat-file path — reconciles JSON back into EXIF, organises by date, dedupes within albums, outputs “real” files. Use if going flat-file route OR as a pre-step before Immich import.

Both are well-maintained, both have edge-case bugs you’ll discover if the library has anything weird in it.

The 90-day guardrail

Don’t delete from Google for 90 days minimum after migration.

Once a week, spot-check date ranges (oldest month, newest, a known-volume holiday) and verify counts
If something’s missing, pull from Google before source goes away
Google won’t delete on you, just keep charging — the value of being able to query the old source during verification is enormous
People have stories about discovering missing batches three weeks in

Detecting what’s missing

Check	How
Total counts	Google Storage panel = total media count. Takeout file count should match minus JSON sidecars. Discrepancy = export failure
Sample by date	Pick months from 2018 / 2022 / last year, count + compare to Google’s date browser
Sample by album	“Italy 2019” had 312 photos in Google? Count yours
Exhaustive verification	NOT easily possible — Google’s API isn’t designed for it. 90-day rule is the practical mitigation

Source of truth — 3 states, not 1

The architectural clarification that matters most:

State	What	Where
T=0 Takeout dump (frozen)	Ground truth on the day of export. Never touched. The “if I screwed up the import, I can re-derive everything” backstop. Keep for at least a year, ideally forever.	R2 Archive tier (~$0.0036/GB/mo) + external SSD in a drawer (belt-and-braces)
T=0+1 working library (going forward)	Source of truth from here on. Where new photos land, where edits + tags accrue.	Immich instance (in option B); the Mac filesystem (in option C)
R2 durable mirror	NOT the truth itself — a copy of working state. Continuous sync.	R2 standard storage; rclone or Immich → S3 backend

Don’t conflate “I have a copy on R2” with “I have a working archive.” R2 backs up state; the working library is state.

For 4TB Audrey-scale: - T=0 frozen in R2 Archive: ~$14/mo forever - Working library + R2 mirror at standard: ~$60/mo

Classification — why Immich wins (the killer features)

Feature	Why it earns its keep
Face clustering	Tag a cluster ONCE (“Wife” / “Client X” / “Baby”) → 8,000 photos instantly searchable. For baby photos this is enormous — first two years = 10,000+ photos, manual tagging impossible
CLIP semantic search	“Whiteboards” finds every client meeting whiteboard → instant client-work segregation. “Beach” finds vacations. “Receipts” finds receipts. Free text query, no manual tagging required
Folder-based at ingest	Google albums “Client — Acme” / “Personal — 2024” become Immich albums automatically via `immich-go`
Geographic	Photos at client addresses during business hours = client. Photos at home = personal. Crude but useful for first-pass triage

Realistic workflow: face-cluster first (one evening → ~70% of human-photo classification). Album-preserve at ingest (another ~15%). Manual tagging for the rest, ambient as you encounter photos during normal browsing. Don’t try to do it all at once.

Can it run on RAID inside Docker? (Dan’s question)

Yes — standard pattern. Immich + Docker Compose + RAID is the dominant Immich-at-home shape.

Immich runs as a Docker Compose stack (web + ML + database + storage)
The UPLOAD_LOCATION env var points at any host path you mount in — a RAID array, NAS over NFS/SMB, or just plain disk
RAID gives you intra-tier-4 redundancy on the working set: if one drive in the mirror fails, no data loss + no downtime
Time Machine on the Mac hosting Docker gives a separate snapshot layer
R2 mirror gives off-site copy

Typical home setup: - Mac mini (or Synology NAS, or Linux box) with 2× large drives in RAID 1 mirror, mounted as /srv/photos - Docker Compose stack with Immich pointing UPLOAD_LOCATION=/srv/photos - cloudflared tunneling immich.gf.cx → localhost:2283 (Immich’s port) - CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy on the hostname

If the Mac mini host dies: the RAID drives can be physically moved to a replacement host, Docker stack comes back up reading from same paths, no data loss. If a single drive fails: hot-swap, RAID rebuilds.

“Layer 0 could involve a lot of batching and building independent library that can live on photos.gf.cx — and we serve behind a CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access”

Layer	What	Where	Auth
Layer 0 — infrastructure	RAID array + Docker host + cloudflared tunnel	Mac mini @ home	physical access
Layer 1 — application + library	Immich app + working library (the indexed substrate, where face clusters / CLIP / tags / albums live)	Container reaching the RAID + photos.gf.cx hostname served via tunnel	CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access keyed to Dan + Audrey
Layer 2 — durable mirror	rclone sync to R2 standard bucket	R2 region us-east	bucket private; Worker-fronted if browseable
Layer 3 — immutable origin	T=0 Takeout dump frozen	R2 Archive tier	private bucket, manual retrieval

The CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access boundary at Layer 1 means photos.gf.cx is the only public hostname — the Docker host’s internal network, the RAID mounts, the Postgres metadata DB are all unreachable except via the tunnel. Same architectural promise as claim.gf.cx (signed-content / PII-bearing surfaces) and ask-opus.gf.cx.

It IS a massive undertaking — Dan’s framing is correct. The smallest first useful step that doesn’t commit to the whole stack:

Today / this week — Trigger Google Takeout export (takes Google days to generate). No infrastructure decisions yet.
Within the 90-day window — Set up Mac mini + RAID + Docker + Immich locally (one weekend). Ingest just one year of photos via immich-go as a feel-test.
After feel-test passes — Bind photos.gf.cx via the same recipe we used for amazon-evidence.gf.cx + CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy.
After photos.gf.cx is live + happy — Backfill remaining ~3.9 TB. Set up rclone → R2 sync (Layer 2).
After Layer 2 is verified — Begin deletion from Google (one folder at a time, never the whole library at once).

Each step is reversible. Each step delivers a working capability. The full migration spans probably 2-3 months of evenings + weekends, not a single sprint.

Multi-axis addressability — sub-subdomains as smart views (Dan 2026-05-24)

“This is where audrey.photos.gf.cx — and dan.photos.gf.cx comes into play, as you can say, work.photos.gf.cx/2010, or school.photos.gf.cx/1995, which are tailored and smart”

The photo library isn’t a single surface — it’s a substrate with N addressable views, each pre-filtered to a persona, category, or era:

URL shape	Filter applied at edge	Use
`photos.gf.cx/`	unfiltered (default Immich view)	full browse
`audrey.photos.gf.cx/`	face-cluster: Audrey	“everything of Audrey”
`dan.photos.gf.cx/`	face-cluster: Dan	“everything of Dan”
`baby.photos.gf.cx/`	face-cluster: baby	the 10K-photo cohort
`work.photos.gf.cx/2010`	album-tag: work + year: 2010	client-work photos from a specific year
`school.photos.gf.cx/1995`	album-tag: school + year: 1995	era-specific recall
`wedding.photos.gf.cx/`	album: wedding	guest-share-friendly
`claim-evidence.photos.gf.cx/`	tag: insurance-claim	tier-3-gated subset for claims work

Each URL is a shareable, bookmarkable, tailored view of the same substrate. Audrey doesn’t navigate to “the photos site, click filters” — she goes to audrey.photos.gf.cx and is already there. Dan texts a tenant “check bathroom.photos.gf.cx” for a specific damage view.

This tips the ACM ($10/mo) decision

Earlier in the session (feedback_cf_pages_subdomain_setup_recipe.md) we noted: ACM becomes worth it when you have 8+ sub-subdomain candidates. The sketch above lists 8 immediately and implies many more (bathroom, kitchen, 2008, vacation, food, etc.). Photos library is the use case that justifies enabling ACM for the gf.cx zone.

Two implementation paths

Path A — one Immich, filter at the Worker edge (RECOMMENDED):

One Immich instance running on the Mac mini + RAID + Docker
All photos in one library with face clusters + albums + tags
Each <persona>.photos.gf.cx is a tiny Cloudflare Worker that hits Immich’s REST API with a pre-baked filter (face cluster ID / album ID / tag)
Worker URL pattern: <persona>.photos.gf.cx/* → Worker translates request → fetches from Immich behind cloudflared tunnel → returns pre-filtered HTML/JSON
Cheap to spin up new views: deploy another Worker route in ~5 min
Cross-references work naturally: a photo of Audrey + Dan together shows on both audrey. AND dan. subdomains

Path B — multiple Immich instances:

One Immich per persona/category
Heavier (multiple Postgres instances, multiple ML model copies, multiple Docker stacks)
Watertight isolation if persona-level privacy ever mattered
Probably overkill for a married couple sharing a library; needed only if guest-share without leaking adjacent content becomes important

Recommendation: Path A. Same architectural promise as the existing tier-3 surfaces — one substrate, many views, Workers filter at the edge. Single source of truth, many smart entry points.

Public OR private per hostname — same architecture, different policy

Dan 2026-05-24: “I can managed these sets, at it will always load super-fast, edge-cache, protect behind CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF access, or publically available for some images.”

The architecture supports both CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF-Access-gated (private) AND fully-public hostnames, identically, with one config flag:

Hostname	Filter	Access policy	Use
`audrey.photos.gf.cx`	face: Audrey	CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access (Dan + Audrey emails)	private daily view
`dan.photos.gf.cx`	face: Dan	CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access (Dan + Audrey emails)	private daily view
`baby.photos.gf.cx`	face: baby	CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access (Dan + Audrey + immediate family)	semi-private
`wedding.photos.gf.cx`	album: wedding	CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access (guest list email allowlist)	broader semi-private
`claim-evidence.photos.gf.cx`	tag: insurance	CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access (Dan + temp adjuster grant)	tightly-scoped private
`portfolio.photos.gf.cx`	tag: portfolio-public	NONE — fully public	audreyinc photography portfolio
`press.photos.gf.cx`	album: press-kit	NONE — fully public	brand press kit for Audrey’s work
`landscape-art.photos.gf.cx`	tag: showcase-landscape	NONE — fully public	exhibition showcase

Adding a public-facing photo surface is the same 5-minute deploy as adding a private one — just skip the CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy add. The Worker reads the same R2 mirror; the filter logic is the same shape.

This means the architecture supports BOTH halves of a creative person’s photo life: - Private: family, baby, household inventory, claim evidence - Public: portfolio work, press, brand assets, exhibition showcases

…from ONE library, ONE source of truth, ONE substrate. The public/private cut is a deploy-time decision per hostname, not a storage-time decision per file.

CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policies can be per-subdomain (private side detail)

The ACM cert covers the whole second-level zone, but each sub-subdomain still gets its own CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy: - audrey.photos.gf.cx — gates to Audrey’s email + Dan’s - dan.photos.gf.cx — gates to Dan’s email + Audrey’s - baby.photos.gf.cx — gates to Dan + Audrey + immediate-family allowlist - wedding.photos.gf.cx — gates to a broader guest list (or maybe a service token + share-link) - claim-evidence.photos.gf.cx — gates to Dan + the insurance adjuster’s email (temp)

Adds to the layer model

Updated Layer 1 row in the table above:

Layer	What
Layer 1 — application + library	Immich (single instance) on Mac mini + RAID + Docker; many small Worker filters per sub-subdomain at `*.photos.gf.cx` (Path A); CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy per hostname; ACM enabled for the gf.cx zone to cover the sub-subdomain SSL

The compromise architecture (consolidated)

STASH70

Daily reality: phone auto-uploads to Immich · iOS/Android apps for browsing · Immich CLI / API for bulk ops · Claude Code SSHs in via Access-for-SSH to do filesystem work directly · rclone keeps R2 in sync.

Cost at 10TB — honest scaling math (Dan 2026-05-24)

“On a more serious note, will a 10TB R2 cost $84 per month…”

No — the $74-84/mo numbers above were the 4TB scenario. R2 scales linearly:

Tier	$/TB/mo	10TB total
Standard (hot, working)	$15.00	$150/mo
Infrequent Access (warm, weekly)	$10.00	$100/mo
Archive (cold, rarely-touched)	$3.60	$36/mo

Realistic 10TB with tier optimization (the way Google Photos auto-archives transparently): - Recent 2 years working in Standard (~2TB): $30/mo - Years 3-7 in IA (~3TB): $30/mo - Years 8+ + frozen Takeout in Archive (~5TB): $18/mo - Total: ~$78/mo at 10TB with proper lifecycle policies

Comparison at 10TB (so trade-offs are visible)

Provider	10TB/mo	Trade-off
Google One 2TB × 5	~$50/mo	Capped at 2TB each, Google ToS lock-in (the problem we’re escaping)
iCloud+ 12TB	$59.99/mo	Apple-ecosystem only, no S3 API
Dropbox Business 9TB	$24/mo	Cheapest pure storage, no developer API
Backblaze B2	$50/mo	Cheap storage, but cross-network egress to CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF stack
R2 with tiering	~$78/mo	$0 egress + same-network as Pages/Workers/Access

R2 wins on integration, not on raw storage price. Since photos.gf.cx will run through CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Workers + Access anyway, R2 is the same-network tier — no bandwidth fees when serving photos to browsers. Backblaze B2 would add cross-network egress costs that erode the storage savings at any meaningful viewing volume.

The R2 question isn’t “is it cheapest” — it’s “does it save me from the next vendor-lock problem.” And yes: S3-compatible API + $0 egress + no ToS-driven feature changes = files-portable-forever.

Daily-UX replacement — Immich as the new “catch-all” (Dan 2026-05-24)

“I assume immich app becomes our new google photos, we are so used to using Google as the catch-all”

The habit change is the slowest part of the migration, not the technical work. What changes for Dan + Audrey day-to-day:

What stays the same (Immich actually delivers Google-Photos-grade UX)

Daily habit	Google Photos	Immich (after setup)
Phone snap → auto-uploaded	✓	✓ (iOS + Android apps, Wi-Fi auto-backup)
Search “beach photos”	✓	✓ (CLIP semantic search)
Search “Audrey + baby”	✓	✓ (face cluster intersection)
Albums	✓	✓ (full albums + smart albums)
Share an album with grandma	✓	✓ (shared link or shared user)
“On this day” / memories	✓	✓ (Immich added this in 2024)
Map view of where photos were taken	✓	✓
Browse on iPad / web / phone	✓	✓ (responsive web + native apps)

What’s different (mostly better, one notably worse)

Aspect	Google Photos	Immich
Where photos go after upload	Google’s cloud, you-don’t-know-where	YOUR Mac mini’s RAID, then R2 mirror
Privacy of face data	Google trains on it	Stays on YOUR ML container, never leaves the home network
Pricing model	Tiered, gets expensive past 2TB	Linear R2 storage cost, $0 egress
Multi-axis URLs	“Click filters, scroll, find”	`audrey.photos.gf.cx`, `baby.photos.gf.cx` — URL IS the filter
Algorithmic “look at this!” surfacing	Aggressive — Google decides what’s important	Gentle — Immich shows memories but doesn’t push them
Mobile upload speed	Fast (Google’s CDN)	Depends on home upload bandwidth + mini being on
Mini going offline = uploads queue	N/A (always available)	Uploads queue locally on phone until mini reachable ← the one regression

The transition cost (in weeks of habit-change)

Phase	Calendar time	Notes
Install Immich apps on both phones	10 min	Side-by-side with Google Photos initially
Turn ON Immich auto-backup	1 click each	Both apps backup simultaneously, double-store
Use Immich app daily, leave Google running	2-4 weeks	Build trust + verify nothing’s missing
Turn OFF Google Photos auto-backup	1 click each	The flip moment
Keep Google Photos APP installed read-only	~3 months	Reference during the 90-day Google retention window
Delete Google Photos app + cancel Google One storage tier	once verified	The exit completion

The “where’s that photo of…” reflex shifts gradually. Probably 4-6 weeks before opening Immich becomes the unconscious default instead of Google Photos.

Often the #1 unstated requirement when leaving Google Photos. Three options inside Immich:

Shared albums (in-Immich) — grandma gets an Immich account, sees specific albums you grant her. Works if grandma’s tech-savvy enough to install another app.
Public share links — Immich generates a URL with optional password + expiry. Grandma clicks, sees photos in browser, no account needed. Best fit for casual relatives.
grandma.photos.gf.cx — Worker filters Immich to a face-cluster: grandma’s pre-selected favorites, behind a wider-allowlist CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy (or a temp-grant token). Same architectural pattern as the persona subdomains.

Most likely production answer: shared public link for casual sharing, persona subdomain for repeat-viewer relatives.

Directional flow — RAID is the central management surface (Dan 2026-05-24)

“And we would be syncing always from a home RAID, that’s the non-cloud, private access, where the management and library central is doing its work.”

Critical architectural rule: the Mac mini + RAID is the SINGLE WRITE SURFACE. Everything else is either an input source (writes THROUGH the mini) or a read mirror (mini writes OUT to it).

STASH75

Why this directionality matters

Rule	Consequence
RAID is the single source of truth	Conflicts can’t arise (no merging from multiple write surfaces). The “where’s the latest version?” question has one answer.
R2 is read-only mirror	If R2 gets corrupted / hacked / deleted, rebuild from RAID — no data loss. The mirror failing is recoverable; the source failing is not.
Inputs (phones, Takeout) write THROUGH the mini	All ingestion passes Immich’s dedup + classify + EXIF-reconcile pipeline. No raw bytes land in R2 without going through the library’s brain first.
Worker-views read from R2, not from the mini	Mini doesn’t get hammered by serving photo browsing. ML training + ingest runs on mini; viewers run on R2’s edge cache. Separation of concerns.
Phones can be offline; mini can be offline	Different failure modes. Phone-offline: photos queue locally on phone. Mini-offline: photos queue on phone for whenever mini comes back. R2-offline: viewers can’t load (rare for CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF). Each is recoverable independently.

The “layered access” framing (NOT “air-gap” — Dan’s terminology correction)

Important precision (Dan 2026-05-24): “air-gap is reserved more for off-site, non-connected, is that fair?” — Yes, that’s correct.

The Mac mini + RAID + cloudflared + CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access architecture is layered access, not air-gap:

Term	Meaning	The mini-with-tunnel setup
Air-gapped	Physically disconnected. Data moves only via removable media.	✗ The mini IS reachable via the cloudflared tunnel
Layered access / conditionally accessible	Connected, but reach is mediated by multiple gates (network policy → tunnel auth → CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access → app auth)	✓ This is what we have
Self-hosted private	On your own infra, no cloud vendor in path	✓ Also true (the data lives on YOUR disk)
Cloud-vendor-managed	Vendor controls infra + access	✗ Definitely not

What’s true: - The RAID is air-gappable on demand — pull the network cable and it becomes air-gapped instantly, library still fully usable from the Mac mini console - Day-to-day it operates layered-access: connected for the phone-auto-upload and remote browse use cases, gated by multiple auth layers - True air-gap belongs to tier 4 (the off-cloud row on payload.gf.cx): family video masters on a USB drive in a drawer, signed tax-return PDFs on encrypted external

This precision matters because “air-gap” implies stronger isolation than the architecture actually provides. Describing it accurately = no false security claims when someone (Audrey, a tenant, an auditor) asks “is this safe?”

Why this still beats the cloud-vendor model

Even though it’s not truly air-gapped, the layered-access shape gives: - Vendor independence: pull the plug, library still works locally - Defense in depth: a compromised CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF account doesn’t get the data (still gated by app auth + filesystem ACLs) - Auditable access: every fetch passes the cloudflared tunnel; logs exist - Reversible: can go to fully-air-gapped mode in 3 seconds (yank the cable) without losing the library

Multi-region option — Mac minis in both NH + London (Dan 2026-05-24)

“Do you think I can run a mac-mini in london, and one in new hope, and have them stay in sync via rsync, so that two identical systems are online”

Yes — but raw rsync is the wrong tool. Three patterns, each with different trade-offs:

DECISION 2026-05-24: Pattern A chosen. Dan: “I like A. Log that to the report.”

Pattern A — Active/Passive · NH canonical, London read replica ⭐ CHOSEN

NH mini + RAID = only place writes land (preserves single-write-surface rule from Dan’s directional-flow diagram)
R2 = sync intermediary; NH pushes hourly via rclone
London mini + RAID = pulls hourly from R2 via rclone, runs Immich reading from local cache
Phones in London still upload via Immich app → London mini → R2 → NH eventually catches up (~hourly latency)
Disaster recovery: if NH burns down, manually promote London to canonical (~30 min: change rclone direction, point R2 sync the other way)
Cost: ~$5/mo extra R2 GET operations for the doubled-read pattern

Best fit for the architecture we just established.

Pattern B — Active/Active · R2 as shared metadata backend

Both minis treat R2 as Immich’s S3-compatible storage backend
Postgres metadata DB replicated between minis via cloudflared tunnel
Both sites can write directly; conflicts (rare for photos) handled via Immich’s UUID-keyed records
Heaviest setup: needs PG replication, careful Immich config
Cost: ~$50-100 extra/mo in R2 GET operations + cross-tunnel bandwidth

True HA, but probably overkill for a family library where conflict-risk is near-zero.

Pattern C — Syncthing peer-to-peer · NOT RECOMMENDED

Syncthing keeps photo directories in sync between minis
Each Immich has its OWN Postgres DB → face clusters and tags will diverge
Need periodic “re-index from sidecar JSON” to converge tag state
Simplest tooling but breaks the “one library” promise

The metadata divergence is the killer. Two libraries that disagree about who’s in a photo isn’t one library — it’s two libraries that share a file tree.

Decision matrix

Question	A	B	C
Preserves single-write-surface rule?	✓	✗ (R2 is the surface now)	✗ (two surfaces)
Phones work in both locations?	✓	✓	✓
Cross-site write visibility	hourly	near-real-time	as fast as Syncthing propagates
Face cluster / tag state consistency	✓ (single DB)	✓ (replicated DB)	✗ (diverges)
Setup complexity	low	high	medium
Disaster recovery	manual promote (~30min)	automatic	unclear
Extra R2 cost/mo at 4TB	~$5	~$50-100	~$0

A’s failure modes (so you know what you’re signing up for)

Failure	Impact
NH ISP outage	London can still read locally; new writes from London queue at London’s mini until R2 reachable. Eventual consistency within hours.
London ISP outage	NH unaffected. London queues for whenever it comes back.
NH mini hardware dies	London is a warm replica. Restore is “promote London + buy replacement NH mini + reverse rclone direction.” ~Days, not hours.
Both ISPs out simultaneously	Each mini browses its local RAID independently. Phones queue. World hasn’t ended.
R2 outage (rare)	Both minis still serve from local RAID. Sync resumes when R2 returns.

This is genuinely “the best of both worlds” — local-disk autonomy on TWO continents, with the R2 layer doing the eventual-consistency lift between them.

Pilot target — dare.co.uk Google Photos first (Dan 2026-05-24)

“dare.co.uk on google photos has to be the first using takeout.”

dare.co.uk = Daniel Dare Ltd, the UK design brand. Its Google Photos library is the right pilot before Audrey’s 4TB family library because:

Why dare.co.uk first	Why NOT Audrey’s library first
Smaller scope (probably 100-500 GB vs 4TB)	4TB initial Takeout = days to generate, weeks to ingest
Business/portfolio content = naturally public-shareable	Family photos = high “must not lose anything” risk
Existing dare.co.uk domain = `photos.dare.co.uk` natural fit	photos.gf.cx requires more architectural decisions upfront
Lower emotional risk = good test case for Takeout knives	Mistakes on family library are unrecoverable in feeling
Validates the Worker-filter persona pattern on a real use	Validates nothing until the architecture is already proven
Result: public portfolio surface that audreyinc/Dan can share	Result: private family archive (no immediate public value)

Sequence for the dare.co.uk pilot: 1. Trigger Google Takeout for dare.co.uk Google account (takes hours-to-days) 2. Set up Mac mini + RAID + Docker + Immich (one weekend) — this is the LAYER-0 infrastructure that BOTH dare.co.uk and audreyinc photos will use 3. Ingest dare.co.uk Takeout via immich-go — test the dedup + sidecar reconcile pipeline on real data 4. Stand up photos.dare.co.uk (cdn-router instance) — first deployed photo surface 5. Maybe spin up portfolio.dare.co.uk (public) as the proving ground for the public/private hostname pattern 6. Lessons learned from the pilot inform the audreyinc 4TB migration — knowns are knowns by then

Note: photos.dare.co.uk uses the dare.co.uk zone (not gf.cx), but the architecture is identical — different DNS zone, same cdn-router Worker + R2 bucket + CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy + Immich filter. Pattern-portable across all Dan-owned domains.

Immich install — Docker is the right answer (Dan 2026-05-24)

“I’ll need to spin up the RAID and install docker, I assume that’s the best way to connect it, unless you have a preferred way of running immich?”

Docker Compose is the official + most-supported install path for Immich. Use it.

Option	Verdict
Docker Compose	✓ Official. The way Immich is built + tested + documented. `docker compose pull && up -d` for upgrades. Standard config; huge community support.
Unraid	Fine if the host OS is Unraid; uses Docker underneath anyway. Not a fit for Mac mini.
TrueNAS Scale	Same as Unraid — Linux-only NAS OS, not relevant for Mac mini.
Native macOS install	Possible but not supported; you’d be on your own for bugs. Skip.
Kubernetes	Overkill for home + family-library scope. Skip.
Proxmox LXC	Linux homelab pattern; not relevant for Mac mini.

On Mac mini specifically: use OrbStack, not Docker Desktop

OrbStack is the modern preferred Docker runtime on macOS:

	OrbStack	Docker Desktop
Native Apple Silicon	✓ optimized	✓ but heavier
Memory footprint	~1 GB	4-8 GB
Boot time	~1 sec	10-30 sec
License	Free for personal	Free for personal, $5/user/mo for business
Compatibility	Drop-in for docker CLI + Compose	Same
File-share performance	Significantly faster	Slower (especially on RAID-mounted volumes)

For Audrey’s full 4TB scale, OrbStack’s file-share performance is a real win — Immich indexing scans need to read every photo file, and Docker Desktop’s VirtIO-FS layer adds latency on every read.

RAID enclosure recommendations for Mac mini

Mac mini has no internal bays, so external Thunderbolt RAID is the answer:

Option	Notes
OWC ThunderBay 4 Mini	4-bay TB4 enclosure, software RAID via SoftRAID Pro. ~$300 + drives. Solid + widely-used.
OWC ThunderBlade	SSD-only, very fast but $$
Sabrent 4-bay TB4	Cheaper alternative; less polished software
Synology DS923+ (over network)	NAS instead of direct-attached; slower for Immich indexing scans but offers more services (Synology Photos as a fallback)
MacBook + external = NO	Don’t run a home-server on a portable laptop

For Audrey’s 4TB scale: 2× 8TB drives in RAID 1 mirror in an OWC ThunderBay = ~8TB usable, single-drive-failure tolerant, ~$600 all-in for enclosure + drives. Add Time Machine to a separate external for full belt-and-braces.

Build cost / sequence

Phase	Effort	Cost
Inventory current copy locations (Google Drive folders, iCloud, local)	1-2 hr	$0
Dedupe local pass (czkawka or rclone dedupe)	4-8 hr CPU + manual review	$0
First rclone sync to R2 (~4 TB if full library)	overnight	~$0 (within free PUT tier)
Tiny Pages viewer at photos.gf.cx	2-3 hr build	$0
CDN, security layer, and DNS provider sitting in front of dare.co.uk." data-tip="Cloudflare — the CDN, security layer, and DNS provider sitting in front of dare.co.uk.">CF Access policy on photos.gf.cx	10 min	$0
(Optional) hourly launchd sync	30 min	$0
Total to Option C live	~2-3 days incl. dedupe	~$60/mo R2 storage

Empirical throughput data (2026-05-25 disk-cleanup rehearsal)

While freeing the Mac’s internal SSD by archiving large media folders to the current 4TB external (the same drive that would host RAID for the build phase), measured sustained ditto throughput is ~22-25 MB/s on the 52GB 4K Stogram folder.

That implies for a 4TB local-attach copy (mini ↔ RAID enclosure) at the current drive speed: ~44 hours sustained — NOT “overnight.” Two consequences for the build plan:

Pre-flight before the real Immich migration: verify the 4TB drive is on a USB-C / Thunderbolt port at native speed; current rate suggests USB 2.0 or a slower spinning enclosure. The recommended OWC ThunderBay 4 Mini over TB4 should hit 500-1000+ MB/s and collapse this to a single overnight.
Cable-swap is cheap when timed right: if the current enclosure has a TB port that’s running over USB-A/2.0 by accident (common with bundled cables), swapping to a TB4 cable BETWEEN copy operations (never during) can 10-50× the rate. Dan flagged this 2026-05-25 — worth a quick port + cable check before the next bulk move.
Throughput is a real planning variable, not a footnote. When the time comes, run a 100GB synthetic copy to the new enclosure first and measure — the build plan should size all phases against actual hardware, not assumed numbers.

Note: this only bounds local-attach copies. The R2-sync phase is bounded by home upload bandwidth instead (100 Mbps up = ~89 hrs for 4TB; gigabit fiber = ~9 hrs).

SYNTHESIS — the four-layer storage model (Dan 2026-05-24, late-session compounding)

Four patterns compose into a single architecture that’s substantially cheaper and more honest than the original “mirror everything to R2” first-cut. Each refinement was a Dan-driven correction; together they collapse the monthly bill from ~$60/mo to ~$2-12/mo for the same 4TB library.

Layer	What it does	What it touches
L0 — process-at-source (the meta-rule)	Raw inputs never write directly to cloud — they pass through the brain (Immich) which dedupes, classifies, emits sidecar metadata	Everything below it depends on this gate
L1 — derivatives-not-masters (size tier)	R2 mirrors web-display derivatives (1500px JPG + 1080p H.264 + thumb + sidecar); RAID keeps RAW/HEIC/HEVC masters	Cuts R2 storage ~10x (4TB→500GB)
L2 — curation tier (judgment filter)	Only items tagged `publish` (or in an album, favorited, etc.) trigger the derivative pipeline; the other 70% live only as RAID masters	Cuts R2 storage another 3-5x (500GB→150GB)
L3 — video-codec normalization	Mac mini transcodes HEVC → H.264 baseline + faststart at ingest so derivatives play in every browser; raw HEVC stays untouched on RAID	Makes the derivatives actually usable across Safari/Chrome/Firefox

What R2 actually holds (per item)

For a surfaced item:

STASH84

For an archived item (curation tier filtered out):

STASH85

The view always knows the count: “showing 10 of 300 from camping trip — full archive on RAID.”

Honest-by-construction property

The compound result is an architecture where no layer can lose track of what exists vs what’s a copy:

R2 always knows it’s the mirror (sidecar marks it as derivative)
The mirror always knows the way home (sidecar.original.on_raid_at + sha256 + access paths)
The view always knows what it’s hiding (stub sidecars + footer count)
Masters never leave the RAID (the tier that fails gracefully)
Curation never destroys anything (tag toggles, not file deletes)

Revised cost at 4TB

Tier	Storage	Monthly
RAID masters (sunk hardware)	4 TB	$0
R2 derivatives (curation tier ~30%)	~150 GB	~$2.25
R2 metadata mirror (DB dumps + sidecars)	~1 GB	~$0.02
ACM for sub-subdomain SSL	—	~$10
Total monthly bill		~$12

Down from the original ~$60-84/mo estimate. Cheaper than the Google One subscription it replaces.

Critical durability note — back up Immich’s DB

The curation metadata (tags, albums, face IDs, favorites) lives ONLY in Immich’s PostgreSQL DB. If that DB dies, masters survive on RAID but curation is lost. Two defenses to wire in from day one:

Nightly pg_dump → R2 — small, cheap, restores Immich state
XMP sidecar export — write tags+ratings into XMP files next to each master so curation survives Immich entirely (Lightroom-standard, vendor-independent)

Cross-references

feedback_process_at_source_sync_clean_outputs.md — L0 meta-rule (the brain pattern)
feedback_derivatives_in_r2_originals_on_raid.md — L1 size-tier (negatives-stay-home)
feedback_curation_tier_album_as_worker_query.md — L2 curation-tier (albums as Worker queries)
feedback_video_in_r2_photo_library_substrate.md — L3 video codec normalization
feedback_image_evidence_via_r2_pattern.md — sibling pattern (R2 + cdn-router for image substrate); same Worker shape
parked_sketch_sandbox_gfcx_subdomain_template.md — scaffold could grow a --photo-library mode
feedback_cf_pages_subdomain_setup_recipe.md — photos.gf.cx subdomain setup recipe
parked_sketch_audrey_eras_few_clicks_deep.md — sibling audreyinc archive work
feedback_api_burn_rate_throttle_chunks.md — dedupe + initial-sync should be chunked per this rule
feedback_pattern_library_composes_into_exits.md — meta-observation that this whole sketch is composition of named patterns, not invention

Build trigger

Build when ANY: - Audrey’s frustration with the duplicate problem hits a “fix this now” threshold - A specific photo is needed urgently + can’t be found across the 5 copy planes - Google or iCloud quota costs spike from the duplicates - Dan + Audrey have a clean weekend to do the inventory + dedupe pass together

The aphorism

Three plans, one auth layer. The network path is the policy. Pick the storage shape that matches how the library actually gets used today, graduate when the missing feature becomes the bottleneck.

Source: parked_sketch_audrey_4tb_photo_library_to_r2_2026-05-25.md · Rendered 2026-07-15 13:05 UTC

Built with — component scripts

seo_render_html.py — wraps the source .md in the dash.gf.cx design language (+ anchor_enricher.py for inline-link promotion & rollover thumbnails)
dare_dev_reports_publish.py — bundles the day’s reports into the catalog and ships to dash.gf.cx/reports