HomeTechOps

NAS

Local AI on a home NAS: Immich, Frigate, Ollama in 2026

Local AI on a home NAS in 2026 has matured into three distinct workloads: Immich for photo libraries (Google Photos replacement, face/object/CLIP-semantic search), Frigate for camera analytics (object detection across home cameras), and Ollama for small LLM inference (personal chat, summarization). Each has different hardware requirements — Immich benefits from any GPU, Frigate's biggest 2025-2026 shift was that Coral USB stopped being the default and Hailo-8 / Intel iGPU / Arc dGPU took over, and Ollama's bottleneck is memory bandwidth (Mac mini M-series wins despite weaker chip-on-paper). The wrong accelerator wastes 50-70% of its capability; the right one runs all three workloads on a single $400 box. This page sorts out which is which.

Best for: Home operators with an existing NAS who want to add: (a) Immich photo library replacing Google Photos / Synology Photos, (b) Frigate object-detection on home cameras, or (c) Ollama LLM inference for personal use.

Accelerator decision matrix

Reference images and diagrams. Click any image to view full resolution.

ZFS pool hierarchy diagram used here to illustrate the storage layout for local AI workloads: postgres + thumbnails on NVMe, bulk media on HDD array, AI model cache on SSD.
Original concept diagram (not vendor copyright). Local AI on a NAS has the same storage-layering decision as any other home server: fast pool for working data (Postgres, model files, thumbnails) and bulk pool for media. Immich's `MLData` dataset is the AI-model cache — keep it on NVMe.

Immich state in 2026 (Google Photos replacement)

  • Feature parity at ~80-90% of Google Photos: auto mobile backup, timeline/albums, face detection, CLIP-powered semantic ('smart') search, maps, memories. 87k+ GitHub stars; FUTO-backed. Face detection notably better across diverse skin tones than several proprietary competitors.
  • ML acceleration backends (ranked by reliability): CUDA (NVIDIA, compute capability ≥ 5.2, driver 545+, CUDA 12.3) → OpenVINO (Intel iGPU Iris Xe / Arc / Intel NPUs) → ROCm (AMD) → RKNN (Rockchip NPUs). ARM NN (Mali) does NOT accelerate smart search.
  • Hardware sizing: <100k assets = N100 + 16 GB RAM + iGPU OpenVINO works fine. 100-500k = RTX 3050/3060 12 GB or Intel Arc A380 strongly recommended for face/object job throughput. 500k-1.5M = 8c/16t CPU, 32-64 GB RAM, dedicated GPU effectively required.
  • Storage layout: Library = ~75% of total disk; Postgres + thumbnails on NVMe SSD; bulk media on HDD array (mount as external library). TrueNAS Scale consolidates to 3 datasets: `Userdata`, `PGData`, `MLData`. `pgData` must be owned by `netdata` (UID 999) on TrueNAS; app user is `apps` (UID 568).

Frigate state in 2026 (camera analytics)

  • **Coral USB is no longer recommended** for new builds. Frigate docs explicitly say so. Active recommendations are Hailo-8/8L, Intel OpenVINO (iGPU/Arc/NPU), or NVIDIA TensorRT.
  • Inference speed comparison (YOLOv6n / YOLOv9 baselines): Hailo-8 (M.2, 26 TOPS) ~7ms; Hailo-8L (13 TOPS) ~11ms ($70); Intel iGPU (N100, OpenVINO) ~15-20ms (also handles enrichments); Intel Arc A380 ~4-8ms (needs ReBAR); Intel Core Ultra 125H iGPU ~10-15ms (best all-rounder); Coral USB ~5-10ms (aging, MobileDet only, 17 COCO classes); RTX 3060 12GB 8-28ms YOLOv9 (does Immich + Frigate + Plex on one card).
  • **Critical trade-off**: Hailo only runs object detection. Intel iGPU (12th-gen+) and dedicated GPUs run object detection + face recognition + license-plate recognition + other Frigate 'enrichments' on the same silicon. This is why Intel mini-PCs became the 2026 default over Hailo add-ons.
  • Real-world camera counts: Hailo-8 on Pi 5 PCIe Gen3 = 4 streams @ ~17 FPS, up to 8 supported; Hailo-8 + 3 cams (2x 1080p + 1x 4K) at <5% CPU, <10% Hailo util; N100 iGPU = 6-10 cameras with main+substream pattern; Intel Core Ultra 125H = significant number of 1080p high-activity cameras.
  • Frigate+ paid model ($50/yr): 12 custom fine-tunings/year, additional $5 each. Trained on global real-user camera footage. Higher accuracy + lower resource use vs default. Custom training needs ~100 verified images per camera, takes ~36 hours. Models you train are yours forever.

Hailo-8 module (the 2024-2025 mainstream accelerator)

  • Form factors: M.2 (Key B+M, 2280), USB via adapter, Pi 5 HAT integrated. Power: ~2.5W typical, 8.25W max at full util.
  • Capacity: 26 TOPS (Hailo-8) / 13 TOPS (Hailo-8L). Officially supported as a Frigate detector since v0.14.
  • Roadmap: active manufacturer development (vs Coral's stalled status). Hailo-10 announced.
  • Caveat: one r/Frigate-style report flagged worse night-time detection than Coral's MobileDet on early models; newer YOLOv6n_0.2.1 model improved this from ~13ms to ~9-10ms.

Local LLM on NAS (Ollama)

  • **Mac mini M-series ergonomics** (the recommended 'appliance' path): M2 16GB → Llama 3.1 8B Q4 at 15-25 tok/s, ~30W load. M3 24GB → Codestral 22B Q4 at 20-30 tok/s, ~35W load. M3 Max 48GB → Llama 3.1 70B Q4 at 8-15 tok/s, ~50W. M4 64GB → near-cloud speed.
  • **Memory bandwidth, not chip gen, drives LLM speed.** M3 Max (400 GB/s) > M4 Pro (273 GB/s) for LLM despite being older silicon.
  • **N100 reality**: Llama 3 8B Q4_K_M = 6-9 tok/s on CPU. OK for single-user chat / summarization. 13B+ crawls or OOMs.
  • **Plex-GPU doubles as LLM box**: if you already have an RTX 3060 12GB / RTX 4060 16GB for Plex transcoding + Immich, it runs 7B-13B Ollama models at 30-60 tok/s on 8B Q4. Cheapest path when the GPU exists.
  • **Network exposure**: `OLLAMA_HOST=0.0.0.0` exposes the API on LAN. **No auth built in.** Never expose to public internet without a reverse proxy + auth. See /guides/home-reverse-proxy-nginx-caddy-traefik.

Power budget adders (idle deltas on top of base NAS ~30-40W)

  • Coral USB: ~0.3W idle / ~2W load. Lowest-power option.
  • Hailo-8 M.2: ~1W idle / ~2.5W typical / ~8W peak.
  • Intel iGPU (in CPU): 0W idle adder (included in CPU TDP).
  • Intel Arc A380: ~8-12W idle / 30-45W load.
  • RTX 3060 12GB: 8-20W idle (driver-dependent, spikes to 35-40W in P0); 120-170W load.
  • Mac mini M-series: ~6W idle / ~30W load.
  • **Power math**: at $0.18/kWh (US avg), an RTX 3060 idling at 15W = ~$24/year just to keep present. A Hailo-8 M.2 idling at 1W = ~$1.50/year. The accelerator power budget can dominate the 'cost to run AI' over years.

Per-platform install matrix

  • **Synology DSM 7 (Container Manager)**: Immich works via Container Manager + docker-compose (community contribution, not officially supported). GPU passthrough is **effectively unavailable** on most Synology models — DSM's hardware-accel APIs are reserved for Synology's own Photos / Video Station / Plex. Recommendation: Synology = software-only Immich ML; offload to a remote ML container on an N100/Mac if you need acceleration. Frigate on Synology: technically runs but no Coral M.2 slot, no GPU passthrough — unsuitable.
  • **Unraid** (most flexible): Immich Community App via `ghcr.io/imagegenius/immich` template; add `NVIDIA_VISIBLE_DEVICES` + `NVIDIA_DRIVER_CAPABILITIES` env vars for CUDA. Frigate Community App by blakeblackshear; `--rm --runtime=nvidia` for NVIDIA; Coral-Driver plugin available since 6.9.0rc2 for USB Coral passthrough. Hailo-8 M.2 passthrough works on motherboards with a spare M.2 slot.
  • **TrueNAS Scale (Apps catalog)**: Immich (official app, updated 2026-05-12), Frigate (official, updated 2026-05-12), Ollama (official, updated 2026-05-15), Open WebUI in catalog. GPU passthrough is manual (Apps > GPU Configuration); NVIDIA easiest.
  • **Mini-PC alternative (Intel N100 / N150)**: ~30-50W idle for the full stack (Immich + Frigate + Ollama on one box). QuickSync handles 3-4 1080p Plex transcodes almost free. 16 GB RAM minimum. Lone NVMe slot can hold a Hailo-8 M.2 for Frigate.
  • **Mac mini M2/M3/M4** (Ollama-focused appliance): Apple Neural Engine accelerates Immich ML (Metal/ANE support added). Lives alongside NAS, not on it. Pattern: Mac mini as remote ML endpoint pointed at by Immich's remote ML config, Ollama API consumed by Open WebUI on the NAS.

Decision matrix by operator profile

  • Synology user, photos only → Immich via Container Manager, software ML, accept slow first scan (days for 100k+ photos).
  • Synology user, wants Frigate too → Add Intel N100 mini-PC alongside NAS; keep media on Synology.
  • Unraid user with existing NVIDIA GPU → Run everything on Unraid: Immich + Frigate + Ollama share one GPU.
  • TrueNAS Scale user → Apps catalog, add Intel Arc A380 or used RTX 3060 for ML.
  • Photos library <50k, budget → N100 mini-PC, OpenVINO for both Immich + Frigate, no add-in card.
  • Existing Plex/Jellyfin RTX 3060 → everything on one box, GPU triple-duty.
  • Wants silent + low power → Mac mini M-series as ML appliance, NAS untouched.
  • Frigate-first, 8+ cameras → Intel Core Ultra 125H mini-PC with OpenVINO, no extra accelerator needed.
Operator snapshotEvidence first
First proof

Photo library size known.

Screen to open

`docker exec immich-machine-learning nvidia-smi` (NVIDIA) or `vainfo` (Intel)

Expected signal

Count of files in the source library.

Stop boundary

Stop if Ollama runs but no real use case emerges in 2 weeks — uninstall + reclaim resources.

Layer path

1Three distinct AI workloads at home: Immich (photo library — CPU + ML model inference for face/object/CLIP search), Frigate (camera analytics — continuous object detection across multiple streams), Ollama (LLM inference — memory-bandwidth-bound, latency-sensitive).
2Accelerator choice depends on the dominant workload: Hailo-8/8L for Frigate-only (cheapest 8+ camera path); Intel iGPU for Immich + Frigate combined (Intel Core Ultra 125H is the all-rounder); RTX 3060+ for all three on one card (when GPU already exists for Plex transcoding); Mac mini M-series for Ollama-first (memory bandwidth dominates LLM speed).
3Storage layout matters: Postgres + thumbnails on NVMe, bulk media on HDD pool, ML model cache on SSD. Mixing these slows first-scan from hours to days.
4Safety boundary: Ollama API has no auth by default. Never expose to public internet without a reverse proxy + auth.
Runbook

Step-by-step runbook

Start here. Do each check in order, compare it to the expected result, and stop when the evidence explains the failure or the safe stop point applies.

1

Decide which AI workload is primary

Check: List which of Immich / Frigate / Ollama you actually need; rank by importance.

Expected result: Clear primary workload; clear secondary; clear 'nice to have'.

If not: Without a primary, you'll over-buy accelerator for use case you don't actually have.

2

Inventory existing hardware

Check: Note: existing GPU? Intel iGPU gen? Coral / Hailo? Mac mini M-series? Plex Pass active?

Expected result: Existing capability documented; gap identified.

If not: Use what you have before buying. A Plex-server RTX 3060 doubles as Immich + Frigate + Ollama silicon.

3

Plan storage layout

Check: NVMe pool for PGData + MLData; HDD pool for Userdata + media. SSD scratch for transcode if shared with Plex.

Expected result: TrueNAS Datasets / Synology Volumes laid out; Postgres on fast media; bulk on cheap media.

If not: Stop if your only pool is HDD-only and you can't add NVMe — first scan will take days.

4

Install Immich with appropriate ML backend

Check: Unraid: ImageGenius template + CUDA env vars (NVIDIA) or no acceleration (Intel iGPU works via OpenVINO container). TrueNAS Scale: official Immich app. Synology: Container Manager + docker-compose (software-only ML expected).

Expected result: Immich UI accessible; mobile app backup configured; first-scan started.

If not: Watch first-scan progress — if days remain after 24 hours, an accelerator is justified.

5

Install Frigate with hardware detector

Check: Configure config.yml with detector: hailo8l (Pi5 + Hailo AI kit) OR openvino (Intel iGPU 12-gen+) OR tensorrt (NVIDIA GPU).

Expected result: Frigate stats show inference 10-20ms; recordings + alerts working.

If not: If inference > 100ms, the detector config is wrong — verify accelerator is visible to container.

6

Add Ollama only if LLM use case is real

Check: Test small model on CPU first (Phi-3 mini, Llama 3.2 3B) before buying hardware.

Expected result: Ollama responds at acceptable tok/s; Open WebUI configured if needed.

If not: Don't install Ollama 'because everyone is' — if you don't actually use the LLM weekly, it's idle power.

Safe stop: Stop if Ollama runs but no real use case emerges in 2 weeks — uninstall + reclaim resources.

Decision tree

Decision tree

If: Photos only, library under 50k, budget conscious.

Then: N100 mini-PC + Immich with OpenVINO iGPU acceleration.

Action: $200-300 total. ~15-20W idle. First scan finishes in hours.

If: Frigate-first, 4 cameras, want cheapest path.

Then: Raspberry Pi 5 + Hailo AI kit (Hailo-8L M.2).

Action: $200 total. 6-8W idle. Handles 4 streams at 1080p main+substream.

If: Frigate + Immich + Plex on one box.

Then: Intel Core Ultra 125H mini-PC with OpenVINO.

Action: Best all-rounder: object detection + face recognition + license plate + Plex transcode on same silicon.

If: Already have RTX 3060 12GB for Plex.

Then: Run Immich + Frigate + Ollama 7B on the same GPU.

Action: All three share one card. Power idle 15-25W when stuck in P0, fix with nvidia-persistenced.

If: Wants silent + low power + LLM-first.

Then: Mac mini M-series as ML appliance alongside the NAS.

Action: Mac mini runs Immich remote ML + Ollama. NAS stays untouched. 4-7W idle.

If: On Synology and want HW-accelerated ML.

Then: Effectively unavailable on most models. Offload to an external box.

Action: N100 mini-PC running Immich remote ML container, configured in the NAS's Immich instance.

Evidence

Evidence table

SymptomEvidence to collectLikely layerNext action
Immich first-scan stuck at <10% after 24 hours.Immich dashboard ML jobs queue showing thousands pending.Software-only ML on weak CPUAdd an accelerator (N100 iGPU + OpenVINO, or NVIDIA via remote ML), or accept the days-long first scan.
Frigate logs show inference 200+ ms.Frigate UI > stats > inference time.Software-only detection, accelerator not wired up.Configure Hailo / OpenVINO / TensorRT detector per Frigate docs; verify accelerator visible in container.
Ollama crawls at <3 tok/s on a small model.ollama logs showing CPU-only inference.GPU not visible to Ollama, OR memory-bandwidth bound (N100 doing it the slow way).Pass GPU device to Ollama container, OR move LLM workload to a Mac mini.
Plex spins up CPU during transcode while Immich+Frigate run.intel_gpu_top shows iGPU at 100%; CPU spiking.Single iGPU is the bottleneck for combined workload.Split workloads: Plex on iGPU; Immich + Frigate on a separate accelerator (Hailo or A380).
Reference

Commands and settings paths

Verify Immich ML container can see GPU

`docker exec immich-machine-learning nvidia-smi` (NVIDIA) or `vainfo` (Intel)

Where: On the host running Immich.

Expected: GPU details visible; driver loaded; supported codecs listed.

Failure means: If nvidia-smi errors, the container doesn't have the runtime; add `--runtime=nvidia` and `NVIDIA_VISIBLE_DEVICES=all` to compose.

Safe next step: Restart container after fixing.

Verify Frigate detector configuration

Frigate config.yml > `detectors:` section

Where: On the host running Frigate.

Expected: Detector type (hailo8l / openvino / tensorrt) and device path correct.

Failure means: If detector unspecified, defaults to CPU which is slow (~200ms inference).

Safe next step: Set detector type + device, restart Frigate.

Check Ollama is using GPU

`ollama ps` (during a query)

Where: From the host running Ollama.

Expected: Output shows model loaded into GPU memory.

Failure means: If `ollama ps` shows only CPU, GPU isn't visible to Ollama.

Safe next step: Verify GPU pass-through; restart Ollama service.

Verify storage layout for Immich

TrueNAS Scale: confirm Userdata, PGData, MLData datasets on the right pools

Where: DSM web UI or `zfs list` from shell.

Expected: PGData + MLData on NVMe pool; Userdata can be on HDD pool.

Failure means: PGData on HDD = slow Immich UI; MLData on HDD = slow first scan.

Safe next step: Move datasets to correct pools via Storage > Datasets.

Hardware boundary

Hardware and platform boundary

Change only when

  • Upgrade accelerator when first-scan times exceed a weekend OR Frigate cameras grow past 8 streams OR Ollama use case stabilizes and you want faster tok/s.

Evidence that matters

  • Memory bandwidth (Mac mini M-series), AV1 encode (Arc A380 / RTX 4060+), simultaneous Frigate detector + Plex iGPU (Intel Core Ultra 125H), all-three-workloads on one card (RTX 3060 12GB+).

Evidence that does not matter

  • FP32 raw TOPS don't matter for these workloads — media engine + memory bandwidth do.

Avoid

  • Don't buy Coral USB in 2026 (Frigate docs explicitly recommend against). Don't expose Ollama API publicly. Don't run Immich ML CPU-only on a library above ~30k photos — first-scan time exceeds patience.

Last reviewed

2026-05-18 · Reviewed by HomeTechOps. Reviewed against the Frigate official hardware recommendations docs, Immich's ML hardware acceleration documentation, the FUTO-Immich GitHub discussions on large photo libraries, the TrueNAS Community Apps + Unraid Community Apps templates as of May 2026, and Hailo + Frigate + Home Assistant integration documentation.

Source-backed checks

HomeTechOps turns official docs and conservative safety rules into a shorter runbook. These links are the source trail for the page direction.