Gear
Safe troubleshooting rules
HomeTechOps is built around one habit: check first, change one thing, avoid unsafe steps, and stop when risk increases.
Who this is for
Home operators in 2026 about to make a change to a router/mesh, NAS pool, UPS battery, dock/cable, BitLocker-protected device, smart-home Thread border router, work-managed laptop, or cloud-backup config — anyone who has felt the post-CrowdStrike urge to 'just disable the EDR and try one thing' and needs the meta-rule that prevents the second-action disaster from following the first.
Outcome
A repeatable meta-framework: capture state with platform-native debug-bundle tools (Synology Support Center → Get Logs, TrueNAS Save Debug, Windows `ipconfig /all` + Event Viewer, BitLocker recovery key URL pre-flight), pick ONE reversible change at the smallest safe layer, compare to the runbook's expected state, respect the managed-device / physical-safety / data-sovereignty / exposure boundaries, and document the final state in the home tech inventory. Built around the 2026 reality that the original problem is usually not what causes loss — it's the rushed second action (factory-resetting the gateway Eero, replacing the wrong disk during a ZFS resilver, disabling EDR mid-meeting and triggering Entra CAE token revocation, force-pushing to 'unleak' a token that's already burned).
Required inputs
- Current symptom + affected devices + last known working time + last change (Windows Update, NAS firmware, router auto-firmware push, app update — the 'I didn't change anything' claim is almost never literally true in 2026, where Patch Tuesday + DSM/TrueNAS minor releases + smart-home hub OTAs happen weekly).
- Device classification: personal vs work-managed (Intune / Workspace ONE / Jamf / Mosyle / Entra-joined). Determines whether bypassing EDR/VPN/firewall is policy-safe (rarely yes) or a compliance violation that will lock the user out further via Microsoft Entra Continuous Access Evaluation re-auth.
- Backup state for any storage-touching change: when was the last restore-drill? Is there an immutable copy (Object Lock on B2/Wasabi/S3, or air-gapped USB)? If both answers aren't 'yes/recent', the change is not safe yet.
- BitLocker recovery key visibility: for ANY firmware/Windows-update/dock change on a BitLocker-protected device, confirm the key is visible at account.microsoft.com/devices/recoverykey (personal) or `aka.ms/aadrecoverykey` (work/school) FIRST. Microsoft Support cannot recreate a lost key.
- Physical safety signs for the change category: hydrogen ventilation for lead-acid UPS battery swaps (1 CFM/sq-ft minimum, LEL is 4%), removed watches/rings/conductive jewelry before any battery work, USB-IF certified cable for any Thunderbolt host (defective EMARK can drive 20V onto VCONN and silently kill the controller).
Step-by-step procedure
Capture current state with platform-native debug tools BEFORE any change
Do: The post-CrowdStrike rule: evidence first, change second. Use the right tool per platform: Windows 11: `ipconfig /all > network-state.txt`, `Get-ComputerInfo > sys.txt`, Reliability Monitor, Event Viewer → System + Application. macOS Tahoe: `system_profiler` output, Console.app filtered to the affected subsystem, Activity Monitor export. Synology DSM 7.3: Main Menu → Support Center → Support Services → Log Generation → select components → Generate Logs → wait → download `debug.dat`. TrueNAS Scale 25.10 Goldeye: System → General Settings → Save Debug (captures system dataset / SMB metadata / encrypted-pool keys); ALSO export config (System → Advanced → Manage Configuration → Download File) — these are TWO files, not one. Router/mesh: most apps support config-backup-to-cloud (Asus AiMesh exports a Settings file — use it; TP-Link Deco has partial config backup; Netgear Orbi exports; Eero does NOT — capture manually via screenshots). CRITICAL Eero rule: a 15-second hard reset on the gateway Eero erases the entire mesh; never do that without a screenshot inventory of every reservation, port-forward, and SSID setting.
Expected result: Diagnostic files captured + stored OUTSIDE the affected system. State-before snapshot exists for comparison after the change.
If not: If the platform-native debug tool fails or you can't find it, that itself is a finding — record the path you tried, stop, and find the correct one before the change. Don't skip the evidence step.
Apply the reversibility hierarchy — pick the smallest reversible change first
Do: Tier 1 (always reversible): cable reseat, single-device power cycle, DHCP renew, disable/enable Wi-Fi adapter on one client, browser/app setting revert. Tier 2 (reversible with effort): firmware update (if config backup exists), Synology DSM 7.3 minor update, TrueNAS Scale minor update, router config restore from backup, Plex library replacement. Tier 3 (effectively irreversible without recovery effort): disk format, RAID/array 'Initialize' action, BitLocker enable without recorded recovery key, encrypted snapshot deletion, signing out of iCloud during photo sync (Optimize Storage stubs become unrecoverable if iCloud was the only full-res copy), Microsoft account 2FA disable without recovery codes, factory-resetting the only working router without config backup, Eero gateway hard-reset (no cloud config restore — total wipe). Tier 4 (account-level irreversible): published social/forum post, sent email, GitHub force-pushed and pruned commit (and force-push does NOT unleak a credential — assume any token that touched a public commit is burned, rotate it). Rule: never apply a Tier 3+ change while a Tier 1 or Tier 2 option exists for the same symptom.
Expected result: Chosen change is the smallest layer that could plausibly fix the symptom AND can be reversed within ~10 minutes if it doesn't help.
If not: If the only path is Tier 3+, stop. Capture a second debug bundle, drill a restore from the current backup, and reconfirm the change is the right one before proceeding.
Identify physical components by serial + bay LED + position BEFORE any destructive action
Do: The ZFS-resilver-on-wrong-disk failure mode is the canonical 2026 storage disaster. Identify the failing drive THREE ways: (1) Serial number from Storage Manager (DSM) or System → Storage → Pool → Disks (TrueNAS) — cross-reference with physical label on the drive. (2) Bay LED — DSM has 'Locate Drive' / 'Blink LED' action; TrueNAS Scale has 'Locate' in Storage UI; Unraid has S.M.A.R.T. test which spins the drive. (3) Physical position — drive bay number on the enclosure. Match all three before pulling. BTRFS-RAID1 caveat: if SMART is clean but checksum errors are accumulating, suspect RAM (especially non-ECC) BEFORE replacing the disk — a documented Synology forum case traced 2 checksum + 2 uncorrectable errors to a single bad RAM stick. Backblaze 2025 stats (1.36% AFR across the 344,196-drive analysis set; CMR-only) confirm: drives don't fail at 50% rates; ECC RAM does. Run a memtest86 pass before swapping a drive.
Expected result: Failing drive is identified by serial + LED + bay number, all matching, before the array is touched. Suspect RAM is ruled out via memtest if checksum errors don't match SMART.
If not: If you can't get all three identifiers to match, do NOT pull a drive. Stop, reboot, re-read pool status from a known-good admin path, and verify.
Respect the managed-device boundary — never bypass EDR / VPN / Conditional Access
Do: Work-managed devices (Intune / Workspace ONE / Jamf / Mosyle / Entra-joined) have policies that revert operator changes within minutes. EDR (CrowdStrike Falcon, SentinelOne, Defender for Endpoint, Sophos) will flag bypass attempts and the post-July-2024 industry now treats reverts as canary-ringed, customer-controlled deployments — IT teams in 2026 are slower to revert EDR changes than they were in 2023. The 2024 CrowdStrike incident (8.5M Windows BSOD'd from Channel File 291 mismatch) is the institutional memory that makes EDR teams cautious. Microsoft Entra Continuous Access Evaluation (CAE) revokes session tokens in near-real-time on Exchange/SharePoint/Teams when device-compliance state changes — disabling EDR mid-meeting signs the user out within seconds, not at next login. The June 15, 2026 enforcement change added: CA policies targeting 'All resources' now enforce sign-ins even when resource exclusions exist. What to do instead: capture exact errors, screenshots, Event Viewer entries, and escalate to IT. Pivot the work to phone audio / alternate device. The conditional access gate is at the identity layer, not the network layer — bypassing the network doesn't help.
Expected result: Managed-device boundary respected. Errors documented. Meeting/work continuity preserved through pivot, not bypass. IT engaged with evidence.
If not: If you find yourself typing the local admin password to disable Defender or Falcon, stop. That triggers a separate compliance alert. Reboot to the safe state and contact IT.
Apply physical safety boundaries for power, battery, and cable work
Do: Lead-acid UPS battery swap: remove all watches/rings/conductive jewelry (a wrench bridging a 12V VRLA terminal can dump 100+ amps instantly); ventilate the room (hydrogen LEL is 4%, charging batteries off-gas at ceiling level, 1 CFM/sq-ft minimum); insulate one terminal at a time with electrical tape before disconnect; never wipe a battery with cloth that has solvent on it. LiFePO4 portable power station hybrid (EcoFlow DELTA 2, Bluetti AC180): far lower thermal-runaway risk than NMC, but NOT zero — don't puncture, don't charge above rated temp (0-45°C for most), don't stack under blankets. USB-C cable swap: swap ONE suspect cable at a time, never two. EMARK chip failures and CC-pin misalignment can drive 20V onto VCONN (instead of VBUS) and silently destroy Thunderbolt 5 host controllers — documented kills on Dell XPS 13 (2022) and HP Spectre x360. Use USB-IF certified cables (240W rated for TB5/USB4 v2 hosts); avoid no-name 240W cables. Active-PFC PSU + simulated-sine UPS: silent failure — the PSU sees the zero-crossing notch as a brownout and drops out the instant grid fails. Verify pure-sine output on the UPS spec sheet for any modern NAS, gaming desktop, or Synology Plus/XS+ model.
Expected result: Physical safety steps completed before electrical work. One-cable-at-a-time discipline for USB-C swaps. Pure-sine UPS verified for Active-PFC loads.
If not: If you can't ventilate, can't remove conductive jewelry, can't confirm USB-IF certification, can't verify pure-sine — stop. The cost of the wrong move (dead battery short, dead Thunderbolt controller, dead NAS at outage moment) is higher than the cost of delay.
Respect the exposure boundary — what leaves the network and what stays
Do: When posting config/logs to a forum, Discord, or vendor support ticket: redact account emails, MAC addresses, internal IPs, serial numbers, ISP account IDs, recovery questions, BitLocker recovery key IDs (even partial), Tailscale/Cloudflare/Plex auth tokens, API keys. GitHub secret-scanning is detection, not prevention: as of 2026 it covers 28+ new detector providers (Airtable, DeepSeek, npm, Pinecone, Sentry) with validity checks + automatic partner-revocation paths — but force-pushing to remove a leaked credential does NOT unleak it. Once a token has touched a public commit (even briefly), assume it's burned and rotate it. 39M secrets leaked across the platform in 2024; the May 2026 Nightwing/CISA contractor incident is the latest reminder that push-protection-disabled commits 6+ months old can be the breach. Don't open ports without first verifying CGNAT/IPv6 setup (see new-router-migration — WAN IP in 100.64.0.0/10 means no port-forward is possible without a tunnel). Don't disable firewall or EDR to 'test' — see managed-device boundary.
Expected result: External shares are redacted. Leaked credentials are rotated, not 'force-pushed away'. Exposure decisions are intentional, not accidental.
If not: If you find an old token in a private (or public) repo, rotate it FIRST, then clean up the history. The order matters because the token is already burned the moment it's committed.
Document final state in the home tech inventory + update the runbook handoff
Do: After the change: update the home tech inventory with the new firmware version, new IP reservation, new dock firmware date, new BitLocker recovery key ID, new cloud-backup state — same-day, not 'when I get around to it'. Update the runbook's `lastReviewed` date if the change uncovered missing context (an undocumented vendor bug, a new product version, a new failure mode). Capture: what fixed it, what didn't, what you'd do differently next time. Two-person rule for major operations: for NAS migrations, RAID expansions, BIOS/firmware flashes, BitLocker enables on devices holding the only copy of important data — have a second person available (in person or on a call) who can stop you from a mistake. Solo high-stakes work is when most disasters happen.
Expected result: Inventory updated same-day. Runbook handoff note exists. Two-person discipline applied to high-stakes operations.
If not: If you can't write down what changed within an hour of the change, you'll forget the key detail. Set a calendar reminder for the next day to do the post-mortem before context fades.
Commands and settings paths
BitLocker recovery key pre-flight
Browser to https://account.microsoft.com/devices/recoverykey (or aka.ms/myrecoverykey for personal; aka.ms/aadrecoverykey for work/school)
Where: Microsoft account web with 2FA, on a different device than the one about to be updated.
Expected: Recovery key entry visible for the target device, key ID matches what would display on the recovery screen (24H2+ shows a hint).
Failure means: If the key is missing or under a different family Microsoft account, the device is one update prompt from unrecoverable. Microsoft Support CANNOT recreate a lost key.
Safe next step: Resolve account mapping FIRST. Capture the recovery key + ID into 1Password/Bitwarden secure note. Then proceed with the update.
Restore-drill freshness check
Backup app → history → last restore-drill date; immutable-copy bucket → Object Lock state
Where: Backup admin UI (B2 / Wasabi / Synology Hyper Backup / TrueNAS replication).
Expected: Last restore drill completed within the last 30 days for source data being touched. Object Lock enforced on at least one cloud copy.
Failure means: If drill is stale or immutable copy isn't enforced, the change is exposed to LockBit-class ransomware AND to operator-error data loss. Don't proceed.
Safe next step: Run a fresh drill via backup-restore-check. Enable Object Lock on the cloud destination (free on B2/Wasabi/S3).
Drive identification triple-match
Storage Manager > Drive > Locate (LED blink) AND record drive serial AND note bay number
Where: Synology DSM / TrueNAS Scale / Unraid web UI.
Expected: Serial matches physical label; LED blinks on the bay you intend to pull from; bay number is recorded.
Failure means: If any of the three don't match, the wrong drive is about to be pulled. Pulling the wrong drive during a degraded array = pool loss.
Safe next step: Reboot to a known-good admin path. Verify pool status. Re-confirm all three identifiers before touching the array.
Synology debug bundle (DSM 7.3)
Main Menu → Support Center → Support Services → Log Generation → select components → Generate Logs
Where: DSM 7.3 admin UI.
Expected: Browser downloads `debug.dat`. Extract with 7-Zip. Contains kernel logs, package logs, system events.
Failure means: If generation fails, the diagnostic trail is missing — don't proceed with storage changes or DSM upgrades.
Safe next step: Retry from a different browser. Check Support Center service status. Capture screenshots of the failure for IT/Synology Support.
TrueNAS debug + config (Scale 25.10 Goldeye)
System → General Settings → Save Debug; THEN System → Advanced Settings → Manage Configuration → Download File
Where: TrueNAS Scale 25.10 Goldeye web UI.
Expected: Two files: debug bundle + encrypted .db config. Both stored alongside the inventory.
Failure means: Config download empty or debug fails = stale data state. Pool feature flags can become incompatible with older releases after major upgrade.
Safe next step: Restart middleware service (`systemctl restart middlewared`) if download fails. Verify file sizes are non-zero before any upgrade.
Eero pre-reset inventory capture
Eero app > Settings > Discovery / Reservations / Port Forwarding > screenshot each screen
Where: Eero app on the admin's iPhone/iPad.
Expected: Screenshots of: every SSID + password, every IP reservation, every port-forward rule, every guest network, every DHCP range setting. There is NO in-app cloud config restore — these screenshots ARE the backup.
Failure means: If any screen is incomplete, the post-reset rebuild will be missing reservations or port-forwards. Devices will lose stable IPs.
Safe next step: Continue screenshotting until every settings screen is captured. Store in 1Password secure-note attachment alongside inventory.
Evidence to record
- Symptom + affected device + exact error text + last known working time + last change.
- Device classification (personal / work-managed) + EDR / Conditional Access state.
- Debug bundle path per platform (DSM debug.dat / TrueNAS Save Debug + config / Windows ipconfig + Event Viewer / Eero screenshot inventory).
- Restore-drill date for any storage-touching change; immutable-copy state per source.
- BitLocker recovery key visibility (account email + key ID, NOT the key itself).
- Drive serial + bay LED + bay number triple-match for any disk swap.
- Physical safety state: ventilation, conductive jewelry removed, USB-IF certified cable, pure-sine UPS verified.
- Post-change inventory update with new firmware / IP / dock / BitLocker / cloud-backup state.
Common mistakes
- Skipping the debug-bundle capture because 'it's just a small change' — the CrowdStrike 2024 incident memory exists because skipped-staging at industrial scale BSOD'd 8.5M machines. Apply the same staging discipline at home: capture state, change one thing, compare.
- Factory-resetting the gateway Eero to 'fix' a slow node — that's a Tier 3 irreversible change with no cloud restore. Try Tier 1 (reseat ethernet, single-node power cycle) first.
- Replacing the wrong drive during a ZFS / BTRFS resilver — the canonical 2026 storage disaster. Identify by serial + LED + bay before pulling. Run memtest first if BTRFS checksum errors don't match SMART.
- Disabling EDR / Defender / Falcon mid-meeting to 'test' — Microsoft Entra CAE revokes Teams/Exchange/SharePoint tokens within seconds. The June 15 2026 CA enforcement change made resource-exclusions stricter. Capture errors + pivot the meeting; don't bypass.
- Force-pushing to 'unleak' a credential — does NOT work. GitHub secret-scanning has 28+ partner-revocation paths and validity checks; force-push hides the commit but the secret is already in caches and partner databases. Rotate the token; THEN clean history if needed.
- Updating Windows / NAS firmware / dock firmware without first checking the BitLocker recovery key URL — KB5083769 (April 2026) triggered recovery prompts on first restart due to PCR7 / Boot Manager signature changes. KB5089549 fixed it but the principle stands: verify recovery-key visibility before firmware/dock/BIOS work.
- Bridging UPS battery terminals with a wrench / watch / ring — 12V VRLA can dump 100+ amps instantly. Remove all conductive jewelry, insulate one terminal with electrical tape at a time. Ventilate (1 CFM/sq-ft minimum, hydrogen LEL 4%).
- Trusting a no-name USB-C cable on a Thunderbolt 5 host — EMARK chip failure can drive 20V onto VCONN and silently kill the TB controller. USB-IF certified cables only. Swap one suspect cable at a time.
- Simulated-sine UPS on Active-PFC PSU — silent failure mode. The PSU sees the zero-crossing notch as brownout and drops out the instant grid fails — opposite of what the UPS is meant to do. Pure-sine spec required for modern NAS/gaming desktop/Synology Plus or XS+.
- Sharing config/logs to a public forum without redacting — account emails, MAC addresses, internal IPs, ISP account IDs, recovery question answers are all in the threat model. Use a separate paste of the relevant section, not a full screenshot.
- Treating LiFePO4 portable power station as 'safe — can't possibly fail' — far lower thermal-runaway risk than NMC, but not zero. Don't puncture, don't charge above rated temp, don't stack under blankets, don't ignore the rated charge-temp range.
- Solo high-stakes operations — NAS migration, RAID expansion, BIOS flash, BitLocker enable on the only-copy device. Have a second person available who can stop you from a mistake. Most disasters are solo work.
Stop points
- Stop for any physical sign of heat / smoke / swelling / electrolyte leak / hydrogen smell / sparking / burning odor / clicking drives / corruption warning / 'Initialize disk' or 'Format' prompt — these are 'call vendor / electrician / data-recovery lab' conditions, not 'try one more thing' conditions.
- Stop before bypassing work-managed VPN / EDR / firewall / certificate / BitLocker recovery / conditional access — those are identity policy boundaries, not network knobs.
- Stop before any Tier 3 storage operation (format / Initialize / encrypted snapshot delete / pool destroy) until: (a) restore drill passed in last 30 days, (b) immutable copy exists, (c) drive identification triple-match completed.
- Stop before opening home internet ports until CGNAT check (`curl ifconfig.me` — if in 100.64.0.0/10, no port-forward is possible without a tunnel).
- Stop before factory-resetting the gateway Eero / Asus / Deco router until Eero screenshot inventory + Asus AiMesh Settings export are saved to encrypted store.
- Stop before posting config, logs, or screenshots publicly until redacting account emails, MAC addresses, internal IPs, serials, ISP IDs, and recovery hints.
Last reviewed
2026-05-06
Source-backed checks
HomeTechOps turns official docs and conservative safety rules into a shorter runbook. These links are the source trail for the page direction.