You're the one who mentioned that managing 10 pools would be ridiculous. I thought it was set and forget.
10 pools is. One pool isn't.
I am more worried about things popping out of the blue and kicking me over the shin. Like the scenario you mentioned, where losing a single disk in a single disk pool would trip up FreeNAS to the point of having to break out the command line and start talking straight to FreeBSD. Irk!
Well, that won't happen in a properly-managed RAIDZ2 pool. By that I mean "regular SMART tests and scrubs are working, email alerts setup and prompt action to remedy drive failures".
That's usually how it goes. Does everyone practice disaster scenarios before actually using FreeNAS? Practically none of the guides or videos I've seen have talked about this. So it doesn't seem like something high on the collective hive mind.
That's standard engineering practice. We tend to practice what we preach (or at least try - sometimes stuff doesn't work out), and making sure your server correctly handles common scenarios is essential to trusting it.
As for guides and videos, much of what is it there is painfully wrong. If there's a contradiction between some outside guide and a forum sticky, trust the forum.
You're the expert here. From my perspective... Erasing the pool. Removing the drive. Add a new drive. Create a new pool. Done.
You're missing crucial, non-trivial steps:
- Determine what data needs to be copied
- Determine where additional copies can be found
- copy data over to new drive
This procedure is just begging to be done by an entity designed to treat massive amounts of data - namely the server itself. These steps are fully automated by ZFS, in a proper pool (RAIDZ vdevs or mirrors).
I do not plan on running any services, plugins or whatever on the NAS.
Here's a big problem: How on earth do you get data onto the server without a file sharing service? Sneakernet?
Services aren't a weird, niche thing, they're fundamental to the operation of the server.
At the very least, you need CIFS or NFS or something similar.
Well, it's mostly a case of not having to download 25+ TB of data if everything is lost, vs only having to download 4 TB for a single drive. For all intents and purposes, time is the key here. I'd rather not spend a month to recreate my system. A single drive takes a few days.
With a properly-maintained pool, you should never, ever lose data. Nothing is foolproof, obviously, but the expected value of downtime for a proper configuration is "the ten minutes it takes you to replace the drive and turn on the server again". Then you just click the button and ZFS will handle the rest while still being available.
You do not lose days, much less months.
Losing a pool is like a plane crash. People work damn hard to make sure it won't happen and the hard work pays off by making flying the safest way to travel. The only difference is that digital data can be trivially backed up, whereas people can't just submit to a quantum photocopying.
Losing a disk is like losing an engine on a plane. It happens and it's unavoidable. No big deal, there's more that can keep things going safely for a while. If planes crashed every time an engine failed, we'd all be screwed.
tl;dr - You're supposed to build a reliable system from unreliable parts by using redundancy, not a small mountain of unreliable systems cobbled together in an ad-hoc fashion.