Dell R630 ESXi + TrueNAS Pre-Deploy Sanity Check

SuperWhisk

Dabbler
Joined
Jan 14, 2022
Messages
19
This is my first TrueNAS setup, or indeed any kind of NAS or home server setup other than pfSense.
I have tried to be thorough and have spent a lot of time reading posts here on the forum as well as other places, and aside from a few hopefully minor points I think my plan follows the generally recommended guidelines.
I know it's a long post, so thank you in advance to those who take the time to read it and respond.

My Usecase and Goals
  • Home Use
  • "Better than my current data storage setup" - Surely even a small step in the right direction is better than no step at all?
  • Looking for bulk network storage to share across Linux, Windows, and MacOS.
  • High Availability and Performance are nice to haves
  • Data Integrity is Important/Critical (but refer back to point 2)
  • I want to run a number of virtual machines on the same hardware, but these do NOT need to have their virtual disks backed by the TrueNAS pool
My Current Data Setup:
  • iMac with 256GB internal SSD and three USB hard drives - 2TB and 3TB for bulk storage, 8TB for time machine backup of the other two plus the internal drive.
  • Linux and Windows machines are not backed up at all, but they don't currently contain anything that isn't replaceable. All the important data like pictures is on the iMac.
  • This really feels like a "better than nothing at all" setup, but not much better.

The New (to me) Hardware:
  • Dell PowerEdge R630 Server with 8 2.5" SAS/SATA bays connected to a Dell HBA330mini Controller (LSI 3008 chipset)
    • Originally had a PERC H730mini raid card that includes an "HBA mode" (which I verified does pass through drive data like SMART, Serial Number, etc) but I decided to bite the bullet and get the HBA330mini anyway just to be safe. Sadly they both use a proprietary? PCIe connector so I can't use the raid card in another machine, or in a different slot on this one.
  • 8 new 1TB Seagate Constellation.2 2.5" SATA hard disks (plus two cold spares)
    • Older model enterprise drives purchased from a liquidator, but verified to the best of my ability to be completely unused. Clean SMART, and no scratches on the pins from having a SATA cable attached and removed (much harder to fake that than clean SMART).
    • I was considering retail WD Red Plus drives of the same capacity (nobody seems to make 2.5" drives over 1TB) but these came up and the price was right (1/4 of the cost of the reds) so I went for it. Happy so far following burn-in testing.
  • Dual Xeon E5-2620 v4 (8 cores each with HT - 32 Logical Cores)
  • 64GB of DDR4 ECC Memory (32GB per CPU)
  • 500GB Samsung 870 Evo SATA SSD connected to the on-board SATA controller (normally used only for disc drive and possibly tape backup).
  • 500GB Samsung 970 Evo Plus NVMe SSD in a PCIe slot using an m.2 to full-size slot pass through adapter (just re-mapped pins. No smarts on the adapter).
  • Dual 750W hot-swap PSUs, but currently no UPS or any other power loss protection - this is something I want to remedy in the future, but not in the budget at present.
The Plan:
  • ESXi 7.0U3 on bare metal - this accomplishes the "run other VMs goal" as obviously this hardware would be gross overkill for just NAS use.
  • TrueNAS Core 12U8 in a VM with PCIe pass-through for the HBA controller.
  • TrueNAS will be exclusively focused on being a NAS. No VMs. Let each do what it does best.
  • Use the existing 8TB USB disk as a local backup of the zpool contents (not sure the best way to do this. I know zfs has some built-in features, but need to research that more)
  • Eventually have an off-site backup, possilbly using two 3.5" 4TB WD Reds that I have here unopened. Maybe add a third for z1 parity instead of just striping with no redundancy.
What I have done so far:
  • ESXi is installed on the NVMe SSD, with the SATA SSD simply there as an extra ESXi datastore (ok, ok, it also hosts an EFI Shell, startup.nsh, and Intel's EDKII NVMe EFI Driver so that I can boot from the NVMe SSD, but once ESXi boots that memory gets dumped and ESXi has it's own drivers.)
  • Obvously ESXi is NOT installed on redundant storage and I acknowledge and accept this single point of failure. I will make sure to backup all my configs (including TrueNAS) to my other computers regularly.
  • Using a plain FreeBSD 12 VM, I did burn-in testing on the hard disks + controller using a combination of SMART tests, badblocks, and the sol-net array script found here on the forums. I didn't do months long testing as some here do, but after a solid week of testing I am reasonably confident that there are no bad apples in the bunch (the two spares also got the same testing regiment). This is certainly more testing than any of my current drives ever got (which is absolutely none).
  • Using Dell's built-in diagnostics I ran "extended" memory and other hardware tests for a number of days with no errors. I have not used memtest86 or memtest86+ (since the original seems to have locked most of the tests behind a paywall)
What I still need to do:
  • Create the TrueNAS VM and setup the zpool (still need to make a final decision on zfs layout, and how much to allocate as far as memory and cpus, etc).
    • Still waffling between z1 or z2 but striped mirrors is out as I want more than 4TB of usable space in my 8 1TB drive array. I think I would be fine with 6TB, if I went with Raidz2...
    • Would be willing to throw one of the two CPUs in it's entirety at TrueNAS if it was worth doing (CPU would come with 32GB of memory too).
  • Test pool? (on top of the testing I did on the drives individually). Not sure what this looks like.
  • Setup Samba, move data in. Be confident in stability, then re-purpose 8TB USB disk as local backup/copy of pool contents.
  • Schedule scrubs, SMART tests, config backups, and other monitoring.
Questions:
  • What level of priority should a UPS purchase be? It sounds like zfs doesn't have the "write hole" problem of Raid5/6 but obviously sudden power loss is never a good thing.
  • Burn-in testing on the pool itself? What does that look like? Performance is a marginal concern as long as I can at least saturate my 1Gbps LAN on occasion, which shouldn't be a problem even for a single SATA spinning disk with a 6Gbps interface.
  • Is there any value at all to creating two virtual disks on different datastores (different physical drives) for the TrueNAS zfs boot pool, or is that pointless given the lack of redundancy in ESXi's boot drive?
  • Any recommendations on VM settings for TrueNAS Core in ESXi other than passthrough for the HBA? (eg, scsi or sata based virtual boot disk? Reserved memory and/or cpu cores?)
  • Any red flags jump out in any of this that I haven't acknowledged?
  • Should I consider dumping ESXi and running TrueNAS SCALE on bare metal instead? The Linux base would enable more widely compatible virtualization directly within TrueNAS, but it doesn't have the tried and true legacy of TrueNAS CORE.
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
What level of priority should a UPS purchase be? It sounds like zfs doesn't have the "write hole" problem of Raid5/6 but obviously sudden power loss is never a good thing.
From a cost/benefit perspective, I think a good UPS is one of the best things you can buy for computer equipment. If you get a UPS that reconditions power, then you're protecting yourself from over/under voltage, surges, and power loss conditions.
Burn-in testing on the pool itself? What does that look like? Performance is a marginal concern as long as I can at least saturate my 1Gbps LAN on occasion, which shouldn't be a problem even for a single SATA spinning disk with a 6Gbps interface.
Not really needed. What you're trying to accomplish with burn-in is to get through the first part of the theoretical bathtub failure curve for disks. It doesn't matter whether you're testing the disks directly or as part of a pool for that. Based on the burn-in that you already have accomplished, I'd say that you're good.

You may, however, want to do testing on the pool for the purposes of checking performance. Testing when the pool is empty gives you the best sense of performance, and allows you to easily wipe and restart if you make a mistake.
Is there any value at all to creating two virtual disks on different datastores (different physical drives) for the TrueNAS zfs boot pool, or is that pointless given the lack of redundancy in ESXi's boot drive?
I'd say no.

The general benefit to redundant boot drives is higher availability. In a business case, if one of your boot devices goes offline (which is not uncommon if you use USB flash drives for boot devices), having a mirrored pair means that your NAS keeps running, and no services go offline. Furthermore, since your NAS is still operational, you don't have to futz with restoring the config from backup.

For a home user, assuming that you're making regular backups of your NAS configuration, losing your boot device means the minor inconvenience of reinstall TrueNAS and restoring from backup. Sure, you might be down for a couple hours, but that's not a huge consequence.

However, in your case, you're limited by ESXi. If you can't keep ESXi up, then it doesn't matter if you have redundant boot devices.
Any recommendations on VM settings for TrueNAS Core in ESXi other than passthrough for the HBA? (eg, scsi or sata based virtual boot disk? Reserved memory and/or cpu cores?)
I'm not super familiar with ESXi, though in general with virtualization, I'd recommend whatever paravirtualized options are available. These are generally faster. Also, make sure you have enough memory for TrueNAS. It will use as much memory as it can for the purpose of caching data, so more memory is generally better. CPU needs are pretty minimal. As a default, I'd assign 2 CPU cores for TrueNAS, and monitor to see if that becomes a bottleneck.
Any red flags jump out in any of this that I haven't acknowledged?
You research is pretty thorough. Nothing major jumps out at me. I'd definitely recommend RAIDZ2 over RAIDZ1, though only you know your own risk tolerance.

I would prioritize setting up scrubs, SMART tests, etc before moving good data over. You want the system to be setup first before you throw data on it. That way, if you mess up on your settings, you can easily restart without impacting your data.
Should I consider dumping ESXi and running TrueNAS SCALE on bare metal instead? The Linux base would enable more widely compatible virtualization directly within TrueNAS, but it doesn't have the tried and true legacy of TrueNAS CORE.
I think your current strategy is sound. Virtualizing TrueNAS is a common deployment today, and as long as you pass the drives into TrueNAS correctly (which it sounds like you are), then there's nothing to worry about.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Be sure to follow the guidance at


Reserved memory and/or cpu cores?

Passthru requires reserved memory. Be generous with memory. Be stingy with CPU cores. Start out with two (or a reasonably small number) and only increase as needs demonstrate.

Originally had a PERC H730mini raid card that includes an "HBA mode" (which I verified does pass through drive data like SMART, Serial Number, etc) but I decided to bite the bullet and get the HBA330mini anyway just to be safe.

That's not "just to be safe".


The issue isn't really "it's a HBA" but rather "it's a controller and driver known to work 100% right". Which MFI definitely isn't, and MRSAS has so little mileage on it as to be hazardous.
 

SuperWhisk

Dabbler
Joined
Jan 14, 2022
Messages
19
Thank you both for the thorough responses! Glad to hear I am on the right track here.
I would prioritize setting up scrubs, SMART tests, etc before moving good data over. You want the system to be setup first before you throw data on it. That way, if you mess up on your settings, you can easily restart without impacting your data.
Yes setting up scheduled scrubs, tests, etc is out of order in that list. I will certainly do that before moving in any data (not that I won't always have another copy though).
From a cost/benefit perspective, I think a good UPS is one of the best things you can buy for computer equipment. If you get a UPS that reconditions power, then you're protecting yourself from over/under voltage, surges, and power loss conditions.
I do have the server connected through a surge protected power bar (and not just some junky one, I nice TrippLite one with solid surge protection that will stop functioning when it is worn out so I don't keep using assuming it is providing protection), but no other conditioning.

Are there any risky moments with zfs (eg perhaps during a scrub or resilver?) where a sudden power loss would be truly catastrophic compared to typical consumer desktop storage with NTFS or EXT4 on a single disk? I'm wondering about potential zfs gotchas where you could loose more than just the one file that you happened to be in the middle of a write for when the power went out.
The sense that I have gotten from reading these forums is that zfs is really good at what it does, but given the right unfortunate conditions it can blow up in your face and you can lose more data than you would in a non-zfs scenario as described above (eg bad/non-ecc memory propagating errors to the disks in a way that zfs can't detect). Some of those risks seem inherent to the concept of striping data across multiple drives though (regardless of whether it's zfs or classic raid, or some other thing).
Be sure to follow the guidance at

I think the only thing I haven't done from that post is validate a bare-metal install vs a VM with the controller passed through. I may do that after I have the VM version installed and the pool setup (but before putting important data in). I did all my burn-in testing in a VM so that it had as close to the final hardware configuration (from the view of FreeBSD) as it could be.
Passthru requires reserved memory. Be generous with memory. Be stingy with CPU cores. Start out with two (or a reasonably small number) and only increase as needs demonstrate.
You are of course right. I was thinking of just reserving the CPU cores too, and possibly locking them to one of the underlying processors, but maybe that's all premature optimization.
Need to balance this with the limitation on the amount of memory per CPU (I don't really want TrueNAS or some other VM running on CPU1 reaching over to CPU2's memory pool if I can help it, so I could probably give TrueNAS 16GB, but more than that would be limiting the usefulness of the other cores on the same physical CPU.
Maybe I'm over-thinking it and the ESXi scheduler will balance all this nicely without me sticking my nose in it - something I will need to research more.
That's not "just to be safe".


The issue isn't really "it's a HBA" but rather "it's a controller and driver known to work 100% right". Which MFI definitely isn't, and MRSAS has so little mileage on it as to be hazardous.
Indeed, it was that post and others like it that convinced me to get the HBA even if the other one would have "probably worked fine" in HBA-mode. I don't recall which driver the H730 showed up under when it was in "HBA Mode".
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
Are there any risky moments with zfs
Not really. Since ZFS effectively patches the write hole with copy-on-write, ZFS is pretty robust in the face of power loss. You will obviously lose any data that's pending at the time of power loss, but your existing data should be fine.

This issue is more that HDDs aren't always very robust in the face of power loss. Head crashing is a real phenomenon, and since power-loss happens to all disks at the same time, the odds that you'll lose multiple drives is much higher. Definitely not 100%, but even at 1-2%, that's much too high for my comfort.

but given the right unfortunate conditions it can blow up in your face and you can lose more data than you would in a non-zfs scenario as described above (eg bad/non-ecc memory propagating errors to the disks in a way that zfs can't detect). Some of those risks seem inherent to the concept of striping data across multiple drives though (regardless of whether it's zfs or classic raid, or some other thing).
ZFS is not as fragile as these forums would perhaps lead you to believe. What happens is that, in the event that you lose data above and beyond what your current pool can tolerate, you have very few tools with which to recover the rest of your data. ZFS is a complex beast under the hood (and for good reason), and that complexity makes recovery of partial data very difficult.

Now, take comfort in the fact that there's a really positive reason that these tools don't exist: there aren't many people who need these tools. ZFS is incredibly good at protecting your data, and combined with a good setup (proper redundancy), good data practices (backups), and good hardware, you shouldn't ever be in a place where you need to attempt a partial data recovery.

Maybe I'm over-thinking it and the ESXi scheduler will balance all this nicely without me sticking my nose in it - something I will need to research more.
ESXi is generally pretty good. My guess is that you're unnecessarily sticking your nose in it here. However, I'm also the kind of person that needs to tinker with everything, so it would be pretty hypocritical to tell you not to tinker.
 

SuperWhisk

Dabbler
Joined
Jan 14, 2022
Messages
19
ZFS is not as fragile as these forums would perhaps lead you to believe.
There needs to be a pinned post or resource that expands upon this, because I am probably not the only one with that impression, given the number of instances of "if you don't do it exactly like this you will lose all your data" found here.
A healthy amount of fear can prevent silly mistakes, but go too far and it all starts to sound like a house of cards no matter how you do it.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
because I am probably not the only one with that impression, given the number of instances of "if you don't do it exactly like this you will lose all your data" found here.

That's mostly because lots of people come in doing things that are in fact risky, and the community gets weary of explaining over and over stuff like why RAID controllers are bad. ZFS is dependent on its being in charge of write ordering to disks. If you have a RAID controller like an LSI SAS39xx with 8GB of cache, which can really soak up the writes, and since you don't need to get it with a battery/supercap option, power loss means you lose a bunch of pool writes on power loss. This can definitely raise to a level of potential for damaged pool to the extent of needing to roll back transaction groups.

The number of people who think that they can do what they want and ZFS will protect them against their bad choices has been pretty high, which is part of why I've written lots of resources on the topics. You definitely can and will lose data if you make certain kinds of wrong choices.
 
Top