(Used / self refurb) HP DL380e G8 SFF SSD iSCSI server (primarily) for ESXi

SamM

Dabbler
Joined
May 29, 2017
Messages
39
I'm building the following:
-Used HP ProLiant DL380e G8 SERVER 25SFF 2x 6 CORE E5-2430 2.2GHz 32GB P420 x1
--32GB RAM, 25 2.5" SFF bays in the front, twin PSU's.
--At first, I put the P420 RAID controller into HBA mode, then needed RAID elsewhere and replaced it with a spare HP H220 HBA I had laying around.

-Chelsio 110-1121-40 Quad Port 10GB PCIe Card HBA Full Profile x1
--The idea is to use 2 links, each link into separate physical switches in the same switch stack for the storage traffic. Stuff like management traffic can remain on the onboard 1GbE NIC's.

-Crucial MX500 1TB SSD 3D NAND SATA 6.0Gb/s 2.5" Internal SSD CT1000MX500SSD1 x7
--3-way mirror 'striped' (though I understand that FreeNAS/ZFS doesn't quite 'stripe' like a traditional RAID0 does) another 3-way mirror, plus 1 online spare; plus room for 6 more 3-way mirrors for future expansion.
---If I understand this correctly, the net result will be just under 2Tb (given overhead, formating loss, and the plain old fact that when the box says 1Tb, it generally formats to just under that...) with triple redundancy (capable of losing a minimum of 2 drives and a maximum of 4 out of 6 drive BEFORE the loss of the entire pool depending on which drives fail), and a hot spare in case a drive in the live pool fails. I also am of the belief that this pool will have the read IO's & 'bandwidth' of 2-to-6 drives (as any disk within a mirror can be read independently), write IO's & 'bandwidth' of 1-or-2 drives (not sure on that one since I'm not sure exactly how ZFS does writes across stripes, but a max of 2 since all disks in a mirror have to write equally in parallel), as opposed to say read & write IO of 1 disk and the 'bandwidth' of 5 disks in a RAIDZ1 pool configuration.

-Vaseky M.2 2280 SATA 60GB SSD MLC Internal Solid State Drive (SSD) for Desktop Notebook Standrad M.2 SATA 60GB SSD MLC Storage Grain x2
--Relatively cheap, low capacity mirrored SSD's for boot purposes.
---I'm using a StarTech.com PEXM2SAT32N1 M.2 Adapter - 3 Port - 1 x PCIe (NVMe) M.2 - 2 x SATA III M.2 - SSD PCIE M.2 Adapter - M2 SSD - PCI Express SSD adapter to house both M.2 SATA SSD's into a single PCI-E slot. These are SATA drives though, and thus connected to motherboard's available 2 conventional SATA ports configured for AHCI mode (as opposed to RAID or legacy).

-MyDigitalSSD 240GB (256GB) BPX 80mm (2280) M.2 PCI Express 3.0 x4 (PCIe Gen3 x4) NVMe MLC SSD x1
--This is an old, left over part I have. My orginal intention was to use a WD Black NVMe M.2 2280 250GB PCI-Express 3.0 x4 3D NAND Internal Solid State Drive (SSD) WDS250G2X0C that I bought specifically for this build, but it caused a kernel panic and system reboot every time I tried to add this NVMe device to a pool. A few crashes later, I swapped the WD for the MyDigitalSSD, and now the pool successfully builds and doesn't crash the server.
--This NVMe is in the M.2 PCI-E socket of the above mention StarTech adapter. This is how I got 3 M.2 devices into one PCI bay thus leaving as much room for disk pool drives as I could (w/o swapping out some of the server chassis rear-facing bays/slots).

The overall objective is to build a highly fault tolerant server (on the cheap, this is just under $2k so far) to power a couple of ESXi hosts, or at least supplement our existing FreeNAS (mechanical 3.5" drives) server which already powering these hosts. Right now, the ESXi hosts boot from small 10Gb iSCSI LUNs housed on the previous FreeNAS server; 1 LUN dedicated to each host. All the hosts also share a (relatively) big (multi Terabyte) iSCSI LUN which houses the VM's and respective data. This setup has worked well for me thus far but maybe there's room for improvement on this go-around. One suggestion I recently got is to use NFS for the big shared LUN instead of iSCSI.

I'm on the fence on how I should use the NVMe drive, or if I should use it at all. My initial thought was to use it as a dedicated SLOG device. I understand that SLOG devices only really help in synchronous (as opposed to asynchronous) write operations, and that NFS & VMWare relies heavily (if not entirely) on such synchronous writes. What's I'm not sure is if that applies or not to VMWare over iSCSI. Regardless of if a SLOG device will be helpful in this case or not, the current consensus seems to be that SLOG devices do not need to be mirrored and present very little risk should they fail.

Ideas? Suggestions? Corrections? Advice? I'm going to assume the first one is "More RAM( when you can afford it)!"...

Thanks
-Sam M.
 

SamM

Dabbler
Joined
May 29, 2017
Messages
39
I also read something about partitioning SLOG devices, though the article is a bit aged. Is this still best practice?

Also, I'd recommend you make your SLOGs using these command over what HoneyBadger provided. His will work, but I will always recommend people format their disks identically to how FreeNAS' GUI does it. It ensure future compatibility(which is always a good thing!)

gpart create -s GPT daXX
gpart add -t freebsd -a 4k -s 8G daXX (for 8G SLOG)
zpool add pool log daXXp1

If you have two disks, then you gpart 2 disks and then do the mirror. I highly recommend mirrors.

Mirrored version:
gpart create -s GPT ada4
gpart create -s GPT ada5
gpart add -t freebsd -a 4k -s 8G ada4
gpart add -t freebsd -a 4k -s 8G ada5
zpool add zpool1 log mirror ada4p1 ada5p1
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Advice? I'm going to assume the first one is "More RAM( when you can afford it)!"...

With precognition like that, you're going to fit in just fine around here. :)

Interesting that the WD Black would cause a crash just from being added to a system pool. If you happen to have any kernel dumps/crashlogs from this I'm sure they'd be of interest to the FreeNAS dev team.

All the hosts also share a (relatively) big (multi Terabyte) iSCSI LUN which houses the VM's and respective data. This setup has worked well for me thus far but maybe there's room for improvement on this go-around. One suggestion I recently got is to use NFS for the big shared LUN instead of iSCSI.

There's pros and cons to each. iSCSI has the advantage of being able to use MPIO and leverage both of the 10Gbps links more fully (and I think better support for the VAAI primitives?) but NFS tends to require fewer resources on the ZFS side to get good performance, since you can get a little more insight into the data in terms of "file handles" versus "retrieve this block, write that block."

I'm on the fence on how I should use the NVMe drive, or if I should use it at all. My initial thought was to use it as a dedicated SLOG device. I understand that SLOG devices only really help in synchronous (as opposed to asynchronous) write operations, and that NFS & VMWare relies heavily (if not entirely) on such synchronous writes. What's I'm not sure is if that applies or not to VMWare over iSCSI. Regardless of if a SLOG device will be helpful in this case or not, the current consensus seems to be that SLOG devices do not need to be mirrored and present very little risk should they fail.

Unfortunately neither the WD or the MyDigitalSSD are likely to be good/viable SLOG devices. SLOG writes are a very specific performance profile that most consumer-grade drives simply weren't engineered to deliver (low latency at single queue depth) - you'll want to check into the thread in my signature for more details as well as suggested drives, or just take the shortcut of "buy an Optane 900p/P3700 and be happy."

Regarding NFS vs iSCSI and sync writes - VMware issues all NFS writes as synchronous by default, and all iSCSI writes as asynchronous by default. This is why people often say "wow iSCSI is so much faster than NFS!" - yes, because you're not comparing apples to apples. Enforce sync=always on your iSCSI ZVOL and performance will be very similar. If you aren't concerned with sync writes, then you probably don't need an SLOG at all. Just some very robust backups and a tolerance for data loss.

Mirroring SLOG is not required, but it's another level of risk reduction. With a single SLOG, you open yourself up to the very small but possible risk of data loss, if you have an unexpected shutdown/kernel panic/catastrophic SAS failure and at the exact same time have your single SLOG device fail. With a mirrored SLOG, you'd have to have that panic and two SLOG devices fail. We're talking about "struck by lightning" odds for most people, but for others (enterprise users, who might be better off with a TrueNAS box and a support contract) it is something worth the extra money to insure against.
 

SamM

Dabbler
Joined
May 29, 2017
Messages
39
Thanks for that HoneyBadger!

I'll try the WD Black again and try to snap a screenshot of the resulting crash when I get the chance (might be a few days...).

(From article in signature) ESXi is particularly rough on this because its NFS implementation defaults to all writes being sync writes as it cannot differentiate from important file system writes inside a VM and some file update that may not be important. This conservatism is what will protect your VMs from corruption if something goes wrong. For this reason, it is typically not a good idea to try to make writes asynchronous.

This makes sense.

VMware issues all NFS writes as synchronous by default, and all iSCSI writes as asynchronous by default.

But this strikes me as odd when compared to the first quote. I don't doubt what was said, but why would VMWare treat NFS with kid-gloves yet not iSCSI when the concern for write integrity seems to apply to both? Does VMWare just assume iSCSI storage systems are significantly less likely to have (writing-related) problems than NFS?
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
But this strikes me as odd when compared to the first quote. I don't doubt what was said, but why would VMWare treat NFS with kid-gloves yet not iSCSI when the concern for write integrity seems to apply to both? Does VMWare just assume iSCSI storage systems are significantly less likely to have (writing-related) problems than NFS?

The problem is people assume it's VMware, it's not. It's written into the NFS RFC's as a hard requirement, and has been all the way back to at least NFSv2.

NFS v2 - RFC 1094 section 2.2:
All of the procedures in the NFS protocol are assumed to be
synchronous. When a procedure returns to the client, the client can
assume that the operation has completed and any data associated with
the request is now on stable storage.

It's actually the reason for the original Sun Microsystems Prestoserve NFS accelerator product.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Why would VMWare treat NFS with kid-gloves yet not iSCSI when the concern for write integrity seems to apply to both? Does VMWare just assume iSCSI storage systems are significantly less likely to have (writing-related) problems than NFS?

NFS requires the sync write as part of the protocol (although I believe you can override it at the client-side, if that client isn't VMware) whereas iSCSI doesn't. There's probably also some assumption on VMware's part that an "iSCSI SAN solution" will implement a non-volatile write cache, or equivalent, which you configure properly on FreeNAS by setting sync=always on your ZVOLs.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
NFS requires the sync write as part of the protocol (although I believe you can override it at the client-side, if that client isn't VMware) whereas iSCSI doesn't. There's probably also some assumption on VMware's part that an "iSCSI SAN solution" will implement a non-volatile write cache, or equivalent, which you configure properly on FreeNAS by setting sync=always on your ZVOLs.

You can override it on the server side. The client can't fabricate the acknowledgement response packet from the server.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
You can override it on the server side. The client can't fabricate the acknowledgement response packet from the server.
Right, client-side NFS sync setting is something different.

Either way, for VM storage, sync writes are the order of the day.
 

SamM

Dabbler
Joined
May 29, 2017
Messages
39
Thanks fellas,

So my current FreeNAS with the mechanical drives is pretty default, and serving ESXi via iSCSI. That said, it sounds like it's safe at assume that it's not writing everything in synchronous mode. So in my paranoia, I'll probably set the whole pool to sync=always on the new (SSD based) FreeNAS box; and if I do the SLOG device, get that (eventually) mirrored as well. A second SLOG, along with more RAM, will be later down the road when I have a bit of spare cash/parts.

That said, what do y'all think about no SLOG (single or mirrored) with an all SSD disk pool (as mentioned in the opening post) versus that same array with a SLOG setup; assuming I set sync=always? If the NVMe device(s) I have will not make for a good SLOG device, and since the pool is SSD's anyways (as opposed to mechanical), then adding this SLOG will make a fairly lackluster (if any at all) performance boost right? If that's the case, the the only real reason to add the SLOG despite that would be to reduce (overall) writes to the SSD array (thus theoretically extending the life of said SSD's) since the ZIL writes will go to the SLOG device and then to the SSD pool as opposed to going to the SSD pool at least twice; is that right?

Thanks again!
-Sam
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
That said, what do y'all think about no SLOG (single or mirrored) with an all SSD disk pool (as mentioned in the opening post) versus that same array with a SLOG setup; assuming I set sync=always?

SSD's are a bit of a game changer. In order for the SLOG to make a difference, it has to be faster than the pool itself. You're going to be hard pressed to pull that off with a single device.
 
Top