ESXi Storage

Status
Not open for further replies.

justgosh

Dabbler
Joined
May 23, 2013
Messages
11
I started by building a system to test various applications. My current configuration has evolved over the past few years, a few applications have been kept for daily use but for the most part it's a test environment.
As I've been building, some issues that I didn't understand initially have been created. I would like to work the issues out and avoid anything that it's obvious to someone with more experience. I'm hosting several VMs but have been running into throughput issues despite their low utilization. I had iSCSI performance issues so I'm currently using NFS and CIFS (occasional file management).

I'm currently running 5 WD Reds in Z1 which I will add a drive to and migrate to Z2. I've added SSD log and cache drives, however, I didn't use a PLP drive. Additionally, the SSD drives selected do not have the wear tolerance required to sustain the configuration. Finally, I'm adding a 10GbE network to free up some switch ports.

ESXi VMs
AD/DHCP/DNS
Ubuntu 16.04 Gitlab
Plex/Sonarr/DVR1
Plex/DVR2
Ubuntu 16.04 Docker app server
MacOS test 1
Windows 10 test 1
vCenter appliance
Fonality appliance

Supermicro X8SIL - Intel Xeon X3450 - KVR1333D3Q8R9S/8G 32GB ECC RAM - M1015 - 6 x 3TB WD Red - X540-T2
I also have a few Samsung 850 Pro 250GB and Sandisk Ultra 240GB that I'm not sure how to use in this setup. I have room for 2 more SATA drives on the controller.

I'm looking at adding a PLP drive. I'm looking for the best way to proceed and considering a DC P3700 or similar but thought I would ask for suggestions before I spend any money.

edit: typo on HBA model
 
Last edited:

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
5 WD Reds in Z1 which I will add a drive to and migrate to Z2.
Skip the raidZ2 and just do striped mirrors. VMs end up using a fair bit of random IO and you need more vdevs to better support that IO. The drive used as a SLOG is fine. The odds of needing a PLP when using a UPS configured to safely shutdown your VMs are slim to none. As for wear on the SSD, keep in mind its never using more than 1GB at time and therefore can wear level across the entire drive. Don't quote me but you may want to give iSCSI a closer look down the road as VAAI support may be limited in upcoming versions of FreeNAS.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Also with only 32GB of RAM your likely better off without the L2ARC. You can use the SSD as vFlash read cache in your host and configure your VMs to use it.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The drive used as a SLOG is fine. The odds of needing a PLP when using a UPS configured to safely shutdown your VMs are slim to none.

If you're not going to do it right, then just skip the SLOG entirely. It just slows everything down without giving the guarantee that the SLOG exists to provide. Turn on async writes and call it a day, and if the NAS panics or loses power, just deal with the potential corruption. Your writes will be a lot faster just turning on async writes.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Also the M5014 is not usable as an HBA. Substitute in an actual HBA in IT mode.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
If you're not going to do it right, then just skip the SLOG entirely. It just slows everything down without giving the guarantee that the SLOG exists to provide. Turn on async writes and call it a day, and if the NAS panics or loses power, just deal with the potential corruption. Your writes will be a lot faster just turning on async writes.
Frequent snapshots can be a lifesaver here.
 

justgosh

Dabbler
Joined
May 23, 2013
Messages
11
You can use the SSD as vFlash read cache in your host and configure your VMs to use it.
Thanks, checking it out on Yellowbricks. What happens in a power failure?

VMs end up using a fair bit of random IO and you need more vdevs to better support that IO.
Because these VMs are mostly used for testing, the I/O is pretty low at idle https://imgur.com/a/DIocbBn When stuff reboots or updates it tends to take more time than I would prefer. I think it's coming down to bandwidth and I think the vFlash suggestion will reduce IO even more if I'm reading that correctly.

Also the M5014 is not usable as an HBA. Substitute in an actual HBA in IT mode.
Nice catch on the typo. It's an M1015. I recall cross flashing to IT mode for use as an HBA.

If you're not going to do it right, then just skip the SLOG entirely. It just slows everything down without giving the guarantee that the SLOG exists to provide. Turn on async writes and call it a day, and if the NAS panics or loses power, just deal with the potential corruption. Your writes will be a lot faster just turning on async writes.
This sounds like more of a headache over the next 5 years than spending $300 for a SLOG device unless I'm missing something.
 
Last edited:

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
What happens in a power failure?
The read cache is lost and is refilled with normal use. The VM never writes data there.
Because these VMs are mostly used for testing, the I/O is pretty low at idle
If only we could design for idea spec and still have it perform.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Nice catch on the typo. It's an M1015. I recall cross flashing to IT mode for use as an HBA.

Fine then.

This sounds like more of a headache over the next 5 years than spending $300 for a SLOG device unless I'm missing something.

https://forums.freenas.org/index.php?threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

The thing that the ZIL is trying to fix - the ONLY thing that the ZIL is trying to fix - is the situation where your NAS panicks or loses power. If - and only if - the hypervisor continues to run, and the VM goes into a holding state waiting on some incomplete disk I/O, here's the problem:

VM writes data to its virtual disk
NAS receives that request but doesn't actually get it committed to disk
NAS crashes, panicks, has other data-tastrophe, and comes back online having lost those "in-RAM" changes
VM now thinks some data has been written to disk but if it were to read those disk blocks would find "old" data there.

This could be a problem if the write was updating file data, filesystem metadata such as a directory, etc.

Without the ZIL, FreeNAS and ZFS will handle it just fine if you were to gracefully shut down the NAS, even with VM's running. iSCSI or NFS will flush the data to ZFS which will flush the data to disk. That's all as should be. The guest VM might freak out and might crash, but that's not a solvable problem within these subsystems.

The purpose of the ZIL is to make things also work right if there's some sort of non-graceful event (crash, power loss, etc). But if there's a power loss, you probably also lose your hypervisor, so not an issue. So the ZIL is really mostly protecting from the case where the NAS crashes and reboots, losing data in-RAM as it does so.

Now, when the NAS crashes but the hypervisor keeps running, and the VM survives the event, you now have a problem. You could potentially lose some of the writes to the VM disk. In some cases, that's a big fat shrug - a reboot and fsck can take care of it. But if you're writing things like database files, it could really suck.

So how often does your NAS crash? And how important are the things your VM's are writing to disk? These questions are useful in guiding whether or not you really need to worry about it. If your VM's are not too important and you've never had your NAS crash, is it worth $300 and a significant drop in write performance to add the SLOG?

Obviously I cannot answer that question for you.
 
Status
Not open for further replies.
Top