Yes Virginia, lots of mirrored vdevs do work

Joined
Dec 29, 2014
Messages
1,135
I know seen it multiple times from @jgreco and @HoneyBadger among others, but I had never gotten around to testing it. Yes, you were all right that a bunch of mirrors do perform better. I went with a real world test for me which is migrating my 4 productions VM's from local storage in the ESXi server to a FreeNAS NFS share and back.
1628961642646.png

This is my external array with 16 3.5" drives. Before it was 2 x 8 disk RAIDZ2 vdevs with an Optane SLOG. Now it is 8 x 2 disk mirrors and the same Optane SLOG. I could get close to the same numbers reading off the the Z2 vdevs, but the writes were around 2G with peaks at around 3.8G. Now the writes were consistently around 4G with peaks around 5.2G. Now i have to debate if I want to destroy and redo all the storage in my primary FreeNAS.

Towards that, would you expect to see similar numbers in the primary system? The drives in the external array are 6G SAS drives at 7200rpm. The ones in the primary are 2.5" 6G SATA 5400rpm drives. Just curious
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
I know seen it multiple times from @jgreco and @HoneyBadger among others, but I had never gotten around to testing it. Yes, you were all right that a bunch of mirrors do perform better.

eyeroll :smile:

Towards that, would you expect to see similar numbers in the primary system? The drives in the external array are 6G SAS drives at 7200rpm. The ones in the primary are 2.5" 6G SATA 5400rpm drives. Just curious

I always hesitate to predict performance because so much of it is dependent on complex factors. It absolutely should perform better. Depending on the RAIDZ2 config, performance may degrade less over time with the mirrors, especially if you take care optimizing record sizes etc on your new layout. Whether or not the numbers are "similar" (proportionally? etc) ... harder to say, but still worth doing IMHO.
 
Joined
Dec 29, 2014
Messages
1,135
especially if you take care optimizing record sizes etc on your new layout.
That brings me to a question about record size. I was just searching the resources and forum post for guidance on record size, but I didn't find any clear answers. This is what the pool looks like in my primary FN now.
1629039771193.png

I do have the different kinds of shares broken up into different data sets. I don't know if I can (or should) use different record sizes in different data sets. The VMWare data set is exactly what it says. I do also keep my store of ISO images in there too. That means the bulk of that data set is some pretty large files. The ISO data set is for some cases where the BMC with NFS mount an images. Not a whole lot in there as you can see, and the ftp data set is the same. The CIFS-I data set is more of a mixed bag. File sizes vary pretty wildly in that. In particular, how would you determine the ideal record size for a pool/data set that is NFS sharing for VM storage from ESXi?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
I know seen it multiple times from @jgreco and @HoneyBadger among others, but I had never gotten around to testing it. Yes, you were all right that a bunch of mirrors do perform better.

I wasn't saying it just for giggles. :grin:

Towards that, would you expect to see similar numbers in the primary system? The drives in the external array are 6G SAS drives at 7200rpm. The ones in the primary are 2.5" 6G SATA 5400rpm drives. Just curious

Absolutely, it won't be as brisk as the 7200rpm drives just on pure physics but comparatively speaking changing vdev geometry from 2x8-wide Z2 to 8x2-way mirror should be a huge uplift in general VMware performance. Rule still applies even up to the SSD level, relative performance is better in mirrors but objectively an SSD RAIDZ pool will likely still beat out mirrored spinning disks because their baseline per-disk and per-vdev IOPS/bandwidth is superior.

What you also should look at in a before-after scenario is not just the bulk copy of svMotion (which is a sequential, larger blocksize workload) but also your datastore and VM latencies, both the average and peaks. I'd predict a lower and more consistent result with some of the nastier 99th-percentile-and-beyond spikes chopped way down from their heights. From a ZFS perspective also look at the physically used space (actual size on disk) compared before and after. Since you were using NFS the default recordsize of 128K might have allowed for "wider stripes" so to speak on your vdevs but also "worse performance" - I'd be curious to see where it lands.

I've used Sexigraf as a easy-to-deploy Grafana instance in the past for this. Pretty sure it can hook into standlone ESXi hosts as well as vCenter.

In particular, how would you determine the ideal record size for a pool/data set that is NFS sharing for VM storage from ESXi?

Depends on the VMs themselves. For Windows and general purpose I've had a lot of success with 32KB recordsize. This is the Goldilocks zone between big records that can make gains from compression (LZ4 or ZSTD) and benefit from the sequential speed for svMotion/XCOPY, but not too big to incur a huge penalty on read-modify-write. For transactional DBs smaller (like the default 16K volblocksize on ZVOLs) can be beneficial for latencies, and if your DB/application already compresses its records then there's nothing lost by having ZFS miss out on the inline size.
 
  • Like
Reactions: Lix
Joined
Dec 29, 2014
Messages
1,135
I wasn't saying it just for giggles.
It never occurred to me that you were. I had just never taken the time to try and set it up, and had the extra hardware to be able to tear a pool down and rebuild it from scratch.
Depends on the VMs themselves. For Windows and general purpose I've had a lot of success with 32KB recordsize.
I have 4 VM's that run constantly. My FreeBSD mail server, Vcenter, Powerchute VM, and a virtual ASA firewall. The rest of the stuff is lab VM's. There are a few windows VM's, but I don't run them frequently. I have a bunch of Cisco voice VM's that I use for a lab (CUCM, UCCX IM&P, Unity Connection), but those also run infrequently, and performance isn't a big deal when I do. I am just experimenting with things in a non-production environment. The FreeBSD box does have MariaDB on it to support a Wordpress site, but it REALLY LOW volume.

The CIFS share is mostly archival stuff. At some point I will probably have my music on there too. Right now a I have a Netgear Readynas (really old) that runs a built in instance of Logitech Media Server. I have several Logitech players that I like, so that is why I have stayed with that over Plex. At some point that will die, so I have already started building an Ubuntu server with LMS. At that point I would put the actual media files on the FreeNAS. I would CIFS shares since I also access them from PC's in the internal network.

Overall none of my needs are real high. The whole thing is mostly "because I can". I have also found it helpful to experiment with stuff at home prior to implementing it in a live customer network.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
It never occurred to me that you were. I had just never taken the time to try and set it up, and had the extra hardware to be able to tear a pool down and rebuild it from scratch.

And I have to scold myself for my own laziness at having said hardware (although a lack of time) to build/tear down/demonstrate this empirically for people here, although it's been done with synthetics here: https://arstechnica.com/gadgets/202...ht-ironwolf-disks-two-filesystems-one-winner/

I have 4 VM's that run constantly. My FreeBSD mail server, Vcenter, Powerchute VM, and a virtual ASA firewall. The rest of the stuff is lab VM's. There are a few windows VM's, but I don't run them frequently. I have a bunch of Cisco voice VM's that I use for a lab (CUCM, UCCX IM&P, Unity Connection), but those also run infrequently, and performance isn't a big deal when I do. I am just experimenting with things in a non-production environment. The FreeBSD box does have MariaDB on it to support a Wordpress site, but it REALLY LOW volume.

None of that drives for high performance by your own admission, so you were likely able to "get away with" the Z2 setup easily. And honestly, you probably still could. But for higher volume or production things, as you mention, it's a good opportunity to experiment and quantify the results before implementing it for someone else.

The CIFS share is mostly archival stuff. At some point I will probably have my music on there too. Right now a I have a Netgear Readynas (really old) that runs a built in instance of Logitech Media Server. I have several Logitech players that I like, so that is why I have stayed with that over Plex. At some point that will die, so I have already started building an Ubuntu server with LMS. At that point I would put the actual media files on the FreeNAS. I would CIFS shares since I also access them from PC's in the internal network.

Media files benefit from large recordsizes, beyond even the default 128K - video files usually measure in the multi-GB range, so a 1M record is more than granular enough, and even photos and audio these days are multiple MBs, but then you get the situation of a file being 3.1MB and it needing that fourth 1M record (because a multi-record file gets all of its component records sized equally) for the last ~100K, and you lose some slack space. Swings and roundabouts. If you can keep your video/large media files separate from your "images/audio" then it's beneficial to adjust the settings there. General user files and documents do well with the 128K defaults.

Overall none of my needs are real high. The whole thing is mostly "because I can". I have also found it helpful to experiment with stuff at home prior to implementing it in a live customer network.
As Cave Johnson said, "Science isn't about WHY - it's about WHY NOT."
 
Joined
Dec 29, 2014
Messages
1,135
The pool in my primary FN has now been rebuilt as 8 vdevs of 2 drive mirrors. Below are the results of several successive Vmotion tasks.
1629115517692.png

Before the write were under 3Gb with some peaks near 4Gb. Now the rights are right around 4Gb with peaks close to 5Gb. Reads peaked a little over 17Gb which is slightly higher than before. I think I am happy with the results. The results were clearly better in the external array with higher rpm SAS drives, but I think this is pretty good. Now I am chewing on the notion of replacing the drives in the primary with some SSD's. I am sure that would be a big jump. I did stay with default record sizes because I never got a good feeling for where to set it. None of my needs are specialized enough for that to be a huge point of concern for me.
 
Top