Is RAID10 / mirrored vdevs still the disk layout to choose at all flahs storage servers for esxi??

Scampicfx

Contributor
Joined
Jul 4, 2016
Messages
125
Dear Community,

I used to learn that a RAID10, or in other words "multiple mirrored vdevs", was the disk layout to choose for a zvol at a ZFS-server for deployments as ESXi datastore.

Now, time has to come to shift from hybrid pools to all flash storage pools. While I was in contact with one hardware vendor, he told that nowadays, a lot of all flahs pools (NetApp, EMC, etc.) are based on RAID5 / RAID6 techniques.

I would like to ask if there are people in here who operate an all flash RAIDZ1 or RAIDZ2 zvol as ESXi datastore? How good / bad was your experience with parity zvols at all flash pools?

From cost-perspective, it would be great to do multiple 6 disk raidz2 pools instead of mirrored vdevs. But well, when performance is back to the levels of hybrid pools, then this disk layout doesn't make any sense...

To be honest, there was one thing I didn't like about mirrored vdevs with 2 disks in them: in case of disk failure there was no redundancy left! A RAIDZ2 would solve this issue (Yes I know there are people who operate 3 disks in each mirrored vdevs ;)).
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Generally speaking, you're looking for IOPS (rather than throughput) when you're delivering block storage to ESXi, in part due to the small sizes of the blocks and in part due to the nature of the guest VM OS interactions with their OS disks or applications and their IOPS requirements.

Since a spinning HDD usually delivers something in the region of 100-200 IOPS and a RAIDZ pool delivers the IOPS of only one of its member disks, your IOPS (and hence your block storage) aren't going to cut it.

With SSDs, you can expect something from 10K to 500K IOPS (depending on what device you bought), so you can see how having 4 or more of them in RAIDZ would possibly be fine for a number of scenarios where your total IOPS requirement is served within that range.

If you were going to expect a huge number of IOPS based on your guest VM workload (like running a bunch of database servers), there may still be a lot of sense in running Mirrored VDEVs in order to boost the IOPS of the pool.

To be honest, there was one thing I didn't like about mirrored vdevs with 2 disks in them: in case of disk failure there was no redundancy left! A RAIDZ2 would solve this issue (Yes I know there are people who operate 3 disks in each mirrored vdevs ;)).
If that bothered you, maybe it would be helpful to know you can add the same disk as a spare to multiple pools or, when added to one pool, it can spare across all VDEVs (which in the case of mirrors, should normally be sufficient protection without requiring 3-way mirrors for all VDEVs).
 

Scampicfx

Contributor
Joined
Jul 4, 2016
Messages
125
Hey sretalla,
thanks for this well written posting and your comparison in there. This helped me quite a lot. So an all-flash RAIDZ2 would be - spoken in general terms - faster than an hybrid mirrored vdev pool - but of course, the best performance will only be possible with mirrored vdevs.

If that bothered you, maybe it would be helpful to know you can add the same disk as a spare to multiple pools or, when added to one pool, it can spare across all VDEVs (which in the case of mirrors, should normally be sufficient protection without requiring 3-way mirrors for all VDEVs).
That was helpful! I will think about this possible setup! :)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
A lot of other vendors have gone with parity RAID levels to drive more "usable capacity" and are counting on the underlying drive performance to be "good enough" that it doesn't matter.

So an all-flash RAIDZ2 would be - spoken in general terms - faster than an hybrid mirrored vdev pool - but of course, the best performance will only be possible with mirrored vdevs.

Correct. To quote/edit an old post of mine on the subject, RAIDZ on SSD tends to be "fast enough" for general use cases.

VM storage tends to create a lot of 4K and 8K writes, which RAIDZ doesn't do the best job with from a space-efficiency perspective. The wider and more redundant your vdev is, the worse it gets - so your 3-wide Z1 is probably okay, but someone with a 6-wide Z2 will wonder why they're getting the efficiency of a mirror3.

For the most part, SSDs do get to club the fragmentation-related performance with raw speed and clear the "fast enough" hurdle for general users. Mirrors will still be overall faster.

I need to set up a test environment to compare and put some numbers against this (eg: 4 SSDs, compare their performance and space-efficiency in both 2x2-mirror and 4-wide Z1, or 6 and compare in 3x2-mirror, 2x3-wide Z1, 1x6-wide Z2) (Edit - I still haven't done this. Shame the Badger.)

But to answer your questions:

1) Is there a significant difference in CPU load between the above setups?
No, the impact of mirror vs RAIDZ and RAIDZ width in terms of CPU utilization is minor. Things like your compression algorithm or encryption settings will have a far bigger impact.

2) Granted all SSDs provide enough IOPS, does it matter as much to choose mirrors over RaidZ(x) for VM storage as the traditional HDD advice?
SSD means it doesn't matter "as much" but it still matters. Mirrors will still be faster than RAIDZ. If the IOPS of a single SSD is sufficient for your use case, and space-inefficiency of RAIDZ on small blocks is tolerable, then you might get some greater usable space from RAIDZ.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Be aware that parity allocation on RAIDZ may not end up working the way that you expect, and can consume excessive storage space in many cases.
 

Scampicfx

Contributor
Joined
Jul 4, 2016
Messages
125
jgreco, do you have an example for that? What exactly should I lookout for?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hopefully not going to mangle this explanation too much, but here goes.

When you store data in a parity RAIDZ, you expect or hope that you receive storage efficiency in line with your vdev configuration - eg, if you've created a 6-wide RAIDZ2 (4+2) vdev, the hope is that you receive 4 drives of usable space out of those six. But for smaller blocks, this is less likely, because of how parity, record/volblocksize, ashift (sector size) and compression interact.

Let's say you create a 6-wide Z2, using recommended ashift=12 yielding 4K as a minimum allocation unit, and you create a ZVOL with a 16KB volblocksize to hold your VMs.

If you happen to write 16K, and it doesn't compress at all, then it will spread across all of the vdev members nicely - 16K/4K = 4 drives, and add two for parity. 66% space efficiency from raw, you're happy.

But now you write 16K, and it compresses small enough to fit into 12K. You end up writing to 3 drives of data and 2 drives of parity - but hold on, RAIDZ also requires each allocation to be in multiples of P+1 (number of parity drives, in your case "2") so you also end up with one drive holding padding data to make the total number of sectors fit the allocation rule of "must be a multiple of 3." 50% space efficiency. So you've ended up with those six drives only yielding 3 drives of "usable space" - the same cost as a 3x2-way mirror vdev, but with significantly worse random I/O.

If if compresses even better - say, 8K - you're going to write two sectors of data, and two of parity. But to meet the "must be multiple of 3" rule - add two sectors of padding. 33% efficiency. Now you're using six spindles and getting 2 drives of "usable space" - the same as a mirror3 but with worse IOPS and worse redundancy.

Short version is that RAIDZ allocation rules and the resulting padding tends to eat up any space you think you were going to save from going RAIDZ.

Compare the logical space used vs. the physical space allocated. You might find that it's consuming the same as a mirror, and mirrors will definitely outstrip it from a performance perspective.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
jgreco, do you have an example for that? What exactly should I lookout for?

@HoneyBadger did a great job of explaining the general problem, although I think that some even worse examples have been laid out in the past. If you don't actually do the work to figure out what is going on underneath everything with parity and space allocation, you can get into a situation that is incredibly suboptimal. And of course there's the performance angle, RAIDZ is generally going to suck compared to mirrors.
 

Scampicfx

Contributor
Joined
Jul 4, 2016
Messages
125
Dear Gents, thank you for your answers and all your helpful information! :)
 
Top