Rebuilding due to poor performance, help me get it right this time?

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
My server is a dell R720xd, dual E5-2637 v2 @ 3.50GHz with 256GB RAM. It has 12 x 4TB SAS drives and 2 further 2.5" bays in the rear, empty at the moment. I have 2 x 10GbE DACs to a Unifi 10GbE switch, and 2 ESXi 6.7 hosts are attached in the same way to the switch.

My main goal is pure storage performance for iSCSI, with protection of data of course. I previously didn't use multi-pathing for iSCSI but I plan to this time. For comparison I have the ESXi hosts linked over 1GbE to an old QNAP NAS with a couple of iSCSI extents and I get better speeds out of that compared to 10GbE with Free/TrueNAS. I've read and watched loads of youtube videos on Free/TrueNAS and on here, and sometimes I see or hear conflicting information or recommendations that applied on older versions and aren't required or as valid these days due to improvements in ZFS.

I have 2 x 2.5" bays I can use for SSDs for SLOG, ZIL or cache, and can also add an nvme drive for one of these functions though I read more RAM is better than SSDs and 256GB should be way more than enough. I had my drives set up in 1 large Z2 pool though I've read it might be better to split these into 6 x 2 drive mirrors, is that best? Open to suggestions from disk layout, block sizes, multi-pathing, tuneables etc.

Below are the autotunes, are these valid, should they be changed/ditched or new ones added?

tuneables.png
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Mirrors are generally recommended for block storage. In a large config like yours, striped/multi-vdev mirrors would be much faster. It really comes down to being able to do more than 1 IOP at a time. When you write to a RAIDz vdev, that I/O event has to issue to every device in the vdev and complete before the write is acknowledged as complete. When you read, you have to issue I/O to roughly 2 out of 3 devices, and perform some math in order to provide the data to the requesting application. In effect, the large RAIDz pool you built likely has the write performance of a single disk, and the read performance of perhaps two or three disks.

In a mirror config, the write IOP's issue in parallel to each mirror half. The read IOP's issue round robin. Each disk has a complete copy of the data, so each can go do some of the work, and return data in parallel. Read performance in a simple 2 disk mirror will usually be 2x the performance of each component disk device. When you add multiple mirror vdev's, you start getting parallel write IOPS. A chunk gets issued to the first mirror pair, and ZFS then walks thru the vdev's until each pair has some work to perform. You can vary the mirror config further to enhance read performance by adding additional mirror components. A 3-way mirror vdev will have 3x the read performance and 1x write performance, a 2 vdev 3-way mirror will have 6x read & 2x write, etc...

If you can't afford the 50% space hit to configure a mirror vdev, then configure multi-vdev RAIDz2. Create a 4 disk RAIDz2, then add two more 4 disk RAIDz2 vdev to it. This will give you a performance multiplier in that ZFS can round robin between 3 vdev's, and you'll have the redundancy of being able to suffer 2 drive failures per vdev without losing data. You could configure multi-vdev RAIDz1, but that isn't a recommended config with 4Tb disks.
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
Wouldn't three 4 disk RaidZ2 vdevs still give you a 50% storage hit? What about two 6 disk RaidZ2 as a better compromise between storage and resiliency?
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Wouldn't three 4 disk RaidZ2 vdevs still give you a 50% storage hit? What about two 6 disk RaidZ2 as a better compromise between storage and resiliency?

The original post's stated purpose is high performance iSCSI, and 3 vdevs will perform better than two. I otherwise confess to not spending much time doing the RAID calculations, you are indeed correct about the 50% penalty.

I recommend some kind of mirror config for iSCSI block storage.
 

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
Thanks for the replies, I'll probably go with 6 x 2 drive mirrored ldevs and see how I get on as it seems whatever scenario will result in 50% useable space.. I'll also try with another 10G intel NIC instead of using the onboard one but didn't see any indication the current card was reporting issues.

edit: Looks like if I created 3 x [4 drive mirror] this would result in 6TB useable space total which is too much to lose out of ~48TB disk capacity. 3 x [4 drive z2] would give 12TB useable which is still waste a lot of space but result in much better performance? If I did 4 x [3 drive z1] then I gain another 4TB useable but obviously carry more risk on losing multiple drives. A lot to think about.

What about SLOG. cache etc, is it worth adding something? I also run some jails (syncthing and others) and have CIFS shares, and run rsync from QNAP nas.
 
Last edited:

LeDiable

Dabbler
Joined
May 6, 2020
Messages
36
3 x [4 drive z2] would give 12TB useable
I'm the furthest thing from an expert, but you may have miscalculated the total capacity of the 3 x 4drive z2 setup, it should be 24TB.
One z2 vdev = (# drives -2) * (smallest drive's capacity). So, in your case, vdev = (4 drives -2) * 4TB = 2 * 4TB = 8TB. If you do 3 z2 vdevs, you'd have 3 * 8TB = 24TB.
But, from other threads, it always sounds like 2 drive mirrors are best for performance, and a 6 x 2drive mirror pool would also have about 24TB capacity, if you can accept the higher risk factor.

Again, not an expert here, just parroting things I've read in other threads.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Do a "data budget". 10GbE ethernet is going to top out around 700 Mb/sec., maybe a little faster. What's the read and write rate of each drive? Say 150Mb/sec sustained, just for discussion purposes... 700 / 150 = 4.6 devices working in concert to fill your 10GbE pipe. Hence 6 vdev 2 * 4Tb mirror, with 24Tb capacity. If you go RAIDz2, you may get close on read performance, but miss on write performance.
 

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
Thanks, that is what I expected as well, but I did a test inside a VM and this is what it shows, I was surprised and can't understand why it's doing that. Obviously it's showing 4GB disks, that isn't an error the drives are only 4GB thin provisioned in the VM to immitate 4TB, so would scale up. After creating the pool it did indeed have 10.56 GB (TB) free.
 

Attachments

  • test_pool.png
    test_pool.png
    373.3 KB · Views: 189

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
I'm the furthest thing from an expert, but you may have miscalculated the total capacity of the 3 x 4drive z2 setup, it should be 24TB.
One z2 vdev = (# drives -2) * (smallest drive's capacity). So, in your case, vdev = (4 drives -2) * 4TB = 2 * 4TB = 8TB. If you do 3 z2 vdevs, you'd have 3 * 8TB = 24TB.
But, from other threads, it always sounds like 2 drive mirrors are best for performance, and a 6 x 2drive mirror pool would also have about 24TB capacity, if you can accept the higher risk factor.

Again, not an expert here, just parroting things I've read in other threads.
Assuming your 4Tb drives...

Capacity:
If you were to set up all your vdevs as mirrors, each mirror would have approximately 4Tb raw capacity and six mirrors would yield 24Tb raw capacity. A 4 disk RaidZ2 vdev will have about 8Tb raw capacity, therefore three 4XRaidZ2 vdevs will also yield 24Tb capacity. A 6 disk RaidZ2 will give 16Tb raw capacity, therefore two 6XRaidZ2 vdevs will yield 32Tb capacity. At the end of the day, from the options above you get the most capacity from a pool composed of two 6 disc RaidZ2 vdevs.

I/O:
Mirrors will give you the best write performance. Read performance will also be better with mirrors, but the difference in performance of reads is less pronounced than it is with writes. This is a generalization as file size and other factors will also play a role in performance.

Resilience:
Mirrors have 1 disk resilience. If you lose one disk in a mirror, then you lose your redundancy and data could be at risk. With RaidZ2, you can lose two disks before data is at risk. If you have a large pool made up of mirrors, and two disks in the same mirror fail, then you loose the pool. Therefore, I would argue that a bunch of mirrors is not safer than a larger RaidZ2 pool. Yes, it is true that if you lose two disks from separate mirrors then you can still recover. But with RaidZ2, you can lose any two disks.

There really is no single "best" way to configure a pool. At the end of the day, it will be a trade-off of capacity versus I/O versus resilience. Of course, you can always configure two pools - one oriented more to write performance and the other oriented more to resilience.
 

Zak95

Dabbler
Joined
Mar 4, 2018
Messages
20
Thanks guys, really appreciate all the replies, all very helpful.

Did you look at the latest picture attachment? I also thought 4 x 4TB z2 would mean 8TB useable, multiply that by 3 however when I test on a VM just to try out various layouts it offers less than half of that space, I can't figure that out. I am definitely not running freenas in a vm, I am just using it to test configs without harming my existing setup.

**edit** learned something about testing inside a VM, if it thinks the drives are really small it clearly doesn't like that, putting them to 4tb and trying the layout does indeed give the expected size.

test2.png


Sorry to ask one more question though, what about the cache or SLOG? I can add SSDs and or an nvme drive, even 2 if required.

Thanks.
 
Last edited:

rvassar

Guru
Joined
May 2, 2018
Messages
972
Sorry to ask one more question though, what about the cache or SLOG? I can add SSDs and or an nvme drive, even 2 if required.

Cache ala L2ARC... Always max out your system RAM first. RAM is the primary ARC or "L1ARC" if you want to think of it that way, and much much faster. Adding L2ARC actually adds some overhead too.

SLOG will only help with O_SYNC writes, and your pool will never be faster than the SLOG device. I'm guessing since this is Dell 12G the NVMe device is PCIe, that would probably work well with iSCSI, once you tuned it and dial'ed it in.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Would it be an option to run the VMs off of local storage on the ESXi hosts and do frequent backups to the NAS? This is what I came up with for myself, having done the capacity/performance/redundancy exercise.
 
Top