New pool, 8 drives: 4 x Mirrored pairs vdevs OR 1 raidz3 vdev with 8 drives?

rigel

Dabbler
Joined
Apr 5, 2023
Messages
19
I'm about to start my TrueNAS adventure and apart from some hardware and cache related questions my biggest headache right now is ZFS pools and vdevs layouts. I have noticed in signatures of several experienced (by tenure and number of posts) members of this forum that they use ZFS pools with multiple mirrored pairs vdevs. Therefore I'm curious, is it really the best layout for my use case?

So basically I have some beefy server with 16 cores and 256GB registered ECC DDR4 over 10 GbE network and want to store some:
1) Personal + work/business document files (Frequently used on a daily basis, critical, I do not want to lose)
2) Personal photos/videos backed up from devices of all my family members (Weekly dumps, infrequent reads, critical, I don't want to lose)
3) Plex media server with 1080p movies and tv shows (sequential reads of big files, non-critical, I will be fine if I lose this data)
4) SMB and NFS shares for VMs I create in my hypervisor (random read/writes, non-critical, will be fine if I lose this data)

So I have 8 new drives 16tb each now and thinking about 3 layouts:
A) 1 x raidz3 VDEV with all 8 drives (storage capacity of 5 drives, 3 for parity)
B) 2 x raidz2 VDEV with all 4 drives in each (storage capacity of 4 drives, 4 for parity)
C) 4 x mirrored pairs with 2 drives in each pair (storage capacity of 4 drives, 4 for parity)

Along with that I will add a special VDEV in a 2 drives mirror for metadata. I found some Intel Optane drives with 10 DWPD over 5 years for this purpose. I will not add any L2ARC or SLOG since I feel like I have enough of RAM.

The goal is to get the best read/write performance over the 10GbE now and 25GbE in future. I have found some online raidz calculator tools, but they all say no matter what raidz layout I chose there will be no any write speed gain, only read speed gains are possible, is it true?

And also I do not quite understand, if for example 1 of my drives in one of the VDEVs will fail in each of the above A,B or C layouts, what has higher chances for a second drive failure during the rebuilding process? For me it seems like mirrored pairs are the most scary, since only 1 drives from that pair of a failed drive will do all the heavy lifting to rebuild data to a newly added drive. Whereas, in wider 8 drives or 4 drives VDEVs this load to rebuild a drive is distributed among several drives.

Mirrored pairs layout also sounds tempting because it is the easiest to scale, just add 2 drives every time your storage pool gets to a certain level of load. For example if it gets filled to about 50% of capacity I would add 2 drives. I do not want to wait til it gets to 70-90% load since then the load will not be evenly distributed among mirrored VDEVs and priority will be given to new empty drives thus affecting read/write performance.

Please correct me in my assumptions, since I feel like I could be wrong.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
For your use cases. 1,2 & 3 some form of RAIDZ is appropriate. Low IOPS, but it doesn't matter as you are not doing anything that requires significant IOPS. Z2 or Z3 (Z1 is contra-indicated due to rebuild times)

Its your use case 4 which wants higher IOPS - however the amount of ARC (memory) you have will mitigate that to some extent, depending on the size of the VM's. If you can run the VM's mostly out of ARC AND have a typical 80/20 Read/Write then IOPS won't matter than much. You have a lot of ARC and could stick with Z2/Z3
Your other point that you don't care much for the VM's also point to Z2/Z3 with sync=disabled on the VM Datasets but you do run a risk that a sudden unexpected / unplanned (by the hypervisor) power cycle of the NAS may corrupt the VM's due to loss of data in flight. This can be mitigated against by a decent backup regime (and reliable hardware). If you run sync=enabled then the data will be safe but writes will be slow.

What would I do? Probably run Z2/Z3 and include a single Optane 900p or better as a SLOG, running sync=enabled on the VM datasets. I probably wouldn't bother with the special vdev unless (as in my case) I had several hundred GB of small files that I wanted to put on the special vdev. If I had two suitable Optanes, run one as a SLOG (assuming home lab), with the other as a L2ARC (Metadata only). If you do the L2ARC (Metadata Only) from the beginning then all the metadata will be cached on the device anyway, and its not pool critical

What Optane's do you have?

As for number of vdevs - thats more difficult. More VDEV's = more parity disks = less available space = cheaper to add more vdevs. Its a balancing act really and partially comes down to how much space you think you will use in 5 years and just as importantly - how good is your backup?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
The following resource might interest you.

If you will be using metadata vdevs, you want their parity to match that of the pool (3 in case of option A) in order to not becoming its weak point.
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
FWIW, the combination of mirroring and checksums makes for solid data integrity checking/validation and allows for correction.

To that, even RAIDZ1 would have this same capability and you have all the added capacity and a bit less overhead than Z2 or Z3.

Another thing to consider, if you ever go to add more disks, you need to match topology; so, if you have an 8-disk vDEV, you need another 8 disk vDEV. WIth mirrored pairs, it's a whole lot easier. :)
 

rigel

Dabbler
Joined
Apr 5, 2023
Messages
19
For your use cases. 1,2 & 3 some form of RAIDZ is appropriate. Low IOPS, but it doesn't matter as you are not doing anything that requires significant IOPS. Z2 or Z3 (Z1 is contra-indicated due to rebuild times)
...
Thank you so much for your answer! I think I will try to arrange drives in 2 pools: one pool that is mostly for reads with regular spinning drives in RAIDZ2 or RAIDZ23 and another one for VMs that require high IOPs in mirrored pairs vdevs. I got 2 cheap 400GB Intel S3710 with 10 DWPD. Probably will buy 2 more so I have 2 mirrored pairs vdevs.

SLOG, L2ARC are still a little bit controversial in my mind. Recently I watched a video on this topic and Chris Peredun from IX Systems said that SLOG, L2ARC will compete with DDR RAM, and can make things even slower.

And yes, backup is whole another plan in the making. I'm just planning to buy some Synology NAS and run a Syncthing job to it.

For about 3 years I was using only Thunderbolt 3 DAS with 8 drives (OWC ThunderBay 8) in RAID1+0 attached directly to my mac mini connected to switch via 10GbE. Very limited setup and not happy with this software SoftRAID by OWC.
 

rigel

Dabbler
Joined
Apr 5, 2023
Messages
19
The following resource might interest you.

If you will be using metadata vdevs, you want their parity to match that of the pool (3 in case of option A) in order to not becoming its weak point.
Thank you so much for this document! I will try to read it soon and hopefully fully understand it. Thank you for the advise! I guess the more drives in the mirrored special vdev the better.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Thank you so much for this document! I will try to read it soon and hopefully fully understand it. Thank you for the advise! I guess the more drives in the mirrored special vdev the better.
They have to be the same as the pariry of the pool. ie, you have a pool of 2 disks in mirror, you need a 2 special vdevs in mirror; if you have 5 drives in Z2, you need 3 special vdevs in a mirror. But you can likely do the same thing as a special vdev in L2ARC without it being critical for the pool (and as such you always need only a single nvme).
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Actually - you don't have to have the same parity. You can use different parity levels - but how sensible that is comes down to what you are trying to achieve.

Data vdevs and special vdevs (metadata & dedupe) are pool critical. If you lose one of these vdevs, you lose the pool
L2ARC is non-critical - lose the vdev, ZFS don't care (performance may be effected, but data isn't)
Hot Spares are obviously non-critical
LOG's are also non-critical UNLESS they fail during an unexpected reboot, in which case the pool should survive BUT you may lose some data in flight
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Thank you so much for your answer! I think I will try to arrange drives in 2 pools: one pool that is mostly for reads with regular spinning drives in RAIDZ2 or RAIDZ23 and another one for VMs that require high IOPs in mirrored pairs vdevs. I got 2 cheap 400GB Intel S3710 with 10 DWPD. Probably will buy 2 more so I have 2 mirrored pairs vdevs.

SLOG, L2ARC are still a little bit controversial in my mind. Recently I watched a video on this topic and Chris Peredun from IX Systems said that SLOG, L2ARC will compete with DDR RAM, and can make things even slower.

And yes, backup is whole another plan in the making. I'm just planning to buy some Synology NAS and run a Syncthing job to it.

For about 3 years I was using only Thunderbolt 3 DAS with 8 drives (OWC ThunderBay 8) in RAID1+0 attached directly to my mac mini connected to switch via 10GbE. Very limited setup and not happy with this software SoftRAID by OWC.

That works and the 3710's are good drives with lots of endurance
L2ARC will definately compete with RAM
Not so sure about SLOG - but its doing something very important if you need one.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
L2ARC is non-critical - lose the vdev, ZFS don't care (performance may be effected, but data isn't)
Technicslly it's not a vdev :tongue:
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
So basically I have some beefy server with 16 cores and 256GB registered ECC DDR4 over 10 GbE network and want to store some:
1) Personal + work/business document files (Frequently used on a daily basis, critical, I do not want to lose)
2) Personal photos/videos backed up from devices of all my family members (Weekly dumps, infrequent reads, critical, I don't want to lose)
3) Plex media server with 1080p movies and tv shows (sequential reads of big files, non-critical, I will be fine if I lose this data)
4) SMB and NFS shares for VMs I create in my hypervisor (random read/writes, non-critical, will be fine if I lose this data)
2) and 3) are clear use cases for raidz# (2 or 3, as 1 is advised against with large HDDs).

So I have 8 new drives 16tb each now and thinking about 3 layouts:
A) 1 x raidz3 VDEV with all 8 drives (storage capacity of 5 drives, 3 for parity)
B) 2 x raidz2 VDEV with all 4 drives in each (storage capacity of 4 drives, 4 for parity)
C) 4 x mirrored pairs with 2 drives in each pair (storage capacity of 4 drives, 4 for parity)
A is fine; B seems space inefficient (and a single 8-wide raidz2 would not be unreasonable); C raises the issue that, with 16 TB drives, the plain 2-way mirror may suffer the same issue as raidz1: too much non redundant data in the event of the loss of one drive.

I'd say:
D) One 8-wide raidz2 or raidz3 for the large file stores 2) and 3) plus two S3710 in a mirror for the VMs in 4).
Depending on size and actual use, documents 1) and NFS shares in 4) may go either to the HDD raidz# or to the SSD mirror. (If the critical dataset 1) is on the small mirror, have frequent replication backups to the HDD pool for extra security.)

And also I do not quite understand, if for example 1 of my drives in one of the VDEVs will fail in each of the above A,B or C layouts, what has higher chances for a second drive failure during the rebuilding process? For me it seems like mirrored pairs are the most scary, since only 1 drives from that pair of a failed drive will do all the heavy lifting to rebuild data to a newly added drive. Whereas, in wider 8 drives or 4 drives VDEVs this load to rebuild a drive is distributed among several drives.
Correct, but resilvering a mirror is faster, and puts less strain on the surviving disks than resilvering raidz#.
Another critical point is that, after losing a drive, a 2-way mirror or raidz1 has no redundancy, and will lose files if there's any further issue (drive failure, URE) during resilver, while raidz2/3 or 3-way mirror still has redundancy and copes with further failures.
 
Top