I have 6 8TB disks. RaidZ1 or Z2?

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
I have 6 8TB disks in my backup server, it backs up an 8 6TB RaidZ2 pool from another server via replication.

I know that more Z's is more better, but....

Since this is a backup, is it better/acceptable to run Z1 with that number of disks or is Z2 really the best option for anything more than 4?
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
The bigger the drives, the better it is to have "more Z's". I'd go for Z2 because it makes little sens to have a backup that is less secure than the primary—and use the primary as "backup of the backup" in case of a failure in the backup.
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
One thing I forgot to mention -- I will have a full 3-2-1(ish) backup. In addition to the 2 TrueNAS servers, I plan to backup all the data to an external 16TB USB device. Does that change your recommendation at all? I kinda think probably not, but wanted you to have all the info.

Also, with the Z2 I won't have enough space to back up the entire main server if it gets full. Z1 will give me enough.

Yeah, probably should just buy one more drive, but if I do, my wife will have my ass.
 

somethingweird

Contributor
Joined
Jan 27, 2022
Messages
183
Question you should ask yourself - can this backup server go offline for service? Yes - then Z1 - otherwise Z2 - resilvering does take sometime - and things can happen while resilvering
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
This server CAN go offline for service if needed, but won't the resilvering time be the same? How would I replace a failed drive?

Maybe I'm not understanding what you mean by offline. Are you meaning to remove data connections, or taking the pool offline somehow? And if I could, how would that mitigate the risk of an additional drive failure that justifies Z2 is it can't go offline?

This is a homelab, so I can do whatever I want with it. The Main server too -- but I might get some guff from the wife if she can't watch a TV show or movie, lol.
 

somethingweird

Contributor
Joined
Jan 27, 2022
Messages
183
This server CAN go offline for service if needed, but won't the resilvering time be the same? How would I replace a failed drive?

Maybe I'm not understanding what you mean by offline. Are you meaning to remove data connections, or taking the pool offline somehow? And if I could, how would that mitigate the risk of an additional drive failure that justifies Z2 is it can't go offline?

This is a homelab, so I can do whatever I want with it. The Main server too -- but I might get some guff from the wife if she can't watch a TV show or movie, lol.
Correct - I should explain better.. while resilvering (z1 vs z2) - what happens if you lose another drive - it can happen - forcing you to restore from backup tape/drive/usb, but if this is homelab - it's really up to you.
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
Is there any data out there about the rate at which a second disk fails during resilver? I understand the theory and the risk, but what are the actual statistical chances of that?

I understand that as disk size goes up, so does that number, since the system spends more time resilvering larger disks.
 
Joined
Jun 15, 2022
Messages
674
Is there any data out there about the rate at which a second disk fails during resilver? I understand the theory and the risk, but what are the actual statistical chances of that?

I understand that as disk size goes up, so does that number, since the system spends more time resilvering larger disks.

I've understood the theoretical number to be 10% during a hardware RAID-5 rebuild, however that doesn't strictly apply to ZFS as a.) ZFS only rebuilds data, not the entire disk, and b.) you have a home server that you can off-line, so that significantly reduces the stress load on the system during the rebuild.

With that said, I'd guess the chance of this happening is reduced by at least a factor of 10 simply for being off-line, and the rest depends on the amount of storage space used, so the chance is small but not insignificant. If you do regular scrubs that reduces the chance of failure due to undetected bad sectors, again I'd guess by a factor of at least 10 though probably more.

If we for a moment accept the 10% rebuild failure rate as a "worst-case scenario" for a RAID-Z1 array failing during rebuild, RAID-Z2 would lower that figure to 10% again meaning the expected failure rate would be 10% of 10%, that being 1%. Factoring in things like rebuilding only the data set, scrubs, and rebuilding off-line would lower the expected failure rate further. Most TrueNAS forum members seem to run RAID-Z2.

On that note, if a person did not yet have a 3-2-1 backup solution in place, running RAID-Z3 on Enterprise level drives would result in a worst-case expected failure rate of 10% * 10% * 10% = 0.1%, though the chance of losing data due to user error when deleting files is near 100%, so turn on snapshot scheduling.

Now, I should note the 10% figure is for Enterprise level drives, not consumer grade--which is up somewhere around 50% failure rate depending on how the array is structured. You can, if you're a math person, read about it further:
 
Last edited:

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
I am using enterprise drives. So I'll go with the 10% number. I'll probably just go with Z1 for now for the extra space. Since this is a backup target only, if I want to move to Z2 later, I don't really mind killing the pool and reconfiguring. Since I will have that external USB backup as last resort, I would have to have a really epic disaster to lose everything.

I'd have to have all 3 sources fail completely, and that's extremely unlikely, although nonzero. I'm OK with that risk level for the data I have.

So, just to be clear, by "rebuild offline," do you mean to sever all external connections to the server so that users aren't making demands?

My servers are headless, and in a location without a monitor -- so I would need to leave the network connection alive.. So just stop any replication and turn off NFS and SMB, etc?
 
Joined
Jun 15, 2022
Messages
674
Solid reasoning.

"Rebuild offline" means different things to different people; it could mean suspend the share that uses the drive from being shared or to turn off the SMB service completely, it depends on your setup. Basically don't let users access the share while the system is doing a rebuild.
 
Top