Ok, I will try one more time (7th try at least -hows the saying go? 7th times the charm...) and try to think of the simplest way to ask since I must have confused the hell out of everyone thus far. Simple question and the first person answered many questions dancing all around my original question almost like they were deliberately trying to evade it. Don't ask who.
Here it goes:
I will ask it in an example
ie: I have a 24 disk RAID 5 array. All are 4TB. How many disks or capacity do I loose to parity? 1 disk? and I have 92TB of usable space correct? Yes or No
ie: I have a 24 disk RAID 6 array. All are 4TB. How many disks or capacity do I loose to parity? 2 disks? and I have 88TB of usable space correct? Yes or No
...can tolerate the loss of one disk...
...can tolerate the loss of two disks...
Now that I have confirmed what I thought, your comment does look obvious. Someone said something once that made me think that parity data grows per amount of disks in an array but I must have misunderstood. I had thought that parity could be just so many bits or bytes per block or chunk of data. Just wasn't entirely sure. Thanks for that confirmation. Now I can go to sleep and not have night mares. ;)I thought this was obvious from my comment above...
My first build idea was a 6x 4TB.
I agree, compared with 10x 2TB it would fit a smaller case (especially that nice Node 304) and consume less energy.
However it has a slightly higher initial cost and can't really be expanded too much just by swapping out the disks.
In general, 'wasting' 1/3 of the disk space on parity... idk, it just seems too much.
Pretty though descision.
Thanks for the welcome btw :)
The assumption is wrong. Maybe NetApp does it that way for some reason, but ZFS does distribute the parity. Spreading the parity across all devices also gives you better read performance. With ZFS' dynamic stripe size it would not even make sense to store the parity on a specific device. Here's a diagram that demonstrates how ZFS dynamic strip size + parity works: https://pthree.org/2012/12/05/zfs-administration-part-ii-raidz/ (the p blocks are parity).About the parity disk discussion, i'm not sure Dusan is completely right about the parity data beeing spread across all drives in the array.
Sure on some systems/controllers that is true, but i'm not sure that is the general rule on hardware RAID controllers.
On NetApp the parity data is on specific drives. You can even see wich ones are data drives and wich is parity drives.
I assume this would be true on FreeNAS aswell.
You realize that whole conversation confuses the heck out of me. You seemed to know what you were talking about, but apparently you didn't know what you knew. LOL.
I never fully understood that whole math thing with the n+ something, something even though I read many things and looked at wiki's and wikipedia numerous times.I have a couple of supermicro e1r36n systems at work, a primary and a replication target, and even there I went with 6 RAIDZ2 vdevs. It just didn't seem like there was a better way of carving up 36 drives, considering production best practice is 2n+2 for RAIDZ2 where n should not exceed 6, hot spares don't really work, etc.
And finally forums.freenas.org turns into www.food.netSeal Flipper Pie
Recipe
The assumption is wrong. Maybe NetApp does it that way for some reason, but ZFS does distribute the parity. Spreading the parity across all devices also gives you better read performance. With ZFS' dynamic stripe size it would not even make sense to store the parity on a specific device. Here's a diagram that demonstrates how ZFS dynamic strip size + parity works: https://pthree.org/2012/12/05/zfs-administration-part-ii-raidz/ (the p blocks are parity).
OK, so my word and an independent blog is not enough :). I'm not familiar with NetApp, but I'm 100% sure about ZFS. The only definite "documentation" is the source code itself. However, let's try an experiment you can repeat yourself. I hope you agree that when you write a file (lets assume a big file, not few bytes) then it makes sense (performance wise) to spread the write across as many devices as possible. In a system that uses a dedicated parity drive you should see write activity on all drives (all data + parity). However, when reading there is no need to read the parity unless you are resilvering. So, now try to run "zpool iostat -v 1" and do some bigger reads and writes. If there is a dedicated parity drive you should see one drive idle when doing reads. In reality you will see an almost equal activity across all the drives. If this is still not a convincing proof, let's look at the source code. You do not need to understand C as there is this interesting comment in vdev_raidz.c: https://github.com/trueos/trueos/bl...ensolaris/uts/common/fs/zfs/vdev_raidz.c#L560Allright, maybe i am. I read the blog post you linked to, but are you sure this guy is correct?
I assume FreeNAS does the parity thing as NetApp because NetApp DataONTAP is running FreeBSD. Also, their "structure" of the volumes/pools are a lot alike.
* If all data stored spans all columns, there's a danger that parity * will always be on the same device and, since parity isn't read * during normal operation, that that device's I/O bandwidth won't be * used effectively. We therefore switch the parity every 1MB.
It's not just because of performance. It is also the result of the variable stripe size. Even if you did not try to move the parity around, writes that are not "aligned to number of data drives" would move the parity to a different drive. I'll use the image from the first blog post:If this is done because of performance it is strange that NetApp has parity on dedicated drives.
I'll look more into that myself...
Yes, the verification happens on all reads. However, you need to understand that checksum and parity are two different things. Checksum is just a hash (single number) that tells you if the block is consistent. The parity is a bigger chunk of data that allows you to actually reconstruct corrupted information. The checksum is stored in the block pointer (parent block) and is used to verify that the read block is OK. Only when the checksum doesn't match ZFS reads the parity and tries to reconstruct the data.I read somewhere that ZFS always checksums what it reads.... so that it is sure what it delivers is correct.
How can that be if parity is not read during read operations??