ZFS Volume / Dataset best practices

Status
Not open for further replies.

Zon

Cadet
Joined
Aug 2, 2014
Messages
8
I'm a Freenas noob who's been experimenting with it for about the last 6 months. I've decided to migrate off a dedicated hardware NAS (with 4 2TB drives in RAID5) to an AMD A8-6600K Quad-Core 3.9GHz machine with 16GB RAM running FreeNAS-9.2.1.6-RELEASE-x64. The machine has 4 2TB drives in a RAIDZ1 volume for testing, with about 4TB out of 6TB being used right now by live data.

I am adding 4 4TB drives, and my thought was to configure them as a separate RAIDZ2 array, copy the data off the test RAIDZ1 array, then copy the data from the hardware NAS, and finally extend the RAIDZ2 volume with the 8 no longer used 2TB drives in a RAIDZ2 configuration as well.

Having a single ZFS volume makes management of the storage easier, however I assume I am increasing my risk by having 12 drives in a volume even though there would be multiple vdevs? If any drive fails, the entire volume is compromised, correct? And by inference, there is no advantage to configure the 8 remaining 2TB drives as two 4 drive RAIDZ2 vdevs rather than a single 8 drive RAIDZ2 vdev? I assume risk, performance, and ease of administration either configuration is identical, right?

During my experimenting when I realized that different datasets are seen as different storage volumes by Freenas, I decided I would use a single main media dataset. I had originally thought I should use separate datasets for all different kinds of data (large video files, documents, pictures, music, home directories, upload / download directories, etc) because it gave me granularity of compression, for example.

However, I very commonly move (cut/paste) large files across these domains. Having them in different datasets makes this a very slow byte-by-byte copy operation rather than the very fast "just update the pointers" when they are on the same dataset. I realize this loses me the granularity of configuration multiple datasets would give me, but it seems like the speed increase of many future operations and simplicity of management makes this a reasonable trade off. Is there an obvious negative that I am missing?

Finally, percentage-wise, quite a bit of my data (h.264 video, JPEG's, MP3's, etc) is incompressible. Given the CPU of my system, is it reasonable to leave the default compression level of the single large dataset as lz4? The system will just be doing file server duty, so wasting cycles trying to compress / decompress incompressable data doesn't harm anything, presuming it can keep up with a 1G LAN.

Thanks for any suggestions!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
You lose the pool if any one vdev fails.

A RAIDZ2 vdev can tolerate two drive failures before failing.

So, in a pool with two RAIDZ2 vdevs, you can always survive any two drives failing. With luck, up to four drives could fail. (Obviously, you want to replace any dead drives ASAP so it doesn't come to this)


Compression will not bottleneck GbE. If you want, you can disable it, though.
 

Zon

Cadet
Joined
Aug 2, 2014
Messages
8
You lose the pool if any one vdev fails.
So, I guess this is an argument for configuring the 8 2TB drives as two separate 4-drive RAIDZ2 vdevs rather than a single 8-drive RAIDZ2 vdev, since in the two vdev case you could theoretically lose as many as 4 drives (2 in each vdev) without losing data?
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
So, I guess this is an argument for configuring the 8 2TB drives as two separate 4-drive RAIDZ2 vdevs rather than a single 8-drive RAIDZ2 vdev, since in the two vdev case you could theoretically lose as many as 4 drives (2 in each vdev) without losing data?
No, he's saying that you could lose your pool if just one of your vdev fails .
This means if say you have 3 failed drives and all of those drives are in one 4-drive vdev, your entire pool is gone even though you're still under that "4" theoretical potential max.

Two 4-drive RAIDZ2 vdevs uses 4 drives for parity.
This means you can turn it into one 8-drive RAIDZ4 and have TRUE 4-drive fault tolerance. You don't have to worry about which drive fails if you have 3 failed drives or even 4.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes, but then you can use those calculators to calculate the chance of failure. I've got 10x6TB drives in a RAIDZ2 and I feel safe with my data.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Yes, but then you can use those calculators to calculate the chance of failure. I've got 10x6TB drives in a RAIDZ2 and I feel safe with my data.
Yeah, chance is pretty small. I was merely answering his question. Not really favoring one way over the other.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Zon

Cadet
Joined
Aug 2, 2014
Messages
8
Thank you for all your advice and feedback! I think I'll go with a RAIDZ2 vdev for the 4 4TB drives combined with a RAIDZ3 vdev for the 8 2TB drives (all of them are older, 4 of them are much older) into a single volume. I've read through cyberjock's guide suggesting for performance RAIDZ3 vdevs should have the total number of drives equal to 2n + 3, and 8 drives doesn't fit that. My hope is not having stripes that fall on 4k boundaries won't be a big performance impediment (as his presentation implies).

I assume from the absence of comments there's no real issue to using a single ZFS dataset for all data, other than the loss of the ability to tweak settings like compression on a more granular basis?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Actually the 2n +p goes out the window. I updated my presentation yesterday with the update.

The performance impact of not aligning to 4k sectors can range from negligible to enormous with your standard home user being negligible.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Thank you for all your advice and feedback! I think I'll go with a RAIDZ2 vdev for the 4 4TB drives combined with a RAIDZ3 vdev for the 8 2TB drives (all of them are older, 4 of them are much older) into a single volume. I've read through cyberjock's guide suggesting for performance RAIDZ3 vdevs should have the total number of drives equal to 2n + 3, and 8 drives doesn't fit that. My hope is not having stripes that fall on 4k boundaries won't be a big performance impediment (as his presentation implies).

I assume from the absence of comments there's no real issue to using a single ZFS dataset for all data, other than the loss of the ability to tweak settings like compression on a more granular basis?

Yeah, a single dataset is fine. You only lose flexibility.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I assume from the absence of comments there's no real issue to using a single ZFS dataset for all data, other than the loss of the ability to tweak settings like compression on a more granular basis?
The other thing that you have to keep in mind. ZFS datasets are treated as separate entities. This means that moving between different datasets are not treated as simple pointer operations.
 
Status
Not open for further replies.
Top