Possibly some very stupid questions...

paulinventome · Jun 1, 2022

So I am learning about ZFS and set ups as I start building some hardware. I just read a whole thread about the CoW and what it means to ZFS but I am just not quite 100% there yet with understanding.

I'm building two main storage volumes.

One will be 8x16TB drives, in a RAIDZ2 (or RAIDZ1 if I can get away with it). This will be more work based storage which is mainly video files, these could be 1 to 2TB for source footage, then intermediate vfx files, and also folders of .exr files. So these projects will build up. Then once a project is finished, or at various stages of a project the data will be achieved to LTO tape.

But do I understand correctly that if I then delete the project, freeing up say 10TB of storage, due to the way ZFS works that 10TB might not be used and/or the fragmentation would lead to performance issues. As I think I understand, ZFS will just write into free space. I have read in order to 'clean' the system that data is copied off, drives formatted and copied back on...

I am hoping that I have just misunderstood?

Then my other storage volume would be NVMe based and that would be hit all the time as online immediate storage, files read, written and deleted constantly. It's the working drive, very quickly this would fragment.

So really it's a follow on question, what kind of set up for NVMe would be most efficient here?

I'm not really able to hear of any other software that would work either, I use Synology at the moment and that's been fine for 8+ years but I need to migrate or sink a load of money into more Synology hardware. I like the idea of TrueNAS because I can build exactly what I want...

thanks
Paul

danb35 · Jun 1, 2022

paulinventome said:
One will be 8x16TB drives, in a RAIDZ2 (or RAIDZ1 if I can get away with it)

Don't do RAIDZ1 with drives of that size.

paulinventome said:
I am hoping that I have just misunderstood?

I'm not quite sure what the concern is. If you delete files, assuming they aren't still referenced in a snapshot, the relevant blocks are marked as free. The system can then use them as it needs to. I'm not sure what you're referring to as "cleaning" the system, or why you'd want to copy the data off, format the drives, and copy the data back on.

paulinventome · Jun 1, 2022

danb35 said:
Don't do RAIDZ1 with drives of that size.

I'm not quite sure what the concern is. If you delete files, assuming they aren't still referenced in a snapshot, the relevant blocks are marked as free. The system can then use them as it needs to. I'm not sure what you're referring to as "cleaning" the system, or why you'd want to copy the data off, format the drives, and copy the data back on.

You mean don't just have one parity drive? Or do you mean something else when you said don't do RAIDZ1?

Well as I understand the CoW is that as time goes on and files are written in free space, then blocks are free'd but files become fragmented and beyond a certain amount of usage then the system slows noticeably and then files become more fragmented and there is no defragmenting and so the whole thing eventually becomes slower and slower.

The enterprise view is to just add more capacity. Almost like treating drives as worm. Or I've read on the forums here, copy everything off and then back on (which for 96TB isn't going to happen)

I presume resilvering is about adding capacity or is it a form of cleaning up when the system is so fragmented?

Kindest
Paul

danb35 · Jun 1, 2022

paulinventome said:
You mean don't just have one parity drive?

Pretty much. Though I think the "RAID5/RAIDZ1 is dead" exaggerates the danger, it's still a legitimate issue--if you Google "RAID 5 is dead" you'll find plenty written about it.

paulinventome said:
Well as I understand the CoW is that as time goes on and files are written in free space,

This really is true of any filesystem--files will be written in available, hopefully-contiguous, free space. And yes, a filesystem (again, any filesystem) will generally become more fragmented over time, though keeping a decent amount of free space can reduce this.

paulinventome said:
I presume resilvering is about adding capacity

Resilvering is what happens any time a disk is replaced, whether to increase capacity or not.

paulinventome · Jun 2, 2022

danb35 said:
Pretty much. Though I think the "RAID5/RAIDZ1 is dead" exaggerates the danger, it's still a legitimate issue--if you Google "RAID 5 is dead" you'll find plenty written about it.

This really is true of any filesystem--files will be written in available, hopefully-contiguous, free space. And yes, a filesystem (again, any filesystem) will generally become more fragmented over time, though keeping a decent amount of free space can reduce this.

Resilvering is what happens any time a disk is replaced, whether to increase capacity or not.

So I guess a follow on with ZFS is how this is managed longer term? There doesn’t appear to be any defragment tools and a file system becoming more fragmented will happen to everyone. Copying data on and off isn’t viable I’d imagine so after a few years of use what does everyone do?

Thanks
Paul

HoneyBadger · Jun 2, 2022

paulinventome said:
But do I understand correctly that if I then delete the project, freeing up say 10TB of storage, due to the way ZFS works that 10TB might not be used and/or the fragmentation would lead to performance issues. As I think I understand, ZFS will just write into free space. I have read in order to 'clean' the system that data is copied off, drives formatted and copied back on...

I am hoping that I have just misunderstood?

You're somewhat misunderstood here, both in terms of the cause and the scope of impact.

Assuming you're working with large files (raw video) you'll be modifying large chunks (many ZFS records) at a time, in a more sequential/contiguous manner (read the file, edit/apply effects, save a chunk of video as a RAW output, etc, finally stitch it all together into a rendered/compressed end result). Once you delete the project, you'll be freeing up 10TB of space that ZFS will consider as "free" and will happily be able to overwrite it again.

Fragmentation hits you much harder when you're using lots of random writes to a file, such as a virtual disk for a hypervisor or a file container that's supporting a database. The first writes to the disk or DB such as the initial OS installation/DB ingest goes to sequential space, and it works nicely. But once you start deleting/overwriting parts of the file, that nice sequential space starts to checkerboard as the new writes land into new "empty space" - so when you're reading LBAs "1, 2, 3, 4" your physical disks might actually need to read "1, 2002, 3, 9001" which is where you get hurt by fragmentation.

Basically, you're working in such huge "chunks of file" at a time that your "fragments" are still really big, and are capable of being handled by spinning disks. "Go here and read 1MB, then go here and read 1MB" is much more tolerable than "go here and read 4K, then go here and read 4K"

paulinventome said:
Then my other storage volume would be NVMe based and that would be hit all the time as online immediate storage, files read, written and deleted constantly. It's the working drive, very quickly this would fragment.

So really it's a follow on question, what kind of set up for NVMe would be most efficient here?

Solid-state storage is able to mostly ignore the impact of fragmentation - reads can ignore it but writes do have some impact with regards to page programming and garbage collection. You'll probably be best served with just a set of mirrors in order to have the best performance. Make sure that you have a reliable drive model and proper cooling if they're known to get hot under sustained writes.

paulinventome · Jun 3, 2022

HoneyBadger said:
You're somewhat misunderstood here, both in terms of the cause and the scope of impact.
Basically, you're working in such huge "chunks of file" at a time that your "fragments" are still really big, and are capable of being handled by spinning disks. "Go here and read 1MB, then go here and read 1MB" is much more tolerable than "go here and read 4K, then go here and read 4K"

I'm happy to have misunderstood. I'm comparing to Brtfs on Synology and to be honest it might be similar. I'm not aware of any defragmentation going on there either. There was a long thread on these forums about the nature of ZFS which made me wonder

HoneyBadger said:
Solid-state storage is able to mostly ignore the impact of fragmentation - reads can ignore it but writes do have some impact with regards to page programming and garbage collection. You'll probably be best served with just a set of mirrors in order to have the best performance. Make sure that you have a reliable drive model and proper cooling if they're known to get hot under sustained writes.

So why mirrors for best performance? This is something I can't decide about. A decent NVMe has onboard ECC and even in some cases parity corrections and internal RAID going on. In *theory* they should be way more robust than any spinning disc and so the idea of RAIDing them seems potentially pointless. But of course we're brought up worrying about failure.

So by mirror for best performance you mean that in any other configuration the NVMe performance would be compromised by any other form of redundancy - so a simple Mirror is best. Ideally if I have 8TB of storage I'd rather it was *all* storage of course. But one parity 2TB is better than 4TB of mirror... A lot of this comes down to performance - so long as I can saturate a 10gbe network I'm happy.

Thanks
Paul

Important Announcement for the TrueNAS Community.

Possibly some very stupid questions...

paulinventome

Explorer

danb35

Hall of Famer

paulinventome

Explorer

danb35

Hall of Famer

paulinventome

Explorer

HoneyBadger

actually does care

paulinventome

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Possibly some very stupid questions...

paulinventome

Explorer

danb35

Hall of Famer

paulinventome

Explorer

danb35

Hall of Famer

paulinventome

Explorer

HoneyBadger

actually does care

paulinventome

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Possibly some very stupid questions..."

Similar threads