ZFS, Copy-On-Write, and Torrents?

kayot · Mar 15, 2016

ZFS is a Copy-On-Write filesystem. Torrents write in chunks of data to a file. A file can be large +4gb. Does this mean that when downloading a torrent onto a ZFS, every chunk that is received will cause the whole file to be rewritten?

jgreco · Mar 15, 2016

No. It means that the chunks that are laid down are probably not strictly sequential, which would likely be the case for other filesystems as well.

kayot · Mar 15, 2016

Wait, so does COW just make a copy of the parts of a file that changes based on cluster size?

jdong · Mar 15, 2016

kayot said:
Wait, so does COW just make a copy of the parts of a file that changes based on cluster size?

No, it's more like COW never overwrites existing blocks of data -- it instead writes down a new block, then atomically updates its notion of a file to point at most of the old file except that new block. The size of a block is variable in ZFS.

Mainly it just means you should disable any torrent client feature that "preallocates" the whole file, because it's useless on a COW file system.

jgreco · Mar 15, 2016

Oh, that's a good point. Depending on what "preallocate" means in the context, merely seeking out to a distant block and creating the file wouldn't be that bad, but in UNIX this would typically create a sparse file anyways. So more likely it would be an attempt to actually allocate zero-filled blocks, which under ZFS would compress anyways, so you'd get some rather less-than-ideal behaviours going on as zero-filled 1MB blocks that compress down to 4K are incrementally replaced, probably not 1MB at a time.

jdong · Mar 15, 2016

jgreco said:
Oh, that's a good point. Depending on what "preallocate" means in the context, merely seeking out to a distant block and creating the file wouldn't be that bad, but in UNIX this would typically create a sparse file anyways. So more likely it would be an attempt to actually allocate zero-filled blocks, which under ZFS would compress anyways, so you'd get some rather less-than-ideal behaviours going on as zero-filled 1MB blocks that compress down to 4K are incrementally replaced, probably not 1MB at a time.

Yeah, that depends on the torrent client. Most of the sensible ones call a syscall like posix_fallocate() to pre-advise the filesystem of the total size of the file. On Linux, almost all of the traditional filesystems will respond to that call by doing something sensible that returns quickly and also prevents fragmentation and allocation overhead as the chunks come in. I'm not sure how ZFS on FreeBSD responds to that. I'd guess it's a no-op, since a COW filesystem really doesn't care about how a file grows in the future.

It's been 5+ years since I've looked at the source code of a torrent client. Back then, posix_fallocate() was the new hotness, and there were still torrent clients that would literally write out a bunch of zeroes, or attempt to create a sparse file by writing some zeroes then seeking + writing some zeroes at the end. IIRC, each approach had some sort of nuanced gotcha depending on what filesystem you used.

jgreco · Mar 16, 2016

Interesting to note how attempts to be clever can actually be detrimental, huh.

jdong · Mar 16, 2016

Yep! There's a lesson to be learned here about jumping the gun on optimizing something a couple abstraction layers below you. Yeah you might even win in the short term... But over time, that might just as easily backfire.

mav@ · Mar 17, 2016

From my experience, torrent indeed can be unpleasant for ZFS, causing huge file fragmentation due to downloading file parts in random order. There is no pre-allocation on ZFS. So the solution I've found for myself is to give torrent client (transmission) a path to small non-redundant SSD-based ZFS pool I am using as scratch space. As result, downloaded files are first randomly written to SSDs, where fragmentation is not a problem, and when download completes, transmission automatically moves them to main large HDD-based pool.

SirMaster · Mar 20, 2016

jgreco said:
No. It means that the chunks that are laid down are probably not strictly sequential, which would likely be the case for other filesystems as well.

Every torrent client I've seen have the option to pre-allocate all the files in the torrent first so there will be no fragmentation of the files on a normal filesystem.

Of course this wont work for ZFS since its copy on write. So you should definitely have the pre-allocation option turned off.

I simply recommend telling your torrent client to download your torrents to one dataset (like torrents/tmp), and then to move "completed" torrents to your main torrent folder (in a different dataset). Having the torrent client move the completed torrent out of the "tmp" dataset into a directory sitting on another dataset will tell ZFS to rewrite the entire file and it will move it mostly sequentially.

jgreco · Mar 21, 2016

SirMaster said:
Having the torrent client move the completed torrent out of the "tmp" dataset into a directory sitting on another dataset will tell ZFS to rewrite the entire file and it will move it mostly sequentially.

Being a little loose with the realities there huh.

Having the torrent client move the completed torrent out of a "tmp" dataset causes it to rewrite the file in a different dataset, which is likely to cause it to be rewritten in a less-fragmented manner. Better description of what actually happens.

That, however, is not always going to be the case. In the event where you are using dedup, copying a file from one dataset to another won't actually accomplish that goal. For that, you'd need to move a file between pools. You may well be better off with a tmp dataset on an SSD or something like that for this sort of abuse.

victorhooi · Jul 2, 2018

Aha, interesting post, I'm having similar questions right now (link)

If you aren't using dedup in the pool - will copying a file from one data-set to another undo the fragmentation?

(Or in this case, from a ZVOL that's backing a Bhyve VM to a different dataset, but all on the same pool)

Is it just the torrent downloads that are fragmented, or is it likely to cause other long-term issues to the pool?

mav@ · Jul 2, 2018

Dedup is generally a bad combination for VM ZVOLs, since the first works better with large blocks (suffers from too high metadata overhead with small blocks), while the second works better with reasonably smaller blocks (suffers from read-modify-write cycles on misaligned writes with large blocks). Also dedup for ZVOL may not work efficiently if file system placed on top of it is not strictly aligned and has block size equal to ZVOL block size, in which case copy of the file may get different alignment and so not identified as a copy.

Torrent is usually bad because by default it uses random block download order, which means they are written to disk in random order, which, combined with the fact of ZFS inability to pre-allocate space, means sequential read of the file(s) after that will be practically random. The are also some other workloads that may have the same issue, for example, actively modified databases.

victorhooi · Jul 2, 2018

Right - I don't plan to use de-dup on my system.

I'm just trying to figure out the best way to run a torrent client (within Docker/Bhyve VM), and not cause long-term damage to the ZFS pool, and get reasonable performance.

The other thread suggested taking my 512GB SSD (currently used for cache/L2ARC), and re-partitioning part of that for a new ZFS pool (to use as a ZVOL for Bhyve), and a cache.

This thread seems to suggest not using pre-allocation - which I will also follow.

But then this thread also seemed to suggest copying from one dataset to another would fix the damage, which is confusing me.

jgreco · Aug 1, 2018

victorhooi said:
But then this thread also seemed to suggest copying from one dataset to another would fix the damage, which is confusing me.

Copying data back and forth is a time-honored technique to reduce fragmentation on ZFS as it forces allocation of new space, which ZFS will try to do contiguously where possible. This won't work particularly well on a relatively full pool, though. Consider the case where you had a highly fragmented pool that was 90% full, and you copied an optimally-sequential file onto it, the copy of the file would actually become fragmented.

Important Announcement for the TrueNAS Community.

ZFS, Copy-On-Write, and Torrents?

kayot

Dabbler

jgreco

Resident Grinch

kayot

Dabbler

jdong

Explorer

jgreco

Resident Grinch

jdong

Explorer

jgreco

Resident Grinch

jdong

Explorer

mav@

iXsystems

SirMaster

Patron

jgreco

Resident Grinch

victorhooi

Contributor

mav@

iXsystems

victorhooi

Contributor

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

ZFS, Copy-On-Write, and Torrents?

Dabbler

Resident Grinch

Dabbler

Explorer

Resident Grinch

Explorer

Resident Grinch

Explorer

iXsystems

Patron

Resident Grinch

Contributor

iXsystems

Contributor

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ZFS, Copy-On-Write, and Torrents?"

Similar threads