ZFS, Copy-On-Write, and Torrents?

Status
Not open for further replies.

kayot

Dabbler
Joined
Nov 29, 2014
Messages
36
ZFS is a Copy-On-Write filesystem. Torrents write in chunks of data to a file. A file can be large +4gb. Does this mean that when downloading a torrent onto a ZFS, every chunk that is received will cause the whole file to be rewritten?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No. It means that the chunks that are laid down are probably not strictly sequential, which would likely be the case for other filesystems as well.
 

kayot

Dabbler
Joined
Nov 29, 2014
Messages
36
Wait, so does COW just make a copy of the parts of a file that changes based on cluster size?
 

jdong

Explorer
Joined
Mar 14, 2016
Messages
59
Wait, so does COW just make a copy of the parts of a file that changes based on cluster size?
No, it's more like COW never overwrites existing blocks of data -- it instead writes down a new block, then atomically updates its notion of a file to point at most of the old file except that new block. The size of a block is variable in ZFS.

Mainly it just means you should disable any torrent client feature that "preallocates" the whole file, because it's useless on a COW file system.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Oh, that's a good point. Depending on what "preallocate" means in the context, merely seeking out to a distant block and creating the file wouldn't be that bad, but in UNIX this would typically create a sparse file anyways. So more likely it would be an attempt to actually allocate zero-filled blocks, which under ZFS would compress anyways, so you'd get some rather less-than-ideal behaviours going on as zero-filled 1MB blocks that compress down to 4K are incrementally replaced, probably not 1MB at a time.
 

jdong

Explorer
Joined
Mar 14, 2016
Messages
59
Oh, that's a good point. Depending on what "preallocate" means in the context, merely seeking out to a distant block and creating the file wouldn't be that bad, but in UNIX this would typically create a sparse file anyways. So more likely it would be an attempt to actually allocate zero-filled blocks, which under ZFS would compress anyways, so you'd get some rather less-than-ideal behaviours going on as zero-filled 1MB blocks that compress down to 4K are incrementally replaced, probably not 1MB at a time.

Yeah, that depends on the torrent client. Most of the sensible ones call a syscall like posix_fallocate() to pre-advise the filesystem of the total size of the file. On Linux, almost all of the traditional filesystems will respond to that call by doing something sensible that returns quickly and also prevents fragmentation and allocation overhead as the chunks come in. I'm not sure how ZFS on FreeBSD responds to that. I'd guess it's a no-op, since a COW filesystem really doesn't care about how a file grows in the future.

It's been 5+ years since I've looked at the source code of a torrent client. Back then, posix_fallocate() was the new hotness, and there were still torrent clients that would literally write out a bunch of zeroes, or attempt to create a sparse file by writing some zeroes then seeking + writing some zeroes at the end. IIRC, each approach had some sort of nuanced gotcha depending on what filesystem you used.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Interesting to note how attempts to be clever can actually be detrimental, huh.
 

jdong

Explorer
Joined
Mar 14, 2016
Messages
59
Yep! There's a lesson to be learned here about jumping the gun on optimizing something a couple abstraction layers below you. Yeah you might even win in the short term... But over time, that might just as easily backfire.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
From my experience, torrent indeed can be unpleasant for ZFS, causing huge file fragmentation due to downloading file parts in random order. There is no pre-allocation on ZFS. So the solution I've found for myself is to give torrent client (transmission) a path to small non-redundant SSD-based ZFS pool I am using as scratch space. As result, downloaded files are first randomly written to SSDs, where fragmentation is not a problem, and when download completes, transmission automatically moves them to main large HDD-based pool.
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
No. It means that the chunks that are laid down are probably not strictly sequential, which would likely be the case for other filesystems as well.

Every torrent client I've seen have the option to pre-allocate all the files in the torrent first so there will be no fragmentation of the files on a normal filesystem.

Of course this wont work for ZFS since its copy on write. So you should definitely have the pre-allocation option turned off.

I simply recommend telling your torrent client to download your torrents to one dataset (like torrents/tmp), and then to move "completed" torrents to your main torrent folder (in a different dataset). Having the torrent client move the completed torrent out of the "tmp" dataset into a directory sitting on another dataset will tell ZFS to rewrite the entire file and it will move it mostly sequentially.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Having the torrent client move the completed torrent out of the "tmp" dataset into a directory sitting on another dataset will tell ZFS to rewrite the entire file and it will move it mostly sequentially.

Being a little loose with the realities there huh. :smile: Having the torrent client move the completed torrent out of a "tmp" dataset causes it to rewrite the file in a different dataset, which is likely to cause it to be rewritten in a less-fragmented manner. Better description of what actually happens.

That, however, is not always going to be the case. In the event where you are using dedup, copying a file from one dataset to another won't actually accomplish that goal. For that, you'd need to move a file between pools. You may well be better off with a tmp dataset on an SSD or something like that for this sort of abuse.
 

victorhooi

Contributor
Joined
Mar 16, 2012
Messages
184
Aha, interesting post, I'm having similar questions right now (link)

If you aren't using dedup in the pool - will copying a file from one data-set to another undo the fragmentation?

(Or in this case, from a ZVOL that's backing a Bhyve VM to a different dataset, but all on the same pool)

Is it just the torrent downloads that are fragmented, or is it likely to cause other long-term issues to the pool?
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Dedup is generally a bad combination for VM ZVOLs, since the first works better with large blocks (suffers from too high metadata overhead with small blocks), while the second works better with reasonably smaller blocks (suffers from read-modify-write cycles on misaligned writes with large blocks). Also dedup for ZVOL may not work efficiently if file system placed on top of it is not strictly aligned and has block size equal to ZVOL block size, in which case copy of the file may get different alignment and so not identified as a copy.

Torrent is usually bad because by default it uses random block download order, which means they are written to disk in random order, which, combined with the fact of ZFS inability to pre-allocate space, means sequential read of the file(s) after that will be practically random. The are also some other workloads that may have the same issue, for example, actively modified databases.
 

victorhooi

Contributor
Joined
Mar 16, 2012
Messages
184
Right - I don't plan to use de-dup on my system.

I'm just trying to figure out the best way to run a torrent client (within Docker/Bhyve VM), and not cause long-term damage to the ZFS pool, and get reasonable performance.

The other thread suggested taking my 512GB SSD (currently used for cache/L2ARC), and re-partitioning part of that for a new ZFS pool (to use as a ZVOL for Bhyve), and a cache.

This thread seems to suggest not using pre-allocation - which I will also follow.

But then this thread also seemed to suggest copying from one dataset to another would fix the damage, which is confusing me.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
But then this thread also seemed to suggest copying from one dataset to another would fix the damage, which is confusing me.

Copying data back and forth is a time-honored technique to reduce fragmentation on ZFS as it forces allocation of new space, which ZFS will try to do contiguously where possible. This won't work particularly well on a relatively full pool, though. Consider the case where you had a highly fragmented pool that was 90% full, and you copied an optimally-sequential file onto it, the copy of the file would actually become fragmented.
 
Status
Not open for further replies.
Top