ZFS Deduplication - Would it worth ...

alexisr73

Explorer
Joined
Oct 21, 2019
Messages
51
Hi, I've got a use case where I'm very thinking about enabling zfs deduplication.

I've got two separate datasets :
- Medias : For Plex movies, tvshows and other
- Qbittorrent : Where qbt is watching, downloading and seeding files

Today, when I'm downloading video file that I want to put in Plex, I need to copy it from the Qbittorrent dataset to the Medias dataset. However, this is duplicating files ...

My question is, would it worth to create a brand new dataset with ZFS deduplication enabled (only on this dataset) and then recreate my two datasets in it (so Qbittorrent and Medias).

I'm not on specific server hardware but I'm running on i3 (8th gen) with 16GB RAM), and I've got less than 2TB of Medias files.

Thank's for your reply !
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
Short answer: DON'T

Deduplication has been discussed a lot lately, and with the hardware you have, you're going to regret enabling big time.

Couldn't you just mount the qbittorrent dataset into plex? that way you have no duplication
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
I think I'm the current "voice of deduplication mayhem" on the forum this month ;-)

Dedup is sensitive to, and stresses, 3 things:
  • RAM - you'll need more than you would without it. Dedup essentially indexes every block in the pool to match duplicates "on the fly", and holding those in RAM takes quite a lot of memory. reckon tens of GB more. You don't want your dedup data to have to be fetched as disk IO speed! As a rough guide the dedup table for my 40 TB pool is about 200 *million* entries at a few hundred bytes each. You want enough RAM for all that ideally, in addition to the RAM you already had for other things.
    CAVEAT - with really good special vdevs on TrueNAS 12 core, that might be less essential from now on, perhaps the dedup tables can tolerate being de-cached more.
    That's the "well known" one. There are 2 others:
  • DISK IO - Dedup will hammer - and I mean, jawdroppingly hammer - disk IO. To give context, the demand on my pool is 1/2 *million* 4k random IOs per second, over minutes at a time, to look up dedup data, when the file system's doing a simple task like a large file write to the server. Thats with a single file transfer, and everything else quiet. And thats 100% dedup. Do *not* run dedup without special vdevs (in trueNAS 12) which allow that data to be permanently stored on special SSDs not the more general pool HDDs.
  • CPU - dedup has to hash everything, mathematically, to find duplicates. That puts workload on the CPU. Ensure you have a 4 core+ CPU.
There are 2 other known issues with dedup:
  • ZFS throttles things based on disk activity and RAM use, NOT CPU use. But dedup is heavy on CPU, because of the hash calculations. During scrub, it's possible for a pathological situation to develop where pool IOs are fine, RAM is fine, so everythings inside its throttled limits, but the resulting data flow is still enough to swallow up 100% of CPU hashing it, causing ANYTHING thats not scrub, to grind painfully slowly. Not a problem if fixed, eventually, or if you run scrubs when you dont need the server. Scrubs WILL run a huge amount faster on 12 than 11.x, but they'll also run more intensely.
  • If you dedup without special vdevs, be prepared for a world of baffling pain. See other forum posts recently.....
Dedup is really intended for when your disk costs are so huge that its not feasible without dedup. When you'll reduce your data size by several times. If you are casually using it, or your data wont dedup much, just don't.
 

alexisr73

Explorer
Joined
Oct 21, 2019
Messages
51
Thank's to both of you for you replies !

Couldn't you just mount the qbittorrent dataset into plex? that way you have no duplication

I thought about it but I'm not always downloading medias files, and I never tried how Plex is managing these non-media files when placed in libraries' source. The same way I don't know how Plex is managing video files when they also need to be accessed by Qbittorrent service. From my point of view, Qbittorrent is only reading the files while Plex must have full rights on it. So what, I'm not sure that Qbt is able to seed while Plex is diplaying a file. What do you think ?

Another way I tried was links. I wrote a little script that list QBT medias files downloaded and make a link on the appropriate place for Plex usage. But it wasn't working between two datasets with different permissions, and most of the time I wasn't able to access my files from SMB share to relocate them.

This is why I came to dedup, but I'm not very confident in my hardware capabilities with Stilez's comment ... I suppose the wiser would be to find another way to not duplicate files since it's not a professional server :confused:
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Today, when I'm downloading video file that I want to put in Plex, I need to copy it from the Qbittorrent dataset to the Medias dataset. However, this is duplicating files.
To be honest, I don't get why you say this creates any need to duplicate files. I mean, once something has finished downloading, and you copy it to the Medias dataset, why exactly do you still need another copy in Qbittorrent? Just to seed? Seeding only needs read access. Qbittorrent can have that without upsetting Plex. Short version - I'm not convinced you need 2 copies of completed files.

I'm not on specific server hardware but I'm running on i3 (8th gen) with 16GB RAM), and I've got less than 2TB of Medias files.
Well, that's the end of that question. Don't dedup. You are nowhere near the hardware requirements, and I'm totally unconvinced of the need to have multiple copies of.files.

Instead, think hard. Why 2 datasets? Why 2 copies?

I'm sure what you want, is easy and shouldn't need more than ACLs and symlinks. Let's assume Qbittorrent can be told to download in one directory, move completed files to another (I'd be amazed if it couldnt). Lets assume PLex can be given a list of directories and will serve from them all, transparently (I'd be amazed if it couldnt). Then try this:
  1. Checkpoint your pool (Zpool checkpoint POOLNAME)
  2. Create (at least these) 4 directories on your Medias dataset: Plex_Shared, Plex_Not_Shared, QBT_Completed, and QBT_Downloading.
  3. Configure QBT to download into QBT_Downloading, and to move completed files to QBT_Completed. Also configure QBT to seed from bothPlex_Shared and QBT_Completed.
  4. Split your current media Plex/media files into those you want to share and those you dont, and move them into Plex_Shared and Plex_Not_Shared accordingly.
  5. Configure Plex to index as media, both Plex_Shared and Plex_Not_Shared.
  6. Configure the jails/plugins/ACLs/whatever, so that both Plex and QBT can see your dataset, or at least that each can see the folders above that they need to.
This is how I *think* it'll work:

QBT will download files using QBT_Downloading for partial DLs. Plex won't see them, which is correct. Once completed, they'll be moved by QBT to QBT_COmpleted, but Plex still wont see them. QBT will seed them, though. You decide for each file whether it'll be seeded or not, once in Plex, and move it to the appropriate Plex directory. Plex will see the media file whichever its in, but it'll only be seen by QBT and seeded, if it's in Plex_Shared, not Plex_Not_Shared. QBT's seeding/sharing wont be an issue for Plex, because both programs read the completed media files, they aren't competing in changing them.

If it works, good. If you need to revert all changes (other than those in the webUI config) then revert back to the checkpoint. Otherwise, discard the checkpoint.
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Hi, I've got a use case where I'm very thinking about enabling zfs deduplication.

I've got two separate datasets :
- Medias : For Plex movies, tvshows and other
- Qbittorrent : Where qbt is watching, downloading and seeding files

Today, when I'm downloading video file that I want to put in Plex, I need to copy it from the Qbittorrent dataset to the Medias dataset. However, this is duplicating files ...

My question is, would it worth to create a brand new dataset with ZFS deduplication enabled (only on this dataset) and then recreate my two datasets in it (so Qbittorrent and Medias).

I'm not on specific server hardware but I'm running on i3 (8th gen) with 16GB RAM), and I've got less than 2TB of Medias files.

Thank's for your reply !
The way everyone does this is by using a single dataset with different folders. You mount your media dataset into qbittorrent and your Plex jail. Make sure to mount them to the same path inside the jail if you plan on using something like sonarr or radarr. Then you download your data and create a hardlink to your organized folder for the particular media type. Using hardlinks here is why you need everything in the same dataset.

This is a super common setup and let's you seed the download while also organizing and renaming the media.
 

alexisr73

Explorer
Joined
Oct 21, 2019
Messages
51
Hi all and sorry for the late reply ! I tried follow your advises so I placed my Qbittorrent folder and my Plex folder on the same dataset. But I'm now encountering a very annoying permission issue ... Let me explain here, and I will open a dediacted thread if no one can help on this.

As said, I was lately splitting Qbittorrent and Plex datasets with their own permission each. According to make them access data with read/write permission, I created on the root system service accounts user with proper UID used by each service (qbittorrent and plex). Then I made these users owner of their respective dataset. So Plex dataset was owned by sa-plex with same UID as plex user into the jail, and so on for qbittorrent.

Following your advises, I made the below organization :

Dataset - Media (configured as SMB share type)
Folder - Plex​
Folder - Movies​
Folder - TVShows​
Folder - Music​
Folder - Qbittorrent​
Folder - downloaded​
Folder - temp​
Folder - torrents​
Folder - watcher​

Media dataset is now owned by my personnal user (because I still need to access files through smb share) and I creatded a new group with all my users that need to access data with read/write access in it.
First issue, I'm totally unable to change neither permission nor ACLs through command line. No changes are made and I can't figure out why.
I tried update permission through command line but, when changes are set, I've got no improvement. And sometimes, I even got permission denied error message as root user ...

Best example, Plex have read permission :
Plex user has UID 972 into the jail
2020-09-21 08_34_12-VMware Horizon Client.png
I have created a service account user for Plex on Freenas system in order to set permission in a more human readable way. Please note that this was done before and worked well.
2020-09-21 08_34_58-FreeNAS - 192.168.1.150.png
This is where the new settings begin. I've created a new group where I add all users with read/write permission needs on the Media dataset
2020-09-21 08_35_50-FreeNAS - 192.168.1.150.png
And then, I set permissions (through command line) as below
2020-09-21 08_44_08-192.168.1.150 - PuTTY.png
But if Plex can successfully read data, it can't delete movies like it used to. Qbittorrent has the same issue and cannot download anymore since it can't write into temp/downloaded folder though pastis-media group has full rights on Media dataset.

I can admit that I am easily overtaken in case of permission issue, but I'm quite sure I've done things in the right way initially. But now, I feel like my Media dataset have totally messed up its permissions without possibility to roll back. Yes, before you ask, I don't make snapshots of this datatset.

Hope you could help ...
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
My first reaction is, are you using traditional permissions or ACLs on those directories. The output looks like you may be primarily using traditional permissions so let's check if ACLs exist and if so, are they overriding them. Either way lets see what the full permissions setup is like for the problem directories and files.

The following output should get us all the permissions info needed to check that. For each directory or file with a "problem" can you pick a sample file that's not doing as it should (say, /mnt/POOL/X/Y/FILE that plex can't delete or whatever it is), and post like this:

  1. The path /mnt/POOL/X/Y/FILE and anything that's not working for that directory or file
  2. Traditional permissions for each element of the path: the output of ls -lf <PATH> (lowercase "l") on each element of the path (eg, each of /mnt/POOL, /mnt/POOL/X, /mnt/POOL/X/Y and /mnt/POOL/X/Y/FILE)
  3. ACL permissions in both friendly and numeric form, for each element of the path: the output of getfacl -v <PATH> and getfacl -nv <PATH> on each of the same as above
  4. ZFS ACL settings on all datasets in the pool: the output of zfs get -o aclmode,aclinherit -r POOL (only needed once
Then see what we see.....
 
Last edited:

alexisr73

Explorer
Joined
Oct 21, 2019
Messages
51
Here's what you asked ...

From my knowledge, everything seems correct ... This is a file that I didn't succeed to delete yesterday.

2020-09-21 15_08_39-192.168.1.150 - PuTTY.png


2020-09-21 15_10_52-192.168.1.150 - PuTTY.png


2020-09-21 15_13_19-192.168.1.150 - PuTTY.png
 

Attachments

  • 2020-09-21 15_10_52-192.168.1.150 - PuTTY.png
    2020-09-21 15_10_52-192.168.1.150 - PuTTY.png
    9.5 KB · Views: 216

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You can't use windows share type and jails for the same dataset. Most jail will try and chmod things in that dataset at some point and that isn't allowed with a windows share type.

Create your dataset as a Unix type and then create your smb share separately and don't let it set permissions.
 

alexisr73

Explorer
Joined
Oct 21, 2019
Messages
51
It was working fine before. I was able to access my files with SMB and Plex had full permission on it too. This is the better way for me to do it because I'm not accessing my files only from Plex. So it may have another solution more appropriate for my uses ...

In case this is the root cause of my issue, do you think about the "share type" attribute that we set at the first creation ? If yes, I set it in SMB. And I read around that there's no way back :confused:
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
It was working fine before. I was able to access my files with SMB and Plex had full permission on it too. This is the better way for me to do it because I'm not accessing my files only from Plex. So it may have another solution more appropriate for my uses ...

In case this is the root cause of my issue, do you think about the "share type" attribute that we set at the first creation ? If yes, I set it in SMB. And I read around that there's no way back :confused:
You can set the aclmode on the dataset to passthrough and then using setfacl on the cli you can remove all acl's on the files. If you do just this it will fix your problems and you can still use smb.
 

alexisr73

Explorer
Joined
Oct 21, 2019
Messages
51
I'm not very familiar with zfs aclmode and setfacl. Could explain me in few lines what are they exactly ? Why am I not able to changes permissions through the GUI directly ?
 

alexisr73

Explorer
Joined
Oct 21, 2019
Messages
51
Okay then, I reset ACLs using command find /mnt/MirrorStorage/Media | setfacl -b and I set zfs aclmode to passthrough.

I was thinking that I could now edit ACLs through command line, but it seems the GUI still can't modify anything ... So I modified every files' ACL under Media dataset with full_set permission

find /mnt/MirrorStorage/Media | setfacl -m owner@:full_set::allow,group@:full_set::allow

But even if ACLs AND permissions seems good now, Plex is still unable to delete a file. That's awesome. I feel like it's telling me "nope, go fuck yourself" at each try o_O

2020-09-21 19_08_52-Window.png
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
I'm not very familiar with zfs aclmode and setfacl. Could explain me in few lines what are they exactly ?
I don't know how much you know, sorry if this is basics.

ACLs are a more advanced version of permissions, compared to traditional unix permissions (chmod). The permissions themselves are more granular, you can be much more specific about what a user can do, and which users can do which things. You can create a list of Access Control Entries (ACEs) which specify the rules for all kinds of users and groups, and (almost) arbitrary numbers of users and groups. The permissions set this way can state if they apply to files or directories, if they inherit downwards or not, if they allow or deny, and which rules have priority over which. The commands to manipulate them are getfacl (read ACL permissions) and setfacl (set/modify ACLs).

Because FreeBSD has both these and traditional permissions, and they aren't the same, it needs a way to know what to do, when a file or directory or some object, has say, traditional permissions set on it, and you try and set ACLs on it. If the ACLs can't be handled as traditional permissions, should it reject the request? Or should it delete the traditional permissions and create ACL permissions instead? or what? And should ACLs where they exist, inherit downwards by default? Equally, what if it has ACLs already and you try to set a traditional chmod permission. Should it convert your request to an equivalent ACL, or reject it, or delete the ACLs and recreate as asked? That's broadly the "aclmode" function. It says what to do if an object has one kind of permission and you try to apply the other kind to it.

The other zfs property, aclinherit, governs which types of ACL entries will automatically inherit downwards, which kinds wont, and which if any will be forced to inherit. See "man zfsprops" for details of both.

Because of the potential conflict of the two, you need to manage permissions a bit carefully on your pool. For example, your windows SMB shares may expect ACLs, because thats what Windows uses, but perhaps Plex and other plugins expect traditional permissions. These conflicts of use can almost always be worked round so dont be worried by them, but you should know they exist. Thats why I raised the issue. Because if permission changes arent working, the most common reason is you're trying to use both traditional and ACL permissions on the same items, one way or another, or an aclmode/aclinherit that locks down what happens when they confli8ct.

Why am I not able to changes permissions through the GUI directly ?
Im not sure, but the GUI can give a limited window on a quite rich system. Its capabilities are inevitably limited, it has t6o offer a certain GUI format and other things outside that, maybe not offer. If in doubt, fix in command line and don't sweat it?
 

alexisr73

Explorer
Joined
Oct 21, 2019
Messages
51
Thank's a lot for this very complete explanation. I'm familiar with ACLs at network level but this is the first time I deal with file system's ACLs. I think my previous actions actually did the work !!

I'm able to delete Plex movies through the web interface (note for people who would read this post later, make sure the file you're trying to delete was already there at last application restart). And I'm also able to give a torrent file to Qbittorrent watcher, and it downloads (an application restart was necessary for ACLs to be taken into consideration).

I'm almost done ! My last (but not least) "problem", even if QBT is downloading, the file created has wrong umask (-rw-r--r-- instead of rwxrwxr-x in best case). This is making files not visible from my personnal user (on SMB share) because they are created with qbittorrent user as owner. Is there a configuration (SMB maybe ?) I missed in order to create files with the right umask ?
 

alexisr73

Explorer
Joined
Oct 21, 2019
Messages
51
Regarding my previous message, I wanna add that I looked for a solution before asking :)

Actually, I have modified the /etc/login.conf file in the qbittorrent jail with umask 002 so, if I have well understood,every user in this jail should create a file with this rights -rwxrwxr-x, but it doesn't. Well, I suppose I have few limitations at this point

Thank's in advance for you reply
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
FWIW, TrueNAS Core 12; when using SSDs for Metadata VDEV (or it's subset, a DeDupe VDEV) the penalty for dedupe is drastically reduced.

Metadata VDEV contains dedupe tables, but one can also split of dedupe tables to a dedicated VDEV. My topology, I use SATA SSD mirrors for Metadata, and NVMe M.2 mirrors for dedupe. I have also tested just with the SATA SSD, and performance is still great.

Metadata doesn't get hit all that hard; but as mentioned, dedupe gets POUNDED. Each block gets a check; do I have a copy of THAT block already? Nope? OK, write it and update the table. Yep? Update the table and skip writing.

The other thing is that all the reporting will not show the dedupe after-effect, like with compression. As with compression; dedupe only benefits your storage footprint; it's not reflective anywhere else, IE, an iSCSI LUN will not report less space consumed than what the guest system believes is in use; you run out of space from the guest system, you run out of space; even if the LUN is only half used. TBH, I haven't checked with shares to see how that lines up with quotas; IE< 2TB Quota, and you throw 1.75TB of data with 10% dedupe rate, if it shows as 1.75TB consumed or not...lol

Basically dedupe will only allow you to store more data; but won't help you in the performance department; it might help against the performance degradation aspect of when a pool gets too full, but again, 12 is the first I've seen where performance is enough with dedupe to try and even push that boundary and see.
 
Top