Very slow SMB performance for file listing/permissions

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
I just recently upgraded my main server to TrueNAS Scale, I put in all new HDD's as well. I moved the old drives to another server that I had upgraded to scale awhile ago. I recently finished copying all the data back over.. both pools are the exact same other then the physical disks of one are 14TB the new ones are 20TB each, but both are Raidz2 7 disk pools. File R/W speed appears better on the new server so its working fine from that perspective >800MB/s on my 10GbE network. The problem though is when listing folder properties to see total file count and size its very slow and sometimes even appears to pause/freeze. The other server does just fine and seems fast. The only difference I can think of is the new server has a fresh TrueNAS scale version while the older server with the old drives was an upgrade. I just don't know why file listing and permission performance would be so painfully slow.. Anyone run across an issue like this?
 

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
I noticed one difference between my two servers. The old server was set to use NFSv4 permissions while the new one was using the default POSIX permissions on the dataset.. I am going to try switching from POSIX to NFSv4 as based on this documentation, https://www.truenas.com/docs/references/aclprimer/ if I primarily am using this for Windows based file sharing that POSIX isn't recommended. Fingers crossed it will fix the file/folder properties slowness.. going to take awhile to reset all those permissions I feel.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I just recently upgraded my main server to TrueNAS Scale, I put in all new HDD's as well. I moved the old drives to another server that I had upgraded to scale awhile ago. I recently finished copying all the data back over.. both pools are the exact same other then the physical disks of one are 14TB the new ones are 20TB each, but both are Raidz2 7 disk pools. File R/W speed appears better on the new server so its working fine from that perspective >800MB/s on my 10GbE network. The problem though is when listing folder properties to see total file count and size its very slow and sometimes even appears to pause/freeze. The other server does just fine and seems fast. The only difference I can think of is the new server has a fresh TrueNAS scale version while the older server with the old drives was an upgrade. I just don't know why file listing and permission performance would be so painfully slow.. Anyone run across an issue like this?
How many files in the folders?

If the metadata is not in ARC, then there are many pool/disk reads to collect the metadata for a directory. The number of disk reads is proportional to number of files.
 

Kuro Houou

Contributor
Joined
Jun 17, 2014
Messages
193
Folders obviously have varying amounts of files, but were talking usually 10's to 100's of thousands of files ranging from folders being 1TB to 10TB. So pretty big folders with lots of files with a wide assortment of sizes of files inside as well. That said the file acl's just finished changing from POSIX to NFS (took awhile, about 30-45 minutes to finish) on this dataset and I already noticed a big improvement. I think the new server is actually a little faster (not a whole lot but just a bit) now which really should be the case due to better hardware/specs in the new server.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Folders obviously have varying amounts of files, but were talking usually 10's to 100's of thousands of files ranging from folders being 1TB to 10TB. So pretty big folders with lots of files with a wide assortment of sizes of files inside as well. That said the file acl's just finished changing from POSIX to NFS (took awhile, about 30-45 minutes to finish) on this dataset and I already noticed a big improvement. I think the new server is actually a little faster (not a whole lot but just a bit) now which really should be the case due to better hardware/specs in the new server.
Additional improvements may be possible (if you need / want more) with:
  • L2ARC metadata only, or L2ARC (non-pool critical - but needs a bit of time and use to get ready)
  • Special vdev (pool critical - but all data needs to be written or re-written to the pool after adding the vdev
Along with appropriate hardware.
 
Joined
Oct 22, 2019
Messages
3,641
See this thread:


Here is the specific post for the solution in the thread.


For reference, I am now happily using the value of 6442450944 (6 GiB) in my tuneable, for my system with 32 GiB of RAM. Your situation may likely require a higher value, since you might be dealing with more files in total.

:smile:

Keep in mind that you'll have to apply this setting differently for a SCALE system.
 
Last edited:

Mugiwaraya

Cadet
Joined
Jan 2, 2017
Messages
9
See this thread:


Here is the specific post for the solution in the thread.


For reference, I am now happily using the value of 6442450944 (6 GiB) in my tuneable, for my system with 32 GiB of RAM. Your situation may likely require a higher value, since you might be dealing with more files in total.

:smile:

Keep in mind that you'll have to apply this setting differently for a SCALE system.
Different for SCALE as in how? I am having problems with slow SMB browsing too on my TrueNAS SCALE 23.0.10.1. Got 6x 14TB in raidz2 with a ryzen 5700x (8core) and 32GB's of RAM and also have an HBA before ppl start asking. What would be the best approach to speed things up? I already ordered 32 more gigs of ram, so I'm gonna get that pretty soon, but is there anything else I can do in the meantime?
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
By default Scale uses much less RAM for ARC. This may well have an impact here. For details the forum search should give you plenty of details.

But I would also question the approach to keep so many files in a single folder. It is known to cause performance issues. In the mid 2000s I was working on some pretty big document archiving systems with millions of files on high-end HP-UX servers (N-class and above). The simple yet effective approach to deal with this was a sharding approach based upon the creation date. So we had a structure like this:
/var/lib/archive/pdf/year/month/day/00001 : for the first x files of the day
/var/lib/archive/pdf/year/month/day/00002 : for the next x files of the day
No idea whether this exact approach fits your use-case. But it is more about the general idea.
 
Joined
Oct 22, 2019
Messages
3,641
Different for SCALE as in how? I am having problems with slow SMB browsing too on my TrueNAS SCALE 23.0.10.1.

It is no longer relevant with the release of OpenZFS 2.2. See here:

The ARC eviction/pressure management has been re-written, and so you technically should see better metadata performance (by default). It also does away with certain tuneables, leaving you with a single universal tuneable that works like a "dial", named arc.meta.balance. The default value is 500. See the linked post for more details.


Check your ZFS version:
Code:
zfs --version


Check how much metadata currently lives in ARC after crawling through your directories:
Code:
arc_summary | grep 'ARC size (current)\|M.U metadata'


Check your metadata balance value (the default is 500):
Code:
arc_summary | grep zfs_arc_meta_balance



However, it may not be solely the metadata issue at fault. As seen above with other SCALE users, you might have to recursively change all ACLs to NFSv4 style.

So this could be an issue with ACLs, or metadata eviction, or a combination of both.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
But I would also question the approach to keep so many files in a single folder.
I have a folder that contains 25,000 individual files, and it loads instantly over SMB and NFS. I can immediately start scrolling through the entire view, and there are no delays to show me the names, timestamps, and filesizes. (Granted, I'm on Core and not using ACLs.) I'm also using a value of 6 GiB for my arc.meta_min value. This one simple change made the biggest difference for my NAS performance. (Once OpenZFS 2.2 lands in Core, I'll see if using the defaults, without any tuneable overrides, still performs well.)

It's not just browsing folders that are affected: anything that "crawls" will be affected. What provoked me to find a solution was that my rsync jobs were atrociously slow. (The transfers were fast, but crawling the entire tree of a dataset to find the differences between source and destination took way too long.)

So even if I "sharded" files among different subdirectories, it wouldn't decrease the total number of files in the dataset.


EDIT: Of course, an underlying problem with SCALE is the ARC in general. But that's a whole different discussion. (iXsystems is working on giving the ARC more breathing room in a future SCALE release, from what I understand.)
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112

Mugiwaraya

Cadet
Joined
Jan 2, 2017
Messages
9
It is no longer relevant with the release of OpenZFS 2.2. See here:

The ARC eviction/pressure management has been re-written, and so you technically should see better metadata performance (by default). It also does away with certain tuneables, leaving you with a single universal tuneable that works like a "dial", named arc.meta.balance. The default value is 500. See the linked post for more details.


Check your ZFS version:
Code:
zfs --version


Check how much metadata currently lives in ARC after crawling through your directories:
Code:
arc_summary | grep 'ARC size (current)\|M.U metadata'


Check your metadata balance value (the default is 500):
Code:
arc_summary | grep zfs_arc_meta_balance



However, it may not be solely the metadata issue at fault. As seen above with other SCALE users, you might have to recursively change all ACLs to NFSv4 style.

So this could be an issue with ACLs, or metadata eviction, or a combination of both.
So I did what you asked and here are the outputs for the different commands, it appears I'm already on the latest ZFS version (because I always keep the version up to date when I get an out-of-date flag). I also have 64GB of ram instead of 32, but I noticed that even after accessing the folders that load slow tens of times with 1-2minutes of loading time; for example "Browser Downloads" folder that contains about 2k files my ZFS used RAM doesn't go higher than 900MB's and the browsing doesn't get any faster like it should, which I find quite weird, as it went up to 20-25GB's before. Here are the outputs of the commands I executed:

Code:
root@freenas[~]# zfs --version
zfs-2.2.0-rc4
zfs-kmod-2.2.0-rc4
root@freenas[~]# arc_summary | grep 'ARC size (current)\|M.U metadata'
ARC size (current):                                     2.0 %  651.8 MiB
        MFU metadata target:                           12.5 %   69.1 MiB
        MFU metadata size:                             15.9 %   88.1 MiB
        MRU metadata target:                           12.5 %   69.1 MiB
        MRU metadata size:                              8.2 %   45.5 MiB
root@freenas[~]# arc_summary | grep zfs_arc_meta_balance
        zfs_arc_meta_balance                                         500
root@freenas[~]#

 
Joined
Oct 22, 2019
Messages
3,641
What happens if you crawl the entire dataset tree and then check those values again?

You can use something like ls -laR /mnt/poolname and then for good measure also run find /mnt/poolname -exec stat "{}" \;

Does your metadata in ARC cap off? Do you notice it grow? Does it change anything about the "snappiness" of browsing a large directory?

If you see no difference, then perhaps you're facing an issue where ACLs over SMB are causing this sluggish performance?

EDIT: I'm assuming your client is Windows. Otherwise, I would test this out on a Linux client, and see if there's any difference between browsing NFS vs browsing SMB.
 

Mugiwaraya

Cadet
Joined
Jan 2, 2017
Messages
9
What happens if you crawl the entire dataset tree and then check those values again?

You can use something like ls -laR /mnt/poolname and then for good measure also run find /mnt/poolname -exec stat "{}" \;

Does your metadata in ARC cap off? Do you notice it grow? Does it change anything about the "snappiness" of browsing a large directory?

If you see no difference, then perhaps you're facing an issue where ACLs over SMB are causing this sluggish performance?

EDIT: I'm assuming your client is Windows. Otherwise, I would test this out on a Linux client, and see if there's any difference between browsing NFS vs browsing SMB.
I just executed the command and it seems to be filling up the ZFS cache, before this command it was stuck at 900mb-1gb and now 5 mins after running this command it's already at 4GB, I'll report back my findings as soon as my ZFS cache is full
 

Mugiwaraya

Cadet
Joined
Jan 2, 2017
Messages
9
What happens if you crawl the entire dataset tree and then check those values again?

You can use something like ls -laR /mnt/poolname and then for good measure also run find /mnt/poolname -exec stat "{}" \;

Does your metadata in ARC cap off? Do you notice it grow? Does it change anything about the "snappiness" of browsing a large directory?

If you see no difference, then perhaps you're facing an issue where ACLs over SMB are causing this sluggish performance?

EDIT: I'm assuming your client is Windows. Otherwise, I would test this out on a Linux client, and see if there's any difference between browsing NFS vs browsing SMB.
So I have tried everything you suggested, the ZFS cache grew to about 38gigs and it unfortunately is still sluggish with regards to browsing performance. When I copy I file over to my PC from the server though, I get throughput of around 500-800mb/s. What is the best next step?
 
Joined
Oct 22, 2019
Messages
3,641
What is the best next step?
Test this out on another client (preferably not Windows). Or switch to NFSv4 ACLs, as noted in the some earlier posts above.

If you see no difference, then perhaps you're facing an issue where ACLs over SMB are causing this sluggish performance?

EDIT: I'm assuming your client is Windows. Otherwise, I would test this out on a Linux client, and see if there's any difference between browsing NFS vs browsing SMB.
 

Mugiwaraya

Cadet
Joined
Jan 2, 2017
Messages
9
Test this out on another client (preferably not Windows). Or switch to NFSv4 ACLs, as noted in the some earlier posts above.
Thanks for your reply. Is there a guide on how to switch to NFSv4 ACL's? I looked around but couldn't seem to find it. Thanks in advance.
 
Top