Understanding Samba Read Performance Characteristics on TrueNAS SCALE

Dysonco

Dabbler
Joined
Jul 4, 2023
Messages
27
I haven't had a chance to test thouroghly, but having stripped ACL from the dataset and just added @Everyone so I can acces it's not better. Write are potentially slightly quicker but still nowhere near the performance I had in Core.

1705511274985.png


This is just stripping ACL from the dataset, not sure I can do that with my ZFS1 pool that contains aforementioned dataset without recreating?

Got to head out shortly, so not got much time. Will look again properly tomorrow.
 

rymandle05

Cadet
Joined
Jan 16, 2024
Messages
8
I was able to test using my Windows Gaming PC. It's hooked into the same ethernet switch as my M2 Mac mini with a 1G connection. The tl;dr is Windows 11 did not have the same performance hit as MacOS. Here's my testing.
  1. Set all TrueNAS SMB datasets back to use ACL Type: SMB/NFSv4 and ACL Mode: Restricted.
    1705539598188.png
  2. Restarted SMB via sudo systemctl restart smbd
  3. Started a transfer of a large mkv file and observed SMB was transferring between 50-70 MB/s on MacOS
    1705539303475.png

  4. Switched over to my Windows 11 computer and connected to the same SMB share
  5. Ran CrystalDiskMark on that same share and observed saturated link with the SEQ1M read and write tests
    1705540239339.png
  6. I changed the ACL Mode back to discard on a single SMB dataset
  7. Restarted SMB again
  8. Re-ran my Mac test to find full 1G link speeds again at 117MB/s
    1705540485616.png
 
Last edited:

alexmarkley

Dabbler
Joined
Jul 27, 2021
Messages
40
Just like what was indicated, the problem seems to stem from SMB datasets setup with ACL Type: SMB/NFSv4 but also ACL Mode: Restricted. If all SMB shares are setup this way then I see reduced SMB speeds transferring files form my M2 Mac mini. However, if I change the ACL Mode to anything else (i.e. discard) on one SMB dataset than all SMB shares are able to once again saturate the 1G connect with ~117MB/s. A key point is to make sure to restart smbd after doing this. The slow down doesn't seem to occur until that restart.

@rymandle05 This is really interesting! I'm happy to test on my side to see if any ACL-related settings affect my issue. (I haven't been fiddling with ACLs much, since the standard unix/chmod bits have been sufficient for my use case.)

Just so there's no confusion, can you please post some screenshots showing the ACL config you're referring to? I don't want to mix anything up. I don't see anything in Share ACL settings that seems to fit your description. I'm probably just looking in the wrong place.

@alexmarkley Are you able to test with a non-MacOS client, just to see if the issue can be isolated further?

I can easily re-run my tests against a Linux cifs client. I might also be able to get access to a Windows 10 client too.

I'd like to see if I can match and/or reproduce @rymandle05's ACL scenario to confirm if I'm seeing the same behavior as he is, then I'll rerun the same full test suite with all the clients I can get my hands on.
 

alexmarkley

Dabbler
Joined
Jul 27, 2021
Messages
40
been wanting to try out Scale anyway so I could at least see if I get a significant difference from CORE
In my case, the performance was fine on CORE and very bad on SCALE. I'm also very focused on the performance of large sequential reads, whereas it sounds like you're dealing with poor write performance? I'm guessing this might be a separate issue...?

I was getting full 10Gb wire speed both read and write in Core, yet less than a quarter of that in Scale.
Have you used something like iperf3 to confirm that you're able to get full 10gbps throughput between your client and your SCALE box? It might be worth double-checking the performance of each component of your system (network, storage, L2ARC) before jumping right to SMB.

Also, when it comes to the issue I've been facing, I don't know (yet) if Windows client performance is impacted. I'll have to see if I can get a Windows client on my network so I can test that...
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608

ryanmandle05​

But isn't the first test reading from disk, and tests 2+ reading from the arc most likely? At least if using the same file, just wanting to make sure you are not doing so.
 

rymandle05

Cadet
Joined
Jan 16, 2024
Messages
8
@rymandle05 This is really interesting! I'm happy to test on my side to see if any ACL-related settings affect my issue. (I haven't been fiddling with ACLs much, since the standard unix/chmod bits have been sufficient for my use case.)

Just so there's no confusion, can you please post some screenshots showing the ACL config you're referring to? I don't want to mix anything up. I don't see anything in Share ACL settings that seems to fit your description. I'm probably just looking in the wrong place.



I can easily re-run my tests against a Linux cifs client. I might also be able to get access to a Windows 10 client too.

I'd like to see if I can match and/or reproduce @rymandle05's ACL scenario to confirm if I'm seeing the same behavior as he is, then I'll rerun the same full test suite with all the clients I can get my hands on.

Here's my dataset structure:
1705578667985.png


To start with, all the datasets shared via SMB (Media, Private, Public, TimeMachine) were setup with same ACL Type and ACL Mode (which I did based on my TrueNAS Core setup). All other settings are also the same except Record Size. Here's a screenshot of Vault15/Media/Movies for reference:
1705578893069.png
1705579054584.png


It's here on the dataset level that I'm changing ACL Mode of one dataset (usually Vault 15/Public) to a different value such as discard, restarting smbd, and then seeing immediate performance improvements on my 1Gbe connection copying a large file from my M2 Mac to the SMB share.
 

rymandle05

Cadet
Joined
Jan 16, 2024
Messages
8

ryanmandle05​

But isn't the first test reading from disk, and tests 2+ reading from the arc most likely? At least if using the same file, just wanting to make sure you are not doing so.
So, on my Mac test, I am using the same file but I'm making a new copy of it (i.e. filename.mkv, SameFile-copy2.mkv, SameFile-copy3.mkv, etc.) I was thinking the ZFS Arc wouldn't be too much of a factor with write speeds which is what I've been focusing on. I also have not seen any improvements in the second or third copy of the same file with MacOS until I make the changes documented and restart SMB.

If you think there's a variable here I need to isolate, I'm open to suggestions. Would setting Sync to Always for the dataset help rule out Arc Cache? If needed, I can grab some new iso's to do unique file transfers.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
So, on my Mac test, I am using the same file but I'm making a new copy of it (i.e. filename.mkv, SameFile-copy2.mkv, SameFile-copy3.mkv, etc.) I was thinking the ZFS Arc wouldn't be too much of a factor with write speeds which is what I've been focusing on. I also have not seen any improvements in the second or third copy of the same file with MacOS until I make the changes documented and restart SMB.

If you think there's a variable here I need to isolate, I'm open to suggestions. Would setting Sync to Always for the dataset help rule out Arc Cache? If needed, I can grab some new iso's to do unique file transfers.
If the read performance drops again after switching your aclmode back to Restricted and restarting smbd then it's definitely independent of ARC.

Just to double-confirm - you're copying files to/from a completely different dataset than the one you're switching the aclmode on, but still seeing a performance change?
 

rymandle05

Cadet
Joined
Jan 16, 2024
Messages
8
If the read performance drops again after switching your aclmode back to Restricted and restarting smbd then it's definitely independent of ARC.

Just to double-confirm - you're copying files to/from a completely different dataset than the one you're switching the aclmode on, but still seeing a performance change?
Yep - that's correct. In my case, I'm transferring files to Vault15/Media/Movies, but I made the change on Vault15/Public. Changing the ACL Type on dataset shared via SMB seems to be allowing 1G line speed reads and writes. Changing Vault15/Public back to being the same as all the other datasets with ACL Type: Restricted then reintroduces the slower writes from MacOS.

Just a hunch but I suspect that the ACL Type might be a red herring due to the situation I just described. I think it's more likely that the change in the dataset is changing/adjusting something else that's actually providing the benefit.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
FYI, regarding setting the ACL Type to discard, this shouldn't be set on a filesystem where you want to preserve ACLs on any files.

Code:
           discard      default, deletes all ACLs except for those
                        representing the mode of the file or directory
                        requested by chmod(2).

So if you set the acltype to discard and then any files get written/touched/updated (including by a system process) you'll likely lose any existing ACLs on them. Best to keep this setting isolated to a test dataset.
 

rymandle05

Cadet
Joined
Jan 16, 2024
Messages
8
Understood! Since this is a new build, I haven't put anything in the Public folder yet. Also, it's not really public. I intend to use it as an access for all share within my home lab where I can throw stuff to move between machines.
 

alexmarkley

Dabbler
Joined
Jul 27, 2021
Messages
40
I know it's been a few days, but I'm just now getting a chance to test this scenario.

At present, my top-level pool settings look like this:

1705942344287.png


From there, all of my individual datasets have ACL Type and ACL Mode set to Inherit. I'm guessing these are the defaults, because I did not adjust them manually.

This leads to a question: @rymandle05 did you ever test with ACL Mode set to Passthrough? If your throughput issues are only happening when ACL Mode is set to Restricted, and the throughput is normal otherwise, we might be looking at two different performance-impacting issues.

Because adjusting the ACL settings on the dataset level could be a destructive operation, I set up my testing like this:
  • I removed all of my SMB shares from the TrueNAS configuration.
  • I created a new dataset called videotest.
  • I copied about 200 GB of data from the videowork dataset to the videotest dataset.
  • I (locally) read the file via cat foo | pv >/dev/null a couple times until I was satisfied that my L2ARC was populated. (Finished in 2m 51s, throughput 1.15GiB/s.)
  • I set the ACL on videotest to Type: SMB/NFSv4 and Mode: Discard.
  • I created the new SMB share and proceeded with testing.
Here are the results of my tests. After each change to any parameter on the server-side, I performed an SMB server restart via sudo systemctl restart smbd.

Purpose: Default share parameters; ACL Type: SMB/NFSv4; Test: Sequential read of 196GiB file
SMB ClientACL Mode DiscardACL Mode PassthroughACL Mode Restricted
macOS 14.2.1Finished 4m 54s (685MiB/s)Finished 5m 15s (638Mib/s)Finished 4m 13s (795MiB/s)
LinuxFinished 3m 3s (1.07GiB/s)Finished 3m 1s (1.08GiB/s)Finished 3m 1s (1.08GiB/s)

Interestingly, ACL Mode Restricted actually performed a bit better on macOS. To double-check this, I went back and re-ran the performance tests on macOS for ACL Mode Discard and ACL Mode Passthrough a second time. In both cases, when I re-ran the test, I picked the best (fastest) time to go into the table. (Just in case cache warming was affecting the results.)

For my next set of tests, I re-created the SMB share again, but this time with Purpose set to Multi-protocol (NFSv4/SMB) shares. (Again, restarting smbd after each configuration change.)

Purpose: Multi-protocol (NFSv4/SMB) shares; ACL Type: SMB/NFSv4; Test: Sequential read of same 196GiB file
SMB ClientACL Mode DiscardACL Mode PassthroughACL Mode Restricted
macOS 14.2.1Finished 1h 2m 46s (53.6MiB/s)Finished 1h 3m 38s (52.8MiB/s)Finished 1h 4m 26s (52.2MiB/s)
LinuxFinished 8m 7s (414MiB/s)Finished 8m 6s (414MiB/s)Finished 7m 59s (420MiB/s)

I'm not sure I see a strong correlation between ACL settings and the SMB share throughput on my system. I'm guessing there's some other difference in the hardware or software configuration that is affecting the issue.

I am shocked by the Linux client performance with Default share parameters. It's so fast I think it's actually limited by the read throughput of the array. This also means Linux read performance has also been impacted by Multi-protocol (NFSv4/SMB) shares, albeit to a much lesser extent.

I'm still not sure why this issue exists, but I'm satisfied that I know how to work around it.

(Last note: I was able to dig up an old Windows 10 machine and I tried to run tests using it as a client. However, I couldn't figure out how to get it connected to the SMB share. (Embarrassing!) So I gave up.)
 

Volts

Patron
Joined
May 3, 2021
Messages
210
There are a number of differences in the Samba configs between the SMB "Default share parameters" vs. "Multi-protocol (NFSv3/SMB) shares".

# testparm -s
...
[testshare-default]

ea support = No
kernel share modes = No
path = /mnt/zpool1/testshare-default
posix locking = No
read only = No
smbd max xattr size = 2097152
vfs objects = fruit streams_xattr shadow_copy_zfs ixnas zfs_core aio_fbsd
fruit:resource = stream
fruit:metadata = stream
nfs4:chown = true
ixnas:dosattrib_xattr = false

[testshare-multi]
ea support = No
level2 oplocks = No
oplocks = No
path = /mnt/zpool1/testshare-multi
read only = No
strict locking = Yes
vfs objects = fruit streams_xattr shadow_copy_zfs noacl zfs_core aio_fbsd
fruit:resource = stream
fruit:metadata = stream
nfs4:chown = true
ixnas:dosattrib_xattr = false
 

Dysonco

Dabbler
Joined
Jul 4, 2023
Messages
27
Have you used something like iperf3 to confirm that you're able to get full 10gbps throughput between your client and your SCALE box? It might be worth double-checking the performance of each component of your system (network, storage, L2ARC) before jumping right to SMB.

Also, when it comes to the issue I've been facing, I don't know (yet) if Windows client performance is impacted. I'll have to see if I can get a Windows client on my network so I can test that...

As i mentioned, my hardware and configuration were identical between my core and scale installs. The ONLY difference was the OS. I easily managed full 10Gb transfers on core but was getting a quarter of that with scale.

I did check iPerf etc and all was consistent with previous install.

It wasn't until i upgraded to SCALE-23.10.2 that it suddenly all started working correctly.

Hopefully you'll see the same improvement.
 

bertomil

Cadet
Joined
Feb 22, 2024
Messages
4
Hello. Also having terrible performance and stability issues with the last SCALE releases. The overall performance is max 4Gbit/s and after some time it rapidly drops to nonsense speeds..

Screenshot 2024-02-25 at 19.17.32.png
 

Dysonco

Dabbler
Joined
Jul 4, 2023
Messages
27
Hello. Also having terrible performance and stability issues with the last SCALE releases. The overall performance is max 4Gbit/s and after some time it rapidly drops to nonsense speeds..

View attachment 76050
Hi Bertomil,
What version are you on? As mentioned above I tried everything to get mine fixed but couldn't work out what was going on.

However the most recent scale update (23.10.2) fixed both my transfer speed issues (now back up to over 1100MBs read and write which saturates the 10Gbe link) and a very annoying issue with cloud synch where with dropbox remote folders, it didn't give you the option to expand sub folders.

So appears to have been a samba bug in my case as nothing else has changed.
 

bertomil

Cadet
Joined
Feb 22, 2024
Messages
4
Hi Bertomil,
What version are you on? As mentioned above I tried everything to get mine fixed but couldn't work out what was going on.

However the most recent scale update (23.10.2) fixed both my transfer speed issues (now back up to over 1100MBs read and write which saturates the 10Gbe link) and a very annoying issue with cloud synch where with dropbox remote folders, it didn't give you the option to expand sub folders.

So appears to have been a samba bug in my case as nothing else has changed.

I have TrueNAS-SCALE-23.10.2 and our transfer speeds are terrible.. I have no clue what to do next. Maybe switch to Unraid. I don't know. So frurtrating.

Screenshot 2024-02-29 at 18.05.15.png
 

Dysonco

Dabbler
Joined
Jul 4, 2023
Messages
27
Hm... Weird especially the slowdown, that almost seems like it's running out of memory or cache somewhere? How much RAM do you have? You need quite a lot of (Preferably) ECC RAM for ZFS depending on how much storage you have.

Maybe share more detais on what your hardware and setup is. It could easily be something else causing the slowdown. For instance some LSI HBA require specific firmware for certain SSDs, I think this was due to disconnects in the pool so not really relevant in your case, but could be something else happening? Maybe try spinning up a live linux image of some descruption and trying smb speed with that? At least that would give you an idea whether it's Truenas or something else going on. If its the Dell machine as in your sig, have you cheked the HBA is definitely running in IT mode? Might need a firmware flash.

Random question, do you actually need Truenas Scale? Your could try Truenas Core instead? Depends if you need VMs or lots of apps. If you just want a great performing NAS then Core is probably a good option and more stable than Scale.
 
Top