Can't find all-SSD RAID-Z2 small file SMB performance bottleneck - Please help!

Neville005

Dabbler
Joined
Sep 18, 2021
Messages
10
Thanks again for all of your replies! I truly appreciate your kindness and willingness to help me find a solution!

Generally speaking, auxiliary parameters are an unsupported configuration. Things may work, but results are not guaranteed and we will most likely not troubleshoot bugs resulting from using them.

Thanks! I wasn't aware of this and will be more careful when considering the use of auxiliary parameters. Do you happen to know if things like Receive Side Scaling / SMB Multichannel are planned to receive official support in future versions?

To try and (further) rule out some potential hardware bottlenecks, I decided to do the following additional tests:
  • I did a test where I used one of the Samsung 870 QVO SSD's as the boot drive and the Samsung 980 NVME M.2 drive as a single drive VDEV. The Samsung 980 NVME M.2 drive should, at least on paper, outperform a single QVO SSD and uses TLC instead of QLC, which should provide it with an improved sustained performance. If nothing else, if my QVO SSD's are currently a bottleneck, the 980 NVME M.2 drive should at least result in different performance figures in my test scenario. Nevertheless, the results of my test scenario were (within a margin of error) unchanged, leading me to conclude that my drives are not currently a bottleneck.
  • As @Samuel Tai correctly pointed out, SMB is by default indeed limited to a single thread. However, as far as I'm aware, there isn't a limitation on the usage of that single thread. During my test scenario, the average CPU load is hovering around 15% with the highest single thread usage peaking around 45-50%. This is well below the maximum usage of a single thread (which should result in an average CPU load of 25%). When I transfer a single large file (of around 15 GB) to the TrueNAS system while having Receive Side Scaling enabled, CPU loads get pushed to around 50% with highest single thread usage peaking around 90%. As such, the CPU does seem to be able to handle higher (single thread) loads and does not appear to be the bottleneck during my test scenario.

After all of these tests, I'm currently under the impression that I'm not dealing with a hardware bottleneck in my test scenario. The CPU has demonstrated to be capable of higher loads, different VDEV configurations and the use of a completely different drive didn't change performance, doubling my RAM to 16 GB did not help and different (Intel and Realtek) NIC's didn't cause an appreciable performance difference either (with available bandwidth being underused on all of them). I've removed unnecessary hardware components (the RX570 GPU), switched the NIC's to different PCIe slots, tried to use different boot drives and even changed my SATA cables. All without the positive performance improvements I was hoping for. I can't currently think of any other potential hardware bottlenecks that I haven't been able to rule out yet, but remain open to suggestions.

The only potential bottleneck that appear to remain (to me) is the software(/operating system) itself. Perhaps a bottleneck in the SMB protocol, perhaps a bottleneck in TrueNAS itself, perhaps a bottleneck in my software configuration or perhaps a bottleneck in something completely different. My knowledge is currently way to limited to even wager a proper guess. I'm open to any and all suggestions!

During my test scenario, the data transfer rate during uploads and downloads varies greatly (from 100 KB/sec to 250 MB/sec), but the amount of files that get handled each second remains fairly constant (only dipping slightly when working on the few larger files). I'm able to upload an average of 95 files per second, download an average of 63 files per second and delete an average of 220 files per second. As of yet, I haven't been to surpass these figures with different datasets. If anyone else is able to surpass these figures, no matter the hardware or datasets used, please let me know!

I'd also be very interested to hear if anyone might have a good alternative for SMB when working with Windows systems!

Thanks again for all the help!
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Daft question here
You have tried a single SMB connection between PC and NAS and are bouncing off limits there.
What happens if you try multiple clients pulling the same files?

I have a 10Gb setup - and a dataset (on Mirrored HDD) thats full of small files (with a tuned special vdev on it). About 272k files, 83GB
Trying a copy and paste to my desktop (X4 NVMe) with 10Gb as well takes quite a while to start up as it counts the files and folders (Yawn). I didn't time this stage - but it took a while, I wanted to see the sustained copy speed once it started, finally. 20-40 MB/s

If I copy a large file I got 350MB/s whilst a second attempt got 1GB (from ARC) whilst the other copy was going on

If I set off multiple (4) small file copies at the same time again it took windows a hell of a long time to get sorted I got he following
1633199071576.png

It looks to me that you are running into an SMB limitation - either on the server or on the client. Note that I have a 10Core CPU (E5_2660V3) on the NAS and a Ryzen 9 5900X on the client. The NIC wan running at a combined 250-330 Mbps
 

Neville005

Dabbler
Joined
Sep 18, 2021
Messages
10
Daft question here
You have tried a single SMB connection between PC and NAS and are bouncing off limits there.
What happens if you try multiple clients pulling the same files?

Good question! I've now rerun my test scenario from two different Windows 10 client systems at the same time. Each client sees about the same performance as compared to when the test scenario is running from a single system, meaning that the total activity at the TrueNAS system is actually around double. I've also tried running two instances of the test scenario from the same client system, but this caused some weird issues with Windows Explorer, where the (explorer.exe) process would suddenly quit after a few minutes. I don't yet know if that was an isolated incident.

This does show that the TrueNAS system should be able to handle more than it currently does with a single connection. The hardware in my main Windows system shouldn't cause any bottlenecks either as none of the hardware appears to be pushed to its limits during the test scenario. All of this does indeed strengthen the case of a potential bottleneck in the SMB protocol. Furthermore, it raises the question if there is anything I can do to alleviate that SMB bottleneck?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I think you need to increase the number of clients to see how far this scales. But it looks like a performance limit on the client side to me if the NAS can support two clients at the same speed.
Also - how are you copying files? Copy & Paste, RoboCopy, Batch file / Powershell
Do you have a copy of your test script I could mod (for source files as I don't have your data) and test over here. I have multiple virtual (and a few physical) clients available to test with.

Also - what are you trying to achieve - your test whilst showing up a possible issue with SMB (suprise) - are you likely to be copying these files every day?
 

piersdd

Cadet
Joined
Nov 20, 2021
Messages
8
Ok, let me be honest: I'm at a loss here and could really use some help. As such, I greatly appreciate any and all feedback and/or suggestions!

The problem
I'm looking to improve my small file SMB performance, but can't seem to find the current bottleneck. I'm pretty new to TrueNAS and ZFS in general, so I wouldn't be surprised if I'm just missing a basic setting somewhere or maybe just have wrong expectations of my hardware. I've been trying to rule things out, but haven't been able to find my exact bottleneck.

My hardware
MotherboardASUS A320M-K
CPUAMD Ryzen 3 2200G
Memory1x 8GB DDR4-2666 Corsair Vengeance LPX
Boot Drive1x 250GB Samsung 980 (NVME M.2)
Data Drives4x 2TB Samsung 870 QVO (SSD) in a RAID-Z2 configuration
Network Interface ControllerEDUP 2.5GBase-T PCIe Network Adapter with an RTL8125 controller running in a PCIe 2.0 x1 slot
GPUSapphire Pulse Radeon RX570 8GB
PSUCorsair CX450M (450W)
Operating SystemTrueNAS Core 12.0 U5.1

I realize that this isn't exactly server grade hardware. Most of the hardware is stuff that I already had laying around and it seemed to be enough for an initial test. I would like to upgrade to a Supermicro A2SDi-H-TF motherboard (with the Intel Atom C3758 processor) and 16GB of ECC memory in the coming weeks, but I'd first like to make sure that such a move would actually help my performance. I'm also aware that the Samsung 870 QVO SSD's are somewhat controversial (mainly due to their QLC memory). However, they should be sufficient for my particular workload. I expect a maximum utilization of 10-20 TBW per drive per year, so they should long outlast their warranty period and performance should be decent enough with their TurboWrite-cache. I'll be using a separate system for back-up purposes.

My test scenario
To test the performance of my TrueNAS system, I'm using one of my previous projects. The project consists out of a total of 68,249 files, which are spread over 1,342 folders and have combined size of 11.5 GB. Most of the files are extremely small, with just a few files responsible for most of the combined size. I've chosen this project as it's a great reflection of my average use case for this system. I'm measuring the performance using the following three metrics:
  1. Upload - The time it takes to copy the project from my main system to the TrueNAS system
  2. Download - The time it takes to copy the project back from the TrueNAS system to my main system
  3. Delete - The amount of files per second that can be deleted from my TrueNAS system

My main system consists of the following (relevant) hardware:
MotherboardASUS ROG Strix B250F Gaming
CPUIntel i7-7700
Memory2x 8GB DDR4-2400 Corsair Vengeance LPX
Boot Drive1x 500GB Samsung 960 EVO (NVME M.2)
Additional Drives1x 4TB Samsung 870 EVO (SSD)
1x 2TB WD Blue (HDD)
Network Interface ControllerASUS XG-C100C (with one 10GBase-T connection)
Operating SystemWindows 10 Pro 21H1

All copies in the test scenario are done to or from the M.2 boot drive of my main system. My main system and TrueNAS system are connected via a QNAP QSW-1105-5T switch and Cat 6a cabling (with a length of max 20 meters).

My hope/expectations
As a point of refence, I ran my test scenario against the built-in (4TB Samsung 960 EVO) SSD in my main system. This resulted in the following performance figures:
Upload (write)6 minutes (average of 32 MB/sec)
Download (read)5.5 minutes (average of 35 MB/sec)
Delete1200 files per second

My hope would be to get within reach of these performance figures with my TrueNAS system. I'm fully aware that small files have always been somewhat of a sore point for SMB file transfers, so I'm not expecting to exceed the local performance figures and/or to get anywhere close to the maximum speed of my 2.5Gbps NIC.

My current performance figures
My test scenario currently results in the following figures when using the onboard 1 gigabit connection of my TrueNAS motherboard:
Upload (write)14 minutes (average of 14 MB/sec)
Download (read)20 minutes (average of 10 MB/sec)
Delete200 files per second

When I use the 2.5 gigabit connection of my PCIe Network Adapter, I get the following figures:
Upload (write)12 minutes (average of 16 MB/sec)
Download (read)18 minutes (average of 11 MB/sec)
Delete220 files per second

Without the few large files in the sample project, the average upload and download speeds would probably be closer to between 2 and 5 MB/sec.

As a reference, I also ran the test scenario over a gigabit connection against a WD My Cloud EX2 Ultra NAS with 2x 8TB WD Red HDD's (which is my previous setup) and this resulted in the following figures:
Upload (write)29 minutes (average of 7 MB/sec)
Download (read)29 minutes (average of 7 MB/sec)
Delete50 files per second

My current performance is definitely better than my previous setup was, but it's still not as close to the performance of a local drive as I'd like it to be.

Potential hardware bottlenecks
I've currently tried to look at the following potential hardware bottlenecks:
  • Network Interface Controller: Switching from a 1 gigabit connection to a 2.5 gigabit connection resulted in a mere 10% performance increase (mainly due to the few large files). I did a separate test with a single large file to confirm that my maximum transfer speed is indeed around 270 MB/sec, so the 2.5 gigabit connection appears to be working properly.
  • SSD's: While I don't have any replacement SSD's for the pool, I did instead try to run the pool as a striped VDEV with all four SSD's. This did not lead to any appreciable difference in the performance figures. With the current RAID-Z2 pool, performance should theoretically be limited to the IOPS of a single SSD, with other performance figures actually performing better due to the number of drives. These theoretical RAID-Z2 performance figures are nowhere close to being met.
  • Memory: According to the TrueNAS dashboard, most of the memory is either free or being used as ZFS cache. While 8GB may not be a lot, it appears to be sufficient for this rather small pool (with a total usable storage capacity of around 3.5TB).
  • CPU: I do not have any drop in replacements for the CPU of my TrueNAS system or for the one of my main system. However, my TrueNAS system averages around 15% load during uploads and downloads, which is less then the 25% load I would expect in case of a continuous max load on a single thread. My main system averages around an 8% load for Windows Explorer, which is also less the potential 12.5% average in case of a max load of a single thread. Both systems should theoretically be able to use Receive Side Scaling (RSS) in order to utilize multiple threads, but I haven't seen any indication of it being used on this single connection. The workload is already divided between multiple threads (even with the onboard 1 gigabit connection), but the CPU load doesn't rise above single thread levels. I'm not sure if the "re" driver for the RTL8125 chip actually supports RSS for TrueNAS, but the chip itself is capable of using this feature.

Potential software bottlenecks
I'm currently using just three minor tweaks to the default TrueNAS settings:
  • I've turned off the compression of my SMB dataset. This seemed to result in a rather small performance improvement (less than 3%).
  • I've disabled the sync option of SMB dataset. This seemed to result in a rather small performance improvement (less than 3%).
  • I've added the "server multi channel support = yes" auxiliary parameter to my SMB service config. This did not seem to impact performance.
I've tried a bunch of other settings from different forum posts and tutorial, but none resulted in a clear performance improvement. I'm a bit of a TrueNAS and SMB(/Samba) noob, so I'm not really sure where it is that I would need to look for any true improvements.

Please help
As you can see, I've been unable to locate the bottleneck of my current setup. My current performance looks to be somewhat similar to what one might expect out of an HDD. It's as if the TurboWrite-cache of the SSD's isn't used at all, even though this feature appears to be enabled on all four SSD's when doing a check from the CLI.

As I mentioned at the beginning of my post: I'm at a complete loss here and don't know where to look anymore. Your help is greatly appreciated!
My experience with SMB performance tuning, albeit slightly a different client/server implementation (Debian/Proxmox SAMBA to MacOS), had similar results, untill;..

I turned OFF SMB signing, which is some sort of modern SMB cryptographic security feature.

Performance went from similarly abysmal, to approx 80-90% wirespeed (for certain large files).

Good luck. I wish you the best, as Im personally now all in on TrueNAS SCALE.

love it
 

Neville005

Dabbler
Joined
Sep 18, 2021
Messages
10
I turned OFF SMB signing, which is some sort of modern SMB cryptographic security feature.

Thanks for the tip! That sure sounds like a plausible contributor to the performance issues that I'm experiencing. However, from what I've been able to find online, it's no longer possible to disable SMB signing for SMB 2 and SMB 3 connections. Within Windows, it should be possible (according to this article) not to require SMB signing, but this only results in unsigned SMB messages when the server does not require it either. As of yet, I've been unable to figure out if it's possible to configure TrueNAS in a way that doesn't require SMB signing. Any further tips are therefore, as always, appreciated!
 
Top