alexmarkley
Dabbler
- Joined
- Jul 27, 2021
- Messages
- 40
Good morning! To start with, I do not want this thread to turn into some kind of hunt for a quick fix to my performance problems. What I am looking for here is a detailed and technical conversation about the read performance characteristics and design tradeoffs of Samba shares on TrueNAS SCALE.
First, some basic information about my use case and my setup:
Here's what my zpool looks like presently:
I have an NVMe L2ARC in the box, but it should not be necessary to achieve pretty decent sequential read speeds on large files.
It's a Saturday and nobody else is using this system right now, so let's do some systematic testing.
Here's a snapshot of reading a large BRAW file (directly on the TrueNAS box, not over the network) which has not been touched in months and therefore cannot be in the cache:
That works out to a practical read throughput of 7.4 gigabits per second from the disk array, which (in my opinion) is fantastic given it's a bunch of spinning rust on the other end of a couple of SAS-3 cables.
The second read of the same file is predictably much faster. My 128GB of RAM and 1TB L2ARC are doing their job:
So, after slightly warming the cache, the read speed for this file increases to just over 10 gigabits per second. Great.
With regards to caching, we generally only have one or two video projects being "actively" worked on at a given time, so the majority of the data should usually be hanging around in L2ARC while the project is hot.
Moving to the network, I'm using 10gbe interfaces and 10gbe switches. MTUs are set to 9000 everywhere.
Here's what iperf3 looks like between the TrueNAS box (veritas2) and one of my macOS clients (Lapis):
Here's what it looks like between veritas2 and one of my Linux clients (Opal):
On both machines, I'm getting about 1% loss with TCP overhead.
So here's the real question. What does it look like when I read the same file over SMB?
From my Linux client, here's what it looks like:
This is a measured practical read throughput of 3.7 gigabits per second. That's not slow by any means, but it's puzzling from a relative performance perspective. (Again, nobody else is using this system today.)
Put another way, if I can read this file at just over 10 gigabits per second locally, and I can push bits over the network at 9.9 gigabits per second, that means Samba itself is introducing a 63% overhead for this use case.
I'm used to protocol overhead (SCP and the like) degrading performance by 10+% compared to raw network performance. But over 60% loss? When considered from that perspective, it makes me wonder if something is wrong with the client or the server.
Speaking of the client, here is the client configuration for the above test:
From my macOS client (macOS Ventura 13.5.2), things get genuinely sad:
There is no sugar coating this. The performance here is just bad. At a measured throughput of 0.4 gigabits per second, that's an eye-popping 96% loss compared to iperf's measured TCP throughput.
I'm pretty sure this is a performance regression, because I don't remember things ever being this bad before. But I don't know for sure when it started. Over the past couple of months, I have migrated my NAS hardware from TrueNAS Mini to a generic SuperMicro machine and migrated my NAS software from TrueNAS Core to TrueNAS SCALE. There have been so many changes in my environment it is impossible to narrow down this performance regression to a specific change.
Besides which, I realize I don't understand enough about the performance characteristics and tradeoffs for managing a Samba server. I want to be able to reason through the system instead of panicking and throwing configuration changes at the wall to see if they help.
The macOS client configuration (as reported by smbutil):
So what is going on here? What design parameters are impacting these protocol overheads? What tuning parameters are going to be most relevant for my use case?
I've googled around a ton, and mostly what I'm finding is half-baked guidance for "set this parameter" without a detailed explanation of why. And a lot of the guidance out there seems either outdated, irrelevant, or outright dangerous.
My experience with this community has been fantastic so far, so I'm hoping someone here will have the expertise and the time to help me reason through this.
Thanks for reading!
First, some basic information about my use case and my setup:
- My use case is video project editing and rendering, where the primary bottlenecks are large file sequential reads and large file random reads.
- My machine is running TrueNAS-SCALE-22.12.3.2. (I have more details on the current hardware setup in my signature.)
Here's what my zpool looks like presently:
Code:
pool: tank state: ONLINE scan: scrub repaired 0B in 08:40:37 with 0 errors on Fri Aug 25 00:46:28 2023 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 7e4aee97-4bfb-4a88-8cc0-60f89a88c17c ONLINE 0 0 0 88da4991-a66c-4b9e-a90c-60612294dc4a ONLINE 0 0 0 42eb1da6-da35-4abc-bd3d-9ab54f7d6382 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 95b760c4-fb07-4388-89d9-5dd04cd5f098 ONLINE 0 0 0 ba771674-8f61-4412-8506-03ec36817bc3 ONLINE 0 0 0 952e6fd9-948c-4ec0-9819-c8916e66e955 ONLINE 0 0 0 raidz1-2 ONLINE 0 0 0 f9298206-7146-425c-b5e0-6ee9a045c23e ONLINE 0 0 0 cbf4c733-c10f-4e41-9c87-063c03b98fc9 ONLINE 0 0 0 c62bf56c-ab78-4b3b-9b33-19e0ae67f3c6 ONLINE 0 0 0 raidz1-3 ONLINE 0 0 0 b9140a74-2689-4a89-a558-59bc04efba8d ONLINE 0 0 0 38c8e6ab-8da5-4bd2-a67a-476bb8b31a3b ONLINE 0 0 0 52c840f8-46f6-41e7-995d-c2f62789cdaa ONLINE 0 0 0 raidz1-4 ONLINE 0 0 0 ac9511ab-efdb-47db-82de-459d9a0913c8 ONLINE 0 0 0 20bcd52b-8d23-4a82-828b-d3990d937426 ONLINE 0 0 0 41349c9a-059b-4cd3-af40-32758cd26dbd ONLINE 0 0 0 cache 1f1b7e96-394f-45b6-83f9-7b0e0846e475 ONLINE 0 0 0 errors: No known data errors
I have an NVMe L2ARC in the box, but it should not be necessary to achieve pretty decent sequential read speeds on large files.
It's a Saturday and nobody else is using this system right now, so let's do some systematic testing.
Here's a snapshot of reading a large BRAW file (directly on the TrueNAS box, not over the network) which has not been touched in months and therefore cannot be in the cache:
Code:
root@veritas2[...es/202303/sources/session_one_a/bmpcc]# cat 1232_02230614_C008.braw | pv >/dev/null 188GiB 0:03:37 [ 887MiB/s] [ ... ] root@veritas2[...es/202303/sources/session_one_a/bmpcc]#
That works out to a practical read throughput of 7.4 gigabits per second from the disk array, which (in my opinion) is fantastic given it's a bunch of spinning rust on the other end of a couple of SAS-3 cables.
The second read of the same file is predictably much faster. My 128GB of RAM and 1TB L2ARC are doing their job:
Code:
root@veritas2[...es/202303/sources/session_one_a/bmpcc]# cat 1232_02230614_C008.braw | pv >/dev/null 188GiB 0:02:32 [1.23GiB/s] [ ... ] root@veritas2[...es/202303/sources/session_one_a/bmpcc]#
So, after slightly warming the cache, the read speed for this file increases to just over 10 gigabits per second. Great.
With regards to caching, we generally only have one or two video projects being "actively" worked on at a given time, so the majority of the data should usually be hanging around in L2ARC while the project is hot.
Moving to the network, I'm using 10gbe interfaces and 10gbe switches. MTUs are set to 9000 everywhere.
Here's what iperf3 looks like between the TrueNAS box (veritas2) and one of my macOS clients (Lapis):
Code:
Lapis:~ alex$ iperf3 -c veritas2 -f g Connecting to host veritas2, port 5201 [ 7] local 10.77.148.76 port 61569 connected to 10.77.1.50 port 5201 [ ID] Interval Transfer Bitrate [ 7] 0.00-1.00 sec 1.15 GBytes 9.92 Gbits/sec [ 7] 1.00-2.00 sec 1.15 GBytes 9.88 Gbits/sec [ 7] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec [ 7] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec [ 7] 4.00-5.00 sec 1.15 GBytes 9.90 Gbits/sec [ 7] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec [ 7] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec [ 7] 7.00-8.00 sec 1.15 GBytes 9.84 Gbits/sec [ 7] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec [ 7] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 7] 0.00-10.00 sec 11.5 GBytes 9.89 Gbits/sec sender [ 7] 0.00-10.00 sec 11.5 GBytes 9.89 Gbits/sec receiver iperf Done. Lapis:~ alex$
Here's what it looks like between veritas2 and one of my Linux clients (Opal):
Code:
[alex@Opal ~]$ iperf3 -c veritas2 -f g Connecting to host veritas2, port 5201 [ 5] local 10.77.245.62 port 52042 connected to 10.77.1.50 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 1.14 GBytes 9.77 Gbits/sec 28 1.55 MBytes [ 5] 1.00-2.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.59 MBytes [ 5] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.63 MBytes [ 5] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.66 MBytes [ 5] 4.00-5.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.76 MBytes [ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.80 MBytes [ 5] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 63 1.35 MBytes [ 5] 7.00-8.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes [ 5] 8.00-9.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.59 MBytes [ 5] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.62 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 11.5 GBytes 9.89 Gbits/sec 91 sender [ 5] 0.00-10.04 sec 11.5 GBytes 9.85 Gbits/sec receiver iperf Done. [alex@Opal ~]$
On both machines, I'm getting about 1% loss with TCP overhead.
So here's the real question. What does it look like when I read the same file over SMB?
From my Linux client, here's what it looks like:
Code:
[alex@Opal bmpcc]$ cat 1232_02230614_C008.braw | pv >/dev/null 188GiB 0:07:14 [ 444MiB/s] [ ... ] [alex@Opal bmpcc]$
This is a measured practical read throughput of 3.7 gigabits per second. That's not slow by any means, but it's puzzling from a relative performance perspective. (Again, nobody else is using this system today.)
Put another way, if I can read this file at just over 10 gigabits per second locally, and I can push bits over the network at 9.9 gigabits per second, that means Samba itself is introducing a 63% overhead for this use case.
I'm used to protocol overhead (SCP and the like) degrading performance by 10+% compared to raw network performance. But over 60% loss? When considered from that perspective, it makes me wonder if something is wrong with the client or the server.
Speaking of the client, here is the client configuration for the above test:
Code:
//veritas2/videowork on /home/videowork type cifs (rw,nosuid,nodev,relatime,vers=3.1.1,cache=strict,username=alex,uid=7000,noforceuid,gid=7000,noforcegid,addr=10.77.1.50,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1,user=alex)
From my macOS client (macOS Ventura 13.5.2), things get genuinely sad:
Code:
Lapis:bmpcc alex$ cat 1232_02230614_C008.braw | pv >/dev/null 188GiB 1:02:02 [51.8MiB/s] [ ... ] Lapis:bmpcc alex$
There is no sugar coating this. The performance here is just bad. At a measured throughput of 0.4 gigabits per second, that's an eye-popping 96% loss compared to iperf's measured TCP throughput.
I'm pretty sure this is a performance regression, because I don't remember things ever being this bad before. But I don't know for sure when it started. Over the past couple of months, I have migrated my NAS hardware from TrueNAS Mini to a generic SuperMicro machine and migrated my NAS software from TrueNAS Core to TrueNAS SCALE. There have been so many changes in my environment it is impossible to narrow down this performance regression to a specific change.
Besides which, I realize I don't understand enough about the performance characteristics and tradeoffs for managing a Samba server. I want to be able to reason through the system instead of panicking and throwing configuration changes at the wall to see if they help.
The macOS client configuration (as reported by smbutil):
Code:
Lapis:~ alex$ smbutil statshares -a ================================================================================================== SHARE ATTRIBUTE TYPE VALUE ================================================================================================== videowork SERVER_NAME veritas2._smb._tcp.local USER_ID 7000 SMB_NEGOTIATE SMBV_NEG_SMB1_ENABLED SMB_NEGOTIATE SMBV_NEG_SMB2_ENABLED SMB_NEGOTIATE SMBV_NEG_SMB3_ENABLED SMB_VERSION SMB_3.1.1 SMB_ENCRYPT_ALGORITHMS AES_128_CCM_ENABLED SMB_ENCRYPT_ALGORITHMS AES_128_GCM_ENABLED SMB_ENCRYPT_ALGORITHMS AES_256_CCM_ENABLED SMB_ENCRYPT_ALGORITHMS AES_256_GCM_ENABLED SMB_CURR_ENCRYPT_ALGORITHM OFF SMB_SHARE_TYPE DISK SIGNING_SUPPORTED TRUE EXTENDED_SECURITY_SUPPORTED TRUE UNIX_SUPPORT TRUE LARGE_FILE_SUPPORTED TRUE OS_X_SERVER TRUE FILE_IDS_SUPPORTED TRUE DFS_SUPPORTED TRUE FILE_LEASING_SUPPORTED TRUE MULTI_CREDIT_SUPPORTED TRUE -------------------------------------------------------------------------------------------------- Lapis:~ alex$
So what is going on here? What design parameters are impacting these protocol overheads? What tuning parameters are going to be most relevant for my use case?
I've googled around a ton, and mostly what I'm finding is half-baked guidance for "set this parameter" without a detailed explanation of why. And a lot of the guidance out there seems either outdated, irrelevant, or outright dangerous.
My experience with this community has been fantastic so far, so I'm hoping someone here will have the expertise and the time to help me reason through this.
Thanks for reading!