22.02 RELEASE Performance Regressions

dirtyfreebooter

Explorer
Joined
Oct 3, 2020
Messages
72
Seeing performance regression with this mornings release. Most in reads, but definitely seen in all workloads in this basic CrystalDisk test. I know its a loaded and not super scientific test, but the results are very repeatable and good enough for a quick glance. 22.02-RC2 on left, 22.02-RELEASE on right.

TrueNAS-SCALE-22.02-RELEASE-performance-issues.png
 

dirtyfreebooter

Explorer
Joined
Oct 3, 2020
Messages
72
yea, same conditions, basically idle server, Xeon E-2278G cpu (turbo boost of 5.0Ghz) and 128GB of memory, 10gbe networking, all on an otherwise idle homelab. CrystalDisk will create a new file that is re-used for the 5 iterations, so ARC is basically used 100% of the time, other than the initial file creation which is not part of the test. Also this is a small 256MiB file. I verified via zpool iostat that indeed while the test is going, there is ZERO disk activity, this is basically all from ARC, so really helps identify kernel/smbd type issues, not ZFS.

I see similar results with 1 GiB test file.

Sure, I can open a ticket.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
yea, same conditions, basically idle server, Xeon E-2278G cpu (turbo boost of 5.0Ghz) and 128GB of memory, 10gbe networking, all on an otherwise idle homelab. CrystalDisk will create a new file that is re-used for the 5 iterations, so ARC is basically used 100% of the time, other than the initial file creation which is not part of the test. Also this is a small 256MiB file. I verified via zpool iostat that indeed while the test is going, there is ZERO disk activity, this is basically all from ARC, so really helps identify kernel/smbd type issues, not ZFS.

I see similar results with 1 GiB test file.

Sure, I can open a ticket.
Can you confirm.. this is with NFS?
 

dirtyfreebooter

Explorer
Joined
Oct 3, 2020
Messages
72
No, this CIFS/Samba to Windows 10 client, SFP+ 10 gbe over UniFi XG-16 switch. Watching top while testing, i don't see smbd process go over 55% cpu usage.
 

dirtyfreebooter

Explorer
Joined
Oct 3, 2020
Messages
72
also, this is on a storage network (separate network / vlan) with jumbo frames enabled on both truenas and client. and again, this is a before and after test, so everything is the same, except the upgrade this morning
 

LawrenceSystems

Dabbler
Joined
Jun 14, 2017
Messages
14
I went from TrueNAS Core 12.U8 to TrueNAS Scale Release and this was the performance difference testing with Windows a 10 system connected at 10GB via iSCSI. The system is a TrueNAS-MINI-3.0-X+ and I did the update install over Core which went smooth. I have since reverted back and the performance is back. I have a few other systems I will be loading it on for testing and following this post.


1645559228849.png

1645559234448.png
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Samba was bumped from 4.15.2 to 4.15.5, but I don't recall any upstream changes that would account for performance change. Maybe test local ZFS perf and see if it's in-line with previous version. If you see any errors in /var/log/samba4/log.smbd send me a debug via PM.
 

simonaaker

Cadet
Joined
Sep 4, 2020
Messages
3
i got some of the same speed issue over nfs after upgrading also.

Speed on Safe pool local on server
root@server[~]# dd if=/dev/zero of=/mnt/Safe/Temp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.54673 s, 694 MB/s


But from my VM ( tested with VirtiO and e1000) both gave same result. Before upgrade it was 300+MB/s
/1 is mounted to same dir as the local test dir

Speed from Safe on VM
root@radarr:~# dd if=/dev/zero of=/1/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 34.2198 s, 31.4 MB/s
 

dirtyfreebooter

Explorer
Joined
Oct 3, 2020
Messages
72
Samba was bumped from 4.15.2 to 4.15.5, but I don't recall any upstream changes that would account for performance change. Maybe test local ZFS perf and see if it's in-line with previous version. If you see any errors in /var/log/samba4/log.smbd send me a debug via PM.
I am seeing a little bit of log spam from samba. Doesn't seem like it would be an issue tho.
Code:
[2022/02/22 13:55:27.517807,  1] ../../lib/param/loadparm.c:1766(lpcfg_do_global_parameter)
  lpcfg_do_global_parameter: WARNING: The "syslog only" option is deprecated
[2022/02/22 13:55:27.519760,  1] ../../source3/smbd/service.c:355(create_connection_session_info)
  create_connection_session_info: guest user (from session setup) not permitted to access this share (anubis1a)
[2022/02/22 13:55:27.519803,  1] ../../source3/smbd/service.c:545(make_connection_snum)
  create_connection_session_info failed: NT_STATUS_ACCESS_DENIED
[2022/02/22 14:11:42.595694,  1] ../../lib/param/loadparm.c:1766(lpcfg_do_global_parameter)
  lpcfg_do_global_parameter: WARNING: The "syslog only" option is deprecated
[2022/02/22 14:11:42.601506,  1] ../../source3/smbd/service.c:355(create_connection_session_info)
  create_connection_session_info: guest user (from session setup) not permitted to access this share (scopuli)
[2022/02/22 14:11:42.601649,  1] ../../source3/smbd/service.c:545(make_connection_snum)
  create_connection_session_info failed: NT_STATUS_ACCESS_DENIED
[2022/02/22 14:11:46.017274,  1] ../../lib/param/loadparm.c:1766(lpcfg_do_global_parameter)
  lpcfg_do_global_parameter: WARNING: The "syslog only" option is deprecated
[2022/02/22 14:11:46.018642,  1] ../../source3/smbd/service.c:355(create_connection_session_info)
  create_connection_session_info: guest user (from session setup) not permitted to access this share (scopuli)
[2022/02/22 14:11:46.018670,  1] ../../source3/smbd/service.c:545(make_connection_snum)
  create_connection_session_info failed: NT_STATUS_ACCESS_DENIED
[2022/02/22 14:11:47.204078,  1] ../../lib/param/loadparm.c:1766(lpcfg_do_global_parameter)
  lpcfg_do_global_parameter: WARNING: The "syslog only" option is deprecated
[2022/02/22 14:11:47.209471,  1] ../../source3/smbd/service.c:355(create_connection_session_info)
  create_connection_session_info: guest user (from session setup) not permitted to access this share (scopuli)
[2022/02/22 14:11:47.209658,  1] ../../source3/smbd/service.c:545(make_connection_snum)
  create_connection_session_info failed: NT_STATUS_ACCESS_DENIED
[2022/02/22 14:11:47.213269,  1] ../../lib/param/loadparm.c:1766(lpcfg_do_global_parameter)
  lpcfg_do_global_parameter: WARNING: The "syslog only" option is deprecated
[2022/02/22 14:11:47.218555,  1] ../../source3/smbd/service.c:355(create_connection_session_info)
  create_connection_session_info: guest user (from session setup) not permitted to access this share (scopuli)
[2022/02/22 14:11:47.218683,  1] ../../source3/smbd/service.c:545(make_connection_snum)
  create_connection_session_info failed: NT_STATUS_ACCESS_DENIED


Client is connected to 3 CIFS shares, all with the same username/password. So not sure what is trying to access using the "guest user"
This is spam is basically non-stop since I upgraded this morning. I don't really think its related to the performance, as it seems like others are having similar issues with NFS. Possible kernel / IO regression? Did the kernel get updated between RC2 and RELEASE?
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
i got some of the same speed issue over nfs after upgrading also.

Speed on Safe pool local on server
root@server[~]# dd if=/dev/zero of=/mnt/Safe/Temp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.54673 s, 694 MB/s


But from my VM ( tested with VirtiO and e1000) both gave same result. Before upgrade it was 300+MB/s
/1 is mounted to same dir as the local test dir

Speed from Safe on VM
root@radarr:~# dd if=/dev/zero of=/1/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 34.2198 s, 31.4 MB/s
Try increasing count of nfs servers:
midclt call nfs.update '{"servers": 16}'
is probably a good place to start.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Client is connected to 3 CIFS shares, all with the same username/password. So not sure what is trying to access using the "guest user"
This is spam is basically non-stop since I upgraded this morning.
Maybe client is thinking it's a different server. Check output of midclt call smb.status AUTH_LOG | jq.
 

janos66

Dabbler
Joined
Feb 18, 2022
Messages
21
Hi,
unfortunately I did not check the performance directly before switching to Scale, but since my workloads are mainly about seq read/write, I still have in mind that I had about 2200 read and 3000MB/s write and at iperf it was about 33Gbits/sec.

The CrystalDiskMark is made in a Windows 10VM on AMD Threadripper 2950x with Manjaro and iperf is directly Manjaro to Truenas Scale, hence the results are not comparable to your native tests.
Truenas Scale is runing on a Xeon 2660v3 with Mellanox Connect-3 pro nic.

------------------------------------------------------------------------------
CrystalDiskMark 8.0.4 x64 (C) 2007-2021 hiyohiyo
Crystal Dew World: https://crystalmark.info/
------------------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

[Read]
SEQ 1MiB (Q= 8, T= 1): 1291.865 MB/s [ 1232.0 IOPS] < 6485.29 us>
RND 4KiB (Q= 32, T= 1): 138.323 MB/s [ 33770.3 IOPS] < 944.66 us>

[Write]
SEQ 1MiB (Q= 8, T= 1): 2682.377 MB/s [ 2558.1 IOPS] < 3119.76 us>
RND 4KiB (Q= 32, T= 1): 129.912 MB/s [ 31716.8 IOPS] < 1003.45 us>

[Mix] Read 70%/Write 30%
SEQ 1MiB (Q= 8, T= 1): 2056.361 MB/s [ 1961.1 IOPS] < 4070.18 us>
RND 4KiB (Q= 32, T= 1): 136.835 MB/s [ 33407.0 IOPS] < 956.01 us>

Profile: Peak
Test: 256 MiB (x5) [E: 0% (0/1500GiB)]
Mode: [Admin]
Time: Measure 5 sec / Interval 5 sec
Date: 2022/02/24 8:42:35
OS: Windows 10 Professional [10.0 Build 18363] (x64)
-----------------------------------------------------------------


root@truenas[~]# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.0.90.55, port 48400
[ 5] local 10.0.90.6 port 5201 connected to 10.0.90.55 port 48402
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 3.09 GBytes 26.6 Gbits/sec
[ 5] 1.00-2.00 sec 3.25 GBytes 27.9 Gbits/sec
[ 5] 2.00-3.00 sec 3.24 GBytes 27.8 Gbits/sec
[ 5] 3.00-4.00 sec 3.25 GBytes 27.9 Gbits/sec
[ 5] 4.00-5.00 sec 2.51 GBytes 21.6 Gbits/sec
[ 5] 5.00-6.00 sec 2.41 GBytes 20.7 Gbits/sec
[ 5] 6.00-7.00 sec 2.42 GBytes 20.8 Gbits/sec
[ 5] 7.00-8.00 sec 2.40 GBytes 20.6 Gbits/sec
[ 5] 8.00-9.00 sec 2.42 GBytes 20.8 Gbits/sec
[ 5] 9.00-10.00 sec 3.03 GBytes 26.0 Gbits/sec
[ 5] 10.00-10.03 sec 86.9 MBytes 26.6 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.03 sec 28.1 GBytes 24.1 Gbits/sec receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
 
Last edited:

simonaaker

Cadet
Joined
Sep 4, 2020
Messages
3
I did disabled Sync on the pool and it solved my issue with crapy speed on nfs and copy between diffrent pools localy
 

simonaaker

Cadet
Joined
Sep 4, 2020
Messages
3
Does your pool have a SLOG device? If not, I'd say this is a strong indication that you'll want one for your workload.
No. I have a Cache drive, since this Pool is for video media and read is much higher the write.
And i did not have this performance issues on RC2 or on Core
 

BitByteBit

Dabbler
Joined
Jul 22, 2021
Messages
17
@LawrenceSystems were you using ZFS native encryption when you did those performance tests mentioned above?
I believe the TrueNAS-MINI-3.0-X+ may use an Intel Atom C3000 series processor, and if so it seems there could be some performance issues with openZFS on Linux (in general, not sure about TrueNAS) on non-AVX processors like the Atom C3000 - see issues here:


That was a while ago though, so I'm not sure if patches have been already been implemented yet as the last issue hints at a possible fix...

If you're not using ZFS encryption then feel free to ignore all of the above :)
 

LawrenceSystems

Dabbler
Joined
Jun 14, 2017
Messages
14
Yes, with encryption for both the testing with TrueNAS core and TrueNAS scale. Also, the CPU I have in my system is the Intel Atom CPU C3758 @ 2.20GHz
 

BitByteBit

Dabbler
Joined
Jul 22, 2021
Messages
17
@LawrenceSystems, it would be interesting, if you had the time of course, to see if there was a similar performance difference between Core and Scale on an unencrypted dataset on your Atom CPU C3758 system, to see if ZFS encryption on that CPU is a contributor?
 
Top