Very Slow Samba CIFS Read/Writes

Thebokke

Dabbler
Joined
Aug 3, 2018
Messages
10
Hello TrueNAS community.

New TrueNAS user - I recently installed TrueNAS Scale to evaluate, I love the new product and the fantastic user interface and inbuilt functionality. I'm assessing whether to move my main Proxmox converged VM host and file server over to TrueNas Scale. On that machine I share a ZFS dataset from the host over SMB/CIFS and can easily saturate a 1GBE connection with a 6 disk RaidZ2 pool.

To test the system I created a Truenas Scale VM on Proxmox. The host is a DEll R710 with dual Xeon L6540 (6C 12T), 96Gb RAM. The VM is running on a Sata SSD and I have passed through a Perc H200 in IT Mode through to the VM. and I can see all drive info as required. In Truenas I've created a 5 disk Raidz1 using 4 x HGST 3Tb SAS 7200 rpm drives and 1 SATA drive. The VM is given 40Gb of dedicated RAM and 6 CPU cores. So far so good.

I've created a Dataset on the Raidz1 pool and shared this via CIFS/SMB and I'm getting very poor write (10 -13 MB/s) and read speeds (20-30MB/s). This is copying over a 5Gb ISO file. The pool is otherwise empty.

Iperf indicates gigabit network speeds to/from TrueNAS at 940 mbit/s.

I tried a pool of 2 mirrored VDEV's of just the 4 SAS drives and it made no difference. I've toggled the sync setting on the pool and again no difference (wasnt sure if this needed a reboot or pool export?)

I havent set jumbo frames yet but even without I'd expect much faster speeds. TOP doesnt seem to indicate a process bogging down the CPU.

Any suggestions on possible areas to troubleshoot?

Thanks for your help
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Try it on the bare metal R710. You have multiple strikes against you here:

1) Proxmox is an immature hypervisor with immature PCIe passthru support. People have reported varying levels of success with it. If it works, great. If it doesn't, well, ...

2) The R710 with Westmere CPU's is ineligible for reliable PCIe passthru. Even with a competent hypervisor like ESXi, platforms prior to Sandy Bridge tended to be very unstable; I have ZERO reliable recommendations for pre-Sandy virtualization. For Sandy and Ivy, the Supermicro X9 boards generally work well; lots of other gear on that CPU generation still didn't get it right. It really wasn't until X10 and Haswell where HP and Dell seem to be able to do a more reliable job. Please note that I've been advising on virtualizing FreeNAS for around a decade.


My suspicion is that when you put it on the bare R710 that it will suddenly be very fast.

Virtualization on this early gear is "it is what it is; you get what you get".
 

Thebokke

Dabbler
Joined
Aug 3, 2018
Messages
10
Try it on the bare metal R710. You have multiple strikes against you here:

1) Proxmox is an immature hypervisor with immature PCIe passthru support. People have reported varying levels of success with it. If it works, great. If it doesn't, well, ...

2) The R710 with Westmere CPU's is ineligible for reliable PCIe passthru. Even with a competent hypervisor like ESXi, platforms prior to Sandy Bridge tended to be very unstable; I have ZERO reliable recommendations for pre-Sandy virtualization. For Sandy and Ivy, the Supermicro X9 boards generally work well; lots of other gear on that CPU generation still didn't get it right. It really wasn't until X10 and Haswell where HP and Dell seem to be able to do a more reliable job. Please note that I've been advising on virtualizing FreeNAS for around a decade.


My suspicion is that when you put it on the bare R710 that it will suddenly be very fast.

Virtualization on this early gear is "it is what it is; you get what you get".
jgreco, thanks for you reply. I decided to give that a try as suggested, reinstalled baremetal on the R710, same setup this time with a Raiz-z2 pool with 5 discs, four of which are SAS2 7200rpm drives. Still getting the same very slow writes to the pool.

Copying to the pool:
1641976961022.png


Copying from the pool, a little quicker this time, but still very slow (this is writing to an NVME SSD on my desktop machine:
1641977311631.png


Results from IPERF
1641978412704.png


I'd expect a lot faster performance here, not sure what is going on here?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well clearly something is wrong. What does gstat show while all this is going on? What does solnet-array-test show?


These are basically the HDD equivalents of iperf to see if we can find what might just be a very slow HDD in the mix. You should also be validating that you actually have IT firmware 20.00.07.00 on the HBA, because weird things happen when you don't.
 

Thebokke

Dabbler
Joined
Aug 3, 2018
Messages
10
Well clearly something is wrong. What does gstat show while all this is going on? What does solnet-array-test show?


These are basically the HDD equivalents of iperf to see if we can find what might just be a very slow HDD in the mix. You should also be validating that you actually have IT firmware 20.00.07.00 on the HBA, because weird things happen when you don't.
Thanks again for the pointers @jgreco

I can confirm the HBA is using 20.00.07.00 IT Firmware. I did try to download the solnet-array test but got a bit stuck downloading it from the FTP with filezilla, I could see the files but couldnt figure out what to do with them or to do a complete download.

I coulddnt find a debian equivalent to gstat, but I ran some IOTEST on the pools and drives, scanned using a range of different hard drive SMART tests, HTOP etc but nothing obvious coming back to show a single slow or failing disk. CPU utilisation peaked at about 7% while transferring data.

To check it wasnt the HBA I added an 256Gb SATA SSD and added this to the onboard SATA controller. I created a new single drive stripe pool in TrueNAS and shared this with SMB and again copied the same file to the new pool.

Again horrible performance, about the same speed as with the HDD pool:
1642045666056.png


So that rules out the HBA and the SAS drives as the issue. Iperf3 confirms I'm getting gigabit bandwidth. What else could be affecting SMB/SIFS?

Scratching my head with this...
 

Thebokke

Dabbler
Joined
Aug 3, 2018
Messages
10
Can sync writes be the culprit?
Thanks Forza, I did wonder the same thing and had turned sync off previously which didnt make a difference. To double check I added an NVME Intel Optane 16Gb module as SLOG to the pool and repeated the test. Again, almost identical speeds. I think that rules out sync writes.

Here's the speed I was getting with this test:

1642046043622.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
find a debian equivalent to gstat

Ah yeah both of those were FreeBSD based tools anyways, sorry. Would be interesting to try TrueNAS CORE here if we cannot figure out any other problem.

You should be able to run some parallel dd commands by hand to see if any of the drives are responding slowly. Try a single drive first, to establish what a single drive is capable of. Even old drives should be able to sustain 80MBytes/sec, new ones will often be 200MBytes/sec plus. I have to assume that your devices are showing up as the conventional Linux /dev/sda, sdb, etc., so you are looking to do something like

dd if=/dev/sda of=/dev/null bs=1048576 count=1000

Use this variant to establish a fast baseline. Omit the count=1000 to run a full-disk test.

We're looking for a few things here. One is if one of the disks is responding slowly, or presenting errors. Another is if ALL of the disks are crappy and slow. A third is if the overall I/O subsystem seems to be responding slowly. So you kind of work through those cases, which is what solnet-array-test is designed to do automatically on FreeBSD.

You can do the basic tests by just running dd's in the background.

dd if=/dev/sda of=/dev/null bs=1048576 &
dd if=/dev/sdb of=/dev/null bs=1048576 &
etc

Most of the time when symptoms like yours show up, there is something mucking up interrupts or causing constipation in the system. Sometimes this is the use of a Realtek ethernet, which can cause high CPU loads, or broken virtualization, which can cause MSI/MSI-X issues. If both the network and disk I/O systems work fine individually, then we start trying these tests while running both kinds of tests in parallel, and seeing if there is some sort of contention or conflict in the system.

Weakness in any subsystem tends to present as poor performance overall, so you really just need to trawl around and find out what's being crappy, and then fix it.
 

Thebokke

Dabbler
Joined
Aug 3, 2018
Messages
10
@jgreco, I did some fault finding, tested drives again, different network cards and eventually, tried TrueNAS core and same issues, ran GSTAT and noticed a fair bit of wait times on the disks. Made me sure its was a network issue and changed a switch as the server was in a temporary location whils I rebuilt it. Looks like that solved the issue, I'm getting >100mb/s write speed to the pool. Weird because iperf was showing gigabit speed, but perhaps actual data is different? I also changed encryption from zstd back to LZ4 which gave a speed boost too.

Thanks for your help troubleshooting!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It's always the thing you didn't expect. Except when it is. :smile: ;-)
 
Top