Slow Transfer Speeds and Scrub Times

BryanT85

Cadet
Joined
Dec 30, 2022
Messages
1
I first want to thank everyone on this forum for the wealth of knowledge I have gain while setting up my TrueNAS server!

I have had my TrueNAS Scale server set up for about 3 months now, and I had scrubs set to run once a month, but recently I noticed that they have never actually run. I did a manual scrub, and let it run overnight; the next morning, it showed an estimated completion time of two months. Also, since setting up the server, I have had mediocre performance when accessing SMB shares, and occasionally having movies buffer while using Plex. I just haven't had the time to sit down and focus on troubleshooting it until now.

Here is my current setup.
OS: TrueNAS-SCALE-22.12.1
Motherboard: Supermicro X11SSH-LN4F
CPU: Intel Xeon E3-1230 v6
RAM: 32GB (2 x 16GB) ECC RDIMM DDR4-2400
Vdev: 8 x 8TB Dell EMC HGST HDD in RaidZ2
HBA: LSI SAS2308-8i IT

Problem 1: Slow access
I stopped all my apps except NetData and then ran Jgreco’s solnet-array-test-v3. The drives were all performing close to each other at around 180 MBs. I then tried downloading a folder with about 2 TB of data via WinSCP, and I was only getting around 2-10 MBs. So something was wrong!

After a few days or researching and testing, what I found was that the issue was with drive compression. When I transfer files through SMB or WinSCP, I get transfer speeds of around 2-10 MB/s. When I disable compression (defaulted to LZ4), I get speeds of 113 MB/s (basically maxing out my 1Gbs connection)

The resolution for the slow network speeds was to turn off drive compression. However, I was unsure why drive compression would cause such a significant slowdown.

Problem 2: Extended scrub and drive test times
After fixing the network slowdown by disabling drive compression, I reran the scrub, which did not improve the speed. The estimated completion time was still somewhere around 2 months.

I ran a SMART Long test and found that 5 of my 8 drives finished the long test within 10 hours. 2 of the 8 drives (sdc and sdd) finished around 24 hours, and 1 of the 8 drives (sdc) finished around 30-36 hours. Although the run times were slower than in the past, there were no errors in the SMART tests.

During the parallel run test, 7 of the 8 drives were all very close to each other, but 1 drive (sda) took twice as long to finish.

Below is a summary of the drives and the tests that I have run so far. (SMART and Solnet Array Test). The items with the asterisk (*) are the ones that are slower than average. The parallel seek test is still running and probably has another 4+ hours left.

Code:
                             Solnet Array Tests
                      -------------------------------
Disk    SMART Long    Serial Parall % of   Parall Run
        Duration (h)  MB/sec MB/sec Serial Time (sec)
------- ------------- ------ ------ ------ ----------
sda               10    187     178    95    104,535*
sdb               10    189     188   100     53,761
sdc               36*   166*    176   106     54,436
sdd               24*   186     189   102     53,924
sde               10    190     192   101     52,987
sdf               24*   186     157    85*    55,030
sdg               10    188     182    96     53,763
sdh               10    188     185    99     53,879


Could the issue be with the HBA, and not the drives?

My thoughts would be to take out the Nvidia card and move the HBA into that slot to test if it is the PCIe slot. Is that worth wild?

I do not have another HBA card, so testing the HBA card will probably require destroying my pool and plug 6 of the 8 drives into the MB’s 6 SATA plugs, create a new pool, and then test the performance.

I would like to hear y’alls advice on troubleshooting this problem!
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Normally if their was a slow down problem for the HBA, it would affect all drives equally. But, I've seen stranger things.

Which PCIe slot is the HBA installed in?

The manual lists 2 physical x8 slots, but only 1 is x8 electrical. The other is x4 electrical, (aka 4 PCIe lanes).

You can also try shutting down, moving as many of the disks SATA cables to on-board SATA ports as you can. ZFS won't care, the pool as it exists now should be fine.
 
Top