New server, slow read/write speeds

MarkH

Cadet
Joined
Jan 9, 2024
Messages
4
I'm having some issues with a new system I've setup and started to transfer files to. However, it's very slow. My other TrueNAS Core server (built on consumer hardware) worked great, nice and fast until the HBA started to fail and caused errors. I bought a refurbished SuperMicro SSG-6048R-E1CR36H as a replacement/upgrade for the old server with the intention of making the old server be a backup.

The SuperMicro has:
* 2x Xeon E5-2673 V4, 2.3GHz 20-Core CPUs
* 8x 32gb PC4-2400T DDR4 ECC (256gb total)
* 14x Toshiba MG03SCA400 4TB 7200RPM SAS 6G drives
* 2x 240gb SATA SSD for OS (cheap Crucial and Kioxia)
* 1x 2tb INTEL SSDPEKNU020TZ NVMe drive for Log.
* Supermicro AOM-SAS3-8i8e HBA in JBOD mode
* Dual 10gbps onboard network ports.

This gives me just enough storage to store all the critical files from the old server and stay under 80% pool usage.

From empty, the pool has struggled to write at more than 2-4Gbit/s, whether via SMB or rsync. The old server can transfer files to my computer at 9.something gbit/s, and i can write to it at similar speeds, so I'm fairly sure it's the new server since I get the same speeds when writing from 3 different computers with 10gbe network connections.

First check was iperf, network seems fine:
Code:
C:\temp\iperf-3.1.3-win64> ./iperf3 -c 192.168.1.70 -bidir
Connecting to host 192.168.1.70, port 5201
[  4] local 192.168.1.88 port 61668 connected to 192.168.1.70 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   894 MBytes  7.50 Gbits/sec
[  4]   1.00-2.00   sec   898 MBytes  7.52 Gbits/sec
[  4]   2.00-3.00   sec   893 MBytes  7.50 Gbits/sec
[  4]   3.00-4.00   sec   885 MBytes  7.42 Gbits/sec
[  4]   4.00-5.00   sec   924 MBytes  7.76 Gbits/sec
[  4]   5.00-6.00   sec   922 MBytes  7.73 Gbits/sec
[  4]   6.00-7.00   sec   925 MBytes  7.76 Gbits/sec
[  4]   7.00-8.00   sec   909 MBytes  7.62 Gbits/sec
[  4]   8.00-9.00   sec   900 MBytes  7.55 Gbits/sec
[  4]   9.00-10.00  sec   904 MBytes  7.58 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  8.84 GBytes  7.59 Gbits/sec                  sender
[  4]   0.00-10.00  sec  8.84 GBytes  7.59 Gbits/sec                  receiver


With parallel connections it looks a little better
Code:
[SUM]   0.00-10.00  sec  11.0 GBytes  9.42 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  11.0 GBytes  9.42 Gbits/sec                  receiver


explorer_xI8c6SJL5d.png

This is pretty typical for a file transfer from windows - a good burst of high speed then really dismal from then on. This is coming from NVMe on my computer, so bottle neck shouldn't be either my computer or the network.

I've tried moving the HBA from the default CPU1/SLOT2 PCIE3.0 X8 port to the CPU2 SLOT 4 PCIE3.0 x16 (note, card is only x8 length) but there was no performance change - no surprise there though.

CPU usage during copying is "0% average" with highest usage typically around 2-4%.

The pool is:
Code:
Data VDEVs 1 x RAIDZ2 | 13 wide | 3.64 TiB
Metadata VDEVs VDEVs not assigned
Log VDEVs 1 x DISK | 1 wide | 1.86 TiB
Cache VDEVs VDEVs not assigned
Spare VDEVs 1 x 3.64 TiB
Dedup VDEVs VDEVs not assigned


ZFS Health:
Code:
ZFS Health check_circle
Pool Status: Online
Total ZFS Errors: 0
Scheduled Scrub Task: Set
Auto TRIM: Off
Last Scan: Finished Scrub on 2023-12-24 18:04:05
Last Scan Errors: 0
Last Scan Duration: 18 hours 2 minutes 48 seconds


Disk Health:
Code:
Disk Health check_circle 
Disks temperature related alerts: 0
Highest Temperature: 43 °C
Lowest Temperature: 19 °C
Average Disk Temperature: 33.056 °C
Failed S.M.A.R.T. Tests: 0


In the SMB service advanced settings I've tried both multi-channel on and off - it doesn't make a difference, but I'm not surprised since rysnc sees the same performance writing to the pool.


At this point I'm not sure what to do next, my understanding (which could be very wrong) is that the log drive would allow network saturation speed writing of data even if the HBA/drives couldn't keep up... but it doesn't see any use. The pool here has more drives (and SAS vs SATA) compared to the old server, so I figured it would be faster for read/write than the old server if anything, plus it has a heck of a lot more processing power.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Data VDEVs 1 x RAIDZ2 | 13 wide | 3.64 TiB
That's going to be pretty dodgy for performance,
* 1x 2tb INTEL SSDPEKNU020TZ NVMe drive for Log.
Not at all suitable for SLOG, though it's likely you don't need an SLOG at all, if you're using SMB.

Now, your performance seems especially sucky. Unless one or more disks are dodgy, you should be seeing at least 4-5 Gb/s without much effort, even with such a wide vdev.
my understanding (which could be very wrong) is that the log drive would allow network saturation speed writing of data even if the HBA/drives couldn't keep up
Indeed it is incorrect, SLOG is only useful for sync writes, which most SMB clients do not issue at all. And it would need to be a fast, low latency SSD, QLC is the antithesis of SLOG device. Async writes end up in RAM only, and it doesn't get faster than that (you didn't set the dataset to sync=always, did you?).
SAS vs SATA
Not relevant in practical terms.
The pool here has more drives
Well, it's constrained to the IOPS of about a single disk, by virtue of having a single RAIDZ vdev. If you need to start seeking more than a little bit, performance will quickly drop to unimpressive levels.
 

MarkH

Cadet
Joined
Jan 9, 2024
Messages
4
I've also used fio (without really knowing what i'm doing, to be clear), based on the commands here: https://docs.oracle.com/en-us/iaas/Content/Block/References/samplefiocommandslinux.htm to test speeds:
Code:
fio --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --numjobs=1 --group_reporting --name=throughputjob --eta-newline=1 --size=5GB 
[sudo] password for admin: 
throughputjob: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=64
fio-3.33
Starting 1 process
throughputjob: Laying out IO file (1 file / 5120MiB)
Jobs: 1 (f=1)
throughputjob: (groupid=0, jobs=1): err= 0: pid=36542: Tue Jan  9 11:47:49 2024
  read: IOPS=5759, BW=1440MiB/s (1510MB/s)(2565MiB/1781msec)
    slat (usec): min=44, max=347, avg=81.01, stdev=45.54
    clat (usec): min=4, max=8395, avg=5475.90, stdev=782.37
     lat (usec): min=70, max=8522, avg=5556.91, stdev=792.15
    clat percentiles (usec):
     |  1.00th=[ 4228],  5.00th=[ 4424], 10.00th=[ 4555], 20.00th=[ 4752],
     | 30.00th=[ 4948], 40.00th=[ 5211], 50.00th=[ 5407], 60.00th=[ 5604],
     | 70.00th=[ 5800], 80.00th=[ 6063], 90.00th=[ 6521], 95.00th=[ 6915],
     | 99.00th=[ 7570], 99.50th=[ 7832], 99.90th=[ 8094], 99.95th=[ 8225],
     | 99.99th=[ 8356]
   bw (  MiB/s): min= 1388, max= 1549, per=100.00%, avg=1441.83, stdev=93.24, samples=3
   iops        : min= 5552, max= 6198, avg=5767.33, stdev=372.97, samples=3
  write: IOPS=5739, BW=1435MiB/s (1505MB/s)(2556MiB/1781msec); 0 zone resets
    slat (usec): min=65, max=359, avg=87.62, stdev=28.75
    clat (usec): min=73, max=8333, avg=5453.90, stdev=770.04
     lat (usec): min=270, max=8424, avg=5541.52, stdev=780.01
    clat percentiles (usec):
     |  1.00th=[ 4228],  5.00th=[ 4424], 10.00th=[ 4555], 20.00th=[ 4752],
     | 30.00th=[ 4948], 40.00th=[ 5145], 50.00th=[ 5407], 60.00th=[ 5604],
     | 70.00th=[ 5800], 80.00th=[ 6063], 90.00th=[ 6456], 95.00th=[ 6849],
     | 99.00th=[ 7504], 99.50th=[ 7767], 99.90th=[ 8029], 99.95th=[ 8160],
     | 99.99th=[ 8160]
   bw (  MiB/s): min= 1343, max= 1622, per=100.00%, avg=1442.00, stdev=156.56, samples=3
   iops        : min= 5372, max= 6490, avg=5768.00, stdev=626.25, samples=3
  lat (usec)   : 10=0.01%, 100=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.03%, 4=0.08%, 10=99.84%
  cpu          : usr=6.24%, sys=88.54%, ctx=1063, majf=4, minf=20
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.7%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=10258,10222,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=1440MiB/s (1510MB/s), 1440MiB/s-1440MiB/s (1510MB/s-1510MB/s), io=2565MiB (2689MB), run=1781-1781msec
  WRITE: bw=1435MiB/s (1505MB/s), 1435MiB/s-1435MiB/s (1505MB/s-1505MB/s), io=2556MiB (2680MB), run=1781-1781msec

These numbers seem optimisitic?

Then with 10 jobs:
Code:
fio --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --numjobs=10 --group_reporting --name=throughputjob --eta-newline=1 --size=5GB
throughputjob: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=64
...
fio-3.33
Starting 10 processes
throughputjob: Laying out IO file (1 file / 5120MiB)
throughputjob: Laying out IO file (1 file / 5120MiB)
throughputjob: Laying out IO file (1 file / 5120MiB)
throughputjob: Laying out IO file (1 file / 5120MiB)
throughputjob: Laying out IO file (1 file / 5120MiB)
throughputjob: Laying out IO file (1 file / 5120MiB)
throughputjob: Laying out IO file (1 file / 5120MiB)
throughputjob: Laying out IO file (1 file / 5120MiB)
throughputjob: Laying out IO file (1 file / 5120MiB)
throughputjob: Laying out IO file (1 file / 5120MiB)
Jobs: 10 (f=10): [m(10)][4.8%][r=227MiB/s,w=237MiB/s][r=906,w=946 IOPS][eta 01m:20s]  
Jobs: 10 (f=10): [m(10)][5.8%][r=163MiB/s,w=171MiB/s][r=653,w=684 IOPS][eta 01m:37s] 
Jobs: 10 (f=10): [m(10)][7.0%][r=170MiB/s,w=181MiB/s][r=681,w=725 IOPS][eta 01m:46s] 
Jobs: 10 (f=10): [m(10)][8.1%][r=136MiB/s,w=139MiB/s][r=545,w=555 IOPS][eta 01m:54s] 
Jobs: 10 (f=10): [m(10)][9.5%][r=218MiB/s,w=219MiB/s][r=870,w=876 IOPS][eta 01m:54s] 
Jobs: 10 (f=10): [m(10)][11.8%][r=316MiB/s,w=290MiB/s][r=1263,w=1159 IOPS][eta 01m:45s] 
Jobs: 10 (f=10): [m(10)][14.0%][r=256MiB/s,w=253MiB/s][r=1022,w=1010 IOPS][eta 01m:38s] 
Jobs: 10 (f=10): [m(10)][15.9%][r=249MiB/s,w=263MiB/s][r=997,w=1053 IOPS][eta 01m:35s]
Jobs: 10 (f=10): [m(10)][18.0%][r=300MiB/s,w=272MiB/s][r=1199,w=1087 IOPS][eta 01m:31s] 
Jobs: 10 (f=10): [m(10)][20.2%][r=242MiB/s,w=247MiB/s][r=967,w=987 IOPS][eta 01m:27s]   
Jobs: 10 (f=10): [m(10)][21.4%][r=156MiB/s,w=158MiB/s][r=623,w=631 IOPS][eta 01m:28s] 
Jobs: 10 (f=10): [m(10)][23.2%][r=293MiB/s,w=286MiB/s][r=1171,w=1143 IOPS][eta 01m:26s]
Jobs: 10 (f=10): [m(10)][25.9%][r=326MiB/s,w=351MiB/s][r=1305,w=1402 IOPS][eta 01m:20s] 
Jobs: 10 (f=10): [m(10)][28.3%][r=342MiB/s,w=325MiB/s][r=1368,w=1298 IOPS][eta 01m:16s] 
Jobs: 10 (f=10): [m(10)][30.8%][r=311MiB/s,w=320MiB/s][r=1245,w=1280 IOPS][eta 01m:12s] 
Jobs: 10 (f=10): [m(10)][33.3%][r=306MiB/s,w=310MiB/s][r=1222,w=1238 IOPS][eta 01m:08s] 
Jobs: 10 (f=10): [m(10)][35.3%][r=216MiB/s,w=215MiB/s][r=864,w=858 IOPS][eta 01m:06s]   
Jobs: 10 (f=10): [m(10)][36.9%][r=204MiB/s,w=213MiB/s][r=814,w=851 IOPS][eta 01m:05s] 
Jobs: 10 (f=10): [m(10)][38.8%][r=288MiB/s,w=304MiB/s][r=1151,w=1217 IOPS][eta 01m:03s] 
Jobs: 10 (f=10): [m(10)][41.2%][r=330MiB/s,w=354MiB/s][r=1320,w=1415 IOPS][eta 01m:00s] 
Jobs: 10 (f=10): [m(10)][44.0%][r=324MiB/s,w=324MiB/s][r=1295,w=1294 IOPS][eta 00m:56s] 
Jobs: 10 (f=10): [m(10)][46.5%][r=313MiB/s,w=287MiB/s][r=1250,w=1146 IOPS][eta 00m:53s] 
Jobs: 10 (f=10): [m(10)][48.5%][r=239MiB/s,w=246MiB/s][r=954,w=985 IOPS][eta 00m:51s]   
Jobs: 10 (f=10): [m(10)][49.5%][r=149MiB/s,w=151MiB/s][r=597,w=605 IOPS][eta 00m:51s] 
Jobs: 10 (f=10): [m(10)][51.5%][r=303MiB/s,w=303MiB/s][r=1213,w=1210 IOPS][eta 00m:49s]
Jobs: 10 (f=10): [m(10)][54.5%][r=367MiB/s,w=357MiB/s][r=1466,w=1427 IOPS][eta 00m:45s] 
Jobs: 10 (f=10): [m(10)][57.1%][r=323MiB/s,w=342MiB/s][r=1291,w=1368 IOPS][eta 00m:42s] 
Jobs: 10 (f=10): [m(10)][59.2%][r=299MiB/s,w=295MiB/s][r=1197,w=1178 IOPS][eta 00m:40s] 
Jobs: 10 (f=10): [m(10)][61.2%][r=266MiB/s,w=247MiB/s][r=1063,w=989 IOPS][eta 00m:38s]  
Jobs: 10 (f=10): [m(10)][63.3%][r=208MiB/s,w=201MiB/s][r=831,w=805 IOPS][eta 00m:36s] 
Jobs: 10 (f=10): [m(10)][64.6%][r=203MiB/s,w=193MiB/s][r=810,w=773 IOPS][eta 00m:35s] 
Jobs: 10 (f=10): [m(10)][66.7%][r=310MiB/s,w=307MiB/s][r=1238,w=1226 IOPS][eta 00m:33s]
Jobs: 10 (f=10): [m(10)][69.4%][r=362MiB/s,w=362MiB/s][r=1449,w=1448 IOPS][eta 00m:30s] 
Jobs: 10 (f=10): [m(10)][71.4%][r=322MiB/s,w=323MiB/s][r=1287,w=1293 IOPS][eta 00m:28s] 
Jobs: 10 (f=10): [m(10)][73.5%][r=243MiB/s,w=248MiB/s][r=972,w=991 IOPS][eta 00m:26s]   
Jobs: 10 (f=10): [m(10)][75.5%][r=246MiB/s,w=261MiB/s][r=983,w=1043 IOPS][eta 00m:24s] 
Jobs: 10 (f=10): [m(10)][76.8%][r=187MiB/s,w=178MiB/s][r=748,w=711 IOPS][eta 00m:23s] 
Jobs: 10 (f=10): [m(10)][78.0%][r=209MiB/s,w=204MiB/s][r=836,w=817 IOPS][eta 00m:22s] 
Jobs: 10 (f=10): [m(10)][80.8%][r=337MiB/s,w=334MiB/s][r=1348,w=1336 IOPS][eta 00m:19s] 
Jobs: 10 (f=10): [m(10)][82.8%][r=340MiB/s,w=314MiB/s][r=1358,w=1255 IOPS][eta 00m:17s] 
Jobs: 10 (f=10): [m(10)][85.7%][r=278MiB/s,w=275MiB/s][r=1113,w=1100 IOPS][eta 00m:14s] 
Jobs: 10 (f=10): [m(10)][87.8%][r=265MiB/s,w=261MiB/s][r=1058,w=1043 IOPS][eta 00m:12s] 
Jobs: 10 (f=10): [m(10)][89.8%][r=277MiB/s,w=262MiB/s][r=1107,w=1047 IOPS][eta 00m:10s] 
Jobs: 10 (f=10): [m(10)][90.9%][r=161MiB/s,w=170MiB/s][r=643,w=679 IOPS][eta 00m:09s] 
Jobs: 10 (f=10): [m(10)][93.9%][r=329MiB/s,w=318MiB/s][r=1315,w=1273 IOPS][eta 00m:06s]
Jobs: 10 (f=10): [m(10)][95.9%][r=332MiB/s,w=340MiB/s][r=1328,w=1358 IOPS][eta 00m:04s] 
Jobs: 8 (f=8): [m(5),_(1),m(3),_(1)][99.0%][r=297MiB/s,w=310MiB/s][r=1188,w=1239 IOPS][eta 00m:01s]
throughputjob: (groupid=0, jobs=10): err= 0: pid=38864: Tue Jan  9 11:55:18 2024
  read: IOPS=1070, BW=268MiB/s (281MB/s)(25.0GiB/95677msec)
    slat (usec): min=65, max=2303, avg=263.72, stdev=42.84
    clat (usec): min=7, max=771718, avg=292584.68, stdev=90152.34
     lat (usec): min=267, max=771990, avg=292848.40, stdev=90152.33
    clat percentiles (msec):
     |  1.00th=[  131],  5.00th=[  190], 10.00th=[  207], 20.00th=[  226],
     | 30.00th=[  241], 40.00th=[  257], 50.00th=[  271], 60.00th=[  288],
     | 70.00th=[  313], 80.00th=[  351], 90.00th=[  418], 95.00th=[  485],
     | 99.00th=[  584], 99.50th=[  617], 99.90th=[  693], 99.95th=[  718],
     | 99.99th=[  751]
   bw (  KiB/s): min=98816, max=536195, per=100.00%, avg=274185.37, stdev=8073.65, samples=1895
   iops        : min=  386, max= 2093, avg=1070.92, stdev=31.53, samples=1895
  write: IOPS=1070, BW=268MiB/s (281MB/s)(25.0GiB/95677msec); 0 zone resets
    slat (usec): min=528, max=59985, avg=9016.25, stdev=2667.89
    clat (usec): min=12, max=770592, avg=292996.16, stdev=90491.77
     lat (msec): min=2, max=787, avg=302.01, stdev=92.81
    clat percentiles (msec):
     |  1.00th=[  134],  5.00th=[  190], 10.00th=[  207], 20.00th=[  226],
     | 30.00th=[  241], 40.00th=[  257], 50.00th=[  271], 60.00th=[  292],
     | 70.00th=[  313], 80.00th=[  351], 90.00th=[  418], 95.00th=[  485],
     | 99.00th=[  584], 99.50th=[  617], 99.90th=[  701], 99.95th=[  726],
     | 99.99th=[  743]
   bw (  KiB/s): min=108544, max=509847, per=100.00%, avg=274215.26, stdev=7352.00, samples=1895
   iops        : min=  424, max= 1989, avg=1071.05, stdev=28.71, samples=1895
  lat (usec)   : 10=0.01%, 20=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.02%, 50=0.05%
  lat (msec)   : 100=0.27%, 250=35.65%, 500=59.96%, 750=4.03%, 1000=0.01%
  cpu          : usr=0.47%, sys=5.17%, ctx=103185, majf=0, minf=3050
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.7%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=102395,102405,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s), io=25.0GiB (26.8GB), run=95677-95677msec
  WRITE: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s), io=25.0GiB (26.8GB), run=95677-95677msec


While running this, disk reports were showing a max of 41.1MiB/s per disk on the graphs, but averaging around 15 or 20.
 

MarkH

Cadet
Joined
Jan 9, 2024
Messages
4
That's going to be pretty dodgy for performance,

Not at all suitable for SLOG, though it's likely you don't need an SLOG at all, if you're using SMB.

Now, your performance seems especially sucky. Unless one or more disks are dodgy, you should be seeing at least 4-5 Gb/s without much effort, even with such a wide vdev.

Indeed it is incorrect, SLOG is only useful for sync writes, which most SMB clients do not issue at all. And it would need to be a fast, low latency SSD, QLC is the antithesis of SLOG device. Async writes end up in RAM only, and it doesn't get faster than that (you didn't set the dataset to sync=always, did you?).

Not relevant in practical terms.

Well, it's constrained to the IOPS of about a single disk, by virtue of having a single RAIDZ vdev. If you need to start seeking more than a little bit, performance will quickly drop to unimpressive levels.

Thank you very much for your input!

At this point, the old server is still running, the new HBA installed in it seems to be resolving the issues writing new files to it, even if some of the old files were corrupted when written - so I don't have any problem with deleting this whole dataset and starting from scratch.

We bought 16x refurbished drives, two of which had substantially more smart errors (all recovered) than the others, so I've put them aside for now. This gives me 14x 4tb drives with the goal of storing about 32TB of data for now - we'll replace with a new 16tb x [many] drives in a month or two once a client catches up on their outstanding invoices. So the drives are sort of an interim low cost solution, but we expect some performance out of them.
 

MarkH

Cadet
Joined
Jan 9, 2024
Messages
4
What would be the optimal structure for read/write speed with the drives that I currently have?
 
Top