Write Performance Issues with Scale 22.02 on HP Microserver Gen 8

stitch

Dabbler
Joined
Jan 7, 2017
Messages
24
Hi Folks,

Okay, here is the situation:

Hardware:
HP Microserver Gen 8
4x WD White Label 8TB disks in 2x 8TB Mirrors (each mirror is using matched drives to ensure RPM is consistent for each mirror)
16GB ECC UDIMM DDR3
Xeon E3 1260L (was until today an E3 1220L v2)
Onboard storage controller in AHCI or Legacy IDE modes (with this server this is equivalent to an HBA mode, there is no RAID shenanigans)

TrueNAS version: Scale 22.02 Release

Configuration:
No running VMs
No running Apps
Bonded onboard NICs into a LAG using LACP, MTU set as 9000 (LAGG was used previously on TrueNAS Core)

Issue:
Writes to an SMB share on either mirror initially performs as expected, writing at 115 MB/s, but after approximately 30 seconds, the speed tanks to between 10 and 15 MB/s and stays there until the transfer completes. This did not occur on TrueNAS Core, I would see a consistent write speed maintained of 115 MB/s. This happened last time I tried to upgrade to Scale with the release candidate of Scale. When I'm installing Scale, I'm using a SATA SSD for the TrueNAS boot volume and I am enabling the creation of the swap partition during install (could this be the cause of my issues?) and importing my configuration from Core. Disabling Sync on the ZFS pool that the shares reside on has no effect.

I did encounter additional issues that forced me to delete my SMB shares and re-create them to have them work, as
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
It would be useful to see if the performance issue exists for ZFS writes internally or locally.. using fio.

What is the SATA SSD being used for??

You might want to look for high latencies on any drives... though I can't imagine what would be under stress.
 

stitch

Dabbler
Joined
Jan 7, 2017
Messages
24
It would be useful to see if the performance issue exists for ZFS writes internally or locally.. using fio.

What is the SATA SSD being used for??

You might want to look for high latencies on any drives... though I can't imagine what would be under stress.
The SATA SSD is used for the OS, on Core I am still using USB Thumb Drives (SanDisk Flair drives are surprisingly durable).

I'll try and do some testing using fio and post back.
 

stitch

Dabbler
Joined
Jan 7, 2017
Messages
24
Ran fio directly on the Scale host (I switched my working directory to /mnt/<poolname>/home/<username>) , using the command `fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numbjob=1 --size=8g --iodepth=1 --runtime=300 --time_based --end_fsync=1` which I hope is sufficient for diagnosing performance (I'm a n00b with using fio).

Code:
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=24334: Thu Mar 17 22:50:48 2022
  write: IOPS=181, BW=727KiB/s (745kB/s)(221MiB/311759msec); 0 zone resets
    slat (nsec): min=1243, max=253891, avg=6803.16, stdev=4315.01
    clat (usec): min=11, max=24034, avg=5281.46, stdev=3104.14
     lat (usec): min=13, max=24038, avg=5288.26, stdev=3105.71
    clat percentiles (usec):
     |  1.00th=[   47],  5.00th=[   52], 10.00th=[   60], 20.00th=[  685],
     | 30.00th=[ 4621], 40.00th=[ 5407], 50.00th=[ 6456], 60.00th=[ 6783],
     | 70.00th=[ 7046], 80.00th=[ 7504], 90.00th=[ 8291], 95.00th=[ 9110],
     | 99.00th=[11338], 99.50th=[12911], 99.90th=[16319], 99.95th=[17433],
     | 99.99th=[20579]
   bw (  KiB/s): min=  240, max=50824, per=100.00%, avg=756.35, stdev=2372.31, samples=599
   iops        : min=   60, max=12706, avg=189.09, stdev=593.08, samples=599
  lat (usec)   : 20=0.50%, 50=2.07%, 100=11.21%, 250=2.22%, 500=2.63%
  lat (usec)   : 750=1.63%, 1000=0.85%
  lat (msec)   : 2=0.94%, 4=4.11%, 10=71.32%, 20=2.51%, 50=0.02%
  cpu          : usr=0.47%, sys=0.43%, ctx=56708, majf=0, minf=43
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,56683,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=727KiB/s (745kB/s), 727KiB/s-727KiB/s (745kB/s-745kB/s), io=221MiB (232MB), run=311759-311759msec


Running it again directly on one of the filesystems that are used by SMB, I get the below results:
Code:
fio-3.25
Starting 1 process
random-write: Laying out IO file (1 file / 8192MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=560KiB/s][w=140 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=90209: Thu Mar 17 23:50:30 2022
  write: IOPS=193, BW=773KiB/s (791kB/s)(226MiB/300001msec); 0 zone resets
    slat (nsec): min=1434, max=218172, avg=6140.48, stdev=3539.30
    clat (usec): min=11, max=31998, avg=5166.91, stdev=3081.88
     lat (usec): min=13, max=32009, avg=5173.05, stdev=3083.19
    clat percentiles (usec):
     |  1.00th=[   28],  5.00th=[   33], 10.00th=[   40], 20.00th=[  938],
     | 30.00th=[ 4621], 40.00th=[ 5407], 50.00th=[ 5866], 60.00th=[ 6390],
     | 70.00th=[ 6849], 80.00th=[ 7308], 90.00th=[ 8160], 95.00th=[ 8979],
     | 99.00th=[12518], 99.50th=[13698], 99.90th=[17957], 99.95th=[19530],
     | 99.99th=[23725]
   bw (  KiB/s): min=  224, max=65208, per=100.00%, avg=773.17, stdev=2724.80, samples=599
   iops        : min=   56, max=16302, avg=193.29, stdev=681.20, samples=599
  lat (usec)   : 20=0.40%, 50=11.62%, 100=0.54%, 250=2.50%, 500=2.70%
  lat (usec)   : 750=1.45%, 1000=1.00%
  lat (msec)   : 2=2.20%, 4=3.27%, 10=71.26%, 20=3.00%, 50=0.04%
  cpu          : usr=0.42%, sys=0.51%, ctx=57973, majf=0, minf=44
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,57948,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=773KiB/s (791kB/s), 773KiB/s-773KiB/s (791kB/s-791kB/s), io=226MiB (237MB), run=300001-300001msec


Running on another filesystem set up to be a share on the other mirror, I get the following results:

Code:
fio-3.25
Starting 1 process
random-write: Laying out IO file (1 file / 8192MiB)
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=99164: Thu Mar 17 23:59:18 2022
  write: IOPS=150, BW=601KiB/s (616kB/s)(187MiB/317887msec); 0 zone resets
    slat (nsec): min=1295, max=245961, avg=6767.08, stdev=4705.68
    clat (usec): min=12, max=24888, avg=6265.85, stdev=3893.73
     lat (usec): min=14, max=24899, avg=6272.62, stdev=3895.54
    clat percentiles (usec):
     |  1.00th=[   30],  5.00th=[   36], 10.00th=[   41], 20.00th=[  347],
     | 30.00th=[ 5604], 40.00th=[ 7635], 50.00th=[ 8029], 60.00th=[ 8455],
     | 70.00th=[ 8717], 80.00th=[ 8848], 90.00th=[ 9372], 95.00th=[ 9896],
     | 99.00th=[14091], 99.50th=[15926], 99.90th=[19006], 99.95th=[20055],
     | 99.99th=[22152]
   bw (  KiB/s): min=  192, max=59544, per=100.00%, avg=637.77, stdev=2556.35, samples=599
   iops        : min=   48, max=14886, avg=159.45, stdev=639.09, samples=599
  lat (usec)   : 20=0.39%, 50=12.90%, 100=1.53%, 250=3.42%, 500=3.08%
  lat (usec)   : 750=1.89%, 1000=0.95%
  lat (msec)   : 2=2.66%, 4=1.98%, 10=67.00%, 20=4.15%, 50=0.05%
  cpu          : usr=0.36%, sys=0.33%, ctx=47812, majf=0, minf=46
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,47796,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=601KiB/s (616kB/s), 601KiB/s-601KiB/s (616kB/s-616kB/s), io=187MiB (196MB), run=317887-317887msec
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Iodepth needs to higher than 1 .. otherwise, its just a latency test. Suggest 32

I think your performance problem was for sequential writes.... so I would recommend testing that.

We're looking to see if ZFS has similar low performance as your original test.... if not, we'd assume its a network/SMB issue.
 

stitch

Dabbler
Joined
Jan 7, 2017
Messages
24
@morganL Okay, so I tried using the following:
Code:
fio --name=random-write --eta-newline=5s --filename=temp.file --ioengine=libaio --rw=write --numjob=1 --size=2g --io_size=10g --iodepth=32 --runtime=60 --direct=1 --fsync=10000 --group_reporting


Code:
random-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.25
Starting 1 process
random-write: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [W(1)][11.7%][w=24.6MiB/s][w=6292 IOPS][eta 00m:53s]
Jobs: 1 (f=1): [W(1)][21.7%][w=20.2MiB/s][w=5161 IOPS][eta 00m:47s]
Jobs: 1 (f=1): [W(1)][31.7%][w=15.0MiB/s][w=4089 IOPS][eta 00m:41s]
Jobs: 1 (f=1): [W(1)][41.7%][w=16.2MiB/s][w=4149 IOPS][eta 00m:35s]
Jobs: 1 (f=1): [W(1)][51.7%][w=16.5MiB/s][w=4229 IOPS][eta 00m:29s]
Jobs: 1 (f=1): [W(1)][61.7%][w=16.6MiB/s][w=4256 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [W(1)][71.7%][w=16.7MiB/s][w=4264 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [W(1)][81.7%][w=16.8MiB/s][w=4302 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [W(1)][91.7%][w=17.4MiB/s][w=4457 IOPS][eta 00m:05s]
Jobs: 1 (f=1): [W(1)][100.0%][w=15.9MiB/s][w=4077 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=1321889: Fri Mar 18 17:22:29 2022
  write: IOPS=8929, BW=34.9MiB/s (36.6MB/s)(2093MiB/60001msec); 0 zone resets
    slat (usec): min=4, max=1714, avg=109.07, stdev=106.44
    clat (usec): min=3, max=11825, avg=3472.61, stdev=3305.94
     lat (usec): min=176, max=12190, avg=3582.02, stdev=3411.51
    clat percentiles (usec):
     |  1.00th=[  227],  5.00th=[  229], 10.00th=[  231], 20.00th=[  237],
     | 30.00th=[  245], 40.00th=[  326], 50.00th=[  979], 60.00th=[ 5669],
     | 70.00th=[ 7046], 80.00th=[ 7308], 90.00th=[ 7439], 95.00th=[ 7570],
     | 99.00th=[ 7767], 99.50th=[ 8029], 99.90th=[10421], 99.95th=[10552],
     | 99.99th=[11338]
   bw (  KiB/s): min=14032, max=479040, per=100.00%, avg=35886.79, stdev=83260.68, samples=119
   iops        : min= 3508, max=119760, avg=8971.70, stdev=20815.17, samples=119
  lat (usec)   : 4=0.01%, 10=0.01%, 250=30.78%, 500=17.81%, 750=1.33%
  lat (usec)   : 1000=0.08%
  lat (msec)   : 2=0.43%, 4=1.34%, 10=47.93%, 20=0.30%
  fsync/fdatasync/sync_file_range:
    sync (nsec): min=80, max=544, avg=190.47, stdev=105.90
    sync percentiles (nsec):
     |  1.00th=[   80],  5.00th=[   82], 10.00th=[   82], 20.00th=[   84],
     | 30.00th=[   90], 40.00th=[  101], 50.00th=[  171], 60.00th=[  251],
     | 70.00th=[  274], 80.00th=[  286], 90.00th=[  302], 95.00th=[  326],
     | 99.00th=[  548], 99.50th=[  548], 99.90th=[  548], 99.95th=[  548],
     | 99.99th=[  548]
  cpu          : usr=3.21%, sys=11.76%, ctx=266464, majf=0, minf=14
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,535790,0,53 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=34.9MiB/s (36.6MB/s), 34.9MiB/s-34.9MiB/s (36.6MB/s-36.6MB/s), io=2093MiB (2195MB), run=60001-60001msec


Running it a second time without --group_reporting gets me this
Code:
random-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [W(1)][12.3%][w=41.9MiB/s][w=10.7k IOPS][eta 00m:50s]
Jobs: 1 (f=1): [W(1)][21.7%][w=18.7MiB/s][w=4782 IOPS][eta 00m:47s]
Jobs: 1 (f=1): [W(1)][31.7%][w=17.8MiB/s][w=4549 IOPS][eta 00m:41s]
Jobs: 1 (f=1): [W(1)][41.7%][w=17.1MiB/s][w=4365 IOPS][eta 00m:35s]
Jobs: 1 (f=1): [W(1)][51.7%][w=17.5MiB/s][w=4481 IOPS][eta 00m:29s]
Jobs: 1 (f=1): [W(1)][61.7%][w=18.7MiB/s][w=4784 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [W(1)][71.7%][w=18.2MiB/s][w=4649 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [W(1)][81.7%][w=17.5MiB/s][w=4488 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [W(1)][91.7%][w=16.9MiB/s][w=4327 IOPS][eta 00m:05s]
Jobs: 1 (f=1): [W(1)][100.0%][w=16.5MiB/s][w=4218 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=1328732: Fri Mar 18 17:26:50 2022
  write: IOPS=9465, BW=36.0MiB/s (38.8MB/s)(2219MiB/60001msec); 0 zone resets
    slat (usec): min=5, max=2382, avg=102.55, stdev=100.28
    clat (usec): min=3, max=9944, avg=3276.14, stdev=3088.33
     lat (usec): min=228, max=10262, avg=3379.02, stdev=3186.92
    clat percentiles (usec):
     |  1.00th=[  239],  5.00th=[  245], 10.00th=[  247], 20.00th=[  251],
     | 30.00th=[  258], 40.00th=[  379], 50.00th=[ 2114], 60.00th=[ 5407],
     | 70.00th=[ 6587], 80.00th=[ 6915], 90.00th=[ 7046], 95.00th=[ 7177],
     | 99.00th=[ 7373], 99.50th=[ 7439], 99.90th=[ 8455], 99.95th=[ 9241],
     | 99.99th=[ 9765]
   bw (  KiB/s): min=16696, max=476648, per=100.00%, avg=38044.03, stdev=81580.79, samples=119
   iops        : min= 4174, max=119162, avg=9511.01, stdev=20395.18, samples=119
  lat (usec)   : 4=0.01%, 10=0.01%, 250=19.67%, 500=25.36%, 750=2.58%
  lat (usec)   : 1000=0.21%
  lat (msec)   : 2=1.80%, 4=6.74%, 10=43.63%
  fsync/fdatasync/sync_file_range:
    sync (nsec): min=46, max=349, avg=166.95, stdev=85.47
    sync percentiles (nsec):
     |  1.00th=[   46],  5.00th=[   65], 10.00th=[   73], 20.00th=[   87],
     | 30.00th=[   90], 40.00th=[   99], 50.00th=[  147], 60.00th=[  201],
     | 70.00th=[  233], 80.00th=[  258], 90.00th=[  290], 95.00th=[  306],
     | 99.00th=[  350], 99.50th=[  350], 99.90th=[  350], 99.95th=[  350],
     | 99.99th=[  350]
  cpu          : usr=3.71%, sys=12.68%, ctx=280288, majf=0, minf=15
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,567940,0,56 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=36.0MiB/s (38.8MB/s), 36.0MiB/s-36.0MiB/s (38.8MB/s-38.8MB/s), io=2219MiB (2326MB), run=60001-60001msec


Should I boot back into TrueNAS core and try benchmarking it there now?
 

stitch

Dabbler
Joined
Jan 7, 2017
Messages
24
I re-ran the test on TrueNAS Core and the results are dramatically better:

Code:
fio --name=random-write --eta-newline=5s --filename=temp.file --ioengine=posixaio --rw=write --numjob=1 --size=2g --io_size=10g --iodepth=32 --runtime=60 --direct=1 --fsync=10000


Code:
random-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=32
fio-3.27
Starting 1 process
random-write: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [W(1)][19.4%][w=182MiB/s][w=46.7k IOPS][eta 00m:29s]
Jobs: 1 (f=1): [W(1)][25.0%][w=172MiB/s][w=43.9k IOPS][eta 00m:36s]
Jobs: 1 (f=1): [W(1)][32.7%][w=161MiB/s][w=41.2k IOPS][eta 00m:35s]
Jobs: 1 (f=1): [W(1)][42.6%][w=127MiB/s][w=32.6k IOPS][eta 00m:31s]
Jobs: 1 (f=1): [W(1)][51.8%][w=124MiB/s][w=31.7k IOPS][eta 00m:27s]
Jobs: 1 (f=1): [W(1)][60.7%][w=177MiB/s][w=45.3k IOPS][eta 00m:22s]
Jobs: 1 (f=1): [W(1)][70.2%][w=178MiB/s][w=45.6k IOPS][eta 00m:17s]
Jobs: 1 (f=1): [W(1)][79.3%][w=150MiB/s][w=38.5k IOPS][eta 00m:12s]
Jobs: 1 (f=1): [W(1)][88.1%][w=185MiB/s][w=47.3k IOPS][eta 00m:07s]
Jobs: 1 (f=1): [W(1)][98.3%][w=177MiB/s][w=45.2k IOPS][eta 00m:01s]
Jobs: 1 (f=1): [W(1)][100.0%][w=174MiB/s][w=44.6k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=1696: Fri Mar 18 20:33:27 2022
  write: IOPS=44.7k, BW=175MiB/s (183MB/s)(10.0GiB/58680msec); 0 zone resets
    slat (nsec): min=1248, max=48665k, avg=6192.59, stdev=43464.73
    clat (usec): min=22, max=450050, avg=669.19, stdev=2116.12
     lat (usec): min=30, max=450067, avg=675.38, stdev=2116.74
    clat percentiles (usec):
     |  1.00th=[   72],  5.00th=[  109], 10.00th=[  147], 20.00th=[  289],
     | 30.00th=[  537], 40.00th=[  660], 50.00th=[  709], 60.00th=[  742],
     | 70.00th=[  775], 80.00th=[  816], 90.00th=[  881], 95.00th=[  988],
     | 99.00th=[ 1401], 99.50th=[ 1663], 99.90th=[10814], 99.95th=[32637],
     | 99.99th=[83362]
   bw (  KiB/s): min=44776, max=477011, per=100.00%, avg=178846.33, stdev=57449.87, samples=116
   iops        : min=11194, max=119252, avg=44711.30, stdev=14362.40, samples=116
  lat (usec)   : 50=0.14%, 100=3.70%, 250=13.83%, 500=10.66%, 750=33.91%
  lat (usec)   : 1000=33.09%
  lat (msec)   : 2=4.38%, 4=0.13%, 10=0.06%, 20=0.03%, 50=0.04%
  lat (msec)   : 100=0.02%, 250=0.01%, 500=0.01%
  fsync/fdatasync/sync_file_range:
    sync (usec): min=38, max=6428, avg=506.85, stdev=513.63
    sync percentiles (usec):
     |  1.00th=[   55],  5.00th=[   72], 10.00th=[   89], 20.00th=[  145],
     | 30.00th=[  251], 40.00th=[  343], 50.00th=[  441], 60.00th=[  553],
     | 70.00th=[  652], 80.00th=[  750], 90.00th=[  857], 95.00th=[  971],
     | 99.00th=[ 2737], 99.50th=[ 4621], 99.90th=[ 6128], 99.95th=[ 6325],
     | 99.99th=[ 6456]
  cpu          : usr=21.87%, sys=51.07%, ctx=733769, majf=0, minf=1
  IO depths    : 1=0.1%, 2=0.5%, 4=1.8%, 8=4.0%, 16=66.3%, 32=27.6%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.7%, 8=2.9%, 16=1.0%, 32=0.5%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2621440,0,8581 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=175MiB/s (183MB/s), 175MiB/s-175MiB/s (183MB/s-183MB/s), io=10.0GiB (10.7GB), run=58680-58680msec

Re-running using the posixaio library on scale:
Code:
random-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=32
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [W(1)][11.7%][w=13.5MiB/s][w=3456 IOPS][eta 00m:53s]
Jobs: 1 (f=1): [W(1)][21.7%][w=29.9MiB/s][w=7655 IOPS][eta 00m:47s]
Jobs: 1 (f=1): [W(1)][31.7%][w=17.9MiB/s][w=4576 IOPS][eta 00m:41s]
Jobs: 1 (f=1): [W(1)][41.7%][w=11.5MiB/s][w=2946 IOPS][eta 00m:35s]
Jobs: 1 (f=1): [W(1)][51.7%][w=17.1MiB/s][w=4384 IOPS][eta 00m:29s]
Jobs: 1 (f=1): [W(1)][61.7%][w=22.6MiB/s][w=5792 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [W(1)][71.7%][w=15.4MiB/s][w=3939 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [W(1)][81.7%][w=29.9MiB/s][w=7660 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [W(1)][91.7%][w=11.5MiB/s][w=2944 IOPS][eta 00m:05s]
Jobs: 1 (f=1): [W(1)][100.0%][w=14.4MiB/s][w=3680 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=9875: Fri Mar 18 20:43:18 2022
  write: IOPS=7222, BW=28.2MiB/s (29.6MB/s)(1693MiB/60004msec); 0 zone resets
    slat (nsec): min=122, max=219996, avg=376.58, stdev=1030.53
    clat (usec): min=9, max=168436, avg=4412.21, stdev=9600.21
     lat (usec): min=10, max=168436, avg=4412.59, stdev=9600.25
    clat percentiles (usec):
     |  1.00th=[   562],  5.00th=[   594], 10.00th=[   619], 20.00th=[   644],
     | 30.00th=[   676], 40.00th=[   717], 50.00th=[   758], 60.00th=[   816],
     | 70.00th=[   914], 80.00th=[  3490], 90.00th=[ 13566], 95.00th=[ 24773],
     | 99.00th=[ 44827], 99.50th=[ 57410], 99.90th=[ 83362], 99.95th=[ 99091],
     | 99.99th=[120062]
   bw (  KiB/s): min= 3840, max=145152, per=100.00%, avg=28992.54, stdev=29636.75, samples=119
   iops        : min=  960, max=36288, avg=7248.13, stdev=7409.19, samples=119
  lat (usec)   : 10=0.01%, 20=0.05%, 50=0.04%, 100=0.03%, 250=0.01%
  lat (usec)   : 500=0.01%, 750=47.55%, 1000=25.85%
  lat (msec)   : 2=3.33%, 4=3.61%, 10=4.71%, 20=8.14%, 50=5.91%
  lat (msec)   : 100=0.72%, 250=0.04%
  fsync/fdatasync/sync_file_range:
    sync (usec): min=8, max=35020, avg=967.96, stdev=4139.02
    sync percentiles (usec):
     |  1.00th=[   12],  5.00th=[   18], 10.00th=[   21], 20.00th=[   26],
     | 30.00th=[   31], 40.00th=[   38], 50.00th=[   48], 60.00th=[   61],
     | 70.00th=[   76], 80.00th=[  578], 90.00th=[  717], 95.00th=[ 3032],
     | 99.00th=[15795], 99.50th=[34866], 99.90th=[34866], 99.95th=[34866],
     | 99.99th=[34866]
  cpu          : usr=1.25%, sys=0.21%, ctx=22896, majf=6, minf=48
  IO depths    : 1=0.1%, 2=1.0%, 4=7.0%, 8=23.8%, 16=57.6%, 32=11.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=96.9%, 8=0.2%, 16=0.1%, 32=2.8%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,433377,0,1666 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=28.2MiB/s (29.6MB/s), 28.2MiB/s-28.2MiB/s (29.6MB/s-29.6MB/s), io=1693MiB (1775MB), run=60004-60004msec

EDIT: Switching the SATA controller from "Legacy" in the BIOS (so that I could boot from the optical SATA port) to AHCI, and installing Scale on USB thumb drives, performance saw an increase:
Code:
random-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=32
fio-3.25
Starting 1 process
random-write: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [W(1)][17.9%][w=151MiB/s][w=38.7k IOPS][eta 00m:32s]
Jobs: 1 (f=1): [W(1)][25.5%][w=143MiB/s][w=36.7k IOPS][eta 00m:38s]
Jobs: 1 (f=1): [W(1)][32.8%][w=72.4MiB/s][w=18.5k IOPS][eta 00m:39s]
Jobs: 1 (f=1): [W(1)][41.7%][w=134MiB/s][w=34.4k IOPS][eta 00m:35s]
Jobs: 1 (f=1): [W(1)][51.7%][w=148MiB/s][w=37.8k IOPS][eta 00m:29s]
Jobs: 1 (f=1): [W(1)][61.7%][w=143MiB/s][w=36.7k IOPS][eta 00m:23s]
Jobs: 1 (f=1): [W(1)][71.7%][w=149MiB/s][w=38.1k IOPS][eta 00m:17s]
Jobs: 1 (f=1): [W(1)][81.7%][w=132MiB/s][w=33.8k IOPS][eta 00m:11s]
Jobs: 1 (f=1): [W(1)][91.7%][w=120MiB/s][w=30.7k IOPS][eta 00m:05s]
Jobs: 1 (f=1): [W(1)][100.0%][w=150MiB/s][w=38.4k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=9716: Fri Mar 18 23:19:16 2022
  write: IOPS=37.7k, BW=147MiB/s (155MB/s)(8846MiB/60001msec); 0 zone resets
    slat (nsec): min=123, max=245296, avg=434.23, stdev=925.35
    clat (usec): min=10, max=14211, avg=838.19, stdev=673.83
     lat (usec): min=10, max=14211, avg=838.62, stdev=673.85
    clat percentiles (usec):
     |  1.00th=[  155],  5.00th=[  180], 10.00th=[  202], 20.00th=[  273],
     | 30.00th=[  461], 40.00th=[  644], 50.00th=[  783], 60.00th=[  889],
     | 70.00th=[  979], 80.00th=[ 1090], 90.00th=[ 1418], 95.00th=[ 1844],
     | 99.00th=[ 3982], 99.50th=[ 4146], 99.90th=[ 4359], 99.95th=[ 4490],
     | 99.99th=[ 5276]
   bw (  KiB/s): min=29208, max=644352, per=100.00%, avg=150996.71, stdev=86341.93, samples=119
   iops        : min= 7302, max=161088, avg=37749.21, stdev=21585.48, samples=119
  lat (usec)   : 20=0.01%, 50=0.01%, 100=0.06%, 250=18.20%, 500=13.09%
  lat (usec)   : 750=14.92%, 1000=25.27%
  lat (msec)   : 2=24.32%, 4=3.21%, 10=0.92%, 20=0.01%
  fsync/fdatasync/sync_file_range:
    sync (usec): min=4, max=3950, avg=292.27, stdev=389.20
    sync percentiles (usec):
     |  1.00th=[   12],  5.00th=[   19], 10.00th=[   24], 20.00th=[   37],
     | 30.00th=[   63], 40.00th=[  101], 50.00th=[  163], 60.00th=[  217],
     | 70.00th=[  314], 80.00th=[  474], 90.00th=[  717], 95.00th=[  947],
     | 99.00th=[ 2057], 99.50th=[ 2671], 99.90th=[ 3261], 99.95th=[ 3359],
     | 99.99th=[ 3949]
  cpu          : usr=10.58%, sys=2.24%, ctx=379917, majf=6, minf=46
  IO depths    : 1=0.1%, 2=0.1%, 4=0.6%, 8=9.6%, 16=67.2%, 32=22.9%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.4%, 8=1.6%, 16=1.5%, 32=1.5%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2264641,0,8014 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=147MiB/s (155MB/s), 147MiB/s-147MiB/s (155MB/s-155MB/s), io=8846MiB (9276MB), run=60001-60001msec

Enabling the setting "Drive Write Cache" in the BIOS seemed to boost performance as well:
Code:
random-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=32
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [W(1)][11.7%][eta 00m:53s]
Jobs: 1 (f=1): [W(1)][21.7%][w=177MiB/s][w=45.3k IOPS][eta 00m:47s]
Jobs: 1 (f=1): [W(1)][31.7%][eta 00m:41s]
Jobs: 1 (f=1): [W(1)][41.0%][w=301MiB/s][w=77.1k IOPS][eta 00m:36s]
Jobs: 1 (f=1): [W(1)][50.8%][w=173MiB/s][w=44.3k IOPS][eta 00m:30s]
Jobs: 1 (f=1): [W(1)][60.0%][w=183MiB/s][w=46.7k IOPS][eta 00m:24s]
Jobs: 1 (f=1): [W(1)][70.5%][w=179MiB/s][w=45.9k IOPS][eta 00m:18s]
Jobs: 1 (f=1): [W(1)][80.3%][w=175MiB/s][w=44.9k IOPS][eta 00m:12s]
Jobs: 1 (f=1): [W(1)][90.2%][w=174MiB/s][w=44.5k IOPS][eta 00m:06s]
Jobs: 1 (f=1): [W(1)][100.0%][w=171MiB/s][w=43.9k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=11691: Fri Mar 18 22:37:21 2022
  write: IOPS=37.0k, BW=148MiB/s (155MB/s)(8897MiB/60001msec); 0 zone resets
    slat (nsec): min=120, max=230996, avg=394.22, stdev=772.39
    clat (usec): min=9, max=1900.0k, avg=833.37, stdev=13955.21
     lat (usec): min=9, max=1900.0k, avg=833.77, stdev=13955.21
    clat percentiles (usec):
     |  1.00th=[   167],  5.00th=[   194], 10.00th=[   212], 20.00th=[   265],
     | 30.00th=[   404], 40.00th=[   529], 50.00th=[   570], 60.00th=[   627],
     | 70.00th=[   758], 80.00th=[   906], 90.00th=[  1106], 95.00th=[  1385],
     | 99.00th=[  3720], 99.50th=[  4047], 99.90th=[  9241], 99.95th=[ 19530],
     | 99.99th=[566232]
   bw (  KiB/s): min=  256, max=563200, per=100.00%, avg=167112.44, stdev=90222.92, samples=108
   iops        : min=   64, max=140800, avg=41778.11, stdev=22555.73, samples=108
  lat (usec)   : 10=0.01%, 20=0.03%, 50=0.01%, 100=0.05%, 250=18.38%
  lat (usec)   : 500=15.99%, 750=35.21%, 1000=15.70%
  lat (msec)   : 2=12.72%, 4=1.37%, 10=0.44%, 20=0.04%, 50=0.03%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 2000=0.01%
  fsync/fdatasync/sync_file_range:
    sync (usec): min=5, max=3665, avg=217.48, stdev=267.10
    sync percentiles (usec):
     |  1.00th=[   12],  5.00th=[   18], 10.00th=[   21], 20.00th=[   28],
     | 30.00th=[   40], 40.00th=[   64], 50.00th=[   92], 60.00th=[  169],
     | 70.00th=[  229], 80.00th=[  408], 90.00th=[  594], 95.00th=[  717],
     | 99.00th=[ 1237], 99.50th=[ 1385], 99.90th=[ 1614], 99.95th=[ 1647],
     | 99.99th=[ 3654]
  cpu          : usr=8.59%, sys=1.72%, ctx=348556, majf=5, minf=49
  IO depths    : 1=0.1%, 2=0.2%, 4=2.1%, 8=13.2%, 16=64.6%, 32=20.3%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.9%, 8=1.0%, 16=1.3%, 32=1.8%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2277556,0,8243 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=148MiB/s (155MB/s), 148MiB/s-148MiB/s (155MB/s-155MB/s), io=8897MiB (9329MB), run=60001-60001msec

So there must be a dramatic difference in the Linux driver for the B120i storage controller vs the one used in FreeBSD.
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I re-ran the test on TrueNAS Core and the results are dramatically better:


So there must be a dramatic difference in the Linux driver for the B120i storage controller vs the one used in FreeBSD.

You could be correct.. feel free to report a bug with just the differences between CORE and SCALE with same tests.

I'm not sure we yet understand your original problem of the performance dropping dramatically after 30 seconds.
It could be that data is cached in DRAM -- 115MB/s for 30 seconds = 3GB.... which is about typical for the DRAM allocated to write caching.

Is this on a Gbit/s link... => 115MB/s is the max possible?
 

stitch

Dabbler
Joined
Jan 7, 2017
Messages
24
You could be correct.. feel free to report a bug with just the differences between CORE and SCALE with same tests.

I'm not sure we yet understand your original problem of the performance dropping dramatically after 30 seconds.
It could be that data is cached in DRAM -- 115MB/s for 30 seconds = 3GB.... which is about typical for the DRAM allocated to write caching.

Is this on a Gbit/s link... => 115MB/s is the max possible?
It will fluctuate around 120 MB/s and 115 MB/s (theoretical max is 125 MB/s), and yes, it is an LACP Gigabit connection on the HP with the other host only having Gigabit and using jumbo frames.

Based on the behavior, it kinda looks like the data is hitting RAM as you suggest, but then the system writes the initial burst to disk and then ZFS stops caching and just writes out immediately to the disk at the crippled speed. So it could be something is wrong with how that interaction between ZFS and the storage driver is handled as well.

Ticket created: 115388
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
It will fluctuate around 120 MB/s and 115 MB/s (theoretical max is 125 MB/s), and yes, it is an LACP Gigabit connection on the HP with the other host only having Gigabit and using jumbo frames.

Based on the behavior, it kinda looks like the data is hitting RAM as you suggest, but then the system writes the initial burst to disk and then ZFS stops caching and just writes out immediately to the disk at the crippled speed. So it could be something is wrong with how that interaction between ZFS and the storage driver is handled as well.

Ticket created: 115388

Thanks.... performance issues like these are possible at the RELEASE stage. There is more extensive performance testing being done with the goal of fixing most of these by U2.. Thanks for reporting the issue.. hopefully there is a fix already in process.
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
So there must be a dramatic difference in the Linux driver for the B120i storage controller vs the one used in FreeBSD.

@stich I don't know what's causing your issue with SCALE, but if you've set the embedded SATA controller to AHCI as recommended, then CORE/SCALE will just see the normal ahci SATA ports from the intel c204 chipset. For example, in TrueNAS CORE taken from "pciconf -lv" output:

ahci0@pci0:0:31:2: class=0x010601 card=0x330d103c chip=0x1c028086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller' class = mass storage subclass = SATA

No raid storage driver to support the microserver's "fakeraid" will, or should be in use. In this mode you can boot from an SSD in the optical drive bay by installing a boot GRUB boot loader on a USB stick as explained in several posts elsewhere in this forum.

Differences in performance could be down to anything from the network stack, samba version and iXsystem's implementation of NFSv4 ACLs in linux to the linux memory management of zfs and version.
 

stitch

Dabbler
Joined
Jan 7, 2017
Messages
24
@stich I don't know what's causing your issue with SCALE, but if you've set the embedded SATA controller to AHCI as recommended, then CORE/SCALE will just see the normal ahci SATA ports from the intel c204 chipset. For example, in TrueNAS CORE taken from "pciconf -lv" output:

ahci0@pci0:0:31:2: class=0x010601 card=0x330d103c chip=0x1c028086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller' class = mass storage subclass = SATA

No raid storage driver to support the microserver's "fakeraid" will, or should be in use. In this mode you can boot from an SSD in the optical drive bay by installing a boot GRUB boot loader on a USB stick as explained in several posts elsewhere in this forum.

Differences in performance could be down to anything from the network stack, samba version and iXsystem's implementation of NFSv4 ACLs in linux to the linux memory management of zfs and version.
I believe I have demonstrated with the benchmarks that at the very least there is something not behaving with whichever driver is used when the controller is in legacy mode and using scale. The benchmarks were ran on the TrueNAS host from its own terminal and so the network stack was no longer a factor.

Meanwhile, Core had no issues whatsoever with the controller operating in Legacy or AHCI. It also didn't experience the time outs in the benchmarks seen on scale and was more consistent without having to toggle on the "drive write cache" BIOS setting.

I am aware of the work arounds for booting off a drive using grub-but without a confirmed "we will not fix" from the devs, this is still a possible issue as the numbers from the test results would seem to indicate.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
whichever driver is used when the controller is in legacy mode

This has never been expected to work in the past, the usual advice is to run a vaguely sane modern AHCI. Pre-AHCI, SATA was a disaster, with weird drivers and tweaks and all sorts of garbage.

I have absolutely no doubt that it's possible to make Legacy mode on an HP Microserver work, but it may be a bit of a slog to get there.
 

stitch

Dabbler
Joined
Jan 7, 2017
Messages
24
This has never been expected to work in the past, the usual advice is to run a vaguely sane modern AHCI. Pre-AHCI, SATA was a disaster, with weird drivers and tweaks and all sorts of garbage.

I have absolutely no doubt that it's possible to make Legacy mode on an HP Microserver work, but it may be a bit of a slog to get there.
Even in AHCI mode, it doesn't seem as performant on scale vs core. The devs have asked if I can test on a raw disk without ZFS, so I have to play musical disks to shuffle data around to be able to perform that testing. If it is the driver specifically, it is then an upstream issue.

My backup plan is an LSI HBA that I've ordered to use instead of the onboard controller.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I suspect the driver, middleware, or even the ZFSonLinux module itself isn't toggling the drive write cache. ZFS on FreeBSD (at least the legacy one on 11.3) was intelligent enough to manipulate and enable the volatile cache on underlying vdev members, but I haven't tried SCALE.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
FreeBSD ATA disk driver in Core historically enables write cache and read ahead on disk probe, so it should not care about the write cache setting in BIOS. I don't see such logic in Linux kernel, that may explain why enabling write cache helps SCALE. ZFS is designed and tuned to work with write caching enabled for best performance, so please set it to enabled.

Legacy ATA is indeed a horror. Please do not use it unless you really have to. It may work more or less, but even in best case it can execute only one command at a time per disk, or even one at a time per two disks, if they happen to be mapped into the same legacy controller channel, unlike 32 commands per disk for AHCI via NCQ. Enabled write cache may partially compensate that for write workloads.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Looking on the tests, I see the first were done with `-rw=randwrite --bs=4k`, that barely has anything common on performance profile with typical SMB operation. On later tests I see `--rw=write`, but the block size still seems to be 4KB, that is at best a waste of CPU resources. I am not sure what block size is used by your SMB client, but for sequential workloads I'd test `--bs=128KB` or even more.
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
HP docs show that when setting the HP microserver embedded SATA controller to AHCI mode, "Drive Write Cache" is enabled by default. Perhaps this is why other folk using this hardware have not noticed this issue as they have never turned it off.

hpg8writecache.jpg
 

stitch

Dabbler
Joined
Jan 7, 2017
Messages
24
I'm now just using the onboard controller for the 2x 2.5" SATA SSD (via a SAS8087 to 4x SATA data cable) and have tossed in a SAS2008 card hooked up to the backplane and the write performance is much more consistent. Writing a large file I'm seeing a solid 112 MB/s write speed so as far as I'm concerned at this point, this is a solid work around.
 

stitch

Dabbler
Joined
Jan 7, 2017
Messages
24
Looking on the tests, I see the first were done with `-rw=randwrite --bs=4k`, that barely has anything common on performance profile with typical SMB operation. On later tests I see `--rw=write`, but the block size still seems to be 4KB, that is at best a waste of CPU resources. I am not sure what block size is used by your SMB client, but for sequential workloads I'd test `--bs=128KB` or even more.
I was going by what I could find online for examples, I'm not very experienced with fio so was literally shooting in the dark.
 
Top