Confusingly low per-disk sequential performance

Littlejd97 · Nov 2, 2022

I was wondering if someone could help me understand if this is just expected performance, or if there is something misconfigured.

I have 2 4TB WD Red CMR (WD40EFRX) drives in a Mirror, and I would expect sequentially 100MB/s+ per drive, but I'm getting far less than that when running tests.

I disabled all services NFS/iSCSI, etc the ran these tests on the machine itself. I also set the `primarycache` to `metadata`

Code:

root@truenas[/mnt/NAS]# fio --filename=/mnt/NAS/test.file --name=sync_seqread --rw=read --bs=4M --direct=1 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --refill_buffers --size=1G --loops=10 --group_reporting
sync_seqread: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
fio-3.28
Starting 1 process
sync_seqread: Laying out IO file (1 file / 1024MiB)

sync_seqread: (groupid=0, jobs=1): err= 0: pid=8979: Wed Nov  2 20:22:40 2022
  read: IOPS=22, BW=88.2MiB/s (92.5MB/s)(10.0GiB/116044msec)
    clat (usec): min=243, max=841757, avg=45324.58, stdev=33266.73
     lat (usec): min=243, max=841758, avg=45325.38, stdev=33266.80
    clat percentiles (usec):
     |  1.00th=[   260],  5.00th=[   277], 10.00th=[   441], 20.00th=[ 41681],
     | 30.00th=[ 43779], 40.00th=[ 45351], 50.00th=[ 46924], 60.00th=[ 47973],
     | 70.00th=[ 49546], 80.00th=[ 51643], 90.00th=[ 54789], 95.00th=[ 65274],
     | 99.00th=[175113], 99.50th=[270533], 99.90th=[304088], 99.95th=[438305],
     | 99.99th=[843056]
   bw (  KiB/s): min=15937, max=2096956, per=100.00%, avg=91020.76, stdev=141078.11, samples=226
   iops        : min=    3, max=  511, avg=21.61, stdev=34.43, samples=226
  lat (usec)   : 250=0.23%, 500=12.58%, 750=0.39%
  lat (msec)   : 50=60.59%, 100=24.41%, 250=1.21%, 500=0.55%, 1000=0.04%
  cpu          : usr=0.01%, sys=0.96%, ctx=8935, majf=0, minf=1024
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=2560,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=88.2MiB/s (92.5MB/s), 88.2MiB/s-88.2MiB/s (92.5MB/s-92.5MB/s), io=10.0GiB (10.7GB), run=116044-116044msec
root@truenas[/mnt/NAS]#

Code:

root@truenas[/mnt/NAS]# fio --filename=/mnt/NAS/test.file --name=sync_seqwrite --rw=write --bs=4M --direct=1 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --refill_buffers --size=5G --loops=1 --group_reporting | tee -a "${LOGFILE}"

sync_seqwrite: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
fio-3.28
Starting 1 process
sync_seqwrite: Laying out IO file (1 file / 5120MiB)

sync_seqwrite: (groupid=0, jobs=1): err= 0: pid=8655: Wed Nov  2 20:19:21 2022
  write: IOPS=8, BW=34.8MiB/s (36.5MB/s)(5120MiB/147112msec); 0 zone resets
    clat (msec): min=21, max=281, avg=114.41, stdev=35.53
     lat (msec): min=21, max=281, avg=114.41, stdev=35.53
    clat percentiles (msec):
     |  1.00th=[   78],  5.00th=[   88], 10.00th=[   89], 20.00th=[   92],
     | 30.00th=[   99], 40.00th=[  100], 50.00th=[  102], 60.00th=[  107],
     | 70.00th=[  111], 80.00th=[  126], 90.00th=[  169], 95.00th=[  203],
     | 99.00th=[  241], 99.50th=[  253], 99.90th=[  275], 99.95th=[  284],
     | 99.99th=[  284]
   bw (  KiB/s): min=15603, max=49053, per=100.00%, avg=35672.08, stdev=8180.57, samples=285
   iops        : min=    3, max=   11, avg= 7.97, stdev= 2.05, samples=285
  lat (msec)   : 50=0.39%, 100=46.72%, 250=52.27%, 500=0.62%
  cpu          : usr=0.47%, sys=0.45%, ctx=2671, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1280,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=34.8MiB/s (36.5MB/s), 34.8MiB/s-34.8MiB/s (36.5MB/s-36.5MB/s), io=5120MiB (5369MB), run=147112-147112msec
root@truenas[/mnt/NAS]

While running these tests, I checked `zpool iostat`

During the read test

Code:

                                                  capacity     operations     bandwidth
pool                                            alloc   free   read  write   read  write
----------------------------------------------  -----  -----  -----  -----  -----  -----
NAS                                             2.68T   972G    655      0  81.9M      0
  mirror-0                                      2.68T   972G    655      0  81.9M      0
    gptid/afc545f7-75ad-11eb-9241-f9ace32b146a      -      -    329      0  41.2M      0
    gptid/aff57292-75ad-11eb-9241-f9ace32b146a      -      -    325      0  40.7M      0
----------------------------------------------  -----  -----  -----  -----  -----  -----

And write test

Code:

                                                  capacity     operations     bandwidth
pool                                            alloc   free   read  write   read  write
----------------------------------------------  -----  -----  -----  -----  -----  -----
NAS                                             2.68T   969G      0    364      0  71.4M
  mirror-0                                      2.68T   969G      0    364      0  71.4M
    gptid/afc545f7-75ad-11eb-9241-f9ace32b146a      -      -      0    182      0  35.7M
    gptid/aff57292-75ad-11eb-9241-f9ace32b146a      -      -      0    182      0  35.7M
----------------------------------------------  -----  -----  -----  -----  -----  -----

These numbers just seem way too low.

HoneyBadger · Nov 2, 2022

Your fio command is structured to only run a single job, with single-depth I/O, synchronously - this will prevent any manner of parallelism from kicking in for reads.

Change your fio parameters to include jobs=2 and it will be able to issue reads to each disk in the mirror:

Reads with jobs=1

Code:

root@badger:/mnt/MirrorVolume/testdata # zfs set primarycache=metadata MirrorVolume/testdata
root@badger:/mnt/MirrorVolume/testdata # fio --filename=test.file --name=sync_seqread --rw=read --bs=4M --direct=1 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --refill_buffers --size=1G --loops=10 --group_reporting
sync_seqread: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
fio-3.0
Starting 1 process
sync_seqread: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=91.9MiB/s,w=0KiB/s][r=22,w=0 IOPS][eta 00m:00s]
sync_seqread: (groupid=0, jobs=1): err= 0: pid=14886: Wed Nov  2 23:56:56 2022
   read: IOPS=22, BW=90.7MiB/s (95.1MB/s)(10.0GiB/112941msec)
    clat (msec): min=26, max=218, avg=44.11, stdev=10.31
     lat (msec): min=26, max=218, avg=44.11, stdev=10.31
    clat percentiles (msec):
     |  1.00th=[   30],  5.00th=[   35], 10.00th=[   36], 20.00th=[   40],
     | 30.00th=[   41], 40.00th=[   42], 50.00th=[   43], 60.00th=[   45],
     | 70.00th=[   46], 80.00th=[   50], 90.00th=[   51], 95.00th=[   55],
     | 99.00th=[   91], 99.50th=[  122], 99.90th=[  155], 99.95th=[  213],
     | 99.99th=[  220]
   bw (  KiB/s): min=57115, max=114459, per=99.65%, avg=92512.85, stdev=9996.78, samples=225
   iops        : min=   13, max=   27, avg=21.74, stdev= 2.48, samples=225
  lat (msec)   : 50=87.50%, 100=11.84%, 250=0.66%
  cpu          : usr=0.03%, sys=1.96%, ctx=10307, majf=5, minf=1025
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=2560,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=90.7MiB/s (95.1MB/s), 90.7MiB/s-90.7MiB/s (95.1MB/s-95.1MB/s), io=10.0GiB (10.7GB), run=112941-112941msec

Reads with jobs=2

Code:

root@badger:/mnt/MirrorVolume/testdata # fio --filename=test.file --name=sync_seqread --rw=read --bs=4M --direct=1 --sync=1 --numjobs=2 --ioengine=psync --iodepth=1 --refill_buffers --size=1G --loops=10 --group_reporting
sync_seqread: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
...
fio-3.0
Starting 2 processes
Jobs: 1 (f=1): [R(1),_(1)][100.0%][r=113MiB/s,w=0KiB/s][r=28,w=0 IOPS][eta 00m:00s]
sync_seqread: (groupid=0, jobs=2): err= 0: pid=15594: Thu Nov  3 00:03:41 2022
   read: IOPS=37, BW=148MiB/s (156MB/s)(20.0GiB/137916msec)
    clat (msec): min=8, max=247, avg=53.79, stdev=17.66
     lat (msec): min=8, max=247, avg=53.79, stdev=17.66
    clat percentiles (msec):
     |  1.00th=[   30],  5.00th=[   35], 10.00th=[   38], 20.00th=[   42],
     | 30.00th=[   43], 40.00th=[   45], 50.00th=[   48], 60.00th=[   53],
     | 70.00th=[   58], 80.00th=[   68], 90.00th=[   81], 95.00th=[   89],
     | 99.00th=[  106], 99.50th=[  111], 99.90th=[  148], 99.95th=[  218],
     | 99.99th=[  249]
   bw (  KiB/s): min=40554, max=106283, per=49.80%, avg=75730.32, stdev=18773.34, samples=550
   iops        : min=    9, max=   25, avg=17.66, stdev= 4.59, samples=550
  lat (msec)   : 10=0.08%, 20=0.04%, 50=56.99%, 100=41.07%, 250=1.82%
  cpu          : usr=0.01%, sys=2.00%, ctx=28179, majf=0, minf=2050
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=5120,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=148MiB/s (156MB/s), 148MiB/s-148MiB/s (156MB/s-156MB/s), io=20.0GiB (21.5GB), run=137916-137916msec

Writes need to go to both disks, so you won't extract any more performance from a single vdev by cranking up the job count there; however, drop the sync line unless you're actually going to demand sync writes from your dataset, and you'll see a rise from not having to wait for immediate write confirmation.

Writes with sync=1:

Code:

root@badger:/mnt/MirrorVolume/testdata # fio --filename=test.file --name=sync_seqwrite --rw=write --bs=4M --direct=1 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --refill_buffers --size=5G --loops=1 --group_reporting            sync_seqwrite: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
fio-3.0
Starting 1 process
sync_seqwrite: Laying out IO file (1 file / 5120MiB)
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=44.4MiB/s][r=0,w=11 IOPS][eta 00m:00s]
sync_seqwrite: (groupid=0, jobs=1): err= 0: pid=15138: Thu Nov  3 00:00:50 2022
  write: IOPS=8, BW=35.0MiB/s (37.7MB/s)(5120MiB/142373msec)
    clat (msec): min=69, max=307, avg=110.36, stdev=40.23
     lat (msec): min=69, max=307, avg=110.36, stdev=40.23
    clat percentiles (msec):
     |  1.00th=[   72],  5.00th=[   78], 10.00th=[   81], 20.00th=[   82],
     | 30.00th=[   85], 40.00th=[   89], 50.00th=[   91], 60.00th=[   99],
     | 70.00th=[  116], 80.00th=[  148], 90.00th=[  165], 95.00th=[  182],
     | 99.00th=[  262], 99.50th=[  271], 99.90th=[  296], 99.95th=[  309],
     | 99.99th=[  309]
   bw (  KiB/s): min=16094, max=57344, per=99.08%, avg=36485.75, stdev=9362.43, samples=284
   iops        : min=    3, max=   14, avg= 8.07, stdev= 2.33, samples=284
  lat (msec)   : 100=61.33%, 250=37.19%, 500=1.48%
  cpu          : usr=0.74%, sys=0.97%, ctx=2925, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,1280,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=35.0MiB/s (37.7MB/s), 35.0MiB/s-35.0MiB/s (37.7MB/s-37.7MB/s), io=5120MiB (5369MB), run=142373-142373msec

Writing asynchronously:

Code:

root@badger:/mnt/MirrorVolume/testdata # fio --filename=test.file --name=sync_seqwrite --rw=write --bs=4M --direct=1 --numjobs=1 --ioengine=psync --iodepth=1 --refill_buffers --size=5G --loops=1 --group_reporting
sync_seqwrite: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
fio-3.0
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=76.2MiB/s][r=0,w=19 IOPS][eta 00m:00s]
sync_seqwrite: (groupid=0, jobs=1): err= 0: pid=16725: Thu Nov  3 00:14:49 2022
  write: IOPS=24, BW=98.7MiB/s (104MB/s)(5120MiB/51861msec)
    clat (usec): min=790, max=450195, avg=39659.56, stdev=33464.23
     lat (usec): min=790, max=450195, avg=39660.54, stdev=33464.23
    clat percentiles (usec):
     |  1.00th=[   807],  5.00th=[   824], 10.00th=[   857], 20.00th=[  3589],
     | 30.00th=[  4293], 40.00th=[ 44303], 50.00th=[ 50070], 60.00th=[ 52691],
     | 70.00th=[ 55837], 80.00th=[ 58983], 90.00th=[ 66847], 95.00th=[ 82314],
     | 99.00th=[120062], 99.50th=[141558], 99.90th=[421528], 99.95th=[450888],
     | 99.99th=[450888]
   bw (  KiB/s): min=24236, max=2096956, per=99.75%, avg=100842.71, stdev=217577.85, samples=103
   iops        : min=    5, max=  511, avg=23.80, stdev=53.12, samples=103
  lat (usec)   : 1000=13.98%
  lat (msec)   : 2=5.23%, 4=2.50%, 10=11.64%, 20=0.31%, 50=15.94%
  lat (msec)   : 100=47.97%, 250=2.27%, 500=0.16%
  cpu          : usr=2.22%, sys=4.70%, ctx=44202, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,1280,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=98.7MiB/s (104MB/s), 98.7MiB/s-98.7MiB/s (104MB/s-104MB/s), io=5120MiB (5369MB), run=51861-51861msec

Littlejd97 · Nov 3, 2022

Wonderful explanation! I now get closer speeds to what I was expecting. Thank you.

I think my clients are all connected async, but if I were wanting to increase my sync performance, I assume adding a fast SLOG would be the trick

HoneyBadger · Nov 4, 2022

Yes; the link in my signature (https://www.truenas.com/community/threads/slog-benchmarking-and-finding-the-best-slog.63521/) is a good place to check in for fast SLOG devices. If it's just for homelab perspectives and you're limited to single-gigabit speeds, the small 16G Optane M10 devices work well here assuming you're okay with their lower (~300T) write lifespan.

Important Announcement for the TrueNAS Community.

Confusingly low per-disk sequential performance

Littlejd97

Cadet

HoneyBadger

actually does care

Littlejd97

Cadet

HoneyBadger

actually does care

Similar threads

Important Announcement for the TrueNAS Community.

Confusingly low per-disk sequential performance

Littlejd97

Cadet

HoneyBadger

actually does care

Littlejd97

Cadet

HoneyBadger

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Confusingly low per-disk sequential performance"

Similar threads