Performance Testing Questions

NickF · Oct 13, 2021

Hi All,

I recently started foolin around with an old server in my lab. I'm looking to see if my performance numbers are out-of-whack because they seem low to me. I've built a poor-man's all flash array using the following

A Supermicro 216 Chassis
A Supermicro x8dth-6 motherboard (SAS2008-IT onboard)
2 additional LSI 9211-8is
2x Xeon X5680s
96GB of RAM
22 Crucial BX500 120GB SSDs (11 mirrored VDEVs) ashift=12
Intel X520-DA2
Running TrueNAS Scale latest beta

fio results:

Sooo 16,000 IOPS and 2000 MiB/s read and write...

A ZVOL Mounted via ISCSI on Windows over a 10gigabit network:

As a point of reference I compared these results to my production server. The production server is faster than the all flash server? This is NOT apples-to-apples but I'm rather confused??

Dell R720
H310 IT Mode
LSI 9205-8i IT Mode
2x Xeon E5-2920s
192GB RAM
Fusion Pool with 12x4TB WD RED (CMR) drives and 2x Samsung SM953 480GB NVME SSDs
TrueNAS CORE 12.0U6

Production server FIO results

Sooo 17,000 IOPS and 2100 MiB/s read and write..

A SMB share over the same 10gigabit network to the same desktop

NugentS · Oct 14, 2021

Whilst I have no specific comments, other than to say that seems slow I would say are you nor asking a bit much of the 9200 LSI cards which are designed for HDD, not masses of SSD.
Also, check the client you are testing from - a recent similar thread turned out to be client issues - not server - but this is probably not your issue

Lastly, Laurence Systems did a recent test on the same hardware. He benched Core, then rebuilt Scale and benched that, keeping everything as identical as possible, Scale is currently performing a lot slower than core - he has put this down to Beta software

NickF · Oct 14, 2021

I can try and re-do the testing with some 9205-IT mode cards, but since my system is PCI-E 2.0 I'm not sure if it will have an appreciable benefit. SAS2308 is generally better than SAS2008 chips in any case. But still, I was expecting closer to 3500 MB/s...at least inline with modern M.2 performance...with 3 separate 2008 controllers and 24 drives with no SAS expander I'm not sure why that's not possible??

I can also try and swap the drives for some old 180GB Intel SSD 520s I have to see if that changes anything. The CT120BX500SSD1 SSDs I have are DRAM-less and any efficiency increase with more modern NAND vs the old 520s is probably lost to that. Worth experimenting, I wish I had a 3rd set of drives to compare against but that's all I have to play with.

I've also bid on an X9DRi-LN4F so I can get some faster Ivy Bridge Xeons (E5-2667 v2?) and see if that helps after I do the above testing...

Any other thoughts are appreciated, I know this is all older hardware but I'm not sure what the biggest bottleneck is....This should be significantly faster than 12 spinning disks in a Raid-Z2 by the nature of the fact that they in a mirrors pool and they are SSDs....?

NugentS · Oct 14, 2021

It should be waaaaaay faster

NickF · Oct 14, 2021

So I did the SSD swapout to the SSD Intel 520s

Now we are at 12K IOPS and 1500 MB/s read and write.

Crystal Disk Mark over ISCSI performance

I know these are really old Sandforce drives, so I wasn't sure what to expect here. Still, performance seems rather low....

I will try and swap in the LSI 9205s tomorrow. Any other ideas are appreciated.

NugentS · Oct 15, 2021

What are you testing from?
A recent thread on performance turned out to be a client issue - a firewall / AV product was causing major performance issues

ChrisRJ · Oct 15, 2021

How is the CPU load during the test? In general you need to identify the bottleneck. And that requires changing settings, components, etc. I would recommend making a list of potential suspects and then checking them one after the other.

NickF · Oct 16, 2021

Additional testing has ensued...

Replaced 9211s with 9205s

13k IOPS and 1700MB/s

Less than with the Crucial drives, faster than 9211s.

Won the action for the LGA-2011 board. Will test with the E5-2620s that it came with (the same CPUs that are in my production server).

RAM is the same exact DIMMS as are in my production server. Experiment is on hold until that comes. If it's not significantly faster with the new board I'm sorta at a loss??

Is my testing methodology not correct?

Spearfoot · Oct 16, 2021

My guess is that you're bumping up against the limits of the SAS 2008/2308chipset and/or the 6Gb/s SATA/SAS2 I/O system.

TL/DR: SAS3 gives 56k IOPS and 7450MB/s, while SAS2/SATA gives 13.6k IOPS and 1786MB/s

My conclusion? The SATA/SAS2 I/O system's performance is constrained, even when using SSDs.

Background: I ran your fio benchmark on 2 of my systems. I should say 'nearly the same as yours', because both servers run FreeNAS 11.2-U8 and their version of fio doesn't support the --gtod_reduce option you used. The pools on both systems are made of spinning rust, not SSDs.

I get results very similar to yours on the server running 6Gb/s LSI SAS9207-8i HBAs (LSI SAS2308 chipset) and spectacularly better results from the system equipped with a 12Gb/s LSI SAS9300-8i HBA (LSI SAS3008).

The SATA/SAS2 system ('BANDIT') is a Supermicro X9DRi-LN4F with 3 x LSI SAS9207-8i, a direct-attached SAS2 backplane, and 16 x 4TB SATA disks configured as mirrors. Results are 13.6k IOPS and ~1786MB/s, much the same as yours:

Code:

randrw: (groupid=0, jobs=12): err= 0: pid=82115: Sat Oct 16 23:33:04 2021
   read: IOPS=13.6k, BW=1703MiB/s (1786MB/s)(99.8GiB/60015msec)
    slat (nsec): min=524, max=1157.3M, avg=420280.39, stdev=7927301.30
    clat (usec): min=5, max=1940.9k, avg=7019.84, stdev=36687.99
     lat (usec): min=66, max=1941.2k, avg=7440.54, stdev=37766.61
    clat percentiles (usec):
     |  1.00th=[     55],  5.00th=[    355], 10.00th=[    709],
     | 20.00th=[   1450], 30.00th=[   2212], 40.00th=[   3032],
     | 50.00th=[   3884], 60.00th=[   4752], 70.00th=[   5735],
     | 80.00th=[   6783], 90.00th=[   8455], 95.00th=[  10290],
     | 99.00th=[  34341], 99.50th=[ 206570], 99.90th=[ 608175],
     | 99.95th=[ 767558], 99.99th=[1069548]
   bw (  KiB/s): min= 1303, max=498671, per=8.50%, avg=148189.67, stdev=90632.25, samples=1351
   iops        : min=   10, max= 3895, avg=1157.24, stdev=708.06, samples=1351
  write: IOPS=13.6k, BW=1704MiB/s (1787MB/s)(99.9GiB/60015msec)
    slat (usec): min=2, max=1301.2k, avg=410.30, stdev=7885.67
    clat (usec): min=30, max=1941.2k, avg=7442.47, stdev=39351.72
     lat (usec): min=103, max=1941.4k, avg=7853.22, stdev=40445.87
    clat percentiles (usec):
     |  1.00th=[     57],  5.00th=[    392], 10.00th=[    758],
     | 20.00th=[   1500], 30.00th=[   2278], 40.00th=[   3097],
     | 50.00th=[   3916], 60.00th=[   4817], 70.00th=[   5735],
     | 80.00th=[   6849], 90.00th=[   8455], 95.00th=[  10421],
     | 99.00th=[  42730], 99.50th=[ 235930], 99.90th=[ 650118],
     | 99.95th=[ 801113], 99.99th=[1069548]
   bw (  KiB/s): min=  751, max=486098, per=8.50%, avg=148295.44, stdev=90556.52, samples=1351
   iops        : min=    5, max= 3797, avg=1158.05, stdev=707.47, samples=1351
  lat (usec)   : 10=0.01%, 50=0.25%, 100=1.52%, 250=1.55%, 500=3.43%
  lat (usec)   : 750=3.46%, 1000=3.44%
  lat (msec)   : 2=13.29%, 4=24.23%, 10=43.32%, 20=4.05%, 50=0.56%
  lat (msec)   : 100=0.18%, 250=0.28%, 500=0.28%, 750=0.11%, 1000=0.04%
  cpu          : usr=2.27%, sys=2.02%, ctx=2400591, majf=0, minf=0
  IO depths    : 1=2.8%, 2=7.1%, 4=14.8%, 8=29.8%, 16=59.7%, 32=3.7%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=96.9%, 8=0.1%, 16=0.1%, 32=3.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=817461,818140,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=1703MiB/s (1786MB/s), 1703MiB/s-1703MiB/s (1786MB/s-1786MB/s), io=99.8GiB (107GB), run=60015-60015msec
  WRITE: bw=1704MiB/s (1787MB/s), 1704MiB/s-1704MiB/s (1787MB/s-1787MB/s), io=99.9GiB (107GB), run=60015-60015msec

The SAS3 system ('BACON') is a Supermicro X10SRL-F with an LSI SAS9300-8i, a SAS3 expander backplane, and 10 x 4TB SAS3 disks configured as 2 x 5-disk RAIDZ2 vdevs. Results are ~56.8k IOPS and ~7450MB/s (details below):

Code:

randrw: (groupid=0, jobs=12): err= 0: pid=1378: Sat Oct 16 23:38:34 2021
   read: IOPS=56.8k, BW=7102MiB/s (7447MB/s)(416GiB/60001msec)
    slat (nsec): min=478, max=605061k, avg=66475.01, stdev=1108715.40
    clat (usec): min=7, max=824734, avg=2043.60, stdev=6062.93
     lat (usec): min=39, max=825033, avg=2110.07, stdev=6183.62
    clat percentiles (usec):
     |  1.00th=[    75],  5.00th=[   192], 10.00th=[   310], 20.00th=[   537],
     | 30.00th=[   766], 40.00th=[  1004], 50.00th=[  1237], 60.00th=[  1483],
     | 70.00th=[  1745], 80.00th=[  2089], 90.00th=[  3523], 95.00th=[  7111],
     | 99.00th=[ 14353], 99.50th=[ 17433], 99.90th=[ 61080], 99.95th=[113771],
     | 99.99th=[231736]
   bw (  KiB/s): min=12288, max=1024000, per=8.33%, avg=605694.98, stdev=314143.10, samples=1436
   iops        : min=   96, max= 8000, avg=4731.67, stdev=2454.25, samples=1436
  write: IOPS=56.9k, BW=7108MiB/s (7454MB/s)(417GiB/60001msec)
    slat (nsec): min=1376, max=613328k, avg=79069.72, stdev=1357027.67
    clat (usec): min=17, max=842909, avg=2283.93, stdev=6745.07
     lat (usec): min=55, max=842934, avg=2363.00, stdev=6897.94
    clat percentiles (usec):
     |  1.00th=[   113],  5.00th=[   233], 10.00th=[   351], 20.00th=[   586],
     | 30.00th=[   816], 40.00th=[  1057], 50.00th=[  1287], 60.00th=[  1532],
     | 70.00th=[  1795], 80.00th=[  2180], 90.00th=[  3949], 95.00th=[  8291],
     | 99.00th=[ 17433], 99.50th=[ 22414], 99.90th=[ 77071], 99.95th=[137364],
     | 99.99th=[246416]
   bw (  KiB/s): min=10496, max=1024512, per=8.33%, avg=606213.19, stdev=314317.66, samples=1436
   iops        : min=   82, max= 8004, avg=4735.72, stdev=2455.62, samples=1436
  lat (usec)   : 10=0.01%, 20=0.01%, 50=0.12%, 100=1.10%, 250=5.37%
  lat (usec)   : 500=10.79%, 750=10.76%, 1000=10.68%
  lat (msec)   : 2=38.12%, 4=13.58%, 10=6.23%, 20=2.74%, 50=0.37%
  lat (msec)   : 100=0.08%, 250=0.06%, 500=0.01%, 750=0.01%, 1000=0.01%
  cpu          : usr=8.43%, sys=9.43%, ctx=10067248, majf=0, minf=0
  IO depths    : 1=1.1%, 2=5.5%, 4=13.5%, 8=28.0%, 16=63.6%, 32=5.6%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=96.6%, 8=0.3%, 16=0.2%, 32=2.9%, 64=0.0%, >=64=0.0%
     issued rwts: total=3408848,3411977,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=7102MiB/s (7447MB/s), 7102MiB/s-7102MiB/s (7447MB/s-7447MB/s), io=416GiB (447GB), run=60001-60001msec
  WRITE: bw=7108MiB/s (7454MB/s), 7108MiB/s-7108MiB/s (7454MB/s-7454MB/s), io=417GiB (447GB), run=60001-60001msec

My fio script:

Code:

fio --name=randrw \
 --bs=128k \
 --direct=1 \
 --directory=/mnt/tank/systems \
 --ioengine=posixaio \
 --iodepth=32 \
 --group_reporting \
 --numjobs=12 \
 --ramp_time=10 \
 --runtime=60 \
 --rw=randrw \
 --size=256MGB \
 --time_based

NickF · Oct 16, 2021

Spearfoot said:

My guess is that you're bumping up against the limits of the SAS 2008/2308chipset and/or the 6Gb/s SATA/SAS2 I/O system.

TL/DR: SAS3 gives 56k IOPS and 7450MB/s, while SAS2/SATA gives 13.6k IOPS and 1786MB/s

My conclusion? The SATA/SAS2 I/O system's performance is constrained, even when using SSDs.

Background: I ran your fio benchmark on 2 of my systems. I should say 'nearly the same as yours', because both servers run FreeNAS 11.2-U8 and their version of fio doesn't support the --gtod_reduce option you used. The pools on both systems are made of spinning rust, not SSDs.

I get results very similar to yours on the server running 6Gb/s LSI SAS9207-8i HBAs (LSI SAS2308 chipset) and spectacularly better results from the system equipped with a 12Gb/s LSI SAS9300-8i HBA (LSI SAS3008).

The SATA/SAS2 system ('BANDIT') is a Supermicro X9DRi-LN4F with 3 x LSI SAS9207-8i, a direct-attached SAS2 backplane, and 16 x 4TB SATA disks configured as mirrors. Results are 13.6k IOPS and ~1786MB/s, much the same as yours:

Code:

randrw: (groupid=0, jobs=12): err= 0: pid=82115: Sat Oct 16 23:33:04 2021
   read: IOPS=13.6k, BW=1703MiB/s (1786MB/s)(99.8GiB/60015msec)
    slat (nsec): min=524, max=1157.3M, avg=420280.39, stdev=7927301.30
    clat (usec): min=5, max=1940.9k, avg=7019.84, stdev=36687.99
     lat (usec): min=66, max=1941.2k, avg=7440.54, stdev=37766.61
    clat percentiles (usec):
     |  1.00th=[     55],  5.00th=[    355], 10.00th=[    709],
     | 20.00th=[   1450], 30.00th=[   2212], 40.00th=[   3032],
     | 50.00th=[   3884], 60.00th=[   4752], 70.00th=[   5735],
     | 80.00th=[   6783], 90.00th=[   8455], 95.00th=[  10290],
     | 99.00th=[  34341], 99.50th=[ 206570], 99.90th=[ 608175],
     | 99.95th=[ 767558], 99.99th=[1069548]
   bw (  KiB/s): min= 1303, max=498671, per=8.50%, avg=148189.67, stdev=90632.25, samples=1351
   iops        : min=   10, max= 3895, avg=1157.24, stdev=708.06, samples=1351
  write: IOPS=13.6k, BW=1704MiB/s (1787MB/s)(99.9GiB/60015msec)
    slat (usec): min=2, max=1301.2k, avg=410.30, stdev=7885.67
    clat (usec): min=30, max=1941.2k, avg=7442.47, stdev=39351.72
     lat (usec): min=103, max=1941.4k, avg=7853.22, stdev=40445.87
    clat percentiles (usec):
     |  1.00th=[     57],  5.00th=[    392], 10.00th=[    758],
     | 20.00th=[   1500], 30.00th=[   2278], 40.00th=[   3097],
     | 50.00th=[   3916], 60.00th=[   4817], 70.00th=[   5735],
     | 80.00th=[   6849], 90.00th=[   8455], 95.00th=[  10421],
     | 99.00th=[  42730], 99.50th=[ 235930], 99.90th=[ 650118],
     | 99.95th=[ 801113], 99.99th=[1069548]
   bw (  KiB/s): min=  751, max=486098, per=8.50%, avg=148295.44, stdev=90556.52, samples=1351
   iops        : min=    5, max= 3797, avg=1158.05, stdev=707.47, samples=1351
  lat (usec)   : 10=0.01%, 50=0.25%, 100=1.52%, 250=1.55%, 500=3.43%
  lat (usec)   : 750=3.46%, 1000=3.44%
  lat (msec)   : 2=13.29%, 4=24.23%, 10=43.32%, 20=4.05%, 50=0.56%
  lat (msec)   : 100=0.18%, 250=0.28%, 500=0.28%, 750=0.11%, 1000=0.04%
  cpu          : usr=2.27%, sys=2.02%, ctx=2400591, majf=0, minf=0
  IO depths    : 1=2.8%, 2=7.1%, 4=14.8%, 8=29.8%, 16=59.7%, 32=3.7%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=96.9%, 8=0.1%, 16=0.1%, 32=3.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=817461,818140,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=1703MiB/s (1786MB/s), 1703MiB/s-1703MiB/s (1786MB/s-1786MB/s), io=99.8GiB (107GB), run=60015-60015msec
  WRITE: bw=1704MiB/s (1787MB/s), 1704MiB/s-1704MiB/s (1787MB/s-1787MB/s), io=99.9GiB (107GB), run=60015-60015msec

The SAS3 system ('BACON') is a Supermicro X10SRL-F with an LSI SAS9300-8i, a SAS3 expander backplane, and 10 x 4TB SAS3 disks configured as 2 x 5-disk RAIDZ2 vdevs. Results are ~56.8k IOPS and ~7450MB/s (details below):

Code:

randrw: (groupid=0, jobs=12): err= 0: pid=1378: Sat Oct 16 23:38:34 2021
   read: IOPS=56.8k, BW=7102MiB/s (7447MB/s)(416GiB/60001msec)
    slat (nsec): min=478, max=605061k, avg=66475.01, stdev=1108715.40
    clat (usec): min=7, max=824734, avg=2043.60, stdev=6062.93
     lat (usec): min=39, max=825033, avg=2110.07, stdev=6183.62
    clat percentiles (usec):
     |  1.00th=[    75],  5.00th=[   192], 10.00th=[   310], 20.00th=[   537],
     | 30.00th=[   766], 40.00th=[  1004], 50.00th=[  1237], 60.00th=[  1483],
     | 70.00th=[  1745], 80.00th=[  2089], 90.00th=[  3523], 95.00th=[  7111],
     | 99.00th=[ 14353], 99.50th=[ 17433], 99.90th=[ 61080], 99.95th=[113771],
     | 99.99th=[231736]
   bw (  KiB/s): min=12288, max=1024000, per=8.33%, avg=605694.98, stdev=314143.10, samples=1436
   iops        : min=   96, max= 8000, avg=4731.67, stdev=2454.25, samples=1436
  write: IOPS=56.9k, BW=7108MiB/s (7454MB/s)(417GiB/60001msec)
    slat (nsec): min=1376, max=613328k, avg=79069.72, stdev=1357027.67
    clat (usec): min=17, max=842909, avg=2283.93, stdev=6745.07
     lat (usec): min=55, max=842934, avg=2363.00, stdev=6897.94
    clat percentiles (usec):
     |  1.00th=[   113],  5.00th=[   233], 10.00th=[   351], 20.00th=[   586],
     | 30.00th=[   816], 40.00th=[  1057], 50.00th=[  1287], 60.00th=[  1532],
     | 70.00th=[  1795], 80.00th=[  2180], 90.00th=[  3949], 95.00th=[  8291],
     | 99.00th=[ 17433], 99.50th=[ 22414], 99.90th=[ 77071], 99.95th=[137364],
     | 99.99th=[246416]
   bw (  KiB/s): min=10496, max=1024512, per=8.33%, avg=606213.19, stdev=314317.66, samples=1436
   iops        : min=   82, max= 8004, avg=4735.72, stdev=2455.62, samples=1436
  lat (usec)   : 10=0.01%, 20=0.01%, 50=0.12%, 100=1.10%, 250=5.37%
  lat (usec)   : 500=10.79%, 750=10.76%, 1000=10.68%
  lat (msec)   : 2=38.12%, 4=13.58%, 10=6.23%, 20=2.74%, 50=0.37%
  lat (msec)   : 100=0.08%, 250=0.06%, 500=0.01%, 750=0.01%, 1000=0.01%
  cpu          : usr=8.43%, sys=9.43%, ctx=10067248, majf=0, minf=0
  IO depths    : 1=1.1%, 2=5.5%, 4=13.5%, 8=28.0%, 16=63.6%, 32=5.6%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=96.6%, 8=0.3%, 16=0.2%, 32=2.9%, 64=0.0%, >=64=0.0%
     issued rwts: total=3408848,3411977,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=7102MiB/s (7447MB/s), 7102MiB/s-7102MiB/s (7447MB/s-7447MB/s), io=416GiB (447GB), run=60001-60001msec
  WRITE: bw=7108MiB/s (7454MB/s), 7108MiB/s-7108MiB/s (7454MB/s-7454MB/s), io=417GiB (447GB), run=60001-60001msec

My fio script:

Code:

fio --name=randrw \
--bs=128k \
--direct=1 \
--directory=/mnt/tank/systems \
--ioengine=posixaio \
--iodepth=32 \
--group_reporting \
--numjobs=12 \
--ramp_time=10 \
--runtime=60 \
--rw=randrw \
--size=256MGB \
--time_based

That's rather interesting... I really appreciate the input. My assumption thusfar has been that my largest problem is that my motherboard is operating at PCIE-2.0 not 3.0 which is why I went for an X9 generation board to try and solve the GT/s bottleneck issue. What I still don't understand is how 3 cards are presenting such a large bottleneck. 1 or 2 cards being limited to this level of performance would not have surprised me, but with 3 cards? I was really hoping to see at least 300MB/s per mirror on Westmere, which is close to the 3000+ or so MB/s of NVME. I'm guessing it's more than just the PCIE generation, and has to do with the bus differences between Westmere and Sandy Bridge-E

Your evidence contradicts my assumption as your numbers correspond with equivalent performance to mine. However, I also have evidence that proves your assertion may not be wholly true, but still close. My production server's pool I have been testing is also using a single LSI 2308 generation card, and it only has 8 SAS Lanes for all 12 drives...Why is my production spinning rust system faster than yours which has 16 drives, and you have 3x 2308 cards with no SAS expander? My system is RAIDZ-2 to boot. Is the SPECIAL VDEV somehow accelerating FIO, even though the file size is far too large?

That's faster than the flash server I'm testing and faster than yours with similar generation hardware to that you are running and to that I have just purchased. Theoretically, at the very least, I would imagine that my flash array should be faster than both my array on my production machine as well as your system with the introduction of the Sandybridge-E motherboard? How much faster, I'm not sure. I would settle for the 3,000 MB/s I was hoping for on Westmere...

On a separate note, I'm confused by your benchmarking even more. Your numbers indicating the stark contrast between 2308 and 3008 series chipsets is not in alignment with other posts I've seen here, which is intriguing. Heres a quote from jgreco in another thread from last year:

jgreco said:
....
The LSI 2308 is PCIe 3.0 and 50% faster.
....
The main problem with using SSD behind an HBA is that you do have the potential to swamp out the CPU on the HBA's; see above for speeds. LSI released the 2308 in large part as a response to the capping effect on the 2008 when you either have a huge number of HDD's or fast SSD's. The 2308 is generally expected to be "as fast as 6Gbps can go" though if you showed me a graph showing a 3008 as a handful of percent faster, I wouldn't be shocked.

This is not the only mention I've found on the forums stating that 2308-series cards shouldn't provide much of a bottleneck for SSDs. I guess we'll all learn together in a week or so.

NickF · Oct 21, 2021

So my new motherboard arrived today...and as I had hoped... It is significantly faster...

Code:

fio --bs=128k --direct=1 --directory=/mnt/lol/fio --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based

Sitting at 25.k IOPS and 3215MB/s

I wish I had some SAS3 cards/SAS3 backplane to see if performance scales JUST by moving to SAS3 hardware...but they are still pretty pricey

I think I will pickup some faster single threaded CPUs next and see what benefit that gives me. 2620s don't exactly scream, and the X5680s I had in the other board are probably faster. It's kinda silly, because over ISCSI it's still only a little faster than a single SATA SSD...

For comparison, this is my (very) full M.2 SSD on my workstation

NickF · Oct 25, 2021

Received new CPUs already... Performance hasn't changed?

2xE5-2667s...

26K IOPS 3272MB/s

NickF · Oct 26, 2021

To be clear, it does seem to be scaling linearly, so maybe this is as fast as these drives go?

This is a single drive:

With 11 VDEVs of 2-drive mirrors I am getting 10x the performance

I've also tested with all 22 drives in a single RAIDZ1

23k IOPS and 2800MB/s
Performance is pretty close to mirrors, which goes against conventional wisdom I've seen here. It was always my understanding that mirrors had a high cost in storage efficiency but were always faster than RAIDZ. Although, we are talking about 22 drives and only a single drive worth of parity...not exactly production ready.

Two Raid-Z1 VDEVs of 11 drives each yields the best performance/storage efficiency so far

28k IOPS and 3500MB/s.

Still, not sure if I would put into production 1 drive of parity with 11 disks.

So next I made a 20-drive ZPool with 4 5-drive vdevs in a raid-z1

Losing 2 drives as "hot spares" plus 3 for parity for a total of 5 drives is still better storage efficiency than the mirrors(which lost 11), and still faster.
26K IOPS and 3200MB/s

Finally, I made an 11-drive-per-vdev RAIDZ-2 of 2 VDEVs. This offers better storage efficiency than the 3 RAIDZ-1 VDEV arrangement.

26k IOPS and 3200MB/s still....which is still on par with the mirror array *and from my test appears to be more consistent*

So some takeaways...
So some takeaways...

I'm not going to break 30k IOPS anytime soon
SAS3 cards may be an answer, but I don't honestly think they will be
I have 5 PCIE 16x slots I can put NVME drives and I'm fairly certain that I can do PCI-E bifurcation in this board so I may be able to do 10xNVME drives on this platform to see what that does.
If anyone has sone 256gb nvme drives from old laptops or something and you want to donate to the cause for science I'm here xD

I hope this helps someone later...

NickF · Oct 26, 2021

For giggles, I wanted to get every drop of performance out of this. I put all 22 drives in a single vdev striped

38K IOPS 4700MB/s

The bottleneck for sure does not appear to be the HBAs...

Important Announcement for the TrueNAS Community.

Performance Testing Questions

NickF

Guru

NugentS

MVP

NickF

Guru

NugentS

MVP

NickF

Guru

NugentS

MVP

ChrisRJ

Wizard

NickF

Guru

Spearfoot

He of the long foot

NickF

Guru

NickF

Guru

NickF

Guru

NickF

Guru

NickF

Guru

Similar threads

Important Announcement for the TrueNAS Community.

Performance Testing Questions

Guru

MVP

Guru

MVP

Guru

MVP

Wizard

Guru

He of the long foot

Guru

Guru

Guru

Guru

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Performance Testing Questions"

Similar threads