My guess is that you're bumping up against the limits of the SAS 2008/2308chipset and/or the 6Gb/s SATA/SAS2 I/O system.
TL/DR: SAS3 gives 56k IOPS and 7450MB/s, while SAS2/SATA gives 13.6k IOPS and 1786MB/s
My conclusion? The SATA/SAS2 I/O system's performance is constrained, even when using SSDs.
Background: I ran your 
 fio benchmark on 2 of my systems. I should say 'nearly the same as yours', because both servers run FreeNAS 11.2-U8 and their version of 
 fio doesn't support the 
 --gtod_reduce option you used. The pools on both systems are made of spinning rust, not SSDs.
I get results very similar to yours on the server running 6Gb/s LSI SAS9207-8i HBAs (LSI SAS2308 chipset) and spectacularly better results from the system equipped with a 12Gb/s LSI SAS9300-8i HBA (LSI SAS3008).
The SATA/SAS2 system ('BANDIT') is a Supermicro X9DRi-LN4F with 3 x LSI SAS9207-8i, a direct-attached SAS2 backplane, and 16 x 4TB SATA disks configured as mirrors. Results are 
13.6k IOPS and 
~1786MB/s, much the same as yours:
Code:
randrw: (groupid=0, jobs=12): err= 0: pid=82115: Sat Oct 16 23:33:04 2021
   read: IOPS=13.6k, BW=1703MiB/s (1786MB/s)(99.8GiB/60015msec)
    slat (nsec): min=524, max=1157.3M, avg=420280.39, stdev=7927301.30
    clat (usec): min=5, max=1940.9k, avg=7019.84, stdev=36687.99
     lat (usec): min=66, max=1941.2k, avg=7440.54, stdev=37766.61
    clat percentiles (usec):
     |  1.00th=[     55],  5.00th=[    355], 10.00th=[    709],
     | 20.00th=[   1450], 30.00th=[   2212], 40.00th=[   3032],
     | 50.00th=[   3884], 60.00th=[   4752], 70.00th=[   5735],
     | 80.00th=[   6783], 90.00th=[   8455], 95.00th=[  10290],
     | 99.00th=[  34341], 99.50th=[ 206570], 99.90th=[ 608175],
     | 99.95th=[ 767558], 99.99th=[1069548]
   bw (  KiB/s): min= 1303, max=498671, per=8.50%, avg=148189.67, stdev=90632.25, samples=1351
   iops        : min=   10, max= 3895, avg=1157.24, stdev=708.06, samples=1351
  write: IOPS=13.6k, BW=1704MiB/s (1787MB/s)(99.9GiB/60015msec)
    slat (usec): min=2, max=1301.2k, avg=410.30, stdev=7885.67
    clat (usec): min=30, max=1941.2k, avg=7442.47, stdev=39351.72
     lat (usec): min=103, max=1941.4k, avg=7853.22, stdev=40445.87
    clat percentiles (usec):
     |  1.00th=[     57],  5.00th=[    392], 10.00th=[    758],
     | 20.00th=[   1500], 30.00th=[   2278], 40.00th=[   3097],
     | 50.00th=[   3916], 60.00th=[   4817], 70.00th=[   5735],
     | 80.00th=[   6849], 90.00th=[   8455], 95.00th=[  10421],
     | 99.00th=[  42730], 99.50th=[ 235930], 99.90th=[ 650118],
     | 99.95th=[ 801113], 99.99th=[1069548]
   bw (  KiB/s): min=  751, max=486098, per=8.50%, avg=148295.44, stdev=90556.52, samples=1351
   iops        : min=    5, max= 3797, avg=1158.05, stdev=707.47, samples=1351
  lat (usec)   : 10=0.01%, 50=0.25%, 100=1.52%, 250=1.55%, 500=3.43%
  lat (usec)   : 750=3.46%, 1000=3.44%
  lat (msec)   : 2=13.29%, 4=24.23%, 10=43.32%, 20=4.05%, 50=0.56%
  lat (msec)   : 100=0.18%, 250=0.28%, 500=0.28%, 750=0.11%, 1000=0.04%
  cpu          : usr=2.27%, sys=2.02%, ctx=2400591, majf=0, minf=0
  IO depths    : 1=2.8%, 2=7.1%, 4=14.8%, 8=29.8%, 16=59.7%, 32=3.7%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=96.9%, 8=0.1%, 16=0.1%, 32=3.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=817461,818140,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
   READ: bw=1703MiB/s (1786MB/s), 1703MiB/s-1703MiB/s (1786MB/s-1786MB/s), io=99.8GiB (107GB), run=60015-60015msec
  WRITE: bw=1704MiB/s (1787MB/s), 1704MiB/s-1704MiB/s (1787MB/s-1787MB/s), io=99.9GiB (107GB), run=60015-60015msec
The SAS3 system ('BACON') is a Supermicro X10SRL-F with an LSI SAS9300-8i, a SAS3 expander backplane, and 10 x 4TB SAS3 disks configured as 2 x 5-disk RAIDZ2 vdevs. Results are 
~56.8k IOPS and 
~7450MB/s (details below):
Code:
randrw: (groupid=0, jobs=12): err= 0: pid=1378: Sat Oct 16 23:38:34 2021
   read: IOPS=56.8k, BW=7102MiB/s (7447MB/s)(416GiB/60001msec)
    slat (nsec): min=478, max=605061k, avg=66475.01, stdev=1108715.40
    clat (usec): min=7, max=824734, avg=2043.60, stdev=6062.93
     lat (usec): min=39, max=825033, avg=2110.07, stdev=6183.62
    clat percentiles (usec):
     |  1.00th=[    75],  5.00th=[   192], 10.00th=[   310], 20.00th=[   537],
     | 30.00th=[   766], 40.00th=[  1004], 50.00th=[  1237], 60.00th=[  1483],
     | 70.00th=[  1745], 80.00th=[  2089], 90.00th=[  3523], 95.00th=[  7111],
     | 99.00th=[ 14353], 99.50th=[ 17433], 99.90th=[ 61080], 99.95th=[113771],
     | 99.99th=[231736]
   bw (  KiB/s): min=12288, max=1024000, per=8.33%, avg=605694.98, stdev=314143.10, samples=1436
   iops        : min=   96, max= 8000, avg=4731.67, stdev=2454.25, samples=1436
  write: IOPS=56.9k, BW=7108MiB/s (7454MB/s)(417GiB/60001msec)
    slat (nsec): min=1376, max=613328k, avg=79069.72, stdev=1357027.67
    clat (usec): min=17, max=842909, avg=2283.93, stdev=6745.07
     lat (usec): min=55, max=842934, avg=2363.00, stdev=6897.94
    clat percentiles (usec):
     |  1.00th=[   113],  5.00th=[   233], 10.00th=[   351], 20.00th=[   586],
     | 30.00th=[   816], 40.00th=[  1057], 50.00th=[  1287], 60.00th=[  1532],
     | 70.00th=[  1795], 80.00th=[  2180], 90.00th=[  3949], 95.00th=[  8291],
     | 99.00th=[ 17433], 99.50th=[ 22414], 99.90th=[ 77071], 99.95th=[137364],
     | 99.99th=[246416]
   bw (  KiB/s): min=10496, max=1024512, per=8.33%, avg=606213.19, stdev=314317.66, samples=1436
   iops        : min=   82, max= 8004, avg=4735.72, stdev=2455.62, samples=1436
  lat (usec)   : 10=0.01%, 20=0.01%, 50=0.12%, 100=1.10%, 250=5.37%
  lat (usec)   : 500=10.79%, 750=10.76%, 1000=10.68%
  lat (msec)   : 2=38.12%, 4=13.58%, 10=6.23%, 20=2.74%, 50=0.37%
  lat (msec)   : 100=0.08%, 250=0.06%, 500=0.01%, 750=0.01%, 1000=0.01%
  cpu          : usr=8.43%, sys=9.43%, ctx=10067248, majf=0, minf=0
  IO depths    : 1=1.1%, 2=5.5%, 4=13.5%, 8=28.0%, 16=63.6%, 32=5.6%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=96.6%, 8=0.3%, 16=0.2%, 32=2.9%, 64=0.0%, >=64=0.0%
     issued rwts: total=3408848,3411977,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
   READ: bw=7102MiB/s (7447MB/s), 7102MiB/s-7102MiB/s (7447MB/s-7447MB/s), io=416GiB (447GB), run=60001-60001msec
  WRITE: bw=7108MiB/s (7454MB/s), 7108MiB/s-7108MiB/s (7454MB/s-7454MB/s), io=417GiB (447GB), run=60001-60001msec
My 
 fio script:
Code:
fio --name=randrw \
--bs=128k \
--direct=1 \
--directory=/mnt/tank/systems \
--ioengine=posixaio \
--iodepth=32 \
--group_reporting \
--numjobs=12 \
--ramp_time=10 \
--runtime=60 \
--rw=randrw \
--size=256MGB \
--time_based