Dell R730xd slow IOPS/transfer rate with any drive

eclipse5302 · Apr 10, 2023

Then maybe this should be re-worded

Ericloewe · Apr 10, 2023

Well, what is the pool topology you're using? A fairly wide RAIDZ?

fredbourdelier · Apr 10, 2023

Ericloewe said:
Not recommended, a default that doesn't suck too much for most cases. This is something that should absolutely be tuned according to the workload.

Can you please elaborate on the relationship of the record size to the fio test results vs. real world workloads?

I can understand if the pool is being used for media serving or databases or some other generally large file formats, but what about standard everyday data mix? Would 128k really outshine 16k in those conditions?

What about going the other way and setting it to 8k so it fits within the controller chip buffer per block? would that allow the driver to desynchronize load/offload to bus vs to drives? I don't have the LSI architecture manual for the 3008 (I'll have to go hunt for it) so I'm guessing how I would build the bus I/O buffers if I were designing it.

fredbourdelier · Apr 10, 2023

eclipse5302 said:
What is your record size set for? With 128k being the recommended size, performance seemed to be capped as you found. I ended up going to 16k record size to get decent performance.

With 16k record size:
fio --filename=test --direct=1 --rw=randrw --randrepeat=0 --rwmixread=100 --iodepth=128 --numjobs=8 --runtime=60 --group_reporting --name=4ktest --ioengine=psync --size=4G --bs=4k

4ktest: Laying out IO file (1 file / 4096MiB)
Jobs: 8 (f=8): [r(8)][100.0%][r=2822MiB/s][r=722k IOPS][eta 00m:00s]
4ktest: (groupid=0, jobs=8): err= 0: pid=10750: Mon Apr 10 07:23:27 2023
read: IOPS=719k, BW=2810MiB/s (2947MB/s)(32.0GiB/11660msec)
clat (usec): min=2, max=2670, avg=10.65, stdev=21.41
lat (usec): min=2, max=2670, avg=10.68, stdev=21.41
clat percentiles (usec):
| 1.00th=[ 6], 5.00th=[ 6], 10.00th=[ 6], 20.00th=[ 7],
| 30.00th=[ 7], 40.00th=[ 7], 50.00th=[ 8], 60.00th=[ 8],
| 70.00th=[ 9], 80.00th=[ 10], 90.00th=[ 12], 95.00th=[ 18],
| 99.00th=[ 118], 99.50th=[ 157], 99.90th=[ 306], 99.95th=[ 375],
| 99.99th=[ 494]
bw ( MiB/s): min= 2749, max= 2902, per=100.00%, avg=2814.20, stdev= 5.04, samples=176
iops : min=703811, max=743039, avg=720430.77, stdev=1291.25, samples=176
lat (usec) : 4=0.13%, 10=84.69%, 20=11.20%, 50=2.38%, 100=0.59%
lat (usec) : 250=0.83%, 500=0.17%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%
cpu : usr=5.81%, sys=94.14%, ctx=4195, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=8388608,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
READ: bw=2810MiB/s (2947MB/s), 2810MiB/s-2810MiB/s (2947MB/s-2947MB/s), io=32.0GiB (34.4GB), run=11660-11660msec

After that, NFS performance was still terrible (compared to identical settings on the R720xd). iSCSI performance is excellent. So now we're running iSCSI.

Thanks! Glad to hear that iSCSI is still fast, that will definitely be the go-to for VM workloads. Now I just need to create a VMware 7 initiator for TrueNAS....

Ericloewe · Apr 10, 2023

Well, we could go down a long discussion, but I'm out of time, so here's the tl;dr:

Most of the time, you want the largest block size that will not cause excessive write amplification. Smaller blocks means more blocks, which means more IOPS for the same bandwidth, more metadata to store/read; in addition worse compression, incompatibilities with wide RAIDZ vdevs, etc.

fredbourdelier said:
What about going the other way and setting it to 8k so it fits within the controller chip buffer per block? would that allow the driver to desynchronize load/offload to bus vs to drives? I don't have the LSI architecture manual for the 3008 (I'll have to go hunt for it) so I'm guessing how I would build the bus I/O buffers if I were designing it.

Any marginal gains would be quickly offset by greater protocol overheads and fewer available IOPS - again, more blocks, more trouble.

fredbourdelier · Apr 18, 2023

Ericloewe said:
Well, we could go down a long discussion, but I'm out of time, so here's the tl;dr:

Most of the time, you want the largest block size that will not cause excessive write amplification. Smaller blocks means more blocks, which means more IOPS for the same bandwidth, more metadata to store/read; in addition worse compression, incompatibilities with wide RAIDZ vdevs, etc.

Any marginal gains would be quickly offset by greater protocol overheads and fewer available IOPS - again, more blocks, more trouble.

Thanks for the insight.

dashoe · Jan 11, 2024

eclipse5302 said:
Hello all,

I've looked around and tried everything I can think of, but I can't figure this out. I have an R720 with a 4-vdev mirror flash pool (8 SATA drives 400GB), and running fio on the hoist itself shows expected results of 500k IOPS and 1300MB/s. This host only has 64GB of memory, 2 Xeon 2690s, and I believe a Dell H310 flashed to IT mode. Not 100% on that last part, but it definitely has a regular HBA and not a RAID card pretending to be a HBA.

fio --filename=test --direct=1 --rw=randrw --randrepeat=0 --rwmixread=100 --iodepth=128 --numjobs=8 --runtime=60 --group_reporting --name=4ktest --ioengine=psync --size=4G --bs=4k 4ktest: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=128

I'm upgrading this to a Dell R730xd with 384GB of memory, 2 Xeon 2667s, a Dell HBA330, and (24) 400GB SAS flash drives. Running the same fio command, I can't get this system to exceed 150k IOPS no matter what. I started by creating a 12-vdev flash pool, and when that performance was poor, I re-created the same 4-vdev flash pool as the old server. Even a single vdev flash pool can only hit 150k IOPS. All BIOS/FW is up to date, and I tried different versions of TrueNAS without success. I tried other drives, and they too are limited by this 150k IOPS limit. I even took 2 of these drives out and put them in the R720, created a single mirror pool, and that could easily hit 220k IOPS.

Any idea what is going on here?

I've been reading through your thread as I'm having the same issue. I just upgraded to the HBA330 hoping for amazing performance but not getting it.

Dell R730xd with 256GB of memory, 2 Xeon 2670s, a Dell HBA330, and (24) 600GB SAS 10k drives

Like you, I've tried various drives and array configs but just can't get good performance.

Did you ever find a solution?

Davvo · Jan 11, 2024

dashoe said:
I've been reading through your thread as I'm having the same issue. I just upgraded to the HBA330 hoping for amazing performance but not getting it.

What is your desired performance and use-case? I also suggest opening your own thread.

dashoe · Jan 11, 2024

Trying to optimize performance for VM storage. I'll open a new thread after I do more research.

Thank you

Davvo · Jan 11, 2024

dashoe said:
Trying to optimize performance for VM storage. I'll open a new thread after I do more research.

Thank you

ZFS Storage Pool Layout

This resource was originally created by user: @Davvo on the TrueNAS Community Forums Archive. https://www.truenas.com/community/resources/zfs-storage-pool-layout.201/download [1] This amazing document, created by iXsystems in February 2022 as a “White Paper”, cleanly explains how to qualify...

www.truenas.com

Resource - The path to success for block storage

ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle bits modified after creation. This is optimal for RAIDZ. It's what most...

www.truenas.com

Some differences between RAIDZ and mirrors, and why we use mirrors for block storage

ZFS is a complicated, powerful system. Unfortunately, it isn't actually magic, and there's a lot of opportunity for disappointment if you don't understand what's going on. RAIDZ (including Z2, Z3) is good for storing large sequential files...

www.truenas.com

dashoe · Jan 11, 2024

Thanks for those references. I'm planning to put up an mirror with multiple vdevs once I can confirm the new HBA330 is performing correctly.
I hitched on to this thread because I was trying to compare my results with the fio test being run here. When I run

Code:

fio --filename=test --direct=1 --rw=randrw  --randrepeat=0 --rwmixread=100 --iodepth=128 --numjobs=8 --runtime=60 --group_reporting --name=4ktest --ioengine=psync --size=4G --bs=4k

I get 8 warnings, one for each job

Code:

Note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1

Is that because I'm on Scale instead of Core? Anyway the test completes with the following results:

Code:

Jobs: 4 (f=4): [_(1),r(1),_(1),r(3),_(2)][95.5%][r=1581MiB/s][r=405k IOPS][eta 00m:01s]
4ktest: (groupid=0, jobs=8): err= 0: pid=86019: Thu Jan 11 18:04:13 2024
  read: IOPS=412k, BW=1608MiB/s (1686MB/s)(32.0GiB/20384msec)
    clat (usec): min=2, max=773, avg=18.30, stdev=18.25
     lat (usec): min=2, max=773, avg=18.33, stdev=18.25
    clat percentiles (nsec):
     |  1.00th=[ 5152],  5.00th=[ 5792], 10.00th=[ 6176], 20.00th=[ 6688],
     | 30.00th=[ 7072], 40.00th=[ 7520], 50.00th=[ 8032], 60.00th=[ 8896],
     | 70.00th=[10816], 80.00th=[42752], 90.00th=[48896], 95.00th=[53504],
     | 99.00th=[62208], 99.50th=[65280], 99.90th=[72192], 99.95th=[75264],
     | 99.99th=[86528]
   bw (  MiB/s): min= 1264, max= 2761, per=100.00%, avg=1637.72, stdev=27.87, samples=312
   iops        : min=323780, max=706900, avg=419256.14, stdev=7134.94, samples=312
  lat (usec)   : 4=0.10%, 10=67.34%, 20=6.48%, 50=17.22%, 100=8.85%
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  cpu          : usr=5.98%, sys=93.99%, ctx=1102, majf=0, minf=317
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=8388608,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=1608MiB/s (1686MB/s), 1608MiB/s-1608MiB/s (1686MB/s-1686MB/s), io=32.0GiB (34.4GB), run=20384-20384msec

If I run the test with the lioaio engine I don't the warnings

Code:

fio --filename=test --direct=1 --rw=randrw  --randrepeat=0 --rwmixread=100 --iodepth=128 --numjobs=8 --runtime=60 --group_reporting --name=4ktest --ioengine=libaio --size=4G --bs=4k
4ktest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.33
Starting 8 processes
Jobs: 5 (f=5): [r(1),_(1),r(1),_(1),r(1),_(1),r(2)][100.0%][r=1549MiB/s][r=396k IOPS][eta 00m:00s]
4ktest: (groupid=0, jobs=8): err= 0: pid=86462: Thu Jan 11 18:06:15 2024
  read: IOPS=381k, BW=1487MiB/s (1559MB/s)(32.0GiB/22041msec)
    slat (usec): min=3, max=872, avg=18.85, stdev=18.73
    clat (usec): min=2, max=5196, avg=2581.88, stdev=438.97
     lat (usec): min=7, max=5235, avg=2600.74, stdev=441.96
    clat percentiles (usec):
     |  1.00th=[ 1188],  5.00th=[ 1647], 10.00th=[ 2057], 20.00th=[ 2311],
     | 30.00th=[ 2442], 40.00th=[ 2540], 50.00th=[ 2638], 60.00th=[ 2704],
     | 70.00th=[ 2802], 80.00th=[ 2933], 90.00th=[ 3064], 95.00th=[ 3195],
     | 99.00th=[ 3425], 99.50th=[ 3490], 99.90th=[ 3654], 99.95th=[ 3720],
     | 99.99th=[ 3851]
   bw (  MiB/s): min= 1428, max= 2541, per=100.00%, avg=1524.20, stdev=25.36, samples=337
   iops        : min=365744, max=650692, avg=390194.50, stdev=6491.19, samples=337
  lat (usec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.24%
  lat (msec)   : 2=8.91%, 4=90.84%, 10=0.01%
  cpu          : usr=8.60%, sys=91.37%, ctx=1173, majf=0, minf=340
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=8388608,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=1487MiB/s (1559MB/s), 1487MiB/s-1487MiB/s (1559MB/s-1559MB/s), io=32.0GiB (34.4GB), run=22041-22041msec

Are these results in line with whats expected? I appreciate your time.

Dell R730xd with 256GB of memory, 2 Xeon 2670s, a Dell HBA330, and (8) 600GB SAS 10k drives in a stripe for testing

Davvo · Jan 12, 2024

dashoe said:
Is that because I'm on Scale instead of Core?

I suppose so, but psync caps your IO depth to 1 in both.

fredbourdelier · Jan 15, 2024

dashoe said:
I've been reading through your thread as I'm having the same issue. I just upgraded to the HBA330 hoping for amazing performance but not getting it.

Dell R730xd with 256GB of memory, 2 Xeon 2670s, a Dell HBA330, and (24) 600GB SAS 10k drives

Like you, I've tried various drives and array configs but just can't get good performance.

Did you ever find a solution?

Your question reminded me that I hadn't performed any benchmarks in quite a while, so I ran some again just now.

First, before any benchmarks, I set cache off: zfs set primarycache=metadata [MY_POOL_NAME]

Reminder of R730XD config: 2 Xeon 2690v4 14 core@2.9, 768GB DDR4@2400, 2 Seagate Nytro 960GB SAS boot drives in rear bays, 8 Seagate EXOS x18 / 18TB SAS 12G drives in front bays, 1 pool RAID Z2/no hot spares

Then, I tried three separate benchmarks:
(1) 32 parallel 4k files: fio --filename=test --direct=1 --rw=randrw --randrepeat=0 --rwmixread=0--iodepth=16 --numjobs=32 --runtime=60 --group_reporting --name=4ktest --size=4G --bs=4k

(2) The same 32 parallel I/O but with full cache on:
zfd set primarycache=all MAIN_POOL
fio --filename=test --direct=1 --rw=randrw --randrepeat=0 --rwmixread=0--iodepth=16 --numjobs=32 --runtime=60 --group_reporting --name=4ktest --size=4G --bs=4k

(3) 8 parallel with limited R/W operations: fio --filename=test --direct=1 --rw=randrw --randrepeat=0 --rwmixread=100 --iodepth=128 --numjobs=8 --runtime=60 --group_reporting --name=4ktest --ioengine=psync --size=4G --bs=4k

Results:
32 proc/no cache: iops : min= 9760, max=46968, avg=13313.97, stdev=128.06, samples=3808
32 proc/full cache: iops : min= 9935, max=66387, avg=13552.69, stdev=180.30, samples=3808
8 proc/no cache: iops : min=658482, max=842085, avg=726907.90, stdev=5823.50, samples=174

Also, as a comparison, I ran the same benchmarks on a DELL R750 Gen2 (8 NVME WD220 SSD on "Front PERC" HBA330, 2 XEON Silver 4309Y 8 core@2.8, 128GB 2400MT/S RAM)
32 proc/no cache: iops : min= 7630, max=35410, avg=10900.03, stdev=101.80, samples=3808
32 proc/full cache: iops : min= 7789, max=62926, avg=11113.09, stdev=174.98, samples=3808
8 proc/no cache: iops : min=328439, max=630595, avg=496384.28, stdev=5869.84, samples=256

The R730XD with spinning SAS outperformed the latest generation DELL NVME by up to 26% on all workloads. Its performance also matches TrueNAS "community standards" for IOPS on other systems.

To note, I did swap the motherboard on the R730XD, the previous one had a problem with the iDRAC, and those are soldered on the board in that generation. The MB was "new in bag" GEN3. All other components were kept. I found the mobo on EBAY for $120, so it's not an unreasonable upgrade. It is really difficult to swap though.

Now I'm concerned about why the R750 has such poor performance with NVME drives...

buffer_overrun · Mar 13, 2024

I have an R730xd:
Dual 2699V4s
1024GB DDR4 (16*64gb)
20 samsung 1.92tb Pm883 sata SSDs in two 10 drive raid Z2 (128k block size)
4 samsung PM1733 1.92tb U.2 in quad mirror for metadata ( special small blocks 64k)
2 Intel DC P3700 striped for testing (64k block size)
2 Intel XXV710 25GBE ( one for each CPU)

P3700 test BS=4k

root@truenas[/home/admin]# fio --directory=/mnt/P3700s/speed --filename=test --direct=1 --rw=randrw --randrepeat=0 --rwmixread=100 --iodepth=128 --numjobs=8 --runtime=60 --group_reporting --name=4ktest --ioengine=psync --size=4G --bs=4k
4ktest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=128
...
fio-3.33
Starting 8 processes
4ktest: Laying out IO file (1 file / 4096MiB)
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
Jobs: 3 (f=3): [_(2),r(1),_(1),r(1),_(2),r(1)][95.8%][r=1593MiB/s][r=408k IOPS][eta 00m:01s]
4ktest: (groupid=0, jobs=8): err= 0: pid=10672: Wed Mar 13 10:31:54 2024
read: IOPS=373k, BW=1455MiB/s (1526MB/s)(32.0GiB/22514msec)
clat (usec): min=2, max=1623, avg=20.24, stdev=11.07
lat (usec): min=3, max=1623, avg=20.30, stdev=11.09
clat percentiles (nsec):
| 1.00th=[ 6752], 5.00th=[ 7904], 10.00th=[ 8640], 20.00th=[10176],
| 30.00th=[16768], 40.00th=[19584], 50.00th=[21120], 60.00th=[22912],
| 70.00th=[24704], 80.00th=[27264], 90.00th=[30080], 95.00th=[32384],
| 99.00th=[36608], 99.50th=[38656], 99.90th=[42752], 99.95th=[44800],
| 99.99th=[62208]
bw ( MiB/s): min= 1294, max= 2032, per=100.00%, avg=1486.29, stdev=15.09, samples=347
iops : min=331390, max=520222, avg=380491.31, stdev=3862.71, samples=347
lat (usec) : 4=0.01%, 10=19.18%, 20=23.64%, 50=57.15%, 100=0.01%
lat (usec) : 250=0.01%, 500=0.01%
lat (msec) : 2=0.01%
cpu : usr=6.66%, sys=93.33%, ctx=483, majf=12, minf=474
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=8388608,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
READ: bw=1455MiB/s (1526MB/s), 1455MiB/s-1455MiB/s (1526MB/s-1526MB/s), io=32.0GiB (34.4GB), run=22514-22514msec

Raid Z2 test BS=4k

root@truenas[/home/admin]# fio --directory=/mnt/lambo/NAS --filename=test --direct=1 --rw=randrw --randrepeat=0 --rwmixread=100 --iodepth=128 --numjobs=8 --runtime=60 --group_reporting --name=4ktest --ioengine=psync --size=4G --bs=4k

4ktest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=128
...
fio-3.33
Starting 8 processes
4ktest: Laying out IO file (1 file / 4096MiB)
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
Jobs: 2 (f=2): [_(3),r(1),E(1),r(1),_(2)][96.4%][r=460MiB/s][r=118k IOPS][eta 00m:02s]
4ktest: (groupid=0, jobs=8): err= 0: pid=11451: Wed Mar 13 10:42:40 2024
read: IOPS=157k, BW=612MiB/s (642MB/s)(32.0GiB/53521msec)
clat (usec): min=3, max=1909, avg=47.15, stdev=29.16
lat (usec): min=3, max=1909, avg=47.32, stdev=29.17
clat percentiles (usec):
| 1.00th=[ 7], 5.00th=[ 9], 10.00th=[ 9], 20.00th=[ 10],
| 30.00th=[ 52], 40.00th=[ 55], 50.00th=[ 56], 60.00th=[ 58],
| 70.00th=[ 62], 80.00th=[ 64], 90.00th=[ 67], 95.00th=[ 70],
| 99.00th=[ 78], 99.50th=[ 92], 99.90th=[ 302], 99.95th=[ 326],
| 99.99th=[ 627]
bw ( KiB/s): min=171440, max=1320280, per=100.00%, avg=653719.04, stdev=12850.57, samples=812
iops : min=42860, max=330070, avg=163429.73, stdev=3212.63, samples=812
lat (usec) : 4=0.01%, 10=20.83%, 20=5.50%, 50=1.04%, 100=72.21%
lat (usec) : 250=0.26%, 500=0.15%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%
cpu : usr=4.99%, sys=95.00%, ctx=1606, majf=0, minf=419
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=8388608,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
READ: bw=612MiB/s (642MB/s), 612MiB/s-612MiB/s (642MB/s-642MB/s), io=32.0GiB (34.4GB), run=53521-53521msec

Raid Z2 test BS=128k (to match array size)

READ: bw=612MiB/s (642MB/s), 612MiB/s-612MiB/s (642MB/s-642MB/s), io=32.0GiB (34.4GB), run=53521-53521msec
root@truenas[/home/admin]# fio --directory=/mnt/lambo/NAS --filename=test --direct=1 --rw=randrw --randrepeat=0 --rwmixread=100 --iodepth=128 --numjobs=8 --runtime=60 --group_reporting --name=4ktest --ioengine=psync --size=4G --bs=128k

4ktest: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=128
...
fio-3.33
Starting 8 processes
4ktest: Laying out IO file (1 file / 4096MiB)
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
Jobs: 8 (f=8): [r(8)][-.-%][r=15.4GiB/s][r=126k IOPS][eta 00m:00s]
4ktest: (groupid=0, jobs=8): err= 0: pid=11812: Wed Mar 13 10:45:01 2024
read: IOPS=85.2k, BW=10.4GiB/s (11.2GB/s)(32.0GiB/3076msec)
clat (usec): min=18, max=1993, avg=86.68, stdev=73.14
lat (usec): min=18, max=1993, avg=86.82, stdev=73.16
clat percentiles (usec):
| 1.00th=[ 30], 5.00th=[ 32], 10.00th=[ 34], 20.00th=[ 44],
| 30.00th=[ 63], 40.00th=[ 65], 50.00th=[ 68], 60.00th=[ 72],
| 70.00th=[ 76], 80.00th=[ 81], 90.00th=[ 210], 95.00th=[ 253],
| 99.00th=[ 330], 99.50th=[ 338], 99.90th=[ 644], 99.95th=[ 717],
| 99.99th=[ 1352]
bw ( MiB/s): min= 4405, max=15856, per=100.00%, avg=10705.10, stdev=578.88, samples=40
iops : min=35246, max=126854, avg=85640.80, stdev=4631.01, samples=40
lat (usec) : 20=0.01%, 50=20.90%, 100=66.13%, 250=7.82%, 500=4.98%
lat (usec) : 750=0.15%, 1000=0.01%
lat (msec) : 2=0.02%
cpu : usr=2.78%, sys=97.26%, ctx=82, majf=0, minf=154
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
READ: bw=10.4GiB/s (11.2GB/s), 10.4GiB/s-10.4GiB/s (11.2GB/s-11.2GB/s), io=32.0GiB (34.4GB), run=3076-3076msec

I don't see the dataset block sizes of the bad performing arrays, but if you have NVME with 1M block size datasets it probably won't perform well. going from 1M block size on the P3700s to 16k only dropped SMB file transfers for me from ~3600 to ~3200 in crystaldisk mark reads from truenas-> client.

homer27081990 · Mar 16, 2024

I know I most certainly am wrong, but, I felt it could be useful, because sometimes these things happen and are not apparent (Also, forgive me if I lost something from the middle of the thread, I TL;DR ed it).

Could it be a DELL firmware, power-management or controller temp thing?

I had a similar thing happen with my HP DL380p gen 8 server and the power management for the HP smart P420i 1GB (passthrough, HBA mode, enabled) controller. I needed to specifically disable firmware power management.

Also, can it be the case that some firmware or BIOS options for the PCIe lane management and topology are not as they should be?

Just a thought.

fredbourdelier · Mar 18, 2024

homer27081990 said:
I know I most certainly am wrong, but, I felt it could be useful, because sometimes these things happen and are not apparent (Also, forgive me if I lost something from the middle of the thread, I TL;DR ed it).

Could it be a DELL firmware, power-management or controller temp thing?

I had a similar thing happen with my HP DL380p gen 8 server and the power management for the HP smart P420i 1GB (passthrough, HBA mode, enabled) controller. I needed to specifically disable firmware power management.

Also, can it be the case that some firmware or BIOS options for the PCIe lane management and topology are not as they should be?

Just a thought.

I believe so @homer27081990 . The specific hardware I tried originally had a gen1 mainboard (with, as it turns out, a bad iDRAC), and performance with the same disks and controller was significantly lower than with the gen3 board. We never figured out if that was the iDRAC interfering with BIOS settings, the main chipset I/O to memory or cache input buffer design, the interface between the MIMO and the disk backplane, or something related to Xeon dual-CPU chipset to bus I/O. Nonetheless, the newer Dell mobo improved performance noticeably.

homer27081990 · Mar 18, 2024

fredbourdelier said:
I believe so @homer27081990 . The specific hardware I tried originally had a gen1 mainboard (with, as it turns out, a bad iDRAC), and performance with the same disks and controller was significantly lower than with the gen3 board. We never figured out if that was the iDRAC interfering with BIOS settings, the main chipset I/O to memory or cache input buffer design, the interface between the MIMO and the disk backplane, or something related to Xeon dual-CPU chipset to bus I/O. Nonetheless, the newer Dell mobo improved performance noticeably.

I hazard a guess that if you have a bad iDRAC, it is not because someone specifically hit the iDRAC chip, or something. More things must be wrong. I can easily imagine a chipset with a dried-out thermal pad that starts to overheat-burn lanes.
Something else must be wrong. Corroded CPU contacts? Partial de-liding? Fan failure (dropped static pressure)? PSU spikes? Busted filter capacitors on the backplane? I like that last one.

Important Announcement for the TrueNAS Community.

Dell R730xd slow IOPS/transfer rate with any drive

Dabbler

Server Wrangler

Dabbler

Dabbler

Server Wrangler

Dabbler

Dabbler

MVP

Dabbler

MVP

Dabbler

MVP

Dabbler

Attachments

Cadet

Patron

Dabbler

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Dell R730xd slow IOPS/transfer rate with any drive"

Similar threads