Definitive detailed tshooting poor R/W throughput

CDRG · Nov 29, 2023

I'm at the limit of what I believe I can see here, and while there are a couple of additional things I need to do to rule out/isolate hardware, my intent here is to gain insight into things I either didn't know I can see or at least get additional sets of eyes on.

I'd posted a few things in the past, but I want to get to the detail here as something doesn't quite line up.

With that said, here's the HW I'm using.

-ESXi 8
-Core 13.0-U6. Previous versions had no differences
-EPYC 7543. 16 vCPUs to VM
-256 GB RAM
-E810-XXV via SR-IOV at 25Gb
-2x 9305-24i passed through. FW 16.00.12.00. Driver 23.00.00.00-fbsd.
-22x Seagate EXOS X12 12TB SAS as 11x mirror VDEVs split across the two HBAs (Model ST12000NM0037)
-3x mirror NVMe as special/metadata
-1M recordsize for media dataset (relevant dataset for this discussion)
-2x HDD and 1x NVMe as hot spare

Client->Server
-10Gb copper to Unifi USW-Enterprise-8-PoE
-10Gb copper from above to USW Pro Aggregation
-25Gb MMF from above direct to NIC on server
-All on same L2
-No jumbo frames
-No flow control

This is the testing I've done thus far. ARC and compression disabled for this testing.

NAS Disks.xlsx

docs.google.com

And additional, initial testing of the pool itself

Write performance
root@truenas[/mnt/TheEndHouse]# dd if=/dev/zero of=/mnt/TheEndHouse/ddfile bs=2048k count=100000
100000+0 records in
100000+0 records out
209715200000 bytes transferred in 91.586038 secs (2289816280 bytes/sec)

Read performance with the cache disabled (cancelled as it was taking forever)
root@truenas[/mnt/TheEndHouse]# dd of=/dev/null if=/mnt/TheEndHouse/ddfile bs=2048k count=100000
30813+0 records in
30813+0 records out
64619544576 bytes transferred in 771.877023 secs (83717409 bytes/sec)

Now, I'm cognizant of the disks and their history/known issues. One mention in that they slow down when getting full. They're only about 55% full so I don't expect that to necessarily be an issue. Given the metrics in the link above, my assumption is that, individually, they perform as expected.

Additionally, this...

Code:

Error counter log:
  Errors Corrected                Total   Correction      Gigabytes          Total
            by ECC rereads/      errors    algorithm      processed    uncorrected
     fast| delayed rewrites   corrected  invocations   [10^9 bytes]         errors
read:   2136549412        0           0   2136549412              0      26358.286            0
write:           0        0           0            0              0        517.500            0

Non-medium error count:        0

...is supposedly also known and normal for Seagate. While I neither expect nor believe that, I'm taking it with a large grain of anecdotal salt.

That said, the underlying issue here is the general, actual, R/W performance. From my Win11 client, I get ~100-150MBps write and ~350MBps read. A local VM gets about 450W/R. The previous setup, with the main differences being one less HBA, no special vdev, and a 7x3 disk vdev as z1 with 6TB WD Red pluses, I was getting significantly better performance, which is to say I was hitting the 10Gb ceiling of the client's link.

I'd split the HBAs as in a previous iteration with these disks, I was getting R/W/V failures all over the place. In some additional testing with a mass 0 write across multiple disks, I was also getting some odd errors. Removing some disks, and retesting suggested the HBA didn't like doing that many things at once, so I split the load. With all the WD disks in that previous setup, things were fine. This is the one additional thing I need to test; removing one HBA/split the pool to ensure that one isn't bad for some reason and causing this issue. While it will be degraded, if there is an issue my expectation is that I will see a performance difference between, even though I'm only leveraging 50% of the disks. 11 of these should still give great read performance, and if there is an issue, I'd expect to see it on the writes.

So, other than my to-do of the above, I'm very open to any suggestions regarding what other testing could potentially shed some light into what the issue is. I don't think there are any metrics to glean from the HBA itself. It would be great to see load of that but I don't believe any metrics of that are exposed.

TIA

CDRG · Dec 1, 2023

I guess I just keep asking the wrong things.

CDRG · Jan 2, 2024

Better luck in the new year? Anyone?

An interesting, if not odd or even concerning observation...

I've just changed the host/VM settings to set Numa.PreferHT to 1. Not realizing HT was disabled by default, needless to say that two NUMA nodes were in play. Both HBAs sit within the same NUMA node, and my hope (I've not yet confirmed) that the CPUs allocated are also within that same node. That said, I rebooted the VM and my reads are about 450MBps and writes over 900. Even if there were a split in NUMA nodes for the various HW in use here, I'd never expect this level of performance differences for this workload.

Important Announcement for the TrueNAS Community.

Definitive detailed tshooting poor R/W throughput

CDRG

Dabbler

NAS Disks.xlsx

CDRG

Dabbler

CDRG

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Definitive detailed tshooting poor R/W throughput

CDRG

Dabbler

NAS Disks.xlsx

CDRG

Dabbler

CDRG

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Definitive detailed tshooting poor R/W throughput"

Similar threads