24 NVMe SSDs - Slow Performance

VincentV

Cadet
Joined
Mar 14, 2016
Messages
6
Just got a SuperMicro AS-2125HS-C TNR system here at work with dual AMD Epyc 9224 (24c/48t) processors and 24 Samsung PM9A3 7.68TB NVMe SSDs.

I'm seeing the oddest thing when I try to measure the performance of the disk array.

If I do a disk IO test with one drive, you can see that in the first blip around 14:15, I'm getting around 3.7GB/s-ish. The line around 14:45 is when I ran the same test with a 24-disk striped array. And the disk throughput drops to about 1/24 of one drive alone. And when I tried a striped array of two disks, the speed from each drive dropped to half. a 4-disk striped array dropped the speeds to 1/4, etc.

TrueNAS_Core_SSDs.png


This is with a stock, out-of-the box TrueNAS Core install. I'm not certain if I need to make certain modifications. Is there a good primer for SSD arrays with TrueNAS? Anyone have any idea what I'm looking at (I sure don't!).

Any help, even WAGs, would be appreciated!

Thanks!
 

VincentV

Cadet
Joined
Mar 14, 2016
Messages
6
To provide a set of hard numbers, here's 24 disks striped running:

fio --bs=128k --direct=1 --directory=/mnt/newprod/ --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=16 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based

Run status group 0 (all jobs): READ: bw=8146MiB/s (8542MB/s), 8146MiB/s-8146MiB/s (8542MB/s-8542MB/s), io=477GiB (513GB), run=60012-60012msec WRITE: bw=8154MiB/s (8550MB/s), 8154MiB/s-8154MiB/s (8550MB/s-8550MB/s), io=478GiB (513GB), run=60012-60012msec
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
That's 8 GB/S or 64 Gb/s... That is suspiciously close to a PCIe 4.0 x4 link. And you say it's approximately constant as you reduce the number of disks involved in the test?
 

VincentV

Cadet
Joined
Mar 14, 2016
Messages
6
It is, yes.

The drives themselves are PCIe 4.0 drives, but the processors themselves each have 128 PCIe 5.0 lanes. Unless SuperMicro really bungled up the board design (or could there be some weird BIOS setting?), each drive should have its full bandwidth available, no?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Depending on the exact cabling, you'll have something like 128-192 lanes available in total, as some lanes are used to link the two CPUs. But that's just a sidenote and yes, all disks should be connected with x4 directly to one of the CPUs...

What does CPU load look like during the test? Perhaps something is getting thrown onto a single thread.
 

VincentV

Cadet
Joined
Mar 14, 2016
Messages
6
I reran the test above and all the threads are in use. I admit, that is the highest I've seen the CPU spike. Earlier, basic tests with dd (dd if=/dev/zero of=temp.dat bs=2048k count=50k)
didn't move the needle much (single-threaded, almost no CPU use), though with similar disk speed results (~7GB/s with 1 thread at 4% with similar results)

disk_test_cpu.png
 
Joined
Jan 5, 2017
Messages
6
It is, yes.

The drives themselves are PCIe 4.0 drives, but the processors themselves each have 128 PCIe 5.0 lanes. Unless SuperMicro really bungled up the board design (or could there be some weird BIOS setting?), each drive should have its full bandwidth available, no?
The misconception about Epyc and the PCIe lanes is that when you have a dual socket motherboard, half of the lanes each goes to the infinity fabric for cross CPU communication. The system will only have 128 lanes total regardless.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Not strictly, particularly Gen 4 systems often use less than the maximum number of interconnects, leaving a few extra PCIe lanes available - up to 192. Gen 2/3 systems were often also configured like this.
 

firesyde424

Contributor
Joined
Mar 5, 2019
Messages
155
The misconception about Epyc and the PCIe lanes is that when you have a dual socket motherboard, half of the lanes each goes to the infinity fabric for cross CPU communication. The system will only have 128 lanes total regardless.

This isn't our experience. We run a dual socket Dell R7525 with 2 x 7H12 64 core CPUs and 24 x 30.72TB Micron 9400 Pro NVME SSDs. If this were the case, our 24 drives would consume 96 of 128 available PCIE lanes, leaving very little for anything else. You can see in this screenshot from the server's iDRAC, that the drives are being assigned to different CPUs.
1696356606236.png


While the screen shot doesn't show it, the drives have been evenly divided, 12 drives per CPU. There is no evidence of PCIE switches and, given that this server is also home to a pair of dual port 100Gbe NICs, a quad port 25GBe NIC, BOSS-S1 boot drive, and 1TB of RAM, I seriously doubt that it only has 128 lanes available, given that the drives themselves would take up 96.
 
Joined
Jan 5, 2017
Messages
6
This isn't our experience. We run a dual socket Dell R7525 with 2 x 7H12 64 core CPUs and 24 x 30.72TB Micron 9400 Pro NVME SSDs. If this were the case, our 24 drives would consume 96 of 128 available PCIE lanes, leaving very little for anything else. You can see in this screenshot from the server's iDRAC, that the drives are being assigned to different CPUs.
View attachment 70841

While the screen shot doesn't show it, the drives have been evenly divided, 12 drives per CPU. There is no evidence of PCIE switches and, given that this server is also home to a pair of dual port 100Gbe NICs, a quad port 25GBe NIC, BOSS-S1 boot drive, and 1TB of RAM, I seriously doubt that it only has 128 lanes available, given that the drives themselves would take up 96.
Page 5, under GMI "AMD Interchip global memory interconnect (xGMI) up to 64 lanes"
So that is 192 available lanes. I didn't know on newer processors and OEMs they paired down the interconnect lanes, but it was ALL in the 7001 days.
 
Top