Dell R730xd slow IOPS/transfer rate with any drive

eclipse5302

Dabbler
Joined
Nov 14, 2022
Messages
11
Hello all,

I've looked around and tried everything I can think of, but I can't figure this out. I have an R720 with a 4-vdev mirror flash pool (8 SATA drives 400GB), and running fio on the hoist itself shows expected results of 500k IOPS and 1300MB/s. This host only has 64GB of memory, 2 Xeon 2690s, and I believe a Dell H310 flashed to IT mode. Not 100% on that last part, but it definitely has a regular HBA and not a RAID card pretending to be a HBA.

fio --filename=test --direct=1 --rw=randrw --randrepeat=0 --rwmixread=100 --iodepth=128 --numjobs=8 --runtime=60 --group_reporting --name=4ktest --ioengine=psync --size=4G --bs=4k 4ktest: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=128

I'm upgrading this to a Dell R730xd with 384GB of memory, 2 Xeon 2667s, a Dell HBA330, and (24) 400GB SAS flash drives. Running the same fio command, I can't get this system to exceed 150k IOPS no matter what. I started by creating a 12-vdev flash pool, and when that performance was poor, I re-created the same 4-vdev flash pool as the old server. Even a single vdev flash pool can only hit 150k IOPS. All BIOS/FW is up to date, and I tried different versions of TrueNAS without success. I tried other drives, and they too are limited by this 150k IOPS limit. I even took 2 of these drives out and put them in the R720, created a single mirror pool, and that could easily hit 220k IOPS.

Any idea what is going on here?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

eclipse5302

Dabbler
Joined
Nov 14, 2022
Messages
11
From what I understand, this is "the" controller to use for this application, and I'm not the first to be using an R730xd. I am certain that nobody would use an R730xd if this was as fast as it went. I would think that if others would be seeing this poor of performance, I would have found more out there about it. The only things I can find are when people are trying to use RAID controllers in HBA mode, which is obviously a no-no.

I can run the identical command on an older R720 with an older generation controller of the same type, using SATA SSDs, and get much better performance. I must be missing something. There appears to be some strange throughout/IOPS limit on this system.

The only thing I haven't tried was moving the HBA to another system, but oddly enough, the original H730P controller in this system exhibited the same speed "limits" (while testing in HBA mode). So I don't believe this new HBA could be the culprit. I also don't see performance increasing at all as I add vdevs...which I see on the R720.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
From what I understand, this is "the" controller to use for this application, and I'm not the first to be using an R730xd. I am certain that nobody would use an R730xd if this was as fast as it went.

Well, you can be certain of what you like, but some people use machines like the R730xd because of easy availability especially on the used market.

I've been skeptical of the HBA330 for awhile. People keep telling me that it is just fine, except that it uses the mrsas driver. As @Davvo noted, lots of the Dell cards have low I/O depth, I'm not quite sure what the thinking is there, but I believe the HBA330 is not one of them. The H740p is something like 64 IIRC. It would be interesting to drop in one of the LSI HBA's and see what happened.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I've been skeptical of the HBA330 for awhile. People keep telling me that it is just fine, except that it uses the mrsas driver.

That's the H730p and other Dell RAID cards when shifted to their "HBA Mode" in firmware. The HBA330 is (or should be) using the mpr driver like any other 12G LSI HBA. (Perfectly reasonable to get confused - there's more than enough PERCs out there to muddle the waters.)

However, it would behoove the OP here to check and confirm that they do indeed have the HBA330 and not the H330 as the latter is definitely going to hamstring things.
 

eclipse5302

Dabbler
Joined
Nov 14, 2022
Messages
11
The card I bought was listed as Dell P/N P2R3R, HBA330, which is equivalent to an LSI 9300-8i. PCIe 3.0, 8 lanes, 12Gbps.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The card I bought was listed as Dell P/N P2R3R, HBA330, which is equivalent to an LSI 9300-8i. PCIe 3.0, 8 lanes, 12Gbps.

Then I suggest you crossflash LSI IT firmware onto it. It's well known that LSI/Avago was trying to create a unified driver for HBA's and MegaRAID cards, and I keep having people tell me that the mrsas (unified) driver driven cards are the equal of mpr driver driven cards, but this is not the first time I've heard complaints. If it is truly equivalent to an LSI 9300-8i, then crossflash to IT should be possible.
 

eclipse5302

Dabbler
Joined
Nov 14, 2022
Messages
11
I believe it's already flashed to IT mode. With TrueNAS Scale loaded currently, the card is coming up as "mpt3sas_cm0: LSISAS3008: FWVersion(16.00.11.00)" next line mentions "Protocol=(Initiator,Target)
 

eclipse5302

Dabbler
Joined
Nov 14, 2022
Messages
11
I reloaded the system to TrueNAS 13.0-U3. lspci for the HBA follows. Anything jumping out here?

02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) Subsystem: Dell HBA330 Mini Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 26 Region 0: I/O ports at 2000 Region 1: Memory at 96200000 (64-bit, non-prefetchable) Region 3: Memory at 96100000 (64-bit, non-prefetchable) Expansion ROM at fff00000 [disabled] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 4096 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s <2us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x8 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [a8] MSI: Enable+ Count=1/1 Maskable+ 64bit+ Address: 00000000fee1a000 Data: 0030 Masking: 00000000 Pending: 00000000 Capabilities: [c0] MSI-X: Enable- Count=96 Masked- Vector table: BAR=1 offset=0000e000 PBA: BAR=1 offset=0000f000 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 05000001 00082001 3f5800b0 adcd4beb Capabilities: [1e0 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [1c0 v1] Power Budgeting <?> Capabilities: [190 v1] Dynamic Power Allocation <?> Capabilities: [150 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- IOVSta: Migration- Initial VFs: 16, Total VFs: 16, Number of VFs: 0, Function Dependency Link: 00 VF offset: 1, stride: 1, Device ID: 0097 Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 0000000000000000 (64-bit, non-prefetchable) Region 2: Memory at 0000000000000000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
No, it's correctly tagging it as a SAS3008 which is what the HBA330 sports under the hood, it's got x8 connectivity, ASPM is off, and it's a "Mini" which means it should be wired to CPU0 and not suffering excessive NUMA shuffle.

The small (4G) size of your fio test file means that it should be coming 100% out of ARC as well, and the disk/HBA subsystem shouldn't really affect it at all, so we're down to a CPU/memory thing.

E5-2667 v3/v4 are both 8-core processors with a relatively similar turbo speed to your R720's E5-2690 so it shouldn't be choking on that, unless you're throttling the processor somehow. Is there anything being logged through iDRAC for over-temp alerts? You might be able to check in-band with ipmitool sel list as well as the dedicated interface.

Try resetting the BIOS back to default settings, changing System Profile to Performance, and disabling HyperThreading (System BIOS -> Processor Settings -> Logical Processor -> Disabled) to see if the scheduler is somehow getting confused and smashing all eight fio threads onto four physical+four logical cores rather than fanning out properly.
 

eclipse5302

Dabbler
Joined
Nov 14, 2022
Messages
11
No, it's correctly tagging it as a SAS3008 which is what the HBA330 sports under the hood, it's got x8 connectivity, ASPM is off, and it's a "Mini" which means it should be wired to CPU0 and not suffering excessive NUMA shuffle.

The small (4G) size of your fio test file means that it should be coming 100% out of ARC as well, and the disk/HBA subsystem shouldn't really affect it at all, so we're down to a CPU/memory thing.

E5-2667 v3/v4 are both 8-core processors with a relatively similar turbo speed to your R720's E5-2690 so it shouldn't be choking on that, unless you're throttling the processor somehow. Is there anything being logged through iDRAC for over-temp alerts? You might be able to check in-band with ipmitool sel list as well as the dedicated interface.

Try resetting the BIOS back to default settings, changing System Profile to Performance, and disabling HyperThreading (System BIOS -> Processor Settings -> Logical Processor -> Disabled) to see if the scheduler is somehow getting confused and smashing all eight fio threads onto four physical+four logical cores rather than fanning out properly.
I appreciate your time in looking at this. I'm well past resetting BIOS to defaults and adjusting things like turning off HT...I really wish one of those worked but nothing I've done in that realm has made a difference. In fact, on the 720, HT is still enabled.

I ordered a true LSI 9300-8i and cables to connect to the Dell backplane, hoping that might work. Unless there is some issue with the backplane(?), but it's fully updated (the whole system is current on FW). This system has the flex bays in the back, but I've disconnected those and it made no difference. The only other thing I can think of is your mention of CPU/memory. This thing has 384 GB of memory, arranged in 16GB sticks. Maybe I'm running into some odd thing with memory config? At this point I'm obviously grasping at anything here.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The only other thing I can think of is your mention of CPU/memory. This thing has 384 GB of memory, arranged in 16GB sticks. Maybe I'm running into some odd thing with memory config? At this point I'm obviously grasping at anything here.

I'm puzzled here as well; it's not as if you switched from a single to a dual-socket system, or lost a bunch of performance by going from a high-clocked E5-2690 to something slower+wider like a 2650L - but that workload is going to be bottlenecked almost 100% by CPU, if you spin up another SSH/shell session and look at your htop/top results while it's running it should basically be pinning a few threads to maximum.

You can also check sysctl dev.cpu.X.freq where X is a core number and compare against sysctl dev.cpu.X.freq_levels- what you should see is the cores hitting nearly the top speed allocated. If it's not, then it might be artificially limiting the power of the CPU.

Nothing in iDRAC declaring a throttle based on power, temperature, or firmware?

This feels like something in the firmware level is limiting your CPU frequency, memory speed, or other piece along those lines. The system's got a metaphorical parking brake stuck on - we need to find what's binding it up.
 

eclipse5302

Dabbler
Joined
Nov 14, 2022
Messages
11
I'm puzzled here as well; it's not as if you switched from a single to a dual-socket system, or lost a bunch of performance by going from a high-clocked E5-2690 to something slower+wider like a 2650L - but that workload is going to be bottlenecked almost 100% by CPU, if you spin up another SSH/shell session and look at your htop/top results while it's running it should basically be pinning a few threads to maximum.

You can also check sysctl dev.cpu.X.freq where X is a core number and compare against sysctl dev.cpu.X.freq_levels- what you should see is the cores hitting nearly the top speed allocated. If it's not, then it might be artificially limiting the power of the CPU.

Nothing in iDRAC declaring a throttle based on power, temperature, or firmware?

This feels like something in the firmware level is limiting your CPU frequency, memory speed, or other piece along those lines. The system's got a metaphorical parking brake stuck on - we need to find what's binding it up.

Nothing in the DRAC logs complaining about anything. I do see 8 cores go to 100% during the test. Unfortunately, neither dev.cpu.X.freq nor dev.cpu.X.freq_levels are present. These aren't present on the 720 either. Here is what I have for one of the cores:

dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc/bma dev.cpu.0.cx_usage_counters: 351 27881 dev.cpu.0.cx_usage: 1.24% 98.75% last 66082us dev.cpu.0.cx_lowest: C8 dev.cpu.0.cx_supported: C1/1/1 C2/2/41 dev.cpu.0.temperature: 25.0C dev.cpu.0.coretemp.throttle_log: 0 dev.cpu.0.coretemp.tjmax: 96.0C dev.cpu.0.coretemp.resolution: 1 dev.cpu.0.coretemp.delta: 71 dev.cpu.0.%parent: acpi0 dev.cpu.0.%pnpinfo: _HID=none _UID=0 _CID=none dev.cpu.0.%location: handle=\_SB_.SCK0.CP00 dev.cpu.0.%driver: cpu dev.cpu.0.%desc: ACPI CPU
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Can you make sure your power cables are properly connected and that the frequencies of your cores are at target levels?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Unfortunately, neither dev.cpu.X.freq nor dev.cpu.X.freq_levels are present. These aren't present on the 720 either.

That would suggest it's not attempting to boost, and/or the clocks are locked. But you'd think that would impact the R720's performance. What do you get from grep -i speedstep /var/run/dmesg.boot?

I know there have historically been bugs in firmware on some Dell servers that caused processors to lock at the lowest possible clock (eg: 800MHz on some machines). When you did the BIOS defaults, did you do it from within the BIOS or a full hard reset including setting NVRAM_CLR from the jumper on the motherboard? It might be worth going to the "nuclear option" of the full iDRAC reset and NVRAM_CLR jumper.
 

eclipse5302

Dabbler
Joined
Nov 14, 2022
Messages
11
As for the reset question, I've fully reset this server including NVRAM_CLR jumper. None of the settings or hardware combinations I've attempted has shown any improvement. I even purchased a regular off-the-shelf Broadcom 9300-8i HBA and ran it in one of the slots, and the same speed issue was there. Even bypassing the backplane and connecting the drives directly to the controller was the same.

At this point I'm out of ideas on this R730 server, and I've got a couple of R630s, so I used one of them for testing. I put the HBA330 in one, and placed 8 of these drives in it and installed TrueNAS 13. It produced the same performance numbers as the R730.

How is it possible that multiple 13th gen Dell servers have this problem?
 

souporman

Explorer
Joined
Feb 3, 2015
Messages
57
What about TrueNAS Core? I know we're in the Core sub-forum, but you mentioned Scale above... have you tried both with the same results?
 
Top