Broadcom 9600W-16e not supported (yet)?

zormik

Dabbler
Joined
Mar 6, 2023
Messages
20
Hello,
I just ordered a new server (supermicro SYS-110D-8C-FRAN8TP) to run Truenas Scale and i installed the broadcom 9600W-16e in it. Besides struggling a bit with getting it installed (had to do it trough installing Truenas Core first and the change train) it seems like Truenas Scale does not recognize my hba-controller. Is there anyway i could fix this?

Thanks and nice to meet you all,

Zormik

P.S. I have some experience running Truenas as i've been running core for about 7 years.
 
Joined
Jun 15, 2022
Messages
674
Is the card cooled properly? There's a minimum airflow specification (200 LFM @ 55°C). If you touch the heatsink and it's hot you have a problem. (The spec. is 3.3 linear feet per second, which might seem near impossible, however if considering a 3" card height and 1" width in front of the heatsink that's 2.3 miles/hour, so quite realistic depending on the box setup, there must be airflow over the heatsink.)

Can you boot into the card's BIOS during system boot (before the OS loads) (pg 25)?

Does the card's BIOS report finding drives?

There is software for running further diagnostics and extracting/injecting BIOS and other information.
 
Last edited:

zormik

Dabbler
Joined
Mar 6, 2023
Messages
20
Hey, Thank you for your reply.
Yes the card is cooled properly and doesn't get hot (the server has a crazy amount of airflow). I can see the card in the BIOS / UEFI and i can see how many disks are attached to it.
 
Joined
Jun 15, 2022
Messages
674
What BIOS is the Broadcom/LSI card running? If it's MegaRAID or Enhanced Host Bus Adapter (eHBA) that's not going to go well (eHBA is simple mirroring RAID). You want an IT BIOS. IT is an LSI firmware that has no RAID functionality in it, it simply passes through the disks to the host by default.


 

zormik

Dabbler
Joined
Mar 6, 2023
Messages
20
The 9600W-16e is a pure HBA card and doesn't have raid functionality. It's connected to a JBOD.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Can you check to see if the card is probed correctly?

Go to console and type

# dmesg | grep ^mp | less

Somewhere in the output you hopefully see something like

mpr0: <Avago Technologies (LSI) SAS3008> port 0x5000-0x50ff mem 0xfd3f0000-0xfd3fffff,0xfd380000-0xfd3bffff irq 19 at device 0.0 on pci4
mpr0: Firmware: 16.00.10.00, Driver: 23.00.00.00-fbsd

which is output from one of the SAS3008 cards here, your output would be somewhat different. The important bits are that this has attached to the "mpr" driver and that the firmware is 16.00.10.00 (should really be .12.00 but oh well).

You can also do a

# dmesg | grep Avago

or for Broadcom I suppose, I don't know what mfr will be reported for the card. Failing that, "lspci -v" results for the card would be interesting.
 

zormik

Dabbler
Joined
Mar 6, 2023
Messages
20
Hello Jgreco, Thank you for helping out. Only the lspci -v gave me something back:

RAID bus controller: Broadcom / LSI Device 00a5 (rev 01)
Subsystem: Broadcom / LSI Device 4650
Physical Slot: 0
Flags: bus master, fast devsel, latency 0, NUMA node 0, IOMMU group 40
Memory at 27fffe00000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [48] MSI: Enable- Count=1/32 Maskable+ 64bit+
Capabilities: [68] Express Endpoint, MSI 00
Capabilities: [a4] MSI-X: Enable- Count=128 Masked-
Capabilities: [b0] Vital Product Data
Capabilities: [100] Device Serial Number 00-80-5e-b1-a5-c9-ed-18
Capabilities: [fb4] Advanced Error Reporting
Capabilities: [138] Power Budgeting <?>
Capabilities: [db4] Secondary PCI Express
Capabilities: [af4] Data Link Feature <?>
Capabilities: [d00] Physical Layer 16.0 GT/s <?>
Capabilities: [d40] Lane Margining at the Receiver <?>
Capabilities: [160] Dynamic Power Allocation <?>
 

zormik

Dabbler
Joined
Mar 6, 2023
Messages
20
When i typed dmesg | grep mp | less (i skipped the ^) i get this back:

0.000000] x86/fpu: Enabled xstate features 0x2e7, context size is 2440 bytes, using 'compacted' format.
[ 0.022813] Device empty
[ 0.027594] smpboot: Allowing 16 CPUs, 0 hotplug CPUs
[ 0.125922] rcu: Hierarchical RCU implementation.
[ 0.159912] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[ 0.179726] smpboot: Estimated ratio of average max frequency by base frequency (times 1024): 1267
[ 0.179737] smpboot: CPU0: Intel(R) Xeon(R) D-2733NT CPU @ 2.10GHz (family: 0x6, model: 0x6c, stepping: 0x1)
[ 0.180002] rcu: Hierarchical SRCU implementation.
[ 0.180220] smp: Bringing up secondary CPUs ...
[ 0.237710] smp: Brought up 1 node, 16 CPUs
[ 0.237710] smpboot: Max logical packages: 1
[ 0.237710] smpboot: Total of 16 processors activated (67200.00 BogoMIPS)
[ 0.319706] devtmpfs: initialized
[ 0.326428] dump_stack_lvl+0x46/0x5e
[ 0.326441] ? __alloc_pages_direct_compact+0xa9/0x200
[ 0.326515] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:2140kB pagetables:12kB all_unreclaimable? no
[ 0.729426] hpet0: 8 comparators, 64-bit 24.000000 MHz counter
[ 0.754440] workingset: timestamp_bits=36 max_order=25 bucket_order=0
[ 0.757641] pcieport 0000:90:02.0: pciehp: Slot #0 AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- IbPresDis- LLActRep+ (with Cmd Compl erratum)
[ 2.796403] pcieport 0000:90:03.0: pciehp: Slot #0 AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- IbPresDis- LLActRep+ (with Cmd Compl erratum)
[ 4.836386] pcieport 0000:90:04.0: pciehp: Slot #0 AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- IbPresDis- LLActRep+ (with Cmd Compl erratum)
[ 4.836555] pcieport 0000:90:05.0: pciehp: Slot #0 AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- IbPresDis- LLActRep+ (with Cmd Compl erratum)
[ 6.877962] node node0: attempt to add duplicate cache level:0
[ 6.877965] node node0: attempt to add duplicate cache level:0
[ 6.877968] node node0: attempt to add duplicate cache level:0
[ 6.877970] node node0: attempt to add duplicate cache level:0
[ 6.877972] node node0: attempt to add duplicate cache level:0
[ 6.877975] node node0: attempt to add duplicate cache level:0
[ 6.877977] node node0: attempt to add duplicate cache level:0
[ 6.877979] node node0: attempt to add duplicate cache level:0
[ 6.877982] node node0: attempt to add duplicate cache level:0
[ 6.877984] node node0: attempt to add duplicate cache level:0
[ 6.877986] node node0: attempt to add duplicate cache level:0
[ 6.877989] node node0: attempt to add duplicate cache level:0
[ 6.877991] node node0: attempt to add duplicate cache level:0
[ 6.877993] node node0: attempt to add duplicate cache level:0
[ 6.877996] node node0: attempt to add duplicate cache level:0
[ 6.992279] mpls_gso: MPLS GSO support
[ 6.994871] Loading compiled-in X.509 certificates
[ 7.027948] pstore: Using crash dump compression: deflate
[ 7.602941] ahci 0000:00:08.0: AHCI 0001.0301 32 slots 4 ports 6 Gbps 0xf impl SATA mode
[ 13.212570] systemd[1]: Listening on Process Core Dump Socket.
[ 13.221761] systemd[1]: Listening on initctl Compatibility Named Pipe.
[ 13.323434] systemd[1]: Mounting Temporary Directory (/tmp)...
[ 13.524044] systemd[1]: Mounted Temporary Directory (/tmp).
[ 14.188329] ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
which is output from one of the SAS3008 cards here, your output would be somewhat different. The important bits are that this has attached to the "mpr" driver and that the firmware is 16.00.10.00 (should really be .12.00 but oh well).
Wouldn't it be different on 94xx/95xx/96xx cards? I don't have any of those to play around with (yet...), but I would expect something like either newer versions or a reset of the version number for the newer generations (or, at the latest, for 96xx cards).

Fake edit: yeah, they hit the reset button for SAS4: https://www.broadcom.com/products/storage/host-bus-adapters/sas-nvme-9600-24i
Meanwhile "SAS3.5" cards like the 9500 are up to P25 firmware/driver... And the version numbers are back in sync! Go figure.

Back in SAS4-land, you look like you've hit the absolute cutting edge:

Interesting to read, not fun to own such a controller.

Some highlights:
  • They've merged the MegaRAID and HBA software stacks
  • Completely new interfaces
  • The new driver is called mpi3mr
So, first of all, I want to be absolutely clear: You're better off getting rid of the card (selling it, returning it, offering it to the data gods, ...) and getting a SAS3 controller, any HBA from the 9300 to the 9500 should work.

Now, if you're a glutton for punishment, let's dig into it:

There's no 30-second-research option I can come up with to get a useful grep, so run lspci -v | less and search for "mpi3mr" (type "/mpi3mr" and press enter). Get us the associated block.
If you get nothing, that would suggest that either the driver is not present or the card isn't recognized by it.
 

zormik

Dabbler
Joined
Mar 6, 2023
Messages
20
yikes, this is not good news. I thought getting the latest gen to be futureproof and having the bandwith for Nvme in the future would be the way to go :s
I get nothing in return when i follow your instructions.

Returning the card is not really a possibility because i'd have to pay a surplus to return it and these cards are kinda 'expensive'.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Wouldn't it be different on 94xx/95xx/96xx cards? I don't have any of those to play around with (yet...), but I would expect something like either newer versions or a reset of the version number for the newer generations (or, at the latest, for 96xx cards).

Well, I didn't sit there and try to goof back and forth all the stupid chipset->model mappings. The FreeBSD manpage for mpr seems to suggest that all the following chipsets are supported under mpr:

⢠Broadcom Ltd./Avago Tech (LSI) SAS 3004 (4 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3008 (8 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3108 (8 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3216 (16 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3224 (24 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3316 (16 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3324 (24 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3408 (8 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3416 (16 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3508 (8 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3516 (16 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3616 (16 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3708 (8 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3716 (16 Port SAS/PCIe)

I'd expect whatever Linux driver is supporting the 12G SAS HBA's to support about the same set of stuff. Unfortunately if this is changing in the next gen chipsets, we may have the same sort of problems we've had with the FreeBSD mrsas driver which attempts to unify the MegaRAID and HBA driver code with questionable results.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
yikes, this is not good news. I thought getting the latest gen to be futureproof and having the bandwith for Nvme in the future would be the way to go :s

I don't know exactly what your background is, but server stuff can be really weird. It's rarely like a gaming PC where having the absolutely latest hottest graphics card and CPU is likely to be beneficial in many or all cases. Latest generation stuff is what us server geeks refer to as the bleeding edge. All the people foolishly buying Intel latest gen stuff often run across stumbles; look at 13th gen, where neither FreeBSD nor Linux offered support for a scheduler that could support P-cores and E-cores when that came out. Drivers for FreeBSD and Linux typically lag new hardware availability by some months, maybe even a year or two in some cases.

It's usually much better in serverworld to buy hardware that is known to work well for your application. For example, we avoided buying Intel X710 cards here for years, despite desperately wanting to make use of ESXi hardware PCIe passthru/virtual function support on these cards, because for a number of years, there was a PSOD issue that would kill your hypervisors. This was semi-mysterious for a number of years, eventually characterized down to an issue with TSO/LRO, and use of the VMKLinux API i40e driver. It was finally resolved around 2020, which seems horrifying for a card released in 2014, but there you have it. It is what it is. Today there's a native driver and it's no longer a problem. Also bonus the quad cards are only $200 on the used market. Even with the expertise of VMware working on this, it took years to become usable.

This is only meant to reinforce the point that buying the latest gen stuff isn't always awesome.
 

zormik

Dabbler
Joined
Mar 6, 2023
Messages
20
In all honesty i should have known better. So do i have to take this as forget getting this to work for the coming months/years?

i used to be a ciso but moved towards management. Playing with this stuff at home is my idea of hopelessly trying to stay in touch
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, I didn't sit there and try to goof back and forth all the stupid chipset->model mappings. The FreeBSD manpage for mpr seems to suggest that all the following chipsets are supported under mpr:

⢠Broadcom Ltd./Avago Tech (LSI) SAS 3004 (4 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3008 (8 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3108 (8 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3216 (16 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3224 (24 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3316 (16 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3324 (24 Port SAS)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3408 (8 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3416 (16 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3508 (8 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3516 (16 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3616 (16 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3708 (8 Port SAS/PCIe)
⢠Broadcom Ltd./Avago Tech (LSI) SAS 3716 (16 Port SAS/PCIe)

I'd expect whatever Linux driver is supporting the 12G SAS HBA's to support about the same set of stuff. Unfortunately if this is changing in the next gen chipsets, we may have the same sort of problems we've had with the FreeBSD mrsas driver which attempts to unify the MegaRAID and HBA driver code with questionable results.
The new stuff's all "SAS 4xxx". I sort of get it, but I'm very reticent. IT firmware and the mpt/mps/mpr / mpt_sas/mpt2sas/mpt3sas have decades of experience under their belts and we still get occasional surprises.

same sort of problems we've had with the FreeBSD mrsas driver which attempts to unify the MegaRAID and HBA driver code with questionable results.
Well, it's more like "the RAID code grew HBA capabilities, finally". I haven't really heard of any issues, outside of weird-ass vendor decisions like Dell handicapping HBA mode's queue depth because reasons - more of a general "it's more expensive, less tested, why would I not just get an SAS 9300?" vibe.

having the bandwith for Nvme
Tri-mode cards are a ridiculous farce pushed by Broadcom and Microchip to try and keep the juicy, high-margin SAS market from dying. The problem is that NVMe sidesteps them substantially. Many/most servers suddenly need only SATA and NVMe from the PCH/SoC, and all workstations easily get by with SATA and NVMe. This undercuts Broadcom and Microchip in two key areas: the controllers and the expanders.
NVMe requires no controller - that's half the point, to reduce latency, and also why Broadcom is touting a slimmer software stack for their SAS4 stuff. PCIe switches are still likely to be necessary, and these two guys are big there, but it's a far cry from every server with more than eight drive bays having an expander in it - many servers get by today with mountains of NVMe SSDs and no PCIe switches, with some compromises. Improvements to performance and capacity allow for higher density to mitigate this further (why buy 16x 1.92 TB SSDs when you can buy two 15.36 TB SSDs for the same price and still saturate most realistic network connections out there?).

Okay, so I just told you why they're a cool toy, not why they should really be avoided - well, they should be avoided because tri-mode 100% requires a tri-mode expander to work as advertised. The expander needs to pull triple-duty as a SAS switch, PCIe switch and very-high-speed mux, because NVMe uses up to four lanes, instead of one (two for multipath, but that would be a second expander) for SAS. Sure, you get to save a little bit of cash on your backplane PCB by virtue of using U.3 and cutting a handful of differential pairs, but I'd bet a half-working A1SRi-2558 that the extra cost of the expander IC more than eats into this. This is especially the case as servers evolve towards having the expander or PCIe switch on a daughterboard attached to the backplane proper, which makes routing significantly easier and massively cuts down on the cost of the backplane PCB, for the cost of an additional couple of high-speed connectors.

"Ok, so who even supports that?" - As far as I can tell, HPe and maybe some recent Dell systems. Everyone else just sticks with U.2, since U.3 SSDs are backwards compatible (U.3 backplanes are not). U.2 doesn't need any of this nonsense, just separate SATA/SAS and PCIe signals - and bam, you have a SATA/SAS/NVMe drive bay.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Yeah, that thing won't even do NVMe, much less U.3
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Tri-mode cards are a ridiculous farce pushed by Broadcom and Microchip to try and keep the juicy, high-margin SAS market from dying. The problem is that NVMe sidesteps them substantially. Many/most servers suddenly need only SATA and NVMe from the PCH/SoC, and all workstations easily get by with SATA and NVMe. This undercuts Broadcom and Microchip in two key areas: the controllers and the expanders

Well, PCIe switches will really decimate the controller market where people are not reliant on things like RAID5 or RAID1, but some of us build infrastructure on hypervisors, and this means either having extremely failure-resistant SSD's or extremely failure-resistant configurations (I prefer RAID1 with a warm spare). It's clear to me from crap that has come into the workshop here from the "big guys" that especially Dell pushes a RAID controller of some sort into every machine they can; I once saw four dozen R210-ii's configured with PERC H310's with only a single SSD in each server. Reasoning? Eff only knows. So I'm still quite happy with -8i controllers and cheap SSD's. Ends up being more reliable and less expensive than buying high end enterprise grade SSD's.
 

zormik

Dabbler
Joined
Mar 6, 2023
Messages
20
I decided to buy myself the 9400-8e since i could get that one cheap. Broadcom's support isn't really great for the time being, they claim they have the driver but i nowhere see the debian driver. On top of that their support started talking about freebsd while i specifically mentioned it was truenas scale running on debian ...
 
Joined
Jun 15, 2022
Messages
674
You know what, you got a great card for a budget price. The whole line of products is super-solid. Members here will support you in getting a Datacenter-type of system running smoothly. It's all good. :D
 

zormik

Dabbler
Joined
Mar 6, 2023
Messages
20
You know what, you got a great card for a budget price. The whole line of products is super-solid. Members here will support you in getting a Datacenter-type of system running smoothly. It's all good. :D
I still got the 9600W-16e but at least i'll have a futureproof backup :D
 
Top