But not in slot 7? What kind of horrors are going on in the system firmware on those motherboards?
Correct. When in slot 7, it just works. In slot 4 and nothing in slot 7, it spits out what I gave above. That's actually not the full message, just the ones with "mpt" in it. I think there was some additional hardware or memory address dumps what were showing. Or maybe that was only when I turned on more detail logging of the module.
Anyways, I really only stumbled that maybe port 7 needed to be populated first. I saw the below and noticed that 0000:01* and 0000:02*, the two addresses the card would show up depending if it was in PORT7 or PORT4, really were just links under 0000:00:01.0 (or .1). I'm not a hardware programmer, and I don't have a clue about how the PCIe bus works and is enumerated, but it just made me think along the lines of "hey, maybe the 1x16 slot gets treated as two 'sub busses' and if there's nothing in the first, the second one can't be communicated with."
I have so many questions now:
- What's with the mpt2sas messages? Is this a benign quirk of the Linux driver? Or is something terrible happening behind the scenes?
I presume you're referring specifically with the 2 instead of 3? Looking at the kernel module code, it's all under the mpt3sas module, but it covers quite a range of actual cards used over the years. The mpt2sas i believe is the code for the more "legacy" cards like my 9211-8i that are SAS 2008 based. It may also differentiate SAS2 vs SAS3. That's not an area I'm particularly fluent with.
- What change introduced this issue?
You tell me.

I really don't know. It may not even be a "driver" problem, it just kind of felt that way as it appeared to work with Ubuntu 18.04, though I didn't really test it beyond seeing if the drives were visible.
As the the last line mentions mpt3sas_scsih.c and scsih_probe(), I thought that file might be a good place to start investigating the history. The line about changing the power state from D3cold to D0 also got power management on my mind. I saw a commit from November 2020 regarding
generic power management. I don't believe that made it into the initial 5.10 kernel release in December 2020, but it's in the kernel version used for Bluefin according to the github repo.
There were several other threads in various other forums that I came across that lead down various rabbit holes.
https://forums.unraid.net/bug-repor...c2-fails-to-load-mpt3sas-kernel-module-r1670/ (Same error about d3cold to d0 and config space unavailable)
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1942624 (Same error about d3cold to d0 and config space unavailable, NVME related)
https://forums.developer.nvidia.com...onfig-space-inaccessible-stuck-at-boot/112912 (Same error about D3cold to D0, GPU related)
There was also a "fix" for a very similar problem that was for the mpt3sas module to set a kernel parameter for max queue depth to 10,000 IIRC. I tried that, but it didn't have an effect.
- Are FreeBSD and Windows also affected?
No idea. Didn't test those. I was most interested with Scale as it best fit what I was looking to do. At the time FreeBSD also didn't yet support the efficiency cores used in Alder Lake and Raptor Lake. I don't know if that's still the case or not.
- Is this solved with a different boot method (e.g. ZFSBootMenu) that doesn't involve GRUB?
No idea. And TBH, based on the other quirks this board has thrown as we as my general inexperience with ZFS, I'd be terrified to try. There's a well-above-zero chance it'd result in the entire universe imploding into a singularity.
- Are there any system firmware settings (e.g. related to extension ROMs) that affect this behavior?
I currently have the EFI rom installed and accessible from with my BIOS. In IT mode there isn't any settings that are particularly useful for the adapter itself. More just read only information. I think the only thing that could be changed was write caching(?).
Other settings with the bios as far as the PCIe slot, loading or disabling roms, etc I could not find any combinations that made a difference. With just the 1 card in SLOT4, it usually appeared like it couldn't initialize, but it was at least detected. Setting ASPM for SLOT4 and/or SLOT7 to Disabled, L1, or Auto didn't improve things, nor did changing the L1 Substates between Disabled, L1.1, and L1.1 & L1.2. There were various other settings I know I tried various different combinations. It wasn't necessarily scientific and methodical though. At best, it didn't work. At worst, it didn't even show up, or boot, or once even make it out of POST. That last one was the one I realized there's no option in the IPMI to "Clear CMOS because you borked the settings"
- Do the extension ROMs make a difference either way?
Not that I could tell. When I first started this journey, I didn't even have the extension rom installed on the card. But early on I wanted to reflash it just to make sure everything was up to date, and I decided to add the EFI rom back on.