Alder Lake / Raptor Lake Build Advice | P Core only CPU vs P+E Core CPU | TrueNAS Scale Angelfish vs Bluefin | MW34-SP0 W680 Motherboard

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
However, I did boot the server and look at the bios in a bit more detail. I missed this previously but I was able to see the LSI card in the BIOS and select it.
Yup, that's the UEFI extension ROM at work.
I shut it down, connected up 2 drives, and only saw one drive.
Which favors the usual suspects of cabling or power.
 

saf

Cadet
Joined
Jan 3, 2023
Messages
6
Yup, that's the UEFI extension ROM at work.
Thank you. I missed this previous but see it now which is good.

Which favors the usual suspects of cabling or power.

I replaced the power supply but seeing the same issue. Navigating through the bios to the LSI card and viewing disks it always shows the one disk from the previous photo I attached (0:1:5). Work was busy last week so I didn't get a chance to try and move the card and cables to my desktop PC which should be done before I go further. That would probably rule out supermicro motherboard and bios. If anyone has a LSI 9300-8i card would you mind sharing what cables you are using with SATA end points?

I will try today to move the card to my desktop. If I reboot, go into LSI card, and see the drives we know it is on the supermicro / motherboard. On the other hand if don't then maybe card or cables.

Edit: moved LSI card to my Windows 10 desktop. Connected both power and SATA cables. Powered up, control-c to get into Avago config, and scroll through only to see 1 drive.

Supermicro X13SAE-F:
Bios sees LSI card, 1 drive
Linux (Ubuntu) sees LSI card
Windows 2019 sees LSI card

Windows 10 Desktop:
Sees LSI card, 1 drive

Replaced power supply:
Same symptoms - only sees 1 drive

Cable issue? If so any recommendations for a LSI 9300-8i / SAS3008 to SATA end point cable that I could try?
Card issue? Should I return the LSI card. Probably cables first I guess.

Side note: the only concern I have is that the X13SAE-F only saw 1 drive when I used the SATA ports. Could be a different issue since the LSI card is having the same issue on my WIndows desktop.

1673135914992.png
 
Last edited:

Lipsum Ipsum

Dabbler
Joined
Aug 31, 2022
Messages
22
Cable issue? If so any recommendations for a LSI 9300-8i / SAS3008 to SATA end point cable that I could try?
Card issue? Should I return the LSI card. Probably cables first I guess.

Side note: the only concern I have is that the X13SAE-F only saw 1 drive when I used the SATA ports. Could be a different issue since the LSI card is having the same issue on my WIndows desktop.
Did you buy your breakout cable new and from a reputable source? I'd try the cable first if you're still having problems (I know it's been a bit since you originally posted). It's possible you have a reverse breakout cable and not a forward breakout. Unless it's written on a label or similar, you can't tell the difference just by looking at them.

For the record, I had no problems with 5 Seagate 18TB Exos X18 drives being detected in any of the SATA ports before moving over to my LSI 9211-8i HBA. My drives are on the approved hardware list, but it's 2023. SM using it not being on the list as an excuse is B.S.
 
Last edited:

Lipsum Ipsum

Dabbler
Joined
Aug 31, 2022
Messages
22
Unless Intel is going out of their way, there's no way for the PCIe slot to not support a specific class of devices. And since the controller shows up in the OS, it's cleared any meaningful hurdles that might have been in its way.
The fact that the SATA ports are also not working correctly suggests that something is up with either power, disks or cables/backplanes.

So let's try to examine this systematically:
  1. Do the problems exist in all OSes? In particular, Windows 11 and Ubuntu 22.04 should both work without hiccups.
The current mpt3sas kernel driver seems to have an issue initializing the card when it's in Slot 4. Both Bluefin as well as Ubuntu Server 22.04 exhibit the same error:
1674347724967.png


Ubuntu Server 18.04 doesn't have a problem with the HBA in that slot, but it has issues elsewhere just due to it's age and not knowing about hardware 4+ years newer.

I posted about it over at STH where saf was also looking for assistance, but I'll mention the fix here too since I figured it out. Just put a card in slot 7 if you're going to have a HBA in slot 4. If you don't have a need for any other cards, just put the HBA in slot 7 and it should work without issues.

It'd be less than optimal, but my H310 flashed in IT mode card also worked in the X4 slots on my X13SAE-F board. It'd only half the lanes, but it's still likely fast enough for 8 spinning rust drives unless they're being hammered. If there's also SSD in the mix, going with a PCIe 3.0 card even in a x4 slot would also likely be sufficient for up to 8 drives, or 16 spinners.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I posted about it over at STH where saf was also looking for assistance
For reference, that's here: https://forums.servethehome.com/index.php?threads/lga-1700-alder-lake-servers.35719/post-363116
The current mpt3sas kernel driver seems to have an issue initializing the card when it's in Slot 4.
But not in slot 7? What kind of horrors are going on in the system firmware on those motherboards?

I have so many questions now:
  • What's with the mpt2sas messages? Is this a benign quirk of the Linux driver? Or is something terrible happening behind the scenes?
  • What change introduced this issue?
  • Are FreeBSD and Windows also affected?
  • Is this solved with a different boot method (e.g. ZFSBootMenu) that doesn't involve GRUB?
  • Are there any system firmware settings (e.g. related to extension ROMs) that affect this behavior?
  • Do the extension ROMs make a difference either way?
 

Lipsum Ipsum

Dabbler
Joined
Aug 31, 2022
Messages
22
But not in slot 7? What kind of horrors are going on in the system firmware on those motherboards?
Correct. When in slot 7, it just works. In slot 4 and nothing in slot 7, it spits out what I gave above. That's actually not the full message, just the ones with "mpt" in it. I think there was some additional hardware or memory address dumps what were showing. Or maybe that was only when I turned on more detail logging of the module.

Anyways, I really only stumbled that maybe port 7 needed to be populated first. I saw the below and noticed that 0000:01* and 0000:02*, the two addresses the card would show up depending if it was in PORT7 or PORT4, really were just links under 0000:00:01.0 (or .1). I'm not a hardware programmer, and I don't have a clue about how the PCIe bus works and is enumerated, but it just made me think along the lines of "hey, maybe the 1x16 slot gets treated as two 'sub busses' and if there's nothing in the first, the second one can't be communicated with."

1674353626176.png


I have so many questions now:
  • What's with the mpt2sas messages? Is this a benign quirk of the Linux driver? Or is something terrible happening behind the scenes?
I presume you're referring specifically with the 2 instead of 3? Looking at the kernel module code, it's all under the mpt3sas module, but it covers quite a range of actual cards used over the years. The mpt2sas i believe is the code for the more "legacy" cards like my 9211-8i that are SAS 2008 based. It may also differentiate SAS2 vs SAS3. That's not an area I'm particularly fluent with.

  • What change introduced this issue?
You tell me. :smile: I really don't know. It may not even be a "driver" problem, it just kind of felt that way as it appeared to work with Ubuntu 18.04, though I didn't really test it beyond seeing if the drives were visible.

As the the last line mentions mpt3sas_scsih.c and scsih_probe(), I thought that file might be a good place to start investigating the history. The line about changing the power state from D3cold to D0 also got power management on my mind. I saw a commit from November 2020 regarding generic power management. I don't believe that made it into the initial 5.10 kernel release in December 2020, but it's in the kernel version used for Bluefin according to the github repo.

There were several other threads in various other forums that I came across that lead down various rabbit holes.

https://forums.unraid.net/bug-repor...c2-fails-to-load-mpt3sas-kernel-module-r1670/ (Same error about d3cold to d0 and config space unavailable)
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1942624 (Same error about d3cold to d0 and config space unavailable, NVME related)
https://forums.developer.nvidia.com...onfig-space-inaccessible-stuck-at-boot/112912 (Same error about D3cold to D0, GPU related)

There was also a "fix" for a very similar problem that was for the mpt3sas module to set a kernel parameter for max queue depth to 10,000 IIRC. I tried that, but it didn't have an effect.

  • Are FreeBSD and Windows also affected?
No idea. Didn't test those. I was most interested with Scale as it best fit what I was looking to do. At the time FreeBSD also didn't yet support the efficiency cores used in Alder Lake and Raptor Lake. I don't know if that's still the case or not.

  • Is this solved with a different boot method (e.g. ZFSBootMenu) that doesn't involve GRUB?
No idea. And TBH, based on the other quirks this board has thrown as we as my general inexperience with ZFS, I'd be terrified to try. There's a well-above-zero chance it'd result in the entire universe imploding into a singularity.

  • Are there any system firmware settings (e.g. related to extension ROMs) that affect this behavior?
I currently have the EFI rom installed and accessible from with my BIOS. In IT mode there isn't any settings that are particularly useful for the adapter itself. More just read only information. I think the only thing that could be changed was write caching(?).

Other settings with the bios as far as the PCIe slot, loading or disabling roms, etc I could not find any combinations that made a difference. With just the 1 card in SLOT4, it usually appeared like it couldn't initialize, but it was at least detected. Setting ASPM for SLOT4 and/or SLOT7 to Disabled, L1, or Auto didn't improve things, nor did changing the L1 Substates between Disabled, L1.1, and L1.1 & L1.2. There were various other settings I know I tried various different combinations. It wasn't necessarily scientific and methodical though. At best, it didn't work. At worst, it didn't even show up, or boot, or once even make it out of POST. That last one was the one I realized there's no option in the IPMI to "Clear CMOS because you borked the settings"

  • Do the extension ROMs make a difference either way?
Not that I could tell. When I first started this journey, I didn't even have the extension rom installed on the card. But early on I wanted to reflash it just to make sure everything was up to date, and I decided to add the EFI rom back on.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Correct. When in slot 7, it just works. In slot 4 and nothing in slot 7, it spits out what I gave above.
Slots 7 and 4 share 16 CPU lanes as x16/x0 or x8/x8, so it looks like the board has trouble bifurcating to x8/x8 if slot 7 is not populated. Maybe there's a BIOS setting to force x8/x8.
Anyway that behaviour is below what we expect from a Supermicro board.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Slots 7 and 4 share 16 CPU lanes as x16/x0 or x8/x8, so it looks like the board has trouble bifurcating to x8/x8 if slot 7 is not populated. Maybe there's a BIOS setting to force x8/x8.
Anyway that behaviour is below what we expect from a Supermicro board.
Thing is... The card is detected, so the hard part is done. It could be that the firmware is incorrectly reporting the configuration to the OS/bootloader, but that sounds like a major regression to me.
 

saf

Cadet
Joined
Jan 3, 2023
Messages
6
Did you buy your breakout cable new and from a reputable source? I'd try the cable first if you're still having problems (I know it's been a bit since you originally posted). It's possible you have a reverse breakout cable and not a forward breakout. Unless it's written on a label or similar, you can't tell the difference just by looking at them.

For the record, I had no problems with 5 Seagate 18TB Exos X18 drives being detected in any of the SATA ports before moving over to my LSI 9211-8i HBA. My drives are on the approved hardware list, but it's 2023. SM using it not being on the list as an excuse is B.S.

Hi, sorry for the delay as I was out of town so I need to catch up. I was trying to RMA the board but the vendor I bought it from said no. They do, however, want to try and troubleshoot a few things.

The brand I bought was Cable Creations. Internal HD Mini SAS (SFF-8643 Host) 4xSATA target to 4x SATA cables. I am returning the card this week and will look into obtaining another. I've seen a few posts of people using a LSI 9300 16i on this board which was good to hear. The part the is a bit weird is when I tried to use the SATA ports with the drives and random problems. I have a 8 TB WD red that works. I shows up on SATA port 0. If I shut the system down, move the 16 TB drive on that port, using same cable, I don't see the drive. I also tried a 10 TB WD drive and it doesn't see that. Only the 8 TB drive.

Let me catch up since there is new information here. I also need to reload Ubuntu as I tried Windows 2019. I was running 20.04 if I recall correctly. Thank you again all for the information.
 

saf

Cadet
Joined
Jan 3, 2023
Messages
6
The current mpt3sas kernel driver seems to have an issue initializing the card when it's in Slot 4. Both Bluefin as well as Ubuntu Server 22.04 exhibit the same error:
View attachment 62746

Ubuntu Server 18.04 doesn't have a problem with the HBA in that slot, but it has issues elsewhere just due to it's age and not knowing about hardware 4+ years newer.

I posted about it over at STH where saf was also looking for assistance, but I'll mention the fix here too since I figured it out. Just put a card in slot 7 if you're going to have a HBA in slot 4. If you don't have a need for any other cards, just put the HBA in slot 7 and it should work without issues.

It'd be less than optimal, but my H310 flashed in IT mode card also worked in the X4 slots on my X13SAE-F board. It'd only half the lanes, but it's still likely fast enough for 8 spinning rust drives unless they're being hammered. If there's also SSD in the mix, going with a PCIe 3.0 card even in a x4 slot would also likely be sufficient for up to 8 drives, or 16 spinners.

Just a quick note. I do have the LSI card in slot 7 which is next to the CPU. I have a Nvidia Quadro in slot 4. It isn't needed but I was going to use that for the Plex transcoding. I wonder if I should take it out.
 
Top