SOLVED System stuck at boot with connected MX500 SSDs on SAS2008

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
Hi all,

I am currently trying out a new setup and I am facing a weired issue, where TrueNAS is stuck at boot when I connect MX500 consumer SSDs. Let me first start with my setup:

Hardware:
  • Supermicro X10SRL-F
    • 2x Micron 5300 1TB (ZFS Proxmox Pool - Mirror)
  • Xeon E5-1650 v4
  • 128GB ECC Memory
  • Supermicro SC846 Enclosure
  • SAS2008 in IT mode
    • 6x 12TB WD Red (Pool 1 - RaidZ2)
    • 3x 2TB WD Red (Pool 2 - Mirror)
    • 2x MX500 1TB (Pool 3 - Mirror)

Software:
The System is running the latest version of Proxmox (v7.1-7).
TrueNAS (v.12.0-U6.1) is running as VM with passthrough of the SAS2008, with 32GB of memory.

The Problem:
TrueNAS seems to be running just fine if I boot the machine while only the HDDs are connected to the HBA. Whenever I attach one of the SSDs before booting the VM, TrueNAS gets stuck in the boot process

boot.png



I let the system run for several minutes but nothing seems to happen. When I check the HBA setup from within the VM it seems to detect the SSD just fine:

controller_disks.png


If I boot the system first and than hot-attach the SSDs everything seems to work fine. I can open the pool that is located on the SSDs. When rebooting the system, the same issue as before occurs.

I tryied out different bays for the SSDs and different combinations of HDD/HDDs connected, but it seems like the issue only occures when any or all of the SSDs is connected at boot time.

I could not find any errors in any log so far, but I am not really sure where or for what I might be looking. Does anyone have an idea what might be the issue with my setup or what I could do to further debug the problem?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Does it work on bare metal? Proxmox has immature PCI passthru support, and isn't really expected to work. Flaky behaviour like this can sometimes be the result. I'd sort of expect it to just totally fail, but stranger things have happened.

Also conspicuously missing is whether you are running the correct LSI firmware. You need 20.00.07.00.
 

QonoS

Explorer
Joined
Apr 1, 2021
Messages
87
Try a firmware update of these SSD first. MX500 is known to have had and many issues, also related to HBAs. Just google "mx500 issues".
 

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
Does it work on bare metal? Proxmox has immature PCI passthru support, and isn't really expected to work. Flaky behaviour like this can sometimes be the result. I'd sort of expect it to just totally fail, but stranger things have happened.

Also conspicuously missing is whether you are running the correct LSI firmware. You need 20.00.07.00.
Proxmox boots fine when the SSDs are connected, but I did not yet try it with TrueNAS baremetal. I might do this if the firmware update of the SSDs isn't doing the trick.

Seems like my HBA is on a slightly older firmware version (20.00.04.00-IT):

controller.png


Are there known issues with the older firmware versions? Might have to look in updating it I guess.

Try a firmware update of these SSD first. MX500 is known to have had and many issues, also related to HBAs. Just google "mx500 issues".

How could I not check for that? That actually sounds like it could really be the issue here as the changelog states:

  • Fixed SATA protocol error that causes start-up failure on certain data center RAID systems
  • Improved boot time after unexpected power loss
  • Fixed Read DMA command abort after an interrupted Secure Erase
My SSDs are not on the latest version, so this will be the first thing I will try out.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There have been some minor tweaks to the P20 firmware, details are available if you Google for awhile, but it is recommended to be on 20.00.07.00.
 

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
The SSD Firmware update did the trick! The system is now booting just fine with both SSDs attached. Thank you both for that quick help!

There have been some minor tweaks to the P20 firmware, details are available if you Google for awhile, but it is recommended to be on 20.00.07.00.

I would still like to update the firmware. I searched a bit how to do the firmware update and I found this post from the forum, refering to this blog post. Since the post is refering to cross-flashing the controller I am not 100% sure I can follow these steps. Does it hurt following all these steps even for a simple firmware update. What steps are required for just updating from FW 20.00.04.00 to 20.00.07.00?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You can update from 20.00.04.00 to 07 using the sas2flash binary included in the FreeNAS base system. You just need the firmware file for 20.00.07.00. I don't have a handy link to those right now. These are not usually distributed individually, but rather included in ZIP files along with the BIOS and DOS or EFI flashing tools, so if you see a ZIP file, it's likely inside in a subdirectory.
 

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
I think I already downloaded the correct file from the broadcom website. The file is called "2118it.bin" and it was contained in a file called "9211-8i_Package_P20_IR_IT_FW_BIOS_for_MSDOS_Windows.zip". So it should be as easy as running the following?

Code:
sas2flash.efi -o -f 2118it.bin
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Also, in regards to this --

Fixed SATA protocol error that causes start-up failure on certain data center RAID systems

I would like to thank you for persevering on this and taking the time to QUOTE it from release notes.

This has next to nothing to do with FreeNAS. I do this stuff professionally, and I collect data points and like to be able to quote sources.

Some people around here know that I've long advocated the use of less expensive "consumer" grade SSD's in servers; I have literally hundreds deployed in servers located in data centers hundreds or even thousands of miles away. Enterprise SSD's absolutely have uses, but quite often I find it cheaper to be able to place three consumer SSD's, two in RAID1, one as a spare, in situations where massive endurance is not a concern. I have been doing this for a decade, now, and results are generally favorable.

It's long been known that some SSD's have firmware that is "Windows" compatible but maybe not RAID compatible. The BX500 was one that I had had some mixed experiences with, and the WD Blues are another. Intel consumer SSD's are workhorses until they expire (story available elsewhere on these forums), while the Samsung Evo's are just amazing overall.

I have occasionally had reason to discuss SSD's with other infrastructure and server architects, and it's very interesting to me that there are basically two large pools of them, and then a smaller pool --

One set of them tend to deploy servers with enterprise SSD's with no consideration to RAID.

Another set deploy enterprise SSD's with RAID1 or RAID5.

A third, apparently smaller, set seem to quietly deploy servers using consumer SSD's where that makes sense.

I have always found it very difficult to talk to the people in the first two sets, because they are absolutely convinced that you could never ever EVER use a consumer SSD for ANYTHING in a server, and that such things have never been done by anyone ANYWHERE and certainly are not tested by the manufacturer for any such uses.

So, in summary, I thank you for taking the time to provide me with a useful data point.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I think I already downloaded the correct file from the broadcom website. The file is called "2118it.bin" and it was contained in a file called "9211-8i_Package_P20_IR_IT_FW_BIOS_for_MSDOS_Windows.zip". So it should be as easy as running the following?

Code:
sas2flash.efi -o -f 2118it.bin

Yes, approximately, but just "sas2flash" -- the UNIX executable doesn't have an .efi extension.

If you can avoid having the pool imported when you're doing it, I recommend that. The firmware update causes the controller to reset (this may be obvious), and that may not make ZFS too happy if there's a live pool there.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
I think I already downloaded the correct file from the broadcom website. The file is called "2118it.bin" and it was contained in a file called "9211-8i_Package_P20_IR_IT_FW_BIOS_for_MSDOS_Windows.zip". So it should be as easy as running the following?

Code:
sas2flash.efi -o -f 2118it.bin

That's for doing it from the EFI shell. @jgreco is referring to /usr/local/sbin/sas2flash.
 

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
@jgreco: I am happy to help you. I always prefer to conserve the importat data on a thread like this and prevent just pointing to external sources which might later not be available. I find it very interesting that you are advocating for consumer hardware in this regard. I am using these drives in a pool that is not that important to me, but before these issues now everything worked fine. I use them for quite some time now, and they have now lost about 25% of there lifetime, which seems fine to me considering the usecase and price of them.

That's for doing it from the EFI shell. @jgreco is referring to /usr/local/sbin/sas2flash.
You are completly right, I just copied the command from the mentioned blog post, not thinking much. I had a long day and will do the firmware update tomorrow, with a "fresh brain" :D.

Yes, approximately, but just "sas2flash" -- the UNIX executable doesn't have an .efi extension.

If you can avoid having the pool imported when you're doing it, I recommend that. The firmware update causes the controller to reset (this may be obvious), and that may not make ZFS too happy if there's a live pool there.
As mentioned will try to do it tomorrow, I think I will boot the machine without drives to be safe and reconnect them after the update. Just to be safe!
 

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
I just did the firmware update as suggested using the build in sas2flash and it worked flawlessly. Thank both of you!
 
Top