Boot Stuck "mps0: Reinitializing controller" after 13.0-U3 upgrade

zenon1823

Explorer
Joined
Nov 13, 2018
Messages
66
System & History:
Freenas/TrueNAS virtualized on ESXi
Dell H310 (sas2008) Flashed to ITmode FW ver 20.00.07.00 passed through to freenas VM
System has been running stable for 4 years across the following versions:
11.1u4 -> 11.1u6 -> 11.1u7 - >11.3u5 -> 12.2u8.1

Today I went to upgrade to 13.0u3 and it hangs on boot with the following error:
Code:
mps0: Calling Reinit from mps_wait_command, timeout=60, elapsed=60
mps0: Reinitializing controller


I see there have been quite a few posts about hanging here, but none recently on TrueNAS 13. I reverted back to 12.2u8 and it boots fine, I deleted the v13 boot and applied a 2nd upgrade with the same results. One post had reported that it would boot sucessfully after a power cycle and then not not boot, but thats not my situation.
Has anyone made any progress on determining this cause of this issue?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
What version ESXi?

There is some evidence that the SAS2008 (though the H310 is a SAS2308) has some weird problems with ESXi 7 and beyond, possibly due to the deprecation of the LSI MPS driver in ESXi 7. That is definitely related to the "boot successfully after power cycle, then not boot" behaviour you're describing, but may not be the problem you're experiencing.

I've had good experience just replacing afflicted controllers with SAS3008 based controllers, although that implies recabling as well.
 

zenon1823

Explorer
Joined
Nov 13, 2018
Messages
66
What version ESXi?

There is some evidence that the SAS2008 (though the H310 is a SAS2308) has some weird problems with ESXi 7 and beyond, possibly due to the deprecation of the LSI MPS driver in ESXi 7. That is definitely related to the "boot successfully after power cycle, then not boot" behaviour you're describing, but may not be the problem you're experiencing.

I've had good experience just replacing afflicted controllers with SAS3008 based controllers, although that implies recabling as well.
Sorry I should have included that. This host is still on esxi 6.0
I am not able to get a successful boot at all on 13.0u3 regardless of host boot state so that behavior especially if it was only on esxi7+ is not likely related to my scenario then. I find it odd that a few others other reported this issue on 11 and 12 and yet i cruised through those versions with no issues.... Lucky me :)

Jgreco, I'm not one to question your knowledge but are you sure the H310 was a SAS2308 or maybe it could be either? Because my memory says it was a SAS2008 controller.

edit:
confirmed mine is a SAS2008 as per sas2flash.
Code:
root@freenas:~ # sas2flash -list
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

        Adapter Selected is a LSI SAS: SAS2008(B2)  

        Controller Number              : 0
        Controller                     : SAS2008(B2)  
        PCI Address                    : 00:03:00:00
        SAS Address                    : 5d4ae52-0-9a5b-4700
        NVDATA Version (Default)       : 14.01.00.08
        NVDATA Version (Persistent)    : 14.01.00.08
        Firmware Product ID            : 0x2213 (IT)
        Firmware Version               : 20.00.07.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9211-8i
        BIOS Version                   : N/A
        UEFI BSD Version               : N/A
        FCODE Version                  : N/A
        Board Name                     : SAS9211-8i
        Board Assembly                 : N/A
        Board Tracer Number            : N/A

        Finished Processing Commands Successfully.
        Exiting SAS2Flash.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Jgreco, I'm not one to question your knowledge but are you sure the H310 was a SAS2308 or maybe it could be either? Because my memory says it was a SAS2008 controller.

Feel free to question away. I was under the impression that it was a 2308. Maybe not. But then I'm not sure what the difference between the H200 and H310 are, except for connector location. Either way, I've slowly been replacing all the 6Gbps SAS controllers with 12Gbps, and some of that has to do with the weird hangups ESXi 7 seems to have with LSI 6Gbps stuff.
 

zenon1823

Explorer
Joined
Nov 13, 2018
Messages
66
haha, I will always question myself standing next to you:tongue: - No Pressure :wink:
But my googlefu is confirming that both the H200 and H310 are both 2008 chips just different connector location. 2308 is the Dell H22* and the 3008 is Dell HBA35*

That good to know about the issues esxi7 has with the older LSi6gbps cards, i wasnt aware, but would likely run into it sooner or later. So for the moment still being on esxi6.0 as far as you know there shouldn't be any compatibility issues with mysetup and 13.0u3 then? The fact that it works perfectly on 12.2u8 makes me lean towards an issue with 13.0 code maybe?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Virtualization has always been a bit dicey; over the years, I've found sensitivities to MSI/MSI-X issues, number of assigned CPU cores, and various other crap. If you are thinking that it is just expected to work, that's just not the way it is. It does seem to have gotten somewhat better as the years have rolled on and ESXi has supported it better. I do know there was a bunch of stuff that kinda changed in FreeBSD 13 that somewhat affected VM's and necessitated some tweaks such as deprecating older ESXi VM hardware versions, so moving forward with FreeBSD on an older ESXi version might very well have some weird problems too.

Is there some particular reason you're still on ESXi 6.0? Even 6.7 is deprecated at this point.
 

zenon1823

Explorer
Joined
Nov 13, 2018
Messages
66
Virtualization has always been a bit dicey; over the years, I've found sensitivities to MSI/MSI-X issues, number of assigned CPU cores, and various other crap. If you are thinking that it is just expected to work, that's just not the way it is. It does seem to have gotten somewhat better as the years have rolled on and ESXi has supported it better. I do know there was a bunch of stuff that kinda changed in FreeBSD 13 that somewhat affected VM's and necessitated some tweaks such as deprecating older ESXi VM hardware versions, so moving forward with FreeBSD on an older ESXi version might very well have some weird problems too.

Is there some particular reason you're still on ESXi 6.0? Even 6.7 is deprecated at this point.
Oh I remember in fact i remember feeling like i was one of the first to buck the anti VM trend back in 2018 when i was a FreeNAS greenhorn. You and CyberJock made sure i knew all to well the risks and most of the documentation was far more bold about discouraging virtualized instances. Ive been very happy that my carefully planned implementation has been so stable over the years. And to that end is one of the reasons esxi is still on 6.0 - I was reluctant to make big changes until i had a new TrueNAS platform deployed - that and my attention around the home and my lab has be focused on other areas the last few years.

I now have TrueNAS scale running on hardware as my primary NAS and I was delegating this virtual instance to be an isolated standby NAS that I would periodically sync.

Based on a few other forums I've read I'm suspecting that you are correct that its changes in FreeBSD 13 that are imcompatible with either the passthrough on esxi6.0 or the VM Hardware Version 11.

If anyone is running the same setup id be curious if you are experiencing the same behaviour.
 

socra

Dabbler
Joined
Nov 3, 2018
Messages
34
Oh I remember in fact i remember feeling like i was one of the first to buck the anti VM trend back in 2018 when i was a FreeNAS greenhorn. You and CyberJock made sure i knew all to well the risks and most of the documentation was far more bold about discouraging virtualized instances. Ive been very happy that my carefully planned implementation has been so stable over the years. And to that end is one of the reasons esxi is still on 6.0 - I was reluctant to make big changes until i had a new TrueNAS platform deployed - that and my attention around the home and my lab has be focused on other areas the last few years.

I now have TrueNAS scale running on hardware as my primary NAS and I was delegating this virtual instance to be an isolated standby NAS that I would periodically sync.

Based on a few other forums I've read I'm suspecting that you are correct that its changes in FreeBSD 13 that are imcompatible with either the passthrough on esxi6.0 or the VM Hardware Version 11.

If anyone is running the same setup id be curious if you are experiencing the same behaviour.
Definitely some weird stuff going on with FreeBSD 13 and ESXi/Passthrough : (threads I found)
 

SGT_GUO

Dabbler
Joined
Sep 6, 2019
Messages
11
I was wondering if theres a solution for TrueNAS Core? I have rencently upgraded to TrueNAS Scale, however it has been acting wired lately, some drive shows in a pool in pool status page, but shows not in pool in disk page. Mine hangs after seeing nvme adapters in booting sequence, so its not the SAS adapters. I am using Dell R750, ESXi 8.0, HBA355e connected to a Dell ME484, with two Optain as log.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I think you need to open a new thread and explain your issue in detail, because you're saying you think it has nothing to do with what's reported here.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Definitely some weird stuff going on with FreeBSD 13 and ESXi/Passthrough : (threads I found)

Yeah, I found it easier to just drop using the 2008 and 2308 controllers in favor of the 3008. Some people suggested that the 2308 solved their issues, and as far as I know, the 3008 has solved it for everyone.
 
Top