IOC Fault 0x40000d04, Resetting

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
My old LSI SAS 2008 HBA started causing occasional pool corruption so I replaced it with a LSI SAS 2308 HBA. Since then I've had no further pool corruption issues. I've also added a fan to provide extra cooling for the 2308 as I think that's why the 2008 started having issues after many years of perfect service.

But now I'm running into the issue where sometimes when I restart my NAS I get the following error shortly after it finishes booting. This causes it to kick several drives from my pool, locking up TrueNAS.

Code:
mps0: IOC Fault 0x40000d04, Resetting
mps0: Reinitializing controller
mps0: Firmware: 20.00.07.0, Driver: 21.02.00.00-fbsd


Because it's locked up, I have to force power it off. However, if the box boots and I don't receive the error, everything works perfectly. It will run fine until i need to reboot. AFAICT, there's no pattern to the IOC Fault. Sometimes it happens, sometimes it doesn't. It hasn't caused any pool corruption and it only seems to happen within a couple minutes of finishing booting.

Any ideas what the issue could be? AFAIK, 20.00.07.0 is the latest firmware. I'm not sure why the Driver is 21.02.00.00-fbsd.

Thanks.
 

zbrown90

Dabbler
Joined
Feb 13, 2016
Messages
10
I dont have anything to contribute other than Ive experienced this issue with my HP Z840 while using the built-in LSI 2308 controller. My controller is flashed to IT mode and running the same firmware and driver as yours.

I chalked it up to the smr drives that I have but Im not exactly sure that is the issue.

I ended up moving the drives from the hba to the sata controller. Eventually I'll buy more drives (SAS this time around) and see if the errors come back.

Im wondering if it is a problem with the hba and freebsd. I might also migrate from Core to Scale and see if anything changes with that.
 

nomad-fr

Cadet
Joined
Jul 5, 2018
Messages
7
Hi same trouble after a reboot on FreeBSD 13.2-RELEASE-p2.
The trouble started the 2023-09-11 but the upgrade was done on the 2023-08-03.

the same day befor these log I had a : critical temperature detected : because maube of a fan issue...

maybe I will check my fan !
 
Last edited:

zbrown90

Dabbler
Joined
Feb 13, 2016
Messages
10
About a month ago I removed the heatsink from the hba chip and re-applied thermal paste to the heatsink. I also installed a spare sata drive (cmr) that is connected to the hba and made a dataset and whatnot in Truenas. I haven't had the error come back yet.
 
Top