SOLVED: IOC Fault after upgrade to 12 U2

Elo

Contributor
Joined
Mar 11, 2012
Messages
122
I have two TrueNAS servers running 12-U2 (40 HDDs). They are in a domain controlled by a Windows 2016 Server as ADC . One server is running OK after some trouble with time sync to the domain. The other is giving me some trouble.

I am in the process of tidy up and rearrange my data and need to copy large amounts of data between the two servers. When I copy to one the server dies and behave erratic. After some search I have found that one of the disk controllers is resetting again and again . (Info on the device follows and the error message).

mps0: <Avago Technologies (LSI) SAS2008> port 0xe000-0xe0ff mem
0xdf4c0000-0xdf4c3fff,0xdf480000-0xdf4bffff irq 17 at device 0.0 on pci2
mps0: Firmware: 20.00.04.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc

Note: The Controller is several years old but has been running without any problem for many years and my systems has 4 identical

Feb 18 19:07:41 OB-NAS-2 mps0: IOC Fault 0x40001500, Resetting
Feb 18 19:07:41 OB-NAS-2 mps0: Reinitializing controller
Feb 18 19:07:41 OB-NAS-2 mps0: Firmware: 20.00.04.00, Driver: 21.02.00.00-fbsd
Feb 18 19:07:41 OB-NAS-2 mps0: IOCCapabilities:
1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

I am of course afraid of destroying data as I do not know the nature of the IOC Fault.

I had no problems with this controller before upgrading to TrueNAS from FreeNAS but that can be a coincidence. I need to determine but I have no idea on how to debug this further. Any advice from some of u superusers?
 

Elo

Contributor
Joined
Mar 11, 2012
Messages
122
Seems to be a HW problem or some incompatibility: I switched the controllers and the problem disappeared. Will replace the controller.
 

atlantic

Explorer
Joined
Jan 9, 2020
Messages
52
I have two TrueNAS servers running 12-U2 (40 HDDs). They are in a domain controlled by a Windows 2016 Server as ADC . One server is running OK after some trouble with time sync to the domain. The other is giving me some trouble.

I am in the process of tidy up and rearrange my data and need to copy large amounts of data between the two servers. When I copy to one the server dies and behave erratic. After some search I have found that one of the disk controllers is resetting again and again . (Info on the device follows and the error message).

mps0: <Avago Technologies (LSI) SAS2008> port 0xe000-0xe0ff mem
0xdf4c0000-0xdf4c3fff,0xdf480000-0xdf4bffff irq 17 at device 0.0 on pci2
mps0: Firmware: 20.00.04.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc

Note: The Controller is several years old but has been running without any problem for many years and my systems has 4 identical

Feb 18 19:07:41 OB-NAS-2 mps0: IOC Fault 0x40001500, Resetting
Feb 18 19:07:41 OB-NAS-2 mps0: Reinitializing controller
Feb 18 19:07:41 OB-NAS-2 mps0: Firmware: 20.00.04.00, Driver: 21.02.00.00-fbsd
Feb 18 19:07:41 OB-NAS-2 mps0: IOCCapabilities:
1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

I am of course afraid of destroying data as I do not know the nature of the IOC Fault.

I had no problems with this controller before upgrading to TrueNAS from FreeNAS but that can be a coincidence. I need to determine but I have no idea on how to debug this further. Any advice from some of u superusers?


Hi, I'm having this issue too. Similarly it has only occurred since upgrading to TrueNAS 12 U2 from a flawlessly running FreeNAS 11.3 for about a year. Same LSI chip (Dell H200e).

I thought it was the controller overheating so replaced the thermal paste, it was OK for 2 days but now the resets are back.

freenas mps1: IOC Fault 0x40007e23, Resetting freenas mps1: Reinitializing controller freenas mps1: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd freenas mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

I even flashed the card to 20.00.07.00 as you can see, still no luck.

Is yours still running well after replacing the card? (My web search comes up with threads suggesting it could be a freebsd driver bug).
 

Elo

Contributor
Joined
Mar 11, 2012
Messages
122
Hi, I'm having this issue too. Similarly it has only occurred since upgrading to TrueNAS 12 U2 from a flawlessly running FreeNAS 11.3 for about a year. Same LSI chip (Dell H200e).

I thought it was the controller overheating so replaced the thermal paste, it was OK for 2 days but now the resets are back.

freenas mps1: IOC Fault 0x40007e23, Resetting freenas mps1: Reinitializing controller freenas mps1: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd freenas mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

I even flashed the card to 20.00.07.00 as you can see, still no luck.

Is yours still running well after replacing the card? (My web search comes up with threads suggesting it could be a freebsd driver bug).
Hi... I have not got the new card as I ordered it from China, I do have two servers with several similar cards as the one I have trouble with and they are working. I have had a lot of problems since the upgrade and my pain is not over. I have lost the connection to my AD which is a big issue for me since I can not rebuild it YET... When I get the new card I will post an update

I do find it strange though that allthis problems occured after the update on systems that have run without issues for years..

Please share the info on the drive bug!
 

atlantic

Explorer
Joined
Jan 9, 2020
Messages
52
Thanks for the reply, really sorry to hear you’re not out of the water yet. I can answer my own question and maybe offer some hope. I bought a replacement card and it has so far cured all the scsi errors and mps resets, pool scrubed with no problems. I will need to run it for a week or so to be certain though. I very much hope it is just coincidence regarding TN12. I’ll have to dig out and post the links regarding some of the driver bug stuff I found, even if it definitely is a faulty card in the end.
 

Elo

Contributor
Joined
Mar 11, 2012
Messages
122
thanks.. I wait for my replacement card.. due her in a week
 
Top