Help Diagnosing Hardware Errors

lostinak

Cadet
Joined
Oct 21, 2015
Messages
6
I am getting the following output on the console randomly and I could use some help diagnosing the issue. Is this my M1015 HBA card (in IT mode) slowly dying? Or is the the PCI slot itself going out.

Dec 26 08:52:53 odin mps0: IOC Fault 0x40007e23, Resetting
Dec 26 08:52:53 odin mps0: Reinitializing controller
Dec 26 08:52:53 odin mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Dec 26 08:52:53 odin mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Dec 26 08:52:54 odin mps0: Error reading device 0xf SATA PASSTHRU; iocstatus= 0x47
Dec 26 08:52:54 odin mps0: Error reading device 0xf SATA PASSTHRU; iocstatus= 0x804b
Dec 26 08:52:54 odin mps0: Sleeping 3 seconds after SATA ID error to wait for spinup

Please let me know if anyone needs any additional information - I am not very well versed at diagnosing hardware issues. This is on TrueNAS Core, 12.0 U8.1 if that makes a difference.

Thanks in advance for any help!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
How's the cooling for the card? They do need a decent amount of airflow. New thermal paste might also be in order if it's never been changed.

If it turns out not to be a cooling issue... Well, things get harder to figure out, so let's start with the simple stuff first.
 

lostinak

Cadet
Joined
Oct 21, 2015
Messages
6
Cooling is decent - not great but not terrible. This is a 4U server chassis with 24 hot swap bays up front - the hard drives in the bays sit around 36-38 usually and I leave the fans on high as noise is not an issue.

I don't have access to thermal paste immediately unfortunately.

Edit:
Thinking about the cooling - I think you are on to something. I am currently badblocking 3 drives and transferring some data around so the cards are working pretty hard. Might be good to wait and see before tearing things apart.
 
Last edited:
Top