Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

Doorbell handshake failed

Western Digital Drives - The Preferred Drives of FreeNAS and TrueNAS CORE

jkingaround

Neophyte
Joined
Sep 12, 2016
Messages
9
Hi all. Woke up to an unresponsive FreeNAS and upon reboot I am greeted by a slew of "mps0: mps_wait_db_ack: failed due to timeout count(10000)" and "mps0: Doorbell handshake failed". I thought this might be due to the upgrade to U7 so I booted to U6 but still have the same error. It just loops continuously. I can't find anything about this anywhere online. Any ideas?
 

Attachments

Meyers

Member
Joined
Nov 16, 2016
Messages
210
Please list full hardware details. See my signature for an example.
 

jkingaround

Neophyte
Joined
Sep 12, 2016
Messages
9
Case: SUPERMICRO 4U 846E16-R1200B
Mobo: X8DTE-F
RAM: 32 GB ECC
CPU: Dual Intel XEON L5520
Storage: 16 x 4TB WD Red RAID Z2 (2 pools of 8x each), 1 x 256GB Samsung 840 EVO SSD for boot
PSU: Corsair TX750
 

Elliot Dierksen

Neophyte Sage
Joined
Dec 29, 2014
Messages
967
Google finds several references to that message, and most of them are tied to hardware faults. The most likely suspect are the card, or the PCIe slot in the motherboard. The former being the highest probability, IMHO.
 

jkingaround

Neophyte
Joined
Sep 12, 2016
Messages
9
Google finds several references to that message, and most of them are tied to hardware faults. The most likely suspect are the card, or the PCIe slot in the motherboard. The former being the highest probability, IMHO.
how can i check the health of the hardware?
 

Elliot Dierksen

Neophyte Sage
Joined
Dec 29, 2014
Messages
967
Not to do too much invoking of Capt. Obvious, but it was working and now it isn't. If it happens on both the new and old version, that pretty clearly points to hardware. The might be some diagnostics you could do on the board, but I would start planning for replacing the HBA. If the HBA is built into the motherboard, perhaps you need to replace that. Sorry to be the bearer of bad news...
 

Meyers

Member
Joined
Nov 16, 2016
Messages
210
You didn't list it, but you're using an LSI HBA? If so, maybe start by taking it out and reseating it. If that doesn't work you'll probably need a new one.
 

jgreco

Resident Grinch
Moderator
Joined
May 29, 2011
Messages
13,705
As an interesting note, I'm seeing this on a virtualized FreeNAS after having increased VM memory from 64GB to 128GB. It works for a few hours under heavy load and then dies. Lowering the RAM back down and restarting doesn't fix it, the hypervisor actually needs to be rebooted. Then it's all fine again, so it doesn't appear to be hardware.
 

pmccabe

Junior Member
Joined
Feb 18, 2013
Messages
18
This is now happening to me after I upgraded to the latest version of TrueNas 12.0U1. I have been running just fine in a virtualized environment with 128GB of ram for many years. After upgrade, crash after a few hours. As jgreco mentions, the only way to fix this is to reboot the hypervisor.

Initially tried using ESXI 6, then updated to the latest 7.0.1 and still same issue. Sucks now I have to limit myself to 64GB of ram :(
 

starnes892

Newbie
Joined
Jan 2, 2021
Messages
1
I have been running TrueNas 12 in esxi 6.7 for 40 days and have been testing to confirm my setup is stable. Moved to TrueNas 12.0U1 last night and this morning Truenas is doing this error on the hba card. I remove the hba from the passthrough to the vm and it fires right up. Smells like a issue with the new update.
 

pmccabe

Junior Member
Joined
Feb 18, 2013
Messages
18
I have been running TrueNas 12 in esxi 6.7 for 40 days and have been testing to confirm my setup is stable. Moved to TrueNas 12.0U1 last night and this morning Truenas is doing this error on the hba card. I remove the hba from the passthrough to the vm and it fires right up. Smells like a issue with the new update.
Do you have more than 64GB of ram assigned to your VM, if so, can you try reducing it to 64 GB and see if the issue persists.
 
Top