Doorbell handshake failed

jkingaround

Dabbler
Joined
Sep 12, 2016
Messages
13
Hi all. Woke up to an unresponsive FreeNAS and upon reboot I am greeted by a slew of "mps0: mps_wait_db_ack: failed due to timeout count(10000)" and "mps0: Doorbell handshake failed". I thought this might be due to the upgrade to U7 so I booted to U6 but still have the same error. It just loops continuously. I can't find anything about this anywhere online. Any ideas?
 

Attachments

  • Screen Shot 2019-03-17 at 11.41.24 AM.png
    Screen Shot 2019-03-17 at 11.41.24 AM.png
    24.2 KB · Views: 560

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
Please list full hardware details. See my signature for an example.
 

jkingaround

Dabbler
Joined
Sep 12, 2016
Messages
13
Case: SUPERMICRO 4U 846E16-R1200B
Mobo: X8DTE-F
RAM: 32 GB ECC
CPU: Dual Intel XEON L5520
Storage: 16 x 4TB WD Red RAID Z2 (2 pools of 8x each), 1 x 256GB Samsung 840 EVO SSD for boot
PSU: Corsair TX750
 
Joined
Dec 29, 2014
Messages
1,135
Google finds several references to that message, and most of them are tied to hardware faults. The most likely suspect are the card, or the PCIe slot in the motherboard. The former being the highest probability, IMHO.
 

jkingaround

Dabbler
Joined
Sep 12, 2016
Messages
13
Google finds several references to that message, and most of them are tied to hardware faults. The most likely suspect are the card, or the PCIe slot in the motherboard. The former being the highest probability, IMHO.

how can i check the health of the hardware?
 
Joined
Dec 29, 2014
Messages
1,135
Not to do too much invoking of Capt. Obvious, but it was working and now it isn't. If it happens on both the new and old version, that pretty clearly points to hardware. The might be some diagnostics you could do on the board, but I would start planning for replacing the HBA. If the HBA is built into the motherboard, perhaps you need to replace that. Sorry to be the bearer of bad news...
 

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
You didn't list it, but you're using an LSI HBA? If so, maybe start by taking it out and reseating it. If that doesn't work you'll probably need a new one.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
As an interesting note, I'm seeing this on a virtualized FreeNAS after having increased VM memory from 64GB to 128GB. It works for a few hours under heavy load and then dies. Lowering the RAM back down and restarting doesn't fix it, the hypervisor actually needs to be rebooted. Then it's all fine again, so it doesn't appear to be hardware.
 

pmccabe

Dabbler
Joined
Feb 18, 2013
Messages
18
This is now happening to me after I upgraded to the latest version of TrueNas 12.0U1. I have been running just fine in a virtualized environment with 128GB of ram for many years. After upgrade, crash after a few hours. As jgreco mentions, the only way to fix this is to reboot the hypervisor.

Initially tried using ESXI 6, then updated to the latest 7.0.1 and still same issue. Sucks now I have to limit myself to 64GB of ram :(
 

starnes892

Cadet
Joined
Jan 2, 2021
Messages
1
I have been running TrueNas 12 in esxi 6.7 for 40 days and have been testing to confirm my setup is stable. Moved to TrueNas 12.0U1 last night and this morning Truenas is doing this error on the hba card. I remove the hba from the passthrough to the vm and it fires right up. Smells like a issue with the new update.
 

pmccabe

Dabbler
Joined
Feb 18, 2013
Messages
18
I have been running TrueNas 12 in esxi 6.7 for 40 days and have been testing to confirm my setup is stable. Moved to TrueNas 12.0U1 last night and this morning Truenas is doing this error on the hba card. I remove the hba from the passthrough to the vm and it fires right up. Smells like a issue with the new update.

Do you have more than 64GB of ram assigned to your VM, if so, can you try reducing it to 64 GB and see if the issue persists.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I know this thread is a little old... but it's the first thing that popped up when I ran into this problem tonight and searched the Interwebs.

I've been running FreeNAS virtualized for years. Main main server since 2017 (see 'Bandit' in my systems below) started out with 128GB of RAM, of which I allocated 64GB to the FreeNAS VM. Years of uneventful service go by... Last September I added another 128GB and increased the FreeNAS VM's allocation to 128GB. Months of uneventful service go by... Then a few days ago I bumped it up to 192GB and tonight I got the 'Doorbell Handshake'. Yikes!

Ah, well. Guess I was reaching too far. I've set it back to 128GB and rebooted the server. We'll see what happens.
 

hernanbozzano

Dabbler
Joined
Aug 3, 2018
Messages
15
Hi there! I am having the same problem, my theory is that it is related with the lastest TRUENAS releases, i am currently running TrueNAS-12.0-U6 as a VM into ESXi 6.7... my virtual machine works great but if i try to edit (to add more ram) it crashes after a few hours... when i power on again it says something like "doorbell handshake failed" when trying to boot the disk connected to the LSI 9211-8i via passthrough... only workaround is reboot the whole system, but after a few hours it crashes again... i tried to create a new VM from scratch (with more ram) and again, after a few hours it crashes. Actually i am only stuck with the old vm i have created from a few months ago.. wich was 12.0-U2 or something like that and i have been updating to U6..

My setup: SUPERMICRO X11SSM-F, INTEL XEON E3-1230v6, KINGSTON KVR24E17S8 8GB x4, LSI 9211-8i.... i dont believe its a hardware issue because it works correctly with my old vm. the problem comes up when i edit it or create a new one and passthrough the pci device in there. let me say it again, this is my THEORY based of my experience and what i have been seeing so far.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Hi there! I am having the same problem, my theory is that it is related with the lastest TRUENAS releases, i am currently running TrueNAS-12.0-U6 as a VM into ESXi 6.7... my virtual machine works great but if i try to edit (to add more ram) it crashes after a few hours... when i power on again it says something like "doorbell handshake failed" when trying to boot the disk connected to the LSI 9211-8i via passthrough... only workaround is reboot the whole system, but after a few hours it crashes again... i tried to create a new VM from scratch (with more ram) and again, after a few hours it crashes. Actually i am only stuck with the old vm i have created from a few months ago.. wich was 12.0-U2 or something like that and i have been updating to U6..

My setup: SUPERMICRO X11SSM-F, INTEL XEON E3-1230v6, KINGSTON KVR24E17S8 8GB x4, LSI 9211-8i.... i dont believe its a hardware issue because it works correctly with my old vm. the problem comes up when i edit it or create a new one and passthrough the pci device in there. let me say it again, this is my THEORY based of my experience and what i have been seeing so far.
I have only experienced this problem with FreeNAS 11.2-U8, so I'm not convinced it's tied to the FreeNAS/TrueNAS version.

My theory is that the trigger seems to be allocating more than half the ESXi server's RAM to the FreeNAS/TrueNAS VM. I've been running my 11.2-U8 VM with 128GB of RAM on a 256GB server for nearly a year now with no problems, and for years before that with 64GB of 128GB (I added RAM last year). And that's with 3 x LSI HBAs passed through to the VM -- LSI SAS-9210's at first, now LSI SAS-9217's. (See 'BANDIT' in 'my systems' below).

I only saw this error when I tried to bump the VM's RAM up to 196MB after adding RAM last year.
 

hernanbozzano

Dabbler
Joined
Aug 3, 2018
Messages
15
I have only experienced this problem with FreeNAS 11.2-U8, so I'm not convinced it's tied to the FreeNAS/TrueNAS version.

My theory is that the trigger seems to be allocating more than half the ESXi server's RAM to the FreeNAS/TrueNAS VM. I've been running my 11.2-U8 VM with 128GB of RAM on a 256GB server for nearly a year now with no problems, and for years before that with 64GB of 128GB (I added RAM last year). And that's with 3 x LSI HBAs passed through to the VM -- LSI SAS-9210's at first, now LSI SAS-9217's. (See 'BANDIT' in 'my systems' below).

I only saw this error when I tried to bump the VM's RAM up to 196MB after adding RAM last year.

i dont think it is related to allocating more than half of the entire RAM, i have 32 Gb of RAM (28+/- usable and my truenas is using 10 Gb right now, when i edit the truenas VM to 12 Gb, it crashes. And it is even far from 50% (that would be 14 +/-).. perhaps it is some kind of bug...
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
My theory is that the trigger seems to be allocating more than half the ESXi server's RAM to the FreeNAS/TrueNAS VM.

I've got some hosts with 256GB on which I have to keep them at 32GB FreeNAS VM's instead of 64GB, so this seems to be invalid.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I've got some hosts with 256GB on which I have to keep them at 32GB FreeNAS VM's instead of 64GB, so this seems to be invalid.
So it's not the version of FreeNAS/TrueNAS, and it's not using more than 50% of system RAM... so we don't know what it is! Drat!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
I've been slowly updating hypervisors with FreeNAS HBA's to LSI 9300 based stuff, in preparation for an eventual migration to ESXi 7, where the 9211's don't fare well. I'm kinda hoping that fixes it, but so far I don't think I've tried large RAM against any of them...
 

hernanbozzano

Dabbler
Joined
Aug 3, 2018
Messages
15
hi there! i have been testing for 48 hours and running a new VM as a workaround: installed truenas 12.0 U1, updated to U6 and then i added the pci device lsi2008 with the drives. the VM is working fine (if i install directly U6, after a few hours the VM shutdown suddenly and it says "doorbell handshake failed" when i boot)..... i am still testing it, but i believe this might be a workaround and there is some kind of bug in the truenas iso when running ESXi.

english is not my born language, and this is very technical... if i dont explain myself clearly please let me know!
Hi there! I am having the same problem, my theory is that it is related with the lastest TRUENAS releases, i am currently running TrueNAS-12.0-U6 as a VM into ESXi 6.7... my virtual machine works great but if i try to edit (to add more ram) it crashes after a few hours... when i power on again it says something like "doorbell handshake failed" when trying to boot the disk connected to the LSI 9211-8i via passthrough... only workaround is reboot the whole system, but after a few hours it crashes again... i tried to create a new VM from scratch (with more ram) and again, after a few hours it crashes. Actually i am only stuck with the old vm i have created from a few months ago.. wich was 12.0-U2 or something like that and i have been updating to U6..

My setup: SUPERMICRO X11SSM-F, INTEL XEON E3-1230v6, KINGSTON KVR24E17S8 8GB x4, LSI 9211-8i.... i dont believe its a hardware issue because it works correctly with my old vm. the problem comes up when i edit it or create a new one and passthrough the pci device in there. let me say it again, this is my THEORY based of my experience and what i have been seeing so far.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
hi there! i have been testing for 48 hours and running a new VM as a workaround: installed truenas 12.0 U1, updated to U6 and then i added the pci device lsi2008 with the drives. the VM is working fine (if i install directly U6, after a few hours the VM shutdown suddenly and it says "doorbell handshake failed" when i boot)..... i am still testing it, but i believe this might be a workaround and there is some kind of bug in the truenas iso when running ESXi.

english is not my born language, and this is very technical... if i dont explain myself clearly please let me know!
That's an interesting data point, but 48 hours may not be a long enough test to prove reliability.
 
Top