Pool Offline. Keep crashing/rebooting. FreeNAS using HBA

ShoFly

Cadet
Joined
Oct 25, 2020
Messages
1
Version: FreeNAS-11.3-U5
Dell T620. Xeon E5-2670 0 @ 2.60GHz x2. 32GB memory
Dell H310 PCIe - 2x SSD's in Raid1. Running ESXi and Freenas.
Dell H310 PCIe flashed IT mode - connected to 8 bay backplane.

Has been running stable for about a year. Now have been randomly crashing or rebooting. Freenas most of the time comes back online, but pool offline. When Freenas doesn't restart usually is erroring 'Doorbell handshake failed'. Sometimes runs fine 3-4 days, sometimes 24 hours. I have to reboot ESXi to get Freenas pool to work again. I'm wondering if maybe my HBA needs replaced. Dell H310 PCIe flashed. I've tried reseating the card in another slot. Unless this could be memory issue as well? I setup Graylog to be able to record what is going on. Not sure if the logs point to anything specific. I turned off scrub task in case related. No errors showing in idrac. Other 2 VM's don't seem to have any issues. Nothing critical on the disks, more just for homelab tinkering that I haven't really gotten as deep into yet.

Thanks!
 

Attachments

  • FreeNAS dashboard.JPG
    FreeNAS dashboard.JPG
    88.4 KB · Views: 147
  • FreeNAS pool offline - VMware logs.txt
    233.6 KB · Views: 735
  • FreeNAS pool offline - Freenas logs 10-24.xlsx
    48.2 KB · Views: 291
  • FreeNAS pool offline - Freenas logs 10-25.xlsx
    31.4 KB · Views: 204

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
First off, FreeNAS crashing is a big deal. It should normally be extremely stable, so I would work on this problem first.

Let's walk this back and confirm how you've got everything set up. Based on your post, it sounds like you have ESXi running on your T620. FreeNAS is a VM in ESXi, and you have the second H310 PCIe card passed through to the FreeNAS VM.

Based on a quick Google search of your "handshake" error, it appears that this may be a hardware error. My first recommended troubleshooting step is reseating the card, and it sounds like you've done that.

I would not think this is a memory issue, though you can easily rule that out with a proper memory test of your system.

Without more info, based only on what you've provided here, I would guess that your HBA is bad and needs to be replaced. The good news is that, thanks to ZFS, it really doesn't matter what HBA you replace it with.
 

badincite

Dabbler
Joined
Aug 10, 2022
Messages
20
Version: FreeNAS-11.3-U5
Dell T620. Xeon E5-2670 0 @ 2.60GHz x2. 32GB memory
Dell H310 PCIe - 2x SSD's in Raid1. Running ESXi and Freenas.
Dell H310 PCIe flashed IT mode - connected to 8 bay backplane.

Has been running stable for about a year. Now have been randomly crashing or rebooting. Freenas most of the time comes back online, but pool offline. When Freenas doesn't restart usually is erroring 'Doorbell handshake failed'. Sometimes runs fine 3-4 days, sometimes 24 hours. I have to reboot ESXi to get Freenas pool to work again. I'm wondering if maybe my HBA needs replaced. Dell H310 PCIe flashed. I've tried reseating the card in another slot. Unless this could be memory issue as well? I setup Graylog to be able to record what is going on. Not sure if the logs point to anything specific. I turned off scrub task in case related. No errors showing in idrac. Other 2 VM's don't seem to have any issues. Nothing critical on the disks, more just for homelab tinkering that I haven't really gotten as deep into yet.

Thanks!
You figure out this issue? Similar problem and I have a h310 on IT mode. Covered pins 5 and 6 also just in case that was causing it.
 
Top