FreeNAS 11.2-U4.1 & watchdog hard resets => unexpected reboots

getcom

Dabbler
Joined
Jun 3, 2019
Messages
10
Hello to all,

I need some hints to find the root cause of unexpected system reboots due to watchdog hard resets.

The system is FreeNAS 11.2-U4.1 running on a 3HE Supermicro Superserver Hardware with MB X10DRi, 2x Intel XEON E5-2640 v4, 256GB ECC RAM, LSI SAS 9305-16i, 16x HGST SAS 10TB + 2x Intel SSDSC2KB960G8, 2x Intel Optane P4800X SSDPED1K375GA01, 1x Intel X710 Dual SFP+, 1x Mellanox Dual Infiniband ConnectX-2.
FreeNAS is working as an iSCSI multipath (2 x Infiniband subnet, IPoIB) and a NFS server (over LACP lagg, SFP+) for a Proxmox cluster. The system is running on the Intel SSD mirrow, the HGSTs are used as 8 mirrored VDEVs in a pool. The Optanes are the mirrored SLOG device for the pool.
If a reset/reboot happens the VMs are running into trouble and have to be rebootet.

At the moment there is at least one reset a day.
After a reboot there is absolutely nothing interesting in the FreeNAS system logs.
In the IPMI event logs there are entries like that, which is all what I have at the moment:

2019/05/27 23:58:13 #0xca Watchdog 2 Timer Interrupt - Assertion
2019/05/27 23:58:14 #0xca Watchdog 2 Hard Reset - Assertion
2019/05/29 17:53:58 #0xca Watchdog 2 Timer Interrupt - Assertion
2019/05/29 17:53:59 #0xca Watchdog 2 Hard Reset - Assertion
2019/05/31 22:12:05 #0xca Watchdog 2 Timer Interrupt - Assertion
2019/05/31 22:12:06 #0xcaWatchdog 2 Hard Reset - Assertion
2019/05/31 22:40:02 #0xca Watchdog 2 Timer Interrupt - Assertion
2019/05/31 22:40:03 #0xca Watchdog 2 Hard Reset - Assertion
...

Watchdog is disabled in BIOS.
I created an entry in System / Tuneables: "watchdogd_enable" with a value "NO" and type rc.conf. A "/etc/rc.d/watchdogd onestatus" tells me it is not running, "ipmitool sdr" tells me all is ok., "ipmitool chassis status" says all is fine.

# ipmitool mc watchdog get
Watchdog Timer Use: SMS/OS (0x04)
Watchdog Timer Is: Stopped
Watchdog Timer Actions: Hard Reset (0x01)
Pre-timeout interval: 0 seconds
Timer Expiration Flags: 0x10
Initial Countdown: 274 sec
Present Countdown: 0 sec

The server parts are new, except the Mellanox card, the hardware is working and tested, all is fine but nevertheless the system is resetting and I have no clue where it comes from. There is still a kernel module ipmi.ko loaded.
I assume that I have either a system incompatibility with FreeBSD/FreeNAS, maybe with the Mellanox driver, a hardware fault (but where if all is green?) or a bug somewhere in the IPMI FreeBSD stuff or something completely unknown.
The question for me is: what is triggering the reset and how could I find it?

Does anybody has an idea how to get that encirceled? Disable the IPMI kernel module and check if the server will freeze?

Thank you in advance.

Ralf
 
Last edited:
D

dlavigne

Guest
Go ahead and create a report at bugs.ixsystems.com and post the issue number here.
 
Top