X9 Watchdog Issues

Status
Not open for further replies.

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
I just had another shutdown (not a restart) with the only IPMI messages regarding watchdog.. Has the X9 Watchdog been addressed?

Code:
 Event ID   	  Time Stamp   	  Sensor Name   	  Sensor Type   	  Description  
1	2016/03/02 05:15:40	Watchdog 2 #0xca	Watchdog 2	Timer Interrupt - Asserted
2	2016/03/02 05:15:41	Watchdog 2 #0xca	Watchdog 2	Hard Reset - Asserted
3	2016/03/02 05:18:51	OEM	AC Power On	AC Power On - Asserted (I reset the power by hand here - forces PSU to be cycled before firing up again


Any help info or stuff to check would be much appreciated!

Thanks!
 
Last edited:

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
@cyberjock did you ever get a resoloution? I think you had issues with watchdog on the x9 series?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Honestly, I can't remember if *I* had issues with the watchdog timer. But last time I checked (and this has been confirmed by at least 2 or 3 individuals in the forums over the last few years) the watchdog timers on at least some X9s/Supermicro boards are NOT being reset regularly by the required software that is required for proper operation of the watchdog function. As a result, reboots should be expected and the only two solutions are to disable the watchdog timer and optionally put in a bug ticket for it to be supported. It is very likely that if it isn't supported that it's because there is no compatible code with FreeBSD for the particular watchdog timer on that board, in which case there is no 'easy fix' and you simply should not attempt to use the watchdog function at all.

In your case, the only solution I can recommend is to disable the watchdog timer. For me, my server is doing something so I can't reboot it, but I always disable the watchdog timers on FreeNAS systems I build for friends because it seems to be more likely that enabling the watchdog will create more problems than the rare scenario where the system freezes completely (that's the scenario the watchdog timer is supposed to help recover from, automatically).
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Honestly, I can't remember if *I* had issues with the watchdog timer. But last time I checked (and this has been confirmed by at least 2 or 3 individuals in the forums over the last few years) the watchdog timers on at least some X9s/Supermicro boards are NOT being reset regularly by the required software that is required for proper operation of the watchdog function. As a result, reboots should be expected and the only two solutions are to disable the watchdog timer and optionally put in a bug ticket for it to be supported. It is very likely that if it isn't supported that it's because there is no compatible code with FreeBSD for the particular watchdog timer on that board, in which case there is no 'easy fix' and you simply should not attempt to use the watchdog function at all.

In your case, the only solution I can recommend is to disable the watchdog timer. For me, my server is doing something so I can't reboot it, but I always disable the watchdog timers on FreeNAS systems I build for friends because it seems to be more likely that enabling the watchdog will create more problems than the rare scenario where the system freezes completely (that's the scenario the watchdog timer is supposed to help recover from, automatically).


Exactly.. Thank you for the confirmation.. I double checked the other day and it's disabled in BIOS.. Then I get the reboot again (at random).. So I think it was related to the bug you talk about.. Maybe the software side is still working somehow lol.. Is there a way to disable FreeNAS side aswell? Would watchdogd_enable="no" as a tunable work?
 
Last edited:

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Mm.. I'm at a loss of what's causing it.. What's odd is it's a shutdown and not reboot.. IPMI logs it..

Code:
ipmitool sel elist | more
   1 | 03/02/2016 | 05:15:40 | Watchdog 2 #0xca | Timer interrupt () | Asserted
   2 | 03/02/2016 | 05:15:41 | Watchdog 2 #0xca | Hard reset () | Asserted
   3 | 03/02/2016 | 05:18:51 | Unknown #0xff |  | Asserted


I can confirm looks like timer is dead from ipmi/bios side..
Code:
~# ipmitool mc watchdog get
Watchdog Timer Use:     SMS/OS (0x04)
Watchdog Timer Is:      Stopped
Watchdog Timer Actions: No action (0x00)~# ipmitool mc watchdog get
Watchdog Timer Use:     SMS/OS (0x04)
Watchdog Timer Is:      Stopped
Watchdog Timer Actions: No action (0x00)
Pre-timeout interval:   0 seconds
Timer Expiration Flags: 0x00
Initial Countdown:      0 sec
Present Countdown:      0 sec


Any idea how to ask for 0xca? This seems to be the timer thats active but I can't see details etc..
 
Last edited:
Status
Not open for further replies.
Top