Weekly crashes

infraerik

Dabbler
Joined
Oct 12, 2017
Messages
24
I have a bit of an odd one here on a couple of lab machines that both crash every Sunday. Totally different hardware, one is an HP Microserver with a USB connected SSD for the boot volume plus internal storage disks and the other a Mac Mini with an internal SSD boot volume and Thunderbolt connected storage disks. They are both on a UPS and nothing else connected to the UPS is affected so I don't think it's a power issue.

The only thing that I can think of is the scrub of the boot-pool, but I can't find where to disable this in the current version. I've tested disabling the other scrubs without changing the behaviour. That seem to be the only thing that would be specific to Sundays.

Ideas ?
 

infraerik

Dabbler
Joined
Oct 12, 2017
Messages
24
And it continues:

TrueNAS @ bezons-s3.infrageeks.com New alerts: * bezons-s3.infrageeks.com had an unscheduled system reboot. The operating system successfully came back online at Sun Jan 23 21:35:28 2022. Current alerts: * bezons-s3.infrageeks.com had an unscheduled system reboot. The operating system successfully came back online at Sun Dec 26 20:54:20 2021. * bezons-s3.infrageeks.com had an unscheduled system reboot. The operating system successfully came back online at Sun Jan 2 20:33:16 2022. * bezons-s3.infrageeks.com had an unscheduled system reboot. The operating system successfully came back online at Sun Jan 9 20:54:01 2022. * bezons-s3.infrageeks.com had an unscheduled system reboot. The operating system successfully came back online at Sun Jan 16 21:14:42 2022. * bezons-s3.infrageeks.com had an unscheduled system reboot. The operating system successfully came back online at Sun Jan 23 21:35:28 2022.
 

infraerik

Dabbler
Joined
Oct 12, 2017
Messages
24
And it turns out that there was a misbehaving UPS that was "protecting" both of those servers. No idea why it was resetting on such a regular schedule, but swapped it out for another one and no more unexpected resets.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Many UPSes can schedule a regular test of the batteries, by going off A.C. power for a few minutes. If the batteries or the actual UPS can't handle the load, it's possible that the client servers, (2 you said), will overload it and thus, the client servers loose power. The UPS probably gives up on the test and goes back to A.C. power. At which point if the client servers auto-boot on restoration of A.C. power, they then boot.

Convoluted reasoning. But, I remember hearing my old APC UPS go into test mode often enough. It was used for my desktop environment, so I would hear it, unlike the other UPS in my "home data center", (aka un-used bedroom).
 

infraerik

Dabbler
Joined
Oct 12, 2017
Messages
24
I'm guessing it's something like that, but it's frustrating as that setup has been running happily for a couple of years and the UPS isn't showing any alerts regarding battery health or capacity and just started doing this spontaneously.

Ah well - as long as they are stable again and I don't have to go reboot a bunch of my lab VMs every Monday morning. Old UPS is heading off to the recycler.
 
Top