Two FreeNAS systems crashed and rebooted within 10 minutes of each other!?!?!??!

zimmy6996

Explorer
Joined
Mar 7, 2016
Messages
50
Hi all, i have a weird one, and there is no rhyme or reason. i have 2 FreeNAS systems running on Supermicro hardware. Both boxes had been running since October 12, 2018. They are iSCSI hosts for my ESX environment. For some reason yesterday, one of them tipped over, and rebooted, and within 10 minutes of the first one, the 2nd one rebooted. The box that rebooted 2nd, actually was booted in October 2018 about 10 minutes after, so it absolutely seems like an "uptime" issue.



FreeNAS-9.10.2-U6 (561f0d7a1)


That is the version I am running on both nodes. Is there any known bug that aligns with behavior like this? Just odd that 2 different pieces of hardware rebooted. There is nothing common between them, except that they both are iSCSI to an ESX farm. I have verified no power issues happened. The ESX farm all stayed live, though now I've got some data corruption on a copy of VMs.
 

zimmy6996

Explorer
Joined
Mar 7, 2016
Messages
50
Might anyone have any insight where to look for clues about why? I've gone thru

/var/log/messages
debug
damon


And there is nothing that stands out. Just see normal log messages, and then you can see all the messages as part of a boot up. But no real insight in to why it crashed.
 

zimmy6996

Explorer
Joined
Mar 7, 2016
Messages
50
I have literally combed everything in logs for both boxes, and there is nothing ... On top of that, I confirmed 100% this wasn't a power issue, and both boxes have redundant PSUs, and both PSUs are on different legs of power. So there is absolutely no explanation here. Anyone?????
 

dashtesla

Explorer
Joined
Mar 8, 2019
Messages
75
I would consider upgrading to 11.3 might as well since it's crashed maybe even start over fresh and copy the data then switch over with minimum downtime. I've seen quite a lot of issues but never uptime. As unlikely as it may be but if you had some hardware from the same batch running on similar conditions it's "possible" not probable but possible at least that it would fail 10 minutes apart i would check the hardware/ram also dust buildup. Something causing a memory leak over a very very long period of time that would naturally go unnoticed could also be playing a part here (though i don't know how the freenas logging works and what it actually ends up there so..)
 
Top