Two Servers on same network restarted at same time, why?

Status
Not open for further replies.

Steven Sedory

Explorer
Joined
Apr 7, 2014
Messages
96
We have two FreeNAS servers on our network, one for backups, the other as an iSCSI target. Both servers restarted this morning at the same time for some reason.

This has never happened before. There was not a power issue that I can see, as other equipment on the same PDU did not restart.

Auto updates are not on.

Can someone give me some pointers as where to look for the cause?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
If it's not a power issue then it probably wasn't at the same time. Updates don't get auto installed, you have to click the update button. Look in /var/log/messages for the exact restart time stamp on both. Also every post needs hardware specs and freenas version, your question is worthless without that info.

Sent from my Nexus 5X using Tapatalk
 

Steven Sedory

Explorer
Joined
Apr 7, 2014
Messages
96
I apologize for the sloppy quick post. I'm running damage control as several of our customer VMs went down (due to the iSCSI box restart), and wanted to get this question posted right away.

Box 1:
FreeNAS-9.10-STABLE-201605021851 (35c85f7)
Intel(R) Xeon(R) CPU L5520 @ 2.27GHz
16GB RAM

Box 2:
FreeNAS-9.3-STABLE-201506042008
Intel(R) Xeon(R) CPU X5550 @ 2.67GHz
32GB RAM

Uptime on box 1 is 2 minutes before box two (4:21 and 4:23 hours ago).

Box 1 messages log (from 9th till this morning's startup):
Nov 9 00:00:00 strg01 syslog-ng[2324]: Configuration reload request received, reloading configuration;
Nov 10 00:00:00 strg01 syslog-ng[2324]: Configuration reload request received, reloading configuration;
Nov 11 00:00:00 strg01 syslog-ng[2324]: Configuration reload request received, reloading configuration;
Nov 12 00:00:00 strg01 syslog-ng[2324]: Configuration reload request received, reloading configuration;
Nov 13 00:00:00 strg01 syslog-ng[2324]: Configuration reload request received, reloading configuration;
Nov 13 00:07:10 strg01 proftpd[96152]: 127.0.0.1 (192.168.90.40[192.168.90.40]) - ROOT FTP login successful
Nov 13 00:16:06 strg01 proftpd[96376]: 127.0.0.1 (192.168.90.50[192.168.90.50]) - ROOT FTP login successful
Nov 14 00:00:00 strg01 syslog-ng[2324]: Configuration reload request received, reloading configuration;
Nov 15 00:00:00 strg01 syslog-ng[2324]: Configuration reload request received, reloading configuration;
Nov 16 00:00:00 strg01 syslog-ng[2324]: Configuration reload request received, reloading configuration;
Nov 16 05:26:37 strg01 syslog-ng[2324]: syslog-ng starting up; version='3.6.4'
Nov 16 05:26:38 strg01 Copyright (c) 1992-2016 The FreeBSD Project.
Nov 16 05:26:38 strg01 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Nov 16 05:26:38 strg01 The Regents of the University of California. All rights reserved.
Nov 16 05:26:38 strg01 FreeBSD is a registered trademark of The FreeBSD Foundation.
Nov 16 05:26:38 strg01 FreeBSD 10.3-RELEASE #0 91d0e8e(freebsd10): Mon May 2 10:32:27 PDT 2016
...

Box 2 messages log:
Nov 9 00:00:00 strg02 syslog-ng[2631]: Configuration reload request received, reloading configuration;
Nov 10 00:00:00 strg02 syslog-ng[2631]: Configuration reload request received, reloading configuration;
Nov 11 00:00:00 strg02 syslog-ng[2631]: Configuration reload request received, reloading configuration;
Nov 11 09:21:57 strg02 alert.py: [freenasOS.Configuration:567] Unable to load https://web.ixsystems.com/updates/ix_crl.pem: <urlopen error [Errno 32] Broken pipe>
Nov 11 09:21:57 strg02 alert.py: [freenasOS.Manifest:387] Could not get CRL file https://web.ixsystems.com/updates/ix_crl.pem
Nov 11 09:22:00 strg02 alert.py: [freenasOS.Configuration:567] Unable to load https://web.ixsystems.com/updates/ix_crl.pem: <urlopen error [Errno 32] Broken pipe>
Nov 11 09:22:00 strg02 alert.py: [freenasOS.Manifest:387] Could not get CRL file https://web.ixsystems.com/updates/ix_crl.pem
Nov 11 10:22:05 strg02 alert.py: [freenasOS.Configuration:567] Unable to load https://web.ixsystems.com/updates/ix_crl.pem: <urlopen error [Errno 32] Broken pipe>
Nov 11 10:22:05 strg02 alert.py: [freenasOS.Manifest:387] Could not get CRL file https://web.ixsystems.com/updates/ix_crl.pem
Nov 11 10:22:08 strg02 alert.py: [freenasOS.Configuration:567] Unable to load https://web.ixsystems.com/updates/ix_crl.pem: <urlopen error [Errno 32] Broken pipe>
Nov 11 10:22:08 strg02 alert.py: [freenasOS.Manifest:387] Could not get CRL file https://web.ixsystems.com/updates/ix_crl.pem
Nov 12 00:00:00 strg02 syslog-ng[2631]: Configuration reload request received, reloading configuration;
Nov 13 00:00:00 strg02 syslog-ng[2631]: Configuration reload request received, reloading configuration;
Nov 14 00:00:00 strg02 syslog-ng[2631]: Configuration reload request received, reloading configuration;
Nov 15 00:00:00 strg02 syslog-ng[2631]: Configuration reload request received, reloading configuration;
Nov 16 00:00:00 strg02 syslog-ng[2631]: Configuration reload request received, reloading configuration;
Nov 16 05:29:20 strg02 syslog-ng[2631]: syslog-ng starting up; version='3.5.6'
Nov 16 05:29:20 strg02 Copyright (c) 1992-2014 The FreeBSD Project.
Nov 16 05:29:20 strg02 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Nov 16 05:29:20 strg02 The Regents of the University of California. All rights reserved.
Nov 16 05:29:20 strg02 FreeBSD is a registered trademark of The FreeBSD Foundation.
Nov 16 05:29:20 strg02 FreeBSD 9.3-RELEASE-p13 #0 r281084+3df1120: Thu Jun 4 01:00:51 PDT 2015
...

Thoughts?
 

Steven Sedory

Explorer
Joined
Apr 7, 2014
Messages
96
Also, wouldn't this imply that there was no proper reboot, but either a crash, or a power interruption, or reset button press?

[root@strg01] ~# last reboot | less
boot time Wed Nov 16 05:26
boot time Thu Jun 9 04:20
shutdown time Thu Jun 9 02:34
boot time Thu May 12 09:31
shutdown time Thu May 12 09:26
boot time Fri May 6 14:26
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
It kind of looks like power was lost at midnight on both systems. Then powered back up at 5:26 and 5:29 manually.

Sent from my Nexus 5X using Tapatalk
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Do you have NUT (UPS service) configured?

These are the options/categories that I see which could cause a reboot.
1. personnel related - accidental or deliberate (malicious)
2. system crash
3. power related - flicker, NUT auto shutdown
 

Steven Sedory

Explorer
Joined
Apr 7, 2014
Messages
96
It kind of looks like power was lost at midnight on both systems. Then powered back up at 5:26 and 5:29 manually.

Sent from my Nexus 5X using Tapatalk

I don't think the midnight part is accurate, as you'll notice the midnight entry in the log the days prior as well.
 

Steven Sedory

Explorer
Joined
Apr 7, 2014
Messages
96
Do you have NUT (UPS service) configured?

These are the options/categories that I see which could cause a reboot.
1. personnel related - accidental or deliberate (malicious)
2. system crash
3. power related - flicker, NUT auto shutdown

I reached out to our data center peeps. They say no one was in there at the time.

I doubt it was a crash, as they both crashed at the same time, and they are mostly unrelated servers.

UPS service is off on both.

Now both server are supermicro, most everything else in our racks are dell. I wonder if there was some kind of flicker that caused only these ones to reboot. What do you think?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Really looks like a power issue. Could've been long enough to cause the SM PSU to deplete their capacitor reserves.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Almost assuredly a power issue, especially as they both came back up of their own volition. Usually, the BIOS has the board set to "return to last state" after a power outage, which means your boxes would have powered up after power was restored.

The "config reload" stuff at midnight is normal. All of our boxes do that.

Supermicro boards (another reason to use a UPS), I believe, monitor their bus voltages, and turn themselves off if a dangerous condition appears.

I will officially bet one reproductive gland that a power line issue (quality, brownout, something) caused this.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Does the IPMI event log have anything interesting in it?
 

Steven Sedory

Explorer
Joined
Apr 7, 2014
Messages
96
Does the IPMI event log have anything interesting in it?

Woah...these are pretty old servers, so I forgot they even had it. Was able to get on one of them...says:

"SEL record 02 11/16/2016 05:04:22 watchdog Hard reset Assertion Event" for the last event

Did a quick search to see what the definition of Hard rest is in context, but hard to find an answer.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Watchdog handles hardware lock ups. Strange that two systems would have the same thing but not impossible. I think I might look at burning in the systems again or if they are all old maybe it's replacement time.

Sent from my Nexus 5X using Tapatalk
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Power glitch could cause a lockup
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I'm not. If the power rails dip to the CPU it will often hang.

I'd expect the BMC to be much more tolerant and thus pick it up under the watchdog.

You can test this out by playing chicken with the power switch ;)

Get it 'right' and you'll hang.
 
Status
Not open for further replies.
Top