FreeNAS powered off - need to find cause

Status
Not open for further replies.

bradley4681

Cadet
Joined
Apr 3, 2012
Messages
5
I have a 8.0.4-release-x64 box running here in my office. Nothing to critical but people complain when its not up. For the most part i've had no problems with it at all. Today I found that nothing was accessible and when I went to look the box was off, no lights, nothing. I hit the power button and it came right back up without any issues. The VM's running from the iSCSI share even picked up where they left off.

I'd really like to find out what happen. No one was in the building last night, it's on a huge UPS and all the other devices would have been hosed if the power went out for more then 3 hours last night. There is no way anyone would have shut it down. It had been running for weeks and the hardware all seems good. There is no automatic shutdown configured and no recent updates. It's the original install and no updates since it was put together.

Not knowing why it was powered off is troubling because without knowing something I can't prevent it from happening again. The UPS logs show no power outage.

Help?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
If WOL works, it would be nice if you could just schedule another machine on the network to make sure it's awake each morning at say 6 a.m. with a WOL magic packet send. Sure, it doesn't figure out why it's shutting down, but if it's only sleep and not really "off", that might be a workaround in the meantime.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Good idea Stephens. Buy concern(and from what it sounds like the OPs concern) is that we don't need to wait for data corruption because the system didn't shutdown cleanly. I was more concerned with that than anything else. Having a server shutdown once a month or less isn't a big deal for my use.. walk over and hit the power button. But if I'm going to bootup and find data is missing/corrupted because the server just turned off, that's a big problem.
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
I've not had to deal with this, but are the logs that are cleared on boot, kept on the USB stick? I assume so - in which case pulling the stick and checking it out in some other OS that understands UFS would be feasible, right?

Of course, thinking to do that before hitting the power button and losing the logs is unlikely...

Or are the logs in a RAMfs? Anyway, I really should set up a syslog server.

For the OP, Bradley: If the hardware is a few years old it could easily be bad capacitors on the mobo or PSU drying out and triggering a power down. Worth an eyeball if you have physical access easily - the caps will often be bulging/leaking if so (google "bad caps" for examples).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I believe the logs are saved to the ramdisk and not the USB itself and on shutdown they are not saved to the USB stick. At least, that was my understanding of it and the reason why a syslog is required if you want to know what happened.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You don't want to hammer your USB device with log writes. A typical USB flash has a very limited number of write cycles and probably doesn't have wear leveling. As a result, FreeNAS either only updates stuff like configuration changes, and occasionally caches stuff like the rrd data (which happens more often than I care for).

However, you could set up logging to your pool, assuming that hardware troubles with your storage aren't the problem, or just force it to log to your USB key anyways, by adding a file like "/data/messages.log" and then modifying /etc/syslog.conf to contain "*.err;kern.warning;auth.notice;mail.crit<tab>/data/messages.log" (in addition to existing contents) and then sending syslogd a HUP.

The win for network syslogging is that if the system is in a state of crashing, the I/O subsystem is sometimes the culprit and is also somewhat more complicated than the network stack, considering caching, filesystem abstractions, and other stuff that can interfere with the system being able to actually write and flush a log message to disk, but with the network stack, basically a userland process (syslogd) merely has to be running and the network functional, and the message quickly gets packed off and sent in something very close to real time. Better yet is serial console, but that assumes a serial port and a second machine available to act as the console.
 

bradley4681

Cadet
Joined
Apr 3, 2012
Messages
5
Hey everyone! Just realized my reply alerts were going to my spam folder. I came to check this post out this morning because I had another failure this morning about 6am.

WOL wouldn't work because while the machine looks like it's on, I can't ping it, ssh, or access the webgui. I have to hard reset it and turn it back on. Luckily I picked up an IP Power switch a few months ago for another issue. I can remotely power cycle it, so at least I don't have to go into the office.

I'm in the process of building a new 8.3 machine on new hardware but I wanted to use this one as a backup. I'll work on setting up a syslog server and post logs after the next issue if there is one. I'm just lucky that there hasn't been any corruption issues and the VMs pickup right where they left off. I have SSD caching enabled on the hosts so that may be what is saving me.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, wait, are you saying that it "looks like it's on" (msg #8) or "off, no lights, nothing" (msg #1)? These are entirely different things.
 

bradley4681

Cadet
Joined
Apr 3, 2012
Messages
5
I can't tell on todays occurrence but the first time it happen I was in the office and the screen saver was still bouncing around on the screen but I couldn't do anything, couldn't get it to respond at the console, couldn't ping, couldn't access the webgui.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, now, that's very different than powering off. Powering off essentially means that either the hardware shut down on its own, or the OS called for a hardware shutdown, or possibly the OS crashed in the right way to make the hardware shut down.

What you're describing is a lockup. That's very different from a debugging perspective. It puts us more in the realm of likely-to-be-software.

Please describe your FreeNAS system. In particular, are you using ZFS? How much memory do you have? How many disks? How busy is the system normally? What controllers and other hardware? That kind of info can lead to rapid conclusion, because if you told us you were using ZFS with a lot of storage and only had 2GB of RAM, we'd all say "you need more RAM."
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
So what you are saying is it was not powered down for either occurrence because an active screensaver means something was running.

Just a few questions: How long has this box been running before you started seeing it lockup? a few months, weeks, just days?
If it had been running for a week or longer without rebooting or cycling power, I'd start looking at a hardware issue. Look for failed fans, run a RAM and CPU stress test, replace the boot flash drive.
 
Status
Not open for further replies.
Top