How to determine cause of reboot

Status
Not open for further replies.
Joined
Dec 6, 2014
Messages
2
Hi, I'm currently running a test box and it rebooted yesterday for no specific reason. What steps should I take to diagnose this? There's nothing in /data/crash.

I had been tracking the ARC usage, and it had recently hit the max. Perhaps an overload of the ARC? I'm using autotune.

BuildFreeNAS 9.3-BETA 2014-11-12 14:34:09 GMT
PlatformIntel(R) Xeon(R) CPU X5560 @ 2.80GHz
Memory24542MB
System TimeSat Dec 06 21:48:43 EST 2014
Uptime9:48PM up 22:25, 0 users
Load Average0.09, 0.08, 0.07

Thanks
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
a) You shouldn't be using autotune
b) ARC is supposed to be full. Why else would you want it, if not to cache as much as possible? ZFS will take care of things.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Look at the logs to see if there was something recorded and pay attention to your box for future occurrences.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If there is nothing in /data/crash it almost always means there's a hardware problem. What's your system specs including part numbers. I don't see a motherboard, yet I'm sure you are using one. ;)

You also might want to update your OS. That's almost a month old, so there's been a dozen or more BETA updates since the date of your build.
 
J

jkh

Guest
a) You shouldn't be using autotune
Actually, autotune can prevent zfs from getting into some fairly pathological performance scenarios, which is why it's there in the first place. There's no reason to *avoid* autotune unless it is simply picking the wrong defaults for your workload (it's not perfect), but we certainly enable it on all the TrueNAS systems we ship because it prevents Bad Things from happening under production workloads.
 
Joined
Dec 6, 2014
Messages
2
Yes, I do have a motherboard :)

It's a Supermicro twin box, 6026TT-HTRF. The motrherboard is X8DTT-HF+

I'm not one for updating software unless there's a problem - while there is no mission critical data on the server at the moment, it does cause a bit of disruption when the box is rebooted.

The box is a bit older, so it could very well be a hardware problem. It did crash hard on me once with an ECC error, but I've replaced the ram since then. In addition, that hard crash was a hard stop- no auto reboot. So this feels different. I'm just trying to figure out where to diagnose in 'BSD as all of my general purpose servers are linux these days and I've forgotten most of my BSD tricks from years gone by (and most of them would likely be 10 years out of date by now anyway)

Interesting that one says don't use autotune, one says to do so.

Attached is a screenshot of the variables it set - any comments (especially obvious problems) most appreciated. My main concern is to max out the ARC as this system has limited # of drives available to it. As it's not yet heavily used, it took about a week to fill up the arc. This is why I noticed that it was approaching full utilization the day before it died.
autotune.png



After it rebooted, I got the following email:

freenas.xxx kernel log messages:
> SMP: AP CPU #12 Launched!
> SMP: AP CPU #10 Launched!
> SMP: AP CPU #5 Launched!
> SMP: AP CPU #15 Launched!
> SMP: AP CPU #7 Launched!
> SMP: AP CPU #13 Launched!
> SMP: AP CPU #6 Launched!
> SMP: AP CPU #14 Launched!
> SMP: AP CPU #11 Launched!
> Timecounter "TSC-low" frequency 1400076204Hz quality 1000
> uhub1: 2 ports with 2 removable, self powered
> uhub2: 2 ports with 2 removable, self powered
> uhub3: 6 ports with 6 removable, self powered
> ukbd0: <Keyboard Interface> on usbus1
> kbd2 at ukbd0
> vboxdrv: fAsync=0 offMin=0x2eb offMax=0x3301

-- End of security output --
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It depends on the philosophy you go by.

Autotune was originally intended to prevent ZFS memory runaway where ZFS would slowly but surely consume more and more RAM until there was no more. Then it would try to consume even more RAM resulting in "bad things".

It does that pretty well on TrueNAS servers. The thing is that TrueNAS servers are often storing data that keeps a business operating. Downtime means money lost and it's better to limit your RAM for ZFS to an arbitrarily lower number than 'all of your RAM' to ensure nearly 100% uptime.

On the flipside FreeNAS servers are rarely in production environments where they would be susceptible to such problems. TrueNAS servers already have lots of RAM (48GB is the smallest I've seen on any TrueNAS server), yet we rarely see 48GB of RAM in the forums.

Different user-base, different hardware-base, different intent, and different consequences of downtime.

There is nothing stopping you from setting it, but for FreeNAS users the autotune is often overly cautious to the point of hurting performance and making the server unstable. If you look at the autotune code it does something like "X GB of RAM in the system, subtract X % and make that the vfs.zfs.arc_max". It might be a raw number like 10GB (I forget which). But ultimately the % is insignificant at small amounts of RAM (like 16-32GB), but is hugely overkill if it's a raw number.

So use the right tool for the right job.

To add to this conversation, autotune sets some other values that seem to be "better" for today's hardware but aren't the default in FreeBSD. Note that the settings may not be ideal for all situations, and in some cases hurt performance. If you look at the code and read up on the parameters and do your own tests you can decide for yourself what you want to use.

There is no secret magic sauce that Autotune does that a sufficiently educated ZFS guru can't prove will help or hurt performance/stability. It's just a matter of whether it applies to you. Autotune is useful for TrueNAS servers which are often underpowered and overworked because businesses almost always buy less than they need and then try to make the hardware do what they want. At home, spending just $300-400 on RAM will give you an overpowered home system that can't possibly be "overworked" without you taking significant effort to "overwork" it. For businesses the cost is quite a bit higher. DIMMs that are >8GB in density can get exponentially more expensive.

As for your exact case I can provide zero evidence as to whether any of those are good or bad. It's very much dependent on your server, it's workload, it's strengths and weaknesses, etc.

If it's any consolation I don't enable autotune and I don't generally recommend others use it either.
 
Status
Not open for further replies.
Top