FreeNAS reboots on stress...

Status
Not open for further replies.
Joined
Jun 24, 2017
Messages
338
Hey guys... not really sure where to even begin looking into this...

I have an Hp SFF 8300 i7 W/ 12GB RAM and 4 D Black 5TB drives... FreeNAS loads from an 8GB thumb drive (Sandisk).

A couple days ago this began...

Whenever there's a large amount of traffic to or from the FreeNAS, it will reboot. It doesn't give any errors in the little Red Light/Green Light indicator... it just drops off, and comes back up after a minute or two. Usually, this is when SabNZB is reading or writing to it, usually under conditions where it's verifying or unpacking... JUST playing a video on Kodi from FreeNAS doesn't usually cause it to reboot, but on occasion, it is stressful enough to cause FreeNAS to do the random reboot...

Are there any log files I can look at to track down what might be causing this? Ive tried cloning the USB stick, but I get an error about the boot file being corrupted (I think the sizes are off ever so slightly... I think the second boot stick I'm trying to use is like 1MB smaller or something...

My next step is to make a clean USB and try that to see if the USB might be a problem (I have had one die on me before... but it just stopped working, it didn't do this reboot thing)...

Any help/advice would be appreciated.!!
 

amiskell

Patron
Joined
Jun 25, 2015
Messages
266
Hey guys... not really sure where to even begin looking into this...

I have an Hp SFF 8300 i7 W/ 12GB RAM and 4 D Black 5TB drives... FreeNAS loads from an 8GB thumb drive (Sandisk).

A couple days ago this began...

Whenever there's a large amount of traffic to or from the FreeNAS, it will reboot. It doesn't give any errors in the little Red Light/Green Light indicator... it just drops off, and comes back up after a minute or two. Usually, this is when SabNZB is reading or writing to it, usually under conditions where it's verifying or unpacking... JUST playing a video on Kodi from FreeNAS doesn't usually cause it to reboot, but on occasion, it is stressful enough to cause FreeNAS to do the random reboot...

Are there any log files I can look at to track down what might be causing this? Ive tried cloning the USB stick, but I get an error about the boot file being corrupted (I think the sizes are off ever so slightly... I think the second boot stick I'm trying to use is like 1MB smaller or something...

My next step is to make a clean USB and try that to see if the USB might be a problem (I have had one die on me before... but it just stopped working, it didn't do this reboot thing)...

Any help/advice would be appreciated.!!

It would help if you mentioned the version and build of FreeNAS you are running.
 

amiskell

Patron
Joined
Jun 25, 2015
Messages
266
Yeah, but that would ruin the surprise... :)

Sorry about that... FreeNAS 11.0 Release. (a2dc21583)

I'd check the console when it crashes to see if it's a kernel panic, etc. You may be able to find some information in the system logs directory (typically /var/log). If it's a kernel panic, open a bug with FreeNAS and provide the kernel dump that was created when the system crashes (I believe there's instructions on the wiki on how to provide a dump with debugging symbols attached).

I had to do it several times for crashing/kernel dump issues with FreeNAS 10.
 
Joined
Jun 24, 2017
Messages
338
Will do and report back :)

... some things of interest in the logs... I've got 8 unreadable (pending) sectors in 2 hard drives (they are only about 6 months old... )
Jul 17 02:32:13 freenas smartd[2531]: Device: /dev/ada1, 8 Currently unreadable (pending) sectors
Jul 17 02:32:13 freenas smartd[2531]: Device: /dev/ada1, 8 Offline uncorrectable sectors
Jul 17 02:32:13 freenas smartd[2531]: Device: /dev/ada3, 8 Currently unreadable (pending) sectors
Jul 17 02:32:13 freenas smartd[2531]: Device: /dev/ada3, 8 Offline uncorrectable sectors

Though... I believe that these have been here for quite some time... IIRC, they appeared shortly after setting up FreeNAS... they've never changed, improved, shrank or grown...


and there are some calls to the CPU that are:
Jul 17 01:44:39 freenas collectd[2989]: utils_vl_lookup: The user object callback failed with status 2.
Jul 17 01:44:39 freenas collectd[2989]: aggregation plugin: Unable to read the current rate of "freenas.local/cpu-4/cpu-user".

Ill keep digging and see if I can catch a KP on screen....
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Does the system actually reboot, or does it simply lose connectivity under load?
Will do and report back :)

... some things of interest in the logs... I've got 8 unreadable (pending) sectors in 2 hard drives (they are only about 6 months old... )
Jul 17 02:32:13 freenas smartd[2531]: Device: /dev/ada1, 8 Currently unreadable (pending) sectors
Jul 17 02:32:13 freenas smartd[2531]: Device: /dev/ada1, 8 Offline uncorrectable sectors
Jul 17 02:32:13 freenas smartd[2531]: Device: /dev/ada3, 8 Currently unreadable (pending) sectors
Jul 17 02:32:13 freenas smartd[2531]: Device: /dev/ada3, 8 Offline uncorrectable sectors

Though... I believe that these have been here for quite some time... IIRC, they appeared shortly after setting up FreeNAS... they've never changed, improved, shrank or grown...
Looks like you have a couple of 'problem' disks. Are you running regularly-scheduled SMART tests on your disks? How often do you schedule scrubs of your volumes?
and there are some calls to the CPU that are:
Jul 17 01:44:39 freenas collectd[2989]: utils_vl_lookup: The user object callback failed with status 2.
Jul 17 01:44:39 freenas collectd[2989]: aggregation plugin: Unable to read the current rate of "freenas.local/cpu-4/cpu-user".

Ill keep digging and see if I can catch a KP on screen....
These are inconsequential and a fix is scheduled for release soon.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Full hardware list please.
 
Joined
Jun 24, 2017
Messages
338
Does the system actually reboot, or does it simply lose connectivity under load?
Looks like you have a couple of 'problem' disks. Are you running regularly-scheduled SMART tests on your disks? How often do you schedule scrubs of your volumes?
These are inconsequential and a fix is scheduled for release soon.


I have not run SMART tests... honestly didn't know it was possible from within FreeNAS... Same with scrubs... Is there literature on how to do these things? or if it's easy enough to describe, just point me in the right direction...

This began with nothing changing except I had attempted to run a couple virtual machines on the NAS... they didn't run well enough for me, so I scraped them and moved back to a separate media scrubber (sonarr/radarr/transmission/SabNzb/MySQL)... honestly, it was MySQL that I couldnt deal with having in a VM... So, I disabled then deleted the virtual machines... after that, got the reboots... It may be coincidence.

Additionally, I get no KP... the machine literally turns off without warning an then immediately back on... Im actually starting to think it's a MoBo issue... possibly power supply.... Regardless, Im less inclined to believe it's software, more inclined to believe it's hardware...
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Possible the CPU is overheating. You could confirm this quite rapidly by running mprime in cpu stress test mode.

you basically just download this to your root login

http://www.mersenne.org/ftp_root/gimps/p95v287.FreeBSD10-64.tar.gz

unpack it, and then run the mprime executable. Small FFTs is the most intense across all cores.

If your system reboots... your cooling is not sufficient.

Verifying/Unpacking, depending on how optimized the implementation, certainly has the capability to stress a CPU if its not cooled sufficiently, and if they heat up fast enough, they'll reboot.

Code:
# cd mprime
root@titan:/mnt/tank/server/bin/mprime # ls
license.txt			p95v287.FreeBSD10-64.tar	results.txt			whatsnew.txt
local.txt			prime.txt			stress.txt
mprime				readme.txt			undoc.txt
root@titan:/mnt/tank/server/bin/mprime # ./mprime
		 Main Menu

	 1.  Test/Primenet
	 2.  Test/Worker threads
	 3.  Test/Status
	 4.  Test/Continue
	 5.  Test/Exit
	 6.  Advanced/Test
	 7.  Advanced/Time
	 8.  Advanced/P-1
	 9.  Advanced/ECM
	10.  Advanced/Manual Communication
	11.  Advanced/Unreserve Exponent
	12.  Advanced/Quit Gimps
	13.  Options/CPU
	14.  Options/Preferences
	15.  Options/Torture Test
	16.  Options/Benchmark
	17.  Help/About
	18.  Help/About PrimeNet Server

Your choice: 15

Number of torture test threads to run (12): 
Choose a type of torture test to run.
  1 = Small FFTs (maximum heat and FPU stress, data fits in L2 cache, RAM
not tested much).
  2 = In-place large FFTs (maximum power consumption, some RAM tested).
  3 = Blend (tests some of everything, lots of RAM tested).
  11,12,13 = Allows you to fine tune the above three selections.
Blend is the default.  NOTE: if you fail the blend test, but can pass the
small FFT test then your problem is likely bad memory or a bad memory
controller.
Type of torture test to run (3): 1

Accept the answers above? (Y): Y
[Main thread Jul 18 00:08] Starting workers.
...


and within seconds the heat starts...

Code:
root@titan:~ # ./show_cpu_temps.sh
dev.cpu.11.temperature: 67.0C
dev.cpu.10.temperature: 67.0C
dev.cpu.9.temperature: 73.0C
dev.cpu.8.temperature: 73.0C
dev.cpu.7.temperature: 70.0C
dev.cpu.6.temperature: 70.0C
dev.cpu.5.temperature: 75.0C
dev.cpu.4.temperature: 75.0C
dev.cpu.3.temperature: 69.0C
dev.cpu.2.temperature: 69.0C
dev.cpu.1.temperature: 72.0C
dev.cpu.0.temperature: 72.0C


BTW,

Code:
root@titan:~ # cat ./show_cpu_temps.sh 
#! /bin/sh
sysctl -a |egrep -E "cpu\.[0-9]+\.temp"
 
Last edited:
Joined
Jun 24, 2017
Messages
338
Full hardware list please.
Do you want a breakdown of the HP SFF 8300?
otherwise, the pertinents are listed in the OP...
4x Seagate ST500s (5TB each) - I believe I said WD in Op... I just checked the machine itself, I mis-spoke
12GB Hynix Ram... 4GB,4gb, 2GB, 2GB
i7 (I can't give the exact version as the machine now won't even boot the whole way before turning off...)

Strike that - i7 2600 3.4GHz - no overclocking...

The system is pretty bare-bones. There is no additional hardware, BIOS has anything not related to storage or networking turned off...

Im definitely leaning towards bad hardware (mobo or power supply) ... it gives REALLY inconsistent reboots, they are abrupt, and can occur mid boot, or during normal use)
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Could be bad RAM too.

ie, ram goes bad sometimes :(
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I have not run SMART tests... honestly didn't know it was possible from within FreeNAS... Same with scrubs... Is there literature on how to do these things? or if it's easy enough to describe, just point me in the right direction...

This began with nothing changing except I had attempted to run a couple virtual machines on the NAS... they didn't run well enough for me, so I scraped them and moved back to a separate media scrubber (sonarr/radarr/transmission/SabNzb/MySQL)... honestly, it was MySQL that I couldnt deal with having in a VM... So, I disabled then deleted the virtual machines... after that, got the reboots... It may be coincidence.

Additionally, I get no KP... the machine literally turns off without warning an then immediately back on... Im actually starting to think it's a MoBo issue... possibly power supply.... Regardless, Im less inclined to believe it's software, more inclined to believe it's hardware...
It's 'Best Practice' to schedule regular SMART tests and volume scrubs. Documentation here:

http://doc.freenas.org/11/tasks.html#s-m-a-r-t-tests
http://doc.freenas.org/11/storage.html#scrubs

But as others have pointed out... I think you may have hardware problems.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Looks like those things come with a 240 watt power supply of likely dubious quality. That's awful small for 4 hard drives. I'm betting your power supply is choking out/overheating under load and rebooting.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
How are the drives connected to your machine? Unless I am looking at the wrong machine, it doesn't appear that you could put 4x3.5" drives inside.
 
Joined
Jun 24, 2017
Messages
338
Looks like those things come with a 240 watt power supply of likely dubious quality. That's awful small for 4 hard drives. I'm betting your power supply is choking out/overheating under load and rebooting.

Jailer wins the prize!!! Or, so it would seem. Replaced PS, mprime running consistently for 15 minutes... (before swap, crashed immediately).

@gpsguy ...oh, ye of little faith :) one below the PS, 1 in the HDD caddy, 1 in the CD Rom caddy and one directly behind the CPU fan... heat was a bit on an issue, so I keep it in the basement, on the bottom of a set of shelves with a heap filter on 1 side and an exhaust fan on the other... it's first in line to be cooled, followed by server, then networking EQ... stays a stable 68-72 degrees in there with "relatively" low humidity for the east coast...

HOWEVER, I am taking this as a learning experience, and am switching out to a tower... Precission T3500 as a base, with a significantly beefed up power supply (Corsair 650 modular that I had laying around....

Thank you guys for all the help on this...
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Jailer wins the prize!!! Or, so it would seem. Replaced PS, mprime running consistently for 15 minutes... (before swap, crashed immediately).

Yeah, that i7 is going to be pulling at least 77W under heavy load.
 
Status
Not open for further replies.
Top