Crashing every ~10-20 minutes

Status
Not open for further replies.

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
I honestly I have no idea how to troubleshoot this. I've had FreeNAS for years now, since 9.2. I've had occasional issues, but not like this. Since yesterday it started crashing every 10-20 minutes or so. Yesterday I spent at school, so I didn't make any configuration changes or hardware changes. Over the past couple weeks I've made hardware changes. I've swapped some 3tb drives for 6tb and replaces some old 250gb with 3tb, I've also doubled the RAM from 16 to 32GB (using the same make/model as the previous two sticks). Apparently logs don't survive a reboot and when I hooked up a monitor to it, no error messages were displayed to the screen. It just appears to hang being completely inaccessible, but still on and displaying whatever it last displayed. I reverted back to a boot prior to upgrading to U4, but even that upgrade was done on the 16th (not recent). Still have the same issue. Just while typing this it has gone down again. I'm at a total loss and don't even know where to start, any help would be appreciated. Maybe just reinstall? I had enough time between crashes to download my config.
 
D

dlavigne

Guest
What is the build version (from System -> Information)?

Anything related in /var/log/messages around the time of the crash?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Sorry about the problem you are having. To be able to help you, it would help us to know everything you can tell us about the hardware you are using. Details matter.
 

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
From my understanding /var/log doesn't survive a reboot. So I don't know how I could tell you what is happening before it crashes. I don't know the exact build version, but I know it was last upgraded to U4 on the 16th of this month.

For hardware I have:
ASUS M5A97 LE R2.0 AM3+
AMD FX8350
4x 8GB DDR3 1600MHZ Unbuffered UDIMM ECC
IBM Megaraid controller flashed to IT firmware
4x 3TB WD Reds + 2x 6TB WD Reds (in the process of upgrading) in a RAIDZ2
2x 3TB WD Reds in a mirror
Asus Geforce 210 for the rare times I need a display
2x 16GB SanDisk Cruzer Fit for boot (these were upgraded from 8GB sticks within the passed year I think)

I've been running this hardware for years now. 2 of the RAM sticks are new and the hard drives have been swapped, but nothing out of the normal.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I've been running this hardware for years now. 2 of the RAM sticks are new and the hard drives have been swapped, but nothing out of the normal.
There are a lot of things that could cause spontaneous reboots. If you have not changed any software settings, it is likely a hardware fault of some kind.
Did you run a memory test on the new memory before you installed it? I would suggest re-seating any connectors that may have been affected when you upgraded the memory as well as re-seating the memory.
It is also possible that a component failure could be causing the reboots. What kind of power supply are you using and how old it is?
 

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
It isn't rebooting though. The hardware is up, but hung (can't access anything, console is still displayed on monitor but unresponsive). No I didn't perform a memtest, I could do that, but that RAM has been in place since the 16th as well. The power supply I did replace March last year, I forgot about that. It is a SeaSonic G Series 550-Watt.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
I've also doubled the RAM from 16 to 32GB
Have you run a full memtest on all the RAM as its installed? A bad stick (even ECC) can cause everything from slowness to kernel panics to dead hangs.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
Difficult to say without more information. RAM problems as already suspected, data disk problems (how is the swap behavior of the system?), boot disk problems. I don't think that any of these three sources can be excluded right now.
 

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
I'm running memtest right now. Took me a little while to figure out how to do it with a modern UEFI system...

As for data disk problems. What swap behavior do you mean? What should I be looking for?

For the boot disk, one of them did have issues and faulted. I dettached it from the mirror. Unless they both went bad at the same time, that would be strange. Maybe the one going down caused issues with the other? It's cheap to replace those USB drives, I might just go ahead and do that. I'll have to look for recommended drives.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
As for data disk problems. What swap behavior do you mean? What should I be looking for?

When in (FreeNAS) operation again, watch out for occurrence of swap usage (for example using Reporting -> Memory in the FreeNAS GUI or using top in a Shell). Using swap can be dangerous on a possibly faulty or non burnin tested HDD (SMART tests are configured and results are o.k.?).
https://forums.freenas.org/index.php?resources/hard-drive-burn-in-testing.92/

Edit: To avoid mistakes: You are running 11.1-U4 right now, apart from doing the memtest run?

For the boot disk, one of them did have issues and faulted. I dettached it from the mirror. Unless they both went bad at the same time, that would be strange. Maybe the one going down caused issues with the other? It's cheap to replace those USB drives, I might just go ahead and do that. I'll have to look for recommended drives.

In the first place do a System -> Boot -> Scrub Boot and a System -> Update -> Verify Install (again).
 
Last edited:

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
Well, I ran memtest and I didn't have time to look at it yesterday. Today I stopped it at 50 hours and no errors. Tells me the ram is fine. I shut it down planning to install new USB drives and a fresh freeFre install... Now it won't boot and gives no beep codes at all. Removed my GPU, RAM, and hard drives, still no post and no beep codes at all. Now I wonder if it's the just over one year old PSU or even the CPU.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Did you replace any parts before you started testing?
What kind of power supply?

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
I put in a working USB drive, not really what I would consider a different part. The PSU I already mentioned is a year old SeaSonic G Series 550-Watt.
 

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
Okay, I reseated the main motherboard power cable (which is right next to the ram slots) and it started posting again. I readded each part and checked for post each time. Now all parts are back in and it is booting. Maybe inserting the ram caused a bad connection with the power? We'll see if it stays up.

Edit: It has been an hour, fingers still crossed but it's looking better.
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
First thing you should’ve checked is the bios/system logs for ECC errors.
 

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
First thing you should’ve checked is the bios/system logs for ECC errors.
This is the point of the post. I didn't know what to check. I didn't even know the BIOS does any logging, never heard of it. At least on consumer hardware. System logs was the first thing I thought of, but I guess FreeNAS keeps logs in memory so they get lost on reboot.

I'm not totally confident the problem was a loose cable. If I start having issues again I'll look for these logs. Thanks
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
System logs was the first thing I thought of, but I guess FreeNAS keeps logs in memory so they get lost on reboot.

While it is certainly possible that the system logs in /var/log/ don't contain something useful after a spontaneous reboot / crash (maybe because a message was not yet written before the reboot / crash or maybe because there is an hardware error involved that can't be logged at all) the logs contained in this folder generally do survive a reboot, of course.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I didn't even know the BIOS does any logging, never heard of it. At least on consumer hardware.
That is one of the reasons that Supermicro server boards are so highly recomended on this site. They do logging and you don't even need to go into the BIOS to pull the log. The IPMI web interface allows remote management of almost everything including viewing the log like this:

upload_2018-4-27_5-47-59.png


FreeNAS® Quick Hardware Guide
https://forums.freenas.org/index.php?resources/freenas®-quick-hardware-guide.7/

Hardware Recommendations Guide Rev 1e) 2017-05-06
https://forums.freenas.org/index.php?resources/hardware-recommendations-guide.12/
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Well, as I expected it went down again sometime last night. Guess I'll look for those ECC logs.
I don't hold out much hope for this, but if you have another power supply, you might change that, just to test if it is the source of any problems. I am leaning toward it being a bad system board.
What kind of temperatures are you getting?
 
Status
Not open for further replies.
Top