FreeNAS-11.1-U5 Random reboots

Status
Not open for further replies.

woyteck

Dabbler
Joined
Jul 10, 2018
Messages
13
Hi,

I have two FreeNAS servers, both running FreeNAS-11.1-U5.
Two different Supermicro servers, in different racks (on different phases) and also each server has a redundant power supply.

We have recently (two weeks ago) ramped up usage of these machines as part of our production setup and I think this is when problems started.

Servers randomly reboot within 48h, however overall load is no more than about 20% on the server.

Initially it was just one, so I thought it could be a hardware issue, but now both of them are doing it.
Please note that other machines in each rack have uptime of hundreds of days, while these just go crashing.

On both servers I now see this error multiple times:

nfsrv_cache_session: no session

Any ideas?
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
It's the HBA your using. Its not compatible with that particular supermicro board.
 

woyteck

Dabbler
Joined
Jul 10, 2018
Messages
13
It's the HBA your using. Its not compatible with that particular supermicro board.
Can you elaborate on this? How come it's not compatible? It's been working.

Also - what I noticed - both servers had reported that it was a watchdog expiration.:

# ipmitool chassis restart_cause
System restart cause: watchdog expired
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It's the HBA your using.
How do you even know what HBA it is?

Also - what I noticed - both servers had reported that it was a watchdog expiration.:
Could be the cause or it could just be the watchdog doing its job. You can try disabling the watchdogs and seeing if the system locks up.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
How do you even know what HBA it is?
Exactly. I didn't see any hardware information at all.
Two different Supermicro servers
We need to look at your hardware and software details to see if we can find a configuration issue and it would probably help to know how you are using the systems.
 

woyteck

Dabbler
Joined
Jul 10, 2018
Messages
13
We need to look at your hardware and software details to see if we can find a configuration issue and it would probably help to know how you are using the systems.

Do you have some standard output/commands that you request from people? Exactly what do you need?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
For the hardware, no, any output would be far more difficult for a human to parse than a simple list of the hardware.

As for software, a tail of the logs would be nice.
 

woyteck

Dabbler
Joined
Jul 10, 2018
Messages
13
Nas1

Hardware:
Manufacturer: Supermicro
Product Name: X10DRi
Processors: 2x Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
RAM: 4x32GB RDIMM ECC 2400MHz
SAS: (LSI) SAS3004
Chassis: https://www.supermicro.com/products/chassis/3U/836/SC836BE2C-R1K03B

Disks:
2x Intel ssd 128GB (system),
Pool 16x Seagate 10TB,
L2ARC 1x Intel s4500 nvme INTEL SSDPE2KX010T7



Nas2
Hardware:
Manufacturer: Supermicro
Product Name: X11DPi-N(T)
Processors: 2x Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
RAM: 4x32GB RDIMM ECC 2400MHz
SAS: (LSI) SAS3004
Chassis: https://www.supermicro.com/products/chassis/4U/847/SC847BE1C-R1K28LPB

Disks:
2x 1TB Hitachi disks (system)
1x 200GB Lenovo SSD - ZIL log
12x Hitachi 10TB disks (pool)
1x INTEL SSDPEDMX020T7 (cache)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112

woyteck

Dabbler
Joined
Jul 10, 2018
Messages
13
I believe I have the watchdog disabled in BIOS. I'm currently testing the ipmi watchdog disabled via ipmitool.
Unfortunately I need to schedule maintenance for anything else and hardware is in the colo.

Hardware has been bought from vendor who put it together, so I'm hoping they used qualified components.
 

woyteck

Dabbler
Joined
Jul 10, 2018
Messages
13
So, this has been an interesting development.
One of our NFS clients kept having "Too many levels of symlinks" issue. During that period we've had the multiple reboot of FreeNAS boxes.
After several tests I've decided to first disable, and then reinstall that client.
Since then no reboots. Uptimes going up nicely, on both FreeNAS servers.

Very odd indeed.
 
Status
Not open for further replies.
Top