random crashes, even after complete system (hardware) swap

s0mm3r

Dabbler
Joined
Jul 25, 2018
Messages
31
hi, hopefully someone can help me here

my FN Box was built off of really old Hardware (Q6600, and a MoBo which caused troubles since the beginning)
and it experienced random hangs and reboots, ping would work, but everything else completely unresponsive.
when connecting through ssh it would take minutes until the shell would be displayed and it would reboot shortly after.

since the HW was old and known to cause issues, I've finally replaced all the hardware (except the drives) with a Xeon and an LSI HBA flashed to IT mode (details in signature) hoping that this would eradicate the issues
EDIT: it's the first use of this HBA, it was not part of the old system

I've reinstalled FN 11.1-U7 to a new drive and imported the config
everything seemed to be fine, until yesterday the first crash appeared during heavy disk I/O (I suspect)
SABnzbd was downloading a series with a total of 95GB of files in single episode nzbs when it suddenly crashed (this causes a lot of r/w since the files will be written and unrar'd while files are written)

after reboot, it showed one disk has one offline uncorrectable sector, but that's the first time and I'd guess that should not cause the whole system to reboot, that disk is not even a member of SAB's DL destination pool

I've enabled persistent syslogging after the first few crashes, attached is the log content, maybe someone could help me or point me in the direction where to investigate further

note: since yesterday one pool is over 80% usage

EDIT:
PSU:
currently 2, because the Dell board (despite using an ATX 24 pin plug) has a proprietary pin header
so a standard ATX is powering the drives and a dell psu the motherboard

I'm planing to use the 825W Dell PSU (which came with the T5610 Workstation) with a 24 pin ATX extender cable, if I can fit it in the case
otherwise I will use a standard ATX and modify the extender cable to match the dell pin header

thank you
regards
 

Attachments

  • syslog.rar
    146.8 KB · Views: 408
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
note: since yesterday the media pool is over 80% usage
That's likely to be some of the problem... either now or later. ZFS wants 20% free space at minimum, at all times.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
That's likely to be some of the problem... either now or later. ZFS wants 20% free space at minimum, at all times.
The closer the pool gets to full, the slower write performance gets. It is a good idea to keep it under 80, but it should not cause a crash, just slow performance.
I've finally replaced all the hardware (except the drives)
What about the boot drive? Your hardware description includes most things, but no mention of boot media.
I've reinstalled FN 11.1-U7 to a new drive and imported the config
What kind of drive? How is it connected to the system?
(this causes a lot of r/w since the files will be written and unrar'd while files are written)
I see you are using a LSI 9217-8i HBA in IT mode. How is the airflow on that heatsync? I have seen SAS controllers overheat if the airflow wasn't good.
 

s0mm3r

Dabbler
Joined
Jul 25, 2018
Messages
31
That's likely to be some of the problem... either now or later. ZFS wants 20% free space at minimum, at all times.

Yeah, but actually no
just ... no

pool is currently at 81% and will be back at 75% as soon as some snaps are deleted

the pool size was under 80% when the first crashes appeared (even at the new hardware)
according to multiple sources, ZFS performance can be limited (like r/w speed, or resilvering) but it won't cause the system to crash
https://docs.oracle.com/cd/E23823_01/html/819-5461/zfspools-4.html

If you can proof me that this causes the crashes, then I'm switching to windows file server tomorrow
 

s0mm3r

Dabbler
Joined
Jul 25, 2018
Messages
31
What about the boot drive? Your hardware description includes most things, but no mention of boot media.

What kind of drive? How is it connected to the system?
the boot drives in the old system where 2x HGST 500GB 2,5" drives, I could not fit them to the new system yet, so I've installed FN to a Samsung 250GB 3,5" I had laying around which had few hours of usage and no errors shown with crystaldiskinfo
It is connected to the first onboard sata port, cause I couldn't boot off of it from the HBA
I see you are using a LSI 9217-8i HBA in IT mode. How is the airflow on that heatsync? I have seen SAS controllers overheat if the airflow wasn't good.
I'd say it's good, the CPU Fan is blowing over the HBA
I don't have a better pic atm
I'm waiting for a SAS expander, cause some drives are connected to the onboard sata controller
so cable management is still a todo ;)
AJhghCQ.png
 
Last edited:

s0mm3r

Dabbler
Joined
Jul 25, 2018
Messages
31
I see you are using a LSI 9217-8i HBA in IT mode.
I have to mention that the HBA is new, it was not used in the previous build (where the first crashes appeared)
that build had a bunch of different SATA PCIe cards

all of the hardware of this build is "new"
except the drives, nothing was used in my old FreeNAS box
 
Last edited:

s0mm3r

Dabbler
Joined
Jul 25, 2018
Messages
31
I've uploaded some screenshots of the reporting tab right before the crash
https://imgur.com/a/qsWKOJM

There is an increase in CPU load, Disk I/O and ARC requests about an hour before the crash (2:58 am)

but this is just the Plex jail doing "maintenance" whatever this means

I didn't change the SMART settings
smart.PNG

does not really explain anything
btw. I can confirm that there was no power outage
 
Last edited:

s0mm3r

Dabbler
Joined
Jul 25, 2018
Messages
31
idk why but it didn't crash in the past 2 weeks
zfs pool usage is now at 84% no problems at all
running perfectly stable

no idea what caused the crashes
 
Top