New Freenas system with DELL R720xd crashes every one or two days

rafima

Dabbler
Joined
Mar 19, 2020
Messages
10
Hi, I finally switched from a windows file server, which worked very smooth to a freenas system. I am not a totally noob, but rather unexperienced with freenas. I build up a system with DELL R720xd and changed the controller to a Dell PERC H710 mini mono LSI P20 it mode.

Everything works fine for a while. After about 3 days of running and files copying onto the Dell maschine, it was not available on the webgui, nor on the machine itself. After rebooting, it told me, that the zfs pool failed.

today, one day later, the zfs pool crashed again, web gui still works. but the pool was gone. Shutdown didn't work. Again, after restarting, everything was fine and worked fine. Now, after rebooting, suddenly there is a iocage dataset. No Idea where it come from.

I already did smart tests, but every harddisk (6 Seagate Exo 16TB in raidz-2) seems to be fine and no errors are recorded.

have now the following alerts:
INFO
Scrub of pool 'Storage' finished.
Mon, 16 Mar 2020 09:59:17 AM (America/Los_Angeles)
Dismiss
WARNING
freenas.local had an unscheduled system reboot. The operating system successfully came back online at Wed Mar 18 05:21:18 2020.
Wed, 18 Mar 2020 05:21:18 AM (America/Los_Angeles)
Dismiss
WARNING
freenas.local had an unscheduled system reboot. The operating system successfully came back online at Thu Mar 19 15:11:32 2020.
Thu, 19 Mar 2020 03:11:32 PM (America/Los_Angeles)
Dismiss


command zpool status shows no errors?

any ideas? I have no clue, what could be wrong or cause the fault. plz help!
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Can open the shell and give output of "zpool status" and check /var/log/messages for any errors? It could be the boot pool as well...
 

rafima

Dabbler
Joined
Mar 19, 2020
Messages
10
Thanks for your fast reply. when i type /var/log/messages, permission denied.

here a screencap from zpool status:
zpoolstatus20200319.jpg
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Sorry, I should have been more specific, "zpool status", since we want to see freenas-boot pool as well. And also "cat /var/log/messages | more" and look for any obvious errors.
 

rafima

Dabbler
Joined
Mar 19, 2020
Messages
10
Hi thanks! Attached the log as screenshots. can i export it into a textfile?
at boot it seems that da6 has some problems!? Could it be that the HBA controller could cause the faults? because its higly unusual, if every disk is bad.

Untitled-1_0030_Layer 2.jpg

Untitled-1_0029_Layer 3.jpg

a selection of log screen shots is in the zip
 

Attachments

  • Untitled-1_0020_Layer 12.zip
    2.3 MB · Views: 231

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
I took a look, and the logs you sent show a ton of SCSI / CAM errors across multiple devices. That to me indicates some sort of controller or other hardware failure, not the disks themselves.
 
Last edited:

rafima

Dabbler
Joined
Mar 19, 2020
Messages
10
thanks. so i dis- and reconnect all SAS cabels, and exchanged the controller, had one in spare. Lest see, how it behaves now. Can I do any tests?
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
thanks. so i dis- and reconnect all SAS cabels, and exchanged the controller, had one in spare. Lest see, how it behaves now. Can I do any tests?

Throw some load on it, monitor it over next couple days. That's really only way to tell at this point if it's stable in long run. If it still craps out, you may want to check for thermal issues as well.
 

rafima

Dabbler
Joined
Mar 19, 2020
Messages
10
okay, thanks again. After installing the new controller and rebooting, after seconds following showed up :(
 

Attachments

  • Freenasboot20200320.jpg
    Freenasboot20200320.jpg
    361.3 KB · Views: 250

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Ram, Motherboard, could also be on the table. Again, check thermals though. I did have a similar issue with an overheating HBA at one point...
 

rafima

Dabbler
Joined
Mar 19, 2020
Messages
10
Now I pulled out every RAM except SlotA1 and B1 (they are needed) now i have only 64 GB of RAM instead of 768. At the moment no errors? Lets see what wil happen over night. I guess there is no upper limit of RAM for freenas, is it?
 
Top