FreeNAS with a big dose of headache - what could be wrong

mrpetey

Cadet
Joined
Oct 19, 2020
Messages
3
Hello! I have searched, researched, tried, tinkered, and experimented, but I cannot !!!!! get my freeNAS server to stay up for much over a few hours let alone it deal with the transferring of files (smb either/or afp). I am looking for suggestions or another direction to check/look at. Below is a quick and dirty of what I have.

My server
SuperMicro X9SCL-F E3-1230 Server motherboard - in a Fractal node 804 case.
On-board video - connected to VGA monitor to login with.
16GB ECC RAM
Onboard SATA
1 x SSD 119GB PNY - Boot disk
* 1 x HGST 7.2k 3TB (get moved)
* 2 x Seagate IronWolf 5.9k 3TB (get moved)
HP H200 HBA SAS-SATA card - IT Mode
4 x HGST 7.2k
2 x Seagate IronWolf 5.9k 3TB
Gigabyte 650 Bronze PS
FreeNAS 11.3- u5

Client and server are connected via 1Gb ethernet on an un-managed NetGear switch. Client Mac Pro5,1 Mac OS X 10.12.

Short version:
During a transfer of file(s) the server will hang and need to be rebooted. Transfer never completes.
SMB/AFP both running, different datasets- hangs on smbd
SMB alone - usually hangs on smbd, sometimes nansl
AFP alone - hangs on afpd
Longest uptime with it doing nothing is a little over 3.5 hours.

Tried:
Power supply was upgraded to the 650.
Memtest86 - passed
3 fans were added brought CPU temps down
Drives * removed from motherboard SATA moved to the second port on the HBA card.
SMB tweaks for Mac OS, fruit setting, max threshold upped.
Download u5, fresh installation.

How best to log to troubleshoot this issue?

I have been working at this for a while and not about to give up. It worked fine on my test rig, but I am stumped as to what I might be missing.

My freenas install is for home. It is in a Fractal case, a total of six fans, with a couple of high-static pressure thrown in. There are 6 - 3TB drives in one pool using raidz-2. The first time I installed this box I used FreeNAS-11.3-u4 and it didn't work then either; the motherboard just powers off. At first I thought it could be a power issue. When I sat down and refined my math I was pushing the 450 a bit to the limit. Ok, a 650 it will be. No, this didn't help either. The motherboard would just die as if it were powered off.

A little digging and I see that the CPU temps are running high and when the hit 60 the system goes down. Ok, let cool down the board a bit. In my reading I did see that the case could be a little light in cooling. I ordered and installed 3 more fans. The motherboard side has 4, drives 2. Well, hey, the temps are down and it seems to be a bit more stable. Lets copy something. Now it just hangs, so it doesn't turn off just hangs. The system is there, but no response. I can't get to it through the GUI or the local monitor either. I keep top running on the local monitor to try and catch what is hanging - smbd, python 3,7, afpd, nmfd. Temps are fine and running mid 40's with spikes. The logs I have looked through don't show any obvious signs of something crashing or throwing errors.

The drives were split between the motherboard SATA and the card, 3 each; now all 6 of the drives are moved to the HBA card. Same problems.

Fresh download of FreeNAS-11.3.u5, fresh dd to a flash drive, boot up, and a fresh installation.
Create one user
Create one pool
Create one dataset
Create one APF share

Login, start to copy and it hangs on afpd.

And now I am here. Ideas please.

Motherboard? CPU?

This is the light version of the issue. For the complete log/saga I have all of my notes posted on my site https://www.wrightmac.net/archives/1500 . They are written for myself so far. If I get this up and running I may clean them up a bit. For now I need to get this up and running.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Anything in IPMI event logs?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Welcome. Sorry to hear you're having trouble.

As suggested by @Redcoat a crash/poweroff event is usually related to hardware failures. I would suggest downloading a copy of the memtest86 ISO and burning it to USB/DVD. Run through several rounds of memory testing.

Silly question: this system was built from scratch. Did you confirm that the thermal contact between CPU and heatsink is good? Ensure no bubbles or dust got in the thermal interface material (and make sure all the plastic is off of the base, if the heatsink comes with it preapplied)

I'd also be curious to see if a simple DD test or local copy causes the lockup as well.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
A couple more thoughts - disable watchdog in BIOS and at Jumper JWD (open) to see if behavior changes. My X9SCM-F's need this for stability on FreeNAS.
 

mrpetey

Cadet
Joined
Oct 19, 2020
Messages
3
Wow, thank you everyone. I’m at work currently but will be home in a couple of hours. I will collect the info requested and post back ASAP.
 

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319
I put my bet on CPU overheat and either insufficient CPU cooler or bad thermal paste installation.
 

mrpetey

Cadet
Joined
Oct 19, 2020
Messages
3
Ok, back. It is hard to believe work is my quiet time. Thank you to everyone who responded, it is very much appreciated.

I need to fetch the photos, but the IPMI event log doesn't show any show stoppers. It complains about a fan spinning slow and it flagged yellow. I spent most of my up time last night reviving my RPi syslog server. First thing this morning I tackled it again having my cup of coffee.

The big news, just before I came in today, I took the advice from RedCoat and removed the jumper from JWD, the watch dog timer, and left the jumper open.
Shazam!
With the one user, with the one AFP share, I was able to transfer a 101.4GB directory from the Mac to the FreeNAS. I had to come into work after that, but the box was left up and running. If it is up when I get home I am getting a little more confident and settle down to real work.

And to jgreco, to date I have been most concerned about is keeping the server up. ;-) Thanks for the link. I am going through the thread and now that the server is working I will set out a new plan.


Thanks again!!
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Well, I hope that turns out to be good in the end. You do need to disable in the BIOS, too (if indeed it's enabled), AFAIK.

I have 2 X9SCM-F FreeNAS boxes, ostensibly identical with exception of one having 4 HDD, one having 6: one spontaneously rebooted occasionally (maybe once per month), one maybe once per week immediately after I built it causing me to investigate everything during an ~1-month burn-in (including changing out the mobo and the CPU), finally stabilized as best I can tell by this watchdog move as no problems since on either box over the last 10 months.
 
Top