Wierd issue with my new build - freezing/unfreezing

Status
Not open for further replies.

Charles Frank

Dabbler
Joined
Dec 24, 2014
Messages
13
Hi folks,

Well, following advice on the forums about whether to go QNAP or FreeNAS, I bit the bullet and ordered components for a FreeNAS server - my first. The build went OK, and after a few reinstalls as I got to play with the software I felt comfortable enough to start copying my data across to get more of a feel for a 'proper FreeNAS setup' in a testing environment.

Now, up to this point everything SEEMED to be OK. But after several hours of copying data across (at the rate of 20-30Mb/s off my old Netgear ReadyNAS Duo...groan) I ran into a problem. Up to this point everything had been going fine - datasets were created, permissions set, CIFS working fine with drive shares mapped on my main computer - but once I had a couple terabytes of data on the new NAS things went rapidly downhill. There were times where the web console responded very sluggishly and it would take a reboot to sort it - but the main issue was that the NAS itself seemed to be having issues with the RAID array. Opening up My Computer could leave me with a blank, unresponsive window for 10-20 seconds before the drive icons (local and mapped) would appear. Double-clicking on one of the mapped network icons could work instantly but could also take several seconds before showing me the contents of the drive (the same issue with directories too). If I play media, like a movie, sometimes I can skip through it fine, with the film moving instantly to the point I've selected - other times it freezes for several seconds before 'catching up'. These issues aren't constant - but it's like everything works fine for 30 seonds, then it'll glitch, then back to normal for 20 seconds, then glitch etc.

I'm using a Supermicro X10SL7-F (BIOS 2.00, IPMI 1.42) with the LSI controller flashed to IT firmware 16 along with an Intel Xeon 1240V3 Processor, 16GB of Crucial ECC RAM, 6x3Tb HGST Deskstar NAS drives and a Seasonic G Series 550W PSU all in a Fractal Design Node 804 case. I'm using the latest (Dec 31) 9.3 STABLE build, patched with the latest updates installed and mirrored on 2 Sandisk Cruzer USB 16Gb drives (installed on an internal dual drive riser on the internal USB header). I have link aggregation set up for the dual onboard gigabit NICs (plugged into a Netgear GS108Tv2 switch). The NAS reports that everything (including the HDDs) is fine, but I have started doing some more detailed testing - redid the CPU and RAM last night with Memtest86 and they passed, need to redo the HDDs now.

If anyone has any ideas I'd really appreciate the help - I'm completely new to *ix/FreeNAS so I'm struggling a bit now if I'm honest as I've pretty much exhausted my own knowledge here...
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Did you burn in your hardware? You should do extensive testing to weed out bad disks and other problems prior to starting to load it up with data. In particular, all the disk testing and SMART scans are very important. Most drive errors exhibit within about the first thousand hours, especially if the system is burn-in tested. There's a sticky in the hardware forum to get you started.
 

Charles Frank

Dabbler
Joined
Dec 24, 2014
Messages
13
Did you burn in your hardware? You should do extensive testing to weed out bad disks and other problems prior to starting to load it up with data. In particular, all the disk testing and SMART scans are very important. Most drive errors exhibit within about the first thousand hours, especially if the system is burn-in tested. There's a sticky in the hardware forum to get you started.

I did a basic check, but certainly nothing like described in the sticky. I've downloaded the solnet-array-test-v2.sh file and I'll give that a go - I also have a Windows 8.1 Enterprise Win2Go setup and if I don't have any joy with the script I'll break the array, boot with Win2Go and run the HGST windows drive test app for a few hours to see if it can find anything. I was thinking that the most likely candidate was a bad drive - your feedback seems to make that very likely...
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I would also not configure lagg right now. Get everything working in the simplest way possible then add features. This way you know what component broke something.
 

Charles Frank

Dabbler
Joined
Dec 24, 2014
Messages
13
I think I've found the issue. In the end I stripped the system and tested each HDD individually - I ran several different utilities against them and the very first drive I checked, while it was reported by all the programs as healthy...wasn't (at least I don't think so...) It would be very active for a second, then settle down for a few seconds, then very active again, then quiet etc - none of the other drives behaved this way (it also got very warm - hot even). I've replaced it with a new drive and I'm going to send it back to Amazon. I'll report back once I've rebuilt the NAS and loaded it with a couple terabytes of data for testing. Thanks for your help folks :)
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That sounds very much like recoverable errors being resolved through retries...
 
Status
Not open for further replies.
Top