Hard drives keep failing, is this really a hard drive issue?

Status
Not open for further replies.

Wynn

Cadet
Joined
Mar 23, 2013
Messages
6
Built my first FreeNas box a few weeks back, but I have yet to get to 100% working order. Here are some details of the machine:

Ver: 8.3.1-RELEASE (64bit)
CPU: Core 2 Duo 6400
Mobo: Gigabyte GA-965P-DS3 (latest bios is flashed)
RAM: 6GB (DDR2 6400)
HD: 3 1TB Seagate Barricuda drives in a RaidZ1 configuration

I have already replaced 2 brand new hard drives and 1 old one (a Western Digital green that was somewhat iffy going in, so I'm not counting that one), after getting this error on each one of them. Now I just got it again on a third brand new drive.
Code:
Apr  8 18:19:23 freenas smartd[2532]: Device: /dev/ada1, Failed SMART usage Attribute: 184 End-to-End_Error.


After that error pops, then I get a system warning:
"WARNING: The volume WynnMedia (ZFS) status is UNKNOWN: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'."

Replacing the hard drive with a brand new one fixes up the problem for a short while. Then when I start to copy a large amount of data over (roughly 20,000 files that take up 250GB), I get the above SMART error (drive number 2 simply disappeared, but also gave me the same error before disappearing from the pool). My file transfer hangs, and I have to figure out what happened and fix it.

Could I really be getting this many drive failures, or is this masking some other problem? I just want to figure out for sure what I need to replace so that I don't end back up at the store every day to return a bad hard drive. As I was typing this, this error just popped up in my log, not sure if it is causing this issue or not:
Code:
Apr  8 18:48:38 freenas kernel: MCA: CPU 0 COR L2 memory error


Thanks in advance for any help, I'm sure that posting some additional info would be helpful, just not sure exactly what would be useful to help out.
 

andoy31

Explorer
Joined
Apr 29, 2012
Messages
65
Not sure, but i think seagate desktop drives are quite know for raid problems ---http://forums.freenas.org/archive/index.php/t-7179.html

maybe your cpu is failing on you with the recent error you got, but if you come to think of it, your hdd smart error and cpu error are somewhat related.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Do you have good ventilation? What does the results from SMART show for your max drive temperature?

It could also be your power supply. A bad power supply can all of your hardware to mysteriously start failing.
 

Wynn

Cadet
Joined
Mar 23, 2013
Messages
6
The temp reported 39 (34 min / 41 max), so that seems ok.

The power supply is something that I haven't tackled yet. It is a Thermaltake TR2 600W.

Should have also noted that I have 2 other drives (2 Seagate 500GB), that are not part of any volume. I was going to buy a 3rd and then add them to the pool, but wanted to solve the other issues first. I mention this because 1 of those 2 drives disappeared, but I have spent zero time trying to figure that one out.

I honestly didn't give my power supply too much thought, wondering if it is not up to the task here. Also still have a Blu-Ray drive consuming power (have not got around to removing it) and a NX7 600GT video card (was kind of handy during installation to just go and directly use the box as opposed to SSH). Will do some tests to see if I can narrow it down.
 

Wynn

Cadet
Joined
Mar 23, 2013
Messages
6
Well I removed all of the unnecessary hardware, and managed to copy over that 250GB of data with no errors. It is still periodically popping that SMART end to end sector failure, but the volume is reported as healthy and everything is running smooth. Hopefully, it is just the power supply, am going to eventually replace everything with better components (this thread is very helpful - http://forums.freenas.org/showthread.php?12276-So-you-want-some-hardware-suggestions), but if I can keep it stable for a month or two that would be awesome.
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
And what if it goes belly up the week before you decide to buy new hardware and you lose all your data? As long as you're OK with that, fine.
 
Status
Not open for further replies.
Top