More hard disk problems

Status
Not open for further replies.

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
I keep getting

Code:
zpool status <pool name>


reports that one or more drives have faulted.

Scrubbing sometimes finds errors but often reports that there were no errors to fix.

Then, not long after, the same or other drives are again reported to have faulted.

smartctl reports that the drive in question has passed, and when I hook the drive up to a Win7 machine in an eSATA docking station and run SeaTools for Windows it reports that the drive is fine. But HD Sentinel on the Windows machine sometimes first shows that the drive is bad but later reports that it is fine.

One drive was shown as UNAVAIL. I replaced it, but the old one still shows as UNAVAIL.

I have replaced the drive cables. The PSU is only a few months old.

This is driving me crazy. Help, please.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Checksum errors? Did you replace the drive according to the documentation? It's only in one pool right?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
This is odd. Let me repeat back what you said.

You say: The drives in the FreeNAS box report faults. On occasion, scrubbing will fix it, but on other occasions, scrubbing finds no problem. THen, the same or OTHER drives will show faulted, shortly thereafter.

When you take any of the suspect drives out and put them on another known-good system, everything is fine.

Sir, there's only one diagnosis for this that I can think of if you can rule out a motherboard or controller failure:

You say the PSU is "new". But, PSU's are by far the most likely to fail, or to be crap out of the box. This is where I'd place my suspicions. I suspect a bad +5V rail, because that'll affect primarily your hard drives, and maybe USB ports.

I found a link that would seem to suggest I might be on to something
http://www.tomshardware.com/forum/187006-28-what-rail
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The question of why the hell you haven't burned in the system in question is ... the burning question on my mind, ahahaha.

Power supply, cabling, controller, system board, memory ... your possible lost of suspects.
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
The question of why the hell you haven't burned in the system in question is ... the burning question on my mind, ahahaha.

Power supply, cabling, controller, system board, memory ... your possible lost of suspects.
Everything was burned in. This problem has suddenly appeared,
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
Checksum errors? Did you replace the drive according to the documentation? It's only in one pool right?
Yes, checksum errors: sometimes in one pool, sometimes in the other, sometimes in both at the same time.
 

Sir.Robin

Guru
Joined
Apr 14, 2012
Messages
554
That suck... but i would do as DrKK says, trie a second PSU. And memtest.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Yep bad capacitors in the PSU? More burn in time..
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
You can also try dusting the PCI-e slot and the M1015 with compressed air (don't blow into it!). At this point, I'd say that general advice of "reseat everything, clean everything, replace dubious (or even all) SATA cables" might be in order. Worst case you get a (physically) cleaner system, best case you solve your problem.

There's nothing quite as hateful as an error that happens seemingly randomly and leaves behind little to trace it.

What PSU are you using, by the way?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
and where'd you get the M1015 from? There were Chinese knockoffs running around for awhile.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
and where'd you get the M1015 from? There were Chinese knockoffs running around for awhile.

Well, at least one person found out they were being sold as M1015s but they clearly weren't by the picture. Even the firmware menus were totally different. I don't think we ever had solid evidence they were Chinese knockoffs though...
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
and where'd you get the M1015 from? There were Chinese knockoffs running around for awhile.
The M1015 was bought on eBay, but I deliberately did not get one that was being shipped from HK or China -- yet it still could be a knockoff, I suppose; any way to tell? It did work for several months plugged into a different motherboard.

Anyway, I've unpluggged and reseated everything. I checked the PSU and saw no "sagging" of any voltages, even at drive spin-up time. The drive spin-up delay was at the 2sec. default; I've increased it to 3sec. anyway.

If I were to increase the spin-up delay even further, is there any way to ensure that FreeNAS doesn't start until all the drives are ready?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
If I were to increase the spin-up delay even further, is there any way to ensure that FreeNAS doesn't start until all the drives are ready?

Yes, the system completes the POST before bootup begins. Aka this isn't a problem that you should be experiencing unless you have failing hardware.

Can you post the smartctl -a -q noserial /dev/xxx for the drives in question. My first guess is you aren't running SMART tests so your "PASSED" status is a lie because it isn't running diagnostics so it hasn't had a chance to fail. ;)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Heard about it elsewhere too, but ... unclear what the reality is.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I think running the MemTest and possibly a CPU Stress test would be very helpful to rule out the hardware. Even if it isn't the PS it could be the MB or other component. Electrolytic capacitors do fail but hopefully you have solid capacitors at least on the MB. I think these tests need to be done simply because you have stated that you have had issues on two different pools on this machine.
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
The M1015 was bought on eBay, but I deliberately did not get one that was being shipped from HK or China -- yet it still could be a knockoff, I suppose; any way to tell? It did work for several months plugged into a different motherboard.

Anyway, I've unplugged and reseated everything. I checked the PSU and saw no "sagging" of any voltages, even at drive spin-up time. The drive spin-up delay was at the 2sec. default; I've increased it to 3sec. anyway.

If I were to increase the spin-up delay even further, is there any way to ensure that FreeNAS doesn't start until all the drives are ready?

It's been more than 24hrs since I unplugged and reconnected everything, and there have been no further error reports, so the problem seems to have been solved.

I had already replaced the drive data cables, so they shouldn't have been the problem. I therefore assume that it was either the M1015's connection to the motherboard or -- more likely -- the power connections to the iStarUSA drive cages.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It's been more than 24hrs since I unplugged and reconnected everything, and there have been no further error reports, so the problem seems to have been solved.

I had already replaced the drive data cables, so they shouldn't have been the problem. I therefore assume that it was either the M1015's connection to the motherboard or -- more likely -- the power connections to the iStarUSA drive cages.

Keep a close eye. I'd say a week without problems is enough to attribute this to gremlins, assuming your problems manifested themselves at least daily.
 
Status
Not open for further replies.
Top