SOLVED Harddrive issue, possibly drive going bad?

Status
Not open for further replies.

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
I wanted to ask you guys here and see what y'all think. I've got a relatively new nas server (about one month old). It's got 6 x 4TB WD Red drives in it configured in RAIDZ2. I have a PlexMediaServer running on the box as well. Today I started having the problem that when clicking Play on something it would take 10 to 15 seconds to actually change screens and start playing the video. Also I noticed browsing the CIFS shares on my windows desktop was responding slow.

So at first I thought something must be up with the network connection (I'm using a 4 port LACP on this box btw). So I rebooted my networking equipment, and that didn't do any good. So I then rebooted the NAS box. No flags came up on anything I saw during reboot, so once rebooted I went to unlock my storage array (it's encrypted) and it seemed to hang there after giving the little created geli decryption messages on the console. After 10 minutes (that is not an exaggeration it was really 10 minutes according to timestamps in the log) it finally unlocked the drive and proceeded through restarting my services and jails and such.

So.. I figured this obviously isn't a networking issue and started poking around more. I checked and all the smart tests have been passed (I have a short self test set to run daily and a long test once a week). So I just hit the drives with a smartctl -H, and ada4 is consistently taking 10 to 15 seconds to respond (all the other drives take < 1 second)

So I think I found the culprit to the problems... I'm just not sure if this is a sign of a drive failing or if some other weirdness is going on. I ran a short test on the drive and it passed, I've got a long test running now. I'm not extremely fluent with the smart utility to know if there is any other kind of test I should run, or if there is anything in particular to look for besides the line that says test result: PASSED.

Looking for any advice. Also, if it is likely to be a drive going bad, is there any particular guide I should follow for replacing it? I'm sure there is something in the manual but I haven't checked yet, wanted to post this first. Just don't want to replace it and screw up and wipe my array lol, most of my data is still on my old nas that I haven't reformatted yet, but I do have about 400 GBs of downloaded data that isn't on the old one atm =p

Thanks!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
First I'd post the full output of thedisk in question. The disk may not be bad, but there may be some kind of communication problem.

Second, FreeNAS version would be helpful.

Third, your hardware would be helpful.

Fourth, a copy of your debug file would be helpful. ;)
 

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
The full output of what? The smart test? Which one, long or short or some other variation? I'm attaching a copy of the current output of smartctl -a /dev/ada4, if something different is needed let me know please.

Freenas build is: FreeNAS-9.2.1.6-RELEASE-x64 (ddd1e39)

Hardware:
A1SAM-2550F motherboard
16GB ECC Unbuffered Memory
6 x 4TB WD Red drives in RAIDZ2
4GB USB Flash drive for Freenas system

I've attached the debug file as well.
 

Attachments

  • debug-freenas-20140902115455.tgz
    439.2 KB · Views: 283
  • ada4_smart.txt
    6.1 KB · Views: 303

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Your UDMA count is very high for a drive with just over 1000 hours on it. That's usually an indicator of bad SATA cable.

A quick glance shows everything else as okay. So try swapping out the SATA cable for that disk and see if that fixes it. ;)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
While you're swapping out the SATA cable fora different one, did you recently make any changes to your network/FreeNAS system? Recently could be as far back as a week ago. I too think the SATA cable is the first place to check but it could be something else too.

Alternately if changing the cable didn't fix it, try to move it to a different SATA port. And lastly, your other drives, when you ran "smartctl -a /dev/adax" on them, did you find UDMA values other than zero?
 

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
I'll try the SATA cable tonight when I get home and report back if it makes a difference or not. I'm assuming ada4 should be SATA4 on my motherboard? I labeled the cables to their matching port number so hopefully that is the case lol. If it isn't I'm going to have to figure out which drive it is. The smart report gives the serial numbers so I can figure it out based on that.. just a bit of a pain since it's not a hot swap system and I can't see the drive serial numbers without removing them from their trays hehe.

I haven't made any changes recently to any hardware / network gear. I added a couple of ZFS Datasets a few days ago to FreeNas, if that counts as relevant changes hehe.

I don't have another Sata port to move to atm hehe, if I had to try that I would have to get a Sata controller card, motherboard only has 6 ports on it and I've got 6 drives.

I did just recheck the other 5 drives and none of them show any UDMA errors.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
No, you can't assume that they are they same - many users have been burnt by trying that when a drive has failed, only to find they removed the wrong hard disk.

Serial number is the best way. Many of label the visable end of the hard disk, so we can easily identify a drive.

I'm assuming ada4 should be SATA4 on my motherboard?
 

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
Ok, yeah it actually ended up being on SATA0 lol. But I tried 3 different sata cables and the same problem was on all of them.

Since I don't have another sata port and no sata controller card atm, would it be safe for me to simply swap the sata port of two drives around to see if the problem remains on this drive or moves to another?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Ok, yeah it actually ended up being on SATA0 lol. But I tried 3 different sata cables and the same problem was on all of them.

Since I don't have another sata port and no sata controller card atm, would it be safe for me to simply swap the sata port of two drives around to see if the problem remains on this drive or moves to another?

Sure, FreeNAS is insensitive to the location of the drives.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
As Ericloewe said, you can move the ports around, what I don't recall is if the drive label will remain ada4 for the physical drive, I believe so but someone can back me up here. Either way, use the drive serial numbers to track the problem.
 

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
Yeah, I just tested that. The ada labels did change, ada4 moved to ada5 (tracked by serial as you said). But the problem remained with the same drive (I'll refer to it as 700014 that is the SN ending hehe).

So, is there anything else I should try or does this sum up that it's likely just a problem with drive 700014? I've already ordered another drive, I like to have a spare on hand anyway. Even if there ends up being something I can do to fix the problems with this drive, I'd like to have one here with easy access to make replacing one in the array quicker while waiting on the RMA hehe.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
If it moves with the drive then I'd say you have a drive issue. You have a RAIDZ2 so just follow the instruction in the manual to properly replace the drive, do not try to hotswap the drive even if your system supports it, too many folks end with disaster.
 

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
Got the new hard drive swapped in and resilvered and all is well and back to snappy response times :)

Thanks for the help everyone.
 
Status
Not open for further replies.
Top