smartd indicates failure not reflected in attributes

Status
Not open for further replies.

Keno5net

Cadet
Joined
Feb 24, 2013
Messages
6
I am getting the following errors after expanding my mirrored volume by replacing two 1 tb disks with two 2 tb disks.

Oct 31 17:14:22 freenas smartd[1905]: Device: /dev/ada1, Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.

This is only showing up on one disk. Here are the reallocated sector count attributes read from the disks.

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

This is from the ada0 not causing errors
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0

This is from ada1 which is erroring
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0

I have run the short test on ada1 and it passed I am in the process of running the long test. I think this may be a false positive or it could be a problem with the new disk. I have a replacement coming tomorrow from a different vendor and manufacturer. Is it common for SMARTD to report errors that aren't reflected in smartctl results? I have seen at least one other post with similar disparity.


The problem disks are both Segate NAS HDD's Model ST2000VN000
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Usually its user error in understanding what smartctl -a give you for the output. Can you post the entire output of smartctl -a -q noserial /dev/ada1?

And for some reason the font keep changing and it won't let me change it back.  So sorry for the weird fonts.  WTF.....
 

Keno5net

Cadet
Joined
Feb 24, 2013
Messages
6
Here is the -a -q output in a file. As you can see I am still waiting on the long test to complete.
 

Attachments

  • smartaq.txt
    4.8 KB · Views: 366

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I'm at a loss to explain your problem. The disk appears to be in perfect health. Drive is nice and cool, low load cycle count, etc. I can't explain it.
 

Keno5net

Cadet
Joined
Feb 24, 2013
Messages
6
It appears that there may not be a problem with the drive. I decided to try the old Dogbert tech support maxim "shut up and reboot" and there have been no failures for over an hour. Now that I think about it when I updated the volume I hot swapped out the second disk which may have gotten the error stuck in the system somehow. /shrug.

One last question.. I have a WD red disk arriving tomorrow and have read in several places the good chance of two disks from the same manufacturer and batch failing within hours of one another. Would I be better off replacing one of my Seagate disks with the new WD disk to protect against that possibility? Thanks for your time.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
It appears that there may not be a problem with the drive. I decided to try the old Dogbert tech support maxim "shut up and reboot" and there have been no failures for over an hour. Now that I think about it when I updated the volume I hot swapped out the second disk which may have gotten the error stuck in the system somehow. /shrug.

One last question.. I have a WD red disk arriving tomorrow and have read in several places the good chance of two disks from the same manufacturer and batch failing within hours of one another. Would I be better off replacing one of my Samsung disks with the new WD disk to protect against that possability? Thanks for your time.

I don't worry about such things to be honest. The whole issue with failures within hours is because in theory all of your disks in the pool will "wear out" at a similar rate. If they were manufactured at the same time they will likely have the same "time to failure". I don't worry about it too much myself. Just watching how many people lose their pools you're far more likely to lose your pool due to improper design, administration, and maintenance of your server.

Don't take this the wrong way, but money says if we had a conversation over skype I'd find at least 2 things you've probably done wrong with your server that puts you at higher risk for data loss than it should otherwise be. It's just a fact that a lot of people don't follow the FreeNAS manual's recommendations, read and follow the stickies, and actually take a lot of the advice from the forums to heart. It's one thing to read it, its another to follow it.

Also keep in mind that RAID is not a substitute for backups. If you really want to worry about things like 2 disks from the same batch failing within hours of each other you should be running a small backup server to maintain your most important data. Even a cheaper slower system can make an excellent backup server. You don't need 100MB/sec transfer rates to/from your backup server, right?
 

Keno5net

Cadet
Joined
Feb 24, 2013
Messages
6
I am already running a smaller backup server that I run once a week to let it back up the main system on Sunday night. Maybe I will use the new disk in that so the sizes match. I decided to do a cold backup system after hearing about the latest ransom-ware that has been going around. Not that I plan to practice unsafe computing but better safe than sorry. The main NAS is used for backups of two systems and as a media server to a home theater. Now I can back that up once a week with rsync to the second server the only thing I still need to work on is off site backup but for home use that probably won't happen unless I haul a disk to and from work once a month.

Thanks again..
 
Status
Not open for further replies.
Top