SMART results - problems?

Status
Not open for further replies.

Sonia Hamilton

Dabbler
Joined
Mar 17, 2014
Messages
10
My first post to freenas forums. My freenas has been running fine for about 6 months now, raidz1-0 on 5 x 1T drives.

I noticed this come up in a SMART test, wondering if this is a problem and I should look at replacing the drive:
nas# smartctl -a /dev/ada2
...
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 70% 2847 650258392
# 2 Extended offline Completed: read failure 40% 2835 1305121080
# 3 Extended offline Completed without error 00% 2812 -

Drive info:

Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST1000DM003-1CH162

Full test results attached. Thanks for any help.
 

Attachments

  • smart.txt
    11.4 KB · Views: 326

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Yes if it's failing SMART I would replace asap.. Is the drive under warranty?
 

panz

Guru
Joined
May 24, 2013
Messages
556
Moreover, you should consider to adopt RAIDZ2. RaidZ1 puts at risk your data (read cyberjock's Guide in the stickies)
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Definately failing.. Reallocated sectors @ 80.. Report uncorrected at 90.. Temps may be slightly high..
 

Sonia Hamilton

Dabbler
Joined
Mar 17, 2014
Messages
10
Thanks ppl for all your replies :) Yes it's under warranty, so I'll rip out the drive and replace it. @Panz I'll check out RAIDZ2 too.

@Yatti420, I can see the 'reallocated sectors' and 'reported_uncorrect' errors, but I didn't know to pick these out. I notice that there's a few VALUE/WORST over THRESHOLD. Any rules of thumb for which ones to pay attention to (in general)?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Sonia:

The Value/Worst/Threshhold things are really a pain in the ass, and very counterintuitive. Don't worry about those.

Pay attention to the raw counts on the right. In particular, #197, the "current pending sectors", is often the first one to go south. If the raw number is anything but zero, start worrying, and if it's more than a handful, start panicking. You can see yours is 2192 or something!!

You also have 80 reallocated sectors, and 90 "uncorrectable" sectors. Again, these are numbers that should be 0 until a drive starts dying. 90 uncorrectable sectors is an emergency.

In short, your drive is in "code red--emergency" status. It will fail soon. Accordingly, it should be replaced.
 

panz

Guru
Joined
May 24, 2013
Messages
556
In my pfSense box I put an old 2.5 drive that shows 1 Current Pending and 1 offline uncorrectable. No reallocated sector ct (=0). But has 81 million + of hardware ECC Recovered.
 

Sonia Hamilton

Dabbler
Joined
Mar 17, 2014
Messages
10
Thanks @DrKK. I found an article that explains these and other metrics - useful for others in future. Here's the relevant part:

Reallocated Sectors Count. This parameter gives a good hint on the overall health of the drive. It represents the number of sectors that were found bad and were remapped to a special zone (reserved area) of the hard disk. Normally, new hard drives should have zero reallocated sectors. With use, you may get an occasional instance or two; this usually does not represent a serious problem. What does represent a problem is a situation where the number of reallocated sectors is steadily increasing with time. This means the disk is slowly failing; get a replacement ASAP before you start getting uncorrectable read errors (see below).​
Current Pending Sector Count. The meaning of this value is highly dependent on disk manufacturers. A rise in Current Pending Sector Count may mean there are unstable (but not necessarily outright bad) sectors on the drive. If the count of pending sectors increases with time, it’s time to replace the disk.​
Uncorrectable Sector Count. When a sector is so bad it can’t be read for remapping, the Uncorrectable Sector Count variable increases. The variable represents the count of uncorrectable errors when reading/writing a sector from the disk surface. If the value of this attribute increases, this indicates mechanical problems or defects of the disk surface. A replacement disk should be used as soon as possible.​
Read Error Rate. This parameter stores data about the rate of hardware read errors that occurred when reading data from the disk. The raw value is manufacturer dependent, so it’s difficult to interpret correctly. This parameter may not mean much to generic SMART analysis tools, but is often used by disk diagnostic tools supplied by hard drive manufacturers.​
Write Error Rate. Indicates errors while recording data into disk. Vendor-specific value, mostly used by manufacturer supplied HDD diagnostic tools.​
Reallocation Event Count. The value stores the count of sector reallocation operations. Both successful and unsuccessful attempts are counted. This value supplements the reading of Reallocated Sectors Count, but is sometimes omitted (not recorded) by some models/manufacturers. A rise in Reallocation Event Count means the hard drive is deteriorating.​
Spin Retry Count. This value stores the number of retries during disk spin-up. A growing value may be a sign of an upcoming mechanical failure.​
From http://hetmanrecovery.com/recovery_news/smart-parameters-and-early-signs-of-a-failing-hard-disk.htm
 
Status
Not open for further replies.
Top