Drive showing a few read errors

Status
Not open for further replies.

philhu

Patron
Joined
May 17, 2016
Messages
258
Should I replace? Question is why is SAFE ok and why doesnt the report give me a warning on it. I just happenned to notice it after replacing another failed disk. Question is on disk da17, see below picture
freenas-disk.png
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Please post output of smartctl -a /dev/da17 using the [ code ] tags.
 

philhu

Patron
Joined
May 17, 2016
Messages
258
thanks for looking, here you go:
Code:
[philhu@freenas] /mnt/volume1/sharecifs/homes/philhu# smartctl -a /dev/da17
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:  WL6000GSA6472E
Serial Number:  WOL240331217
LU WWN Device Id: 0 000000 000000000
Firmware Version: 01.01B.1
User Capacity:  6,201,213,935,616 bytes [6.20 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Device is:  Not in smartctl database [for details use: -P showall]
ATA Version is:  ATA8-ACS (minor revision not indicated)
Local Time is:  Tue Aug  1 16:24:13 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
  was never started.
  Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  ( 7484) seconds.
Offline data collection
capabilities:  (0x5b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  No Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  ( 728) minutes.
SCT capabilities:  (0x3035) SCT Status supported.
  SCT Feature Control supported.
  SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  253  253  021  Pre-fail  Always  -  5933
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  13
  5 Reallocated_Sector_Ct  0x0033  199  199  140  Pre-fail  Always  -  40
  7 Seek_Error_Rate  0x002f  100  253  051  Pre-fail  Always  -  0
  9 Power_On_Hours  0x0032  088  088  000  Old_age  Always  -  9470
 10 Spin_Retry_Count  0x0033  100  253  051  Pre-fail  Always  -  0
 11 Calibration_Retry_Count 0x0032  100  253  051  Old_age  Always  -  0
 12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  13
184 End-to-End_Error  0x0033  100  100  097  Pre-fail  Always  -  0
187 Reported_Uncorrect  0x0032  100  100  000  Old_age  Always  -  0
188 Command_Timeout  0x0032  100  100  000  Old_age  Always  -  0
190 Airflow_Temperature_Cel 0x0022  058  049  000  Old_age  Always  -  42
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  12
193 Load_Cycle_Count  0x0032  200  200  000  Old_age  Always  -  0
194 Temperature_Celsius  0x0022  110  101  000  Old_age  Always  -  42
195 Hardware_ECC_Recovered  0x0036  200  200  000  Old_age  Always  -  0
196 Reallocated_Event_Count 0x0032  177  177  000  Old_age  Always  -  23
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0009  200  200  051  Pre-fail  Offline  -  59

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline  Completed without error  00%  9456  -
# 2  Extended offline  Completed without error  00%  9375  -
# 3  Short offline  Completed without error  00%  9072  -
# 4  Short offline  Completed without error  00%  8713  -
# 5  Short offline  Completed without error  00%  8353  -
# 6  Short offline  Completed without error  00%  7993  -
# 7  Short offline  Completed without error  00%  7610  -
# 8  Short offline  Completed without error  00%  7261  -
# 9  Short offline  Completed without error  00%  6891  -
#10  Short offline  Completed without error  00%  6531  -
#11  Short offline  Completed without error  00%  6147  -
#12  Short offline  Completed without error  00%  5788  -
#13  Short offline  Completed without error  00%  5478  -
#14  Short offline  Completed without error  00%  5117  -
#15  Short offline  Completed without error  00%  4734  -
#16  Short offline  Completed without error  00%  4374  -
#17  Short offline  Completed without error  00%  3990  -
#18  Short offline  Completed without error  00%  3631  -
#19  Short offline  Completed without error  00%  3271  -
#20  Short offline  Completed without error  00%  2910  -
#21  Short offline  Completed without error  00%  2527  -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[philhu@freenas] /mnt/volume1/sharecifs/homes/philhu#
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
You have 23 reallocated sectors. It's not much to worry about. Should that number continue to increase then it's time to replace it.

I would also check the cooling of your drives. While 42C is within tolerance, you should aim for a lower temperature around the 35C mark.


Oops. I got numbers mixed up. See @joeschmuck's response below.
 
Last edited:

philhu

Patron
Joined
May 17, 2016
Messages
258
Thanks...it has been at 23 errors forever.

The 42 heat is due to a very warm day in mass, with my bulkhead open for a plumber to do work, so the basement heated up.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Not so fast!

5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 40
That is 40 reallocated sectors.

196 Reallocated_Event_Count 0x0032 177 177 000 Old_age Always - 23
This value is actually data transfer events such as if you were reading some data and it noticed the sector was questionable and then transfer the data to another sector (pass or fail).

200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail Offline - 59
And you have this smoking gun as well. While this on it's own may not mean anything, this likely factors in given the other errors you have.

9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 9470
You only have 9470 hours (barely over 1 year) on this drive and it's failing.

But you likely cannot RMA this drive because it's a White Label drive and I suspect the warranty is only 1 year.

My advice, and this is based on the fact that you have a RAIDZ3...
1) If this is a production machine, replace the drive. There is a reason you have RAIDZ3, because you care a lot about your data.
2) If you plan to retian this drive, run a SMART Extended (Long) test on this drive at least once a week or more frequently and a SMART Short test daily. You do not want to be caught off guard when the drive starts to fail more. If you see no further failures in your SMART data for this drive after a few months, you may be fine for a while.

Hope this helps.
 

philhu

Patron
Joined
May 17, 2016
Messages
258
I have replacements, at temperature, in the box. I WILL replace the disk

Even though I have a full tape backup, I do not want to spend the 11 days to recover it!!!!!
 
Status
Not open for further replies.
Top