Is this a SMART fail?

Status
Not open for further replies.

Bhoot

Patron
Joined
Mar 28, 2015
Messages
241
Just checked I had a "CRITICAL: Device: /dev/ada3, Self-Test Log error count increased from 0 to 1" which I thought was pretty strange.
Nevertheless I did SSH and get the smartctl

Code:
[root@freenas] ~# smartctl -a /dev/ada3 | more
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Red
Device Model:  WDC WD40EFRX-68WT0N0
Serial Number:  WD-WCC4E0ESU744
LU WWN Device Id: 5 0014ee 2623ee4e2
Firmware Version: 82.00A82
User Capacity:  4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Thu Feb  9 22:03:10 2017 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
  was never started.
  Auto Offline Data Collection: Disabled.
Self-test execution status:  ( 120) The previous self-test completed having
  the read element of the test failed.
Total time to complete Offline
data collection:  (52560) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  ( 526) minutes.
Conveyance self-test routine
recommended polling time:  (  5) minutes.
SCT capabilities:  (0x703d) SCT Status supported.
  SCT Error Recovery Control supported.
  SCT Feature Control supported.
  SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  187  178  021  Pre-fail  Always  -  7633
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  35
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  200  200  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  089  089  000  Old_age  Always  -  8393
10 Spin_Retry_Count  0x0032  100  253  000  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  100  253  000  Old_age  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  34
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  11
193 Load_Cycle_Count  0x0032  200  200  000  Old_age  Always  -  841
194 Temperature_Celsius  0x0022  113  107  000  Old_age  Always  -  39
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  200  200  000  Old_age  Offline  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline  Completed: read failure  80%  8354  880016000
# 2  Short offline  Completed without error  00%  8297  -
# 3  Short offline  Completed without error  00%  8038  -
# 4  Extended offline  Completed without error  00%  7962  -
# 5  Short offline  Completed without error  00%  7876  -
# 6  Short offline  Completed without error  00%  7707  -
# 7  Extended offline  Completed without error  00%  7628  -
# 8  Short offline  Completed without error  00%  7576  -
# 9  Short offline  Completed without error  00%  7301  -
#10  Extended offline  Completed without error  00%  7219  -
#11  Short offline  Completed without error  00%  7137  -
#12  Short offline  Completed without error  00%  6968  -
#13  Extended offline  Completed without error  00%  6880  -
#14  Short offline  Completed without error  00%  6803  -
#15  Short offline  Completed without error  00%  6581  -
#16  Extended offline  Completed without error  00%  6494  -
#17  Short offline  Completed without error  00%  6413  -
#18  Short offline  Completed without error  00%  6247  -
#19  Extended offline  Completed without error  00%  6164  -
#20  Short offline  Completed without error  00%  6085  -
#21  Short offline  Completed without error  00%  5836  -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



in which I am guessing
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 80% 8354 880016000

indicated that I have an issue. Is this the only parameter that shows the disk has a SMART error? Please help me decipher this report. Thank you. :)
I also see a parameter called remaining shown as 80% which was 00% previously. Could someone help me with that as well?
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
This is an early indication of a failing drive. You can re-run the SMART test by running smartctl -t long /dev/ada3 and see if you get the same.
 

Bhoot

Patron
Joined
Mar 28, 2015
Messages
241
This is an early indication of a failing drive. You can re-run the SMART test by running smartctl -t long /dev/ada3 and see if you get the same.
I guess I read your mind. Thank you nevertheless.

Code:
[root@freenas] ~# smartctl -t long /dev/ada3
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-li  ne mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line  mode" successful.
Testing has begun.
Please wait 526 minutes for test to complete.
Test will complete after Fri Feb 10 07:09:05 2017

Use smartctl -X to abort test.



I am guessing since the read error is from an extended offline test chances are that they are pretty accurate. I shall re run the smartctl in about 10 hours. :)
 

Bhoot

Patron
Joined
Mar 28, 2015
Messages
241
Previous tests passed, showing 00% remaining. The test in question failed 20% in, with 80% remaining.
Thanks. Since you pointed it out that it failed with 80% remaining I guessed the test wouldn't last very long to pick another error if it was a 'correct' error.
took a fresh reading
Code:
[root@freenas] ~# smartctl -a /dev/ada3
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Red
Device Model:  WDC WD40EFRX-68WT0N0
Serial Number:  WD-WCC4E0ESU744
LU WWN Device Id: 5 0014ee 2623ee4e2
Firmware Version: 82.00A82
User Capacity:  4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Fri Feb 10 08:23:30 2017 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
  was never started.
  Auto Offline Data Collection: Disabled.
Self-test execution status:  ( 121) The previous self-test completed having
  the read element of the test failed.
Total time to complete Offline
data collection:  (52560) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  ( 526) minutes.
Conveyance self-test routine
recommended polling time:  (  5) minutes.
SCT capabilities:  (0x703d) SCT Status supported.
  SCT Error Recovery Control supported.
  SCT Feature Control supported.
  SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  187  178  021  Pre-fail  Always  -  7633
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  35
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  100  253  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  089  089  000  Old_age  Always  -  8403
 10 Spin_Retry_Count  0x0032  100  253  000  Old_age  Always  -  0
 11 Calibration_Retry_Count 0x0032  100  253  000  Old_age  Always  -  0
 12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  34
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  11
193 Load_Cycle_Count  0x0032  200  200  000  Old_age  Always  -  843
194 Temperature_Celsius  0x0022  113  107  000  Old_age  Always  -  39
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  200  200  000  Old_age  Offline  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline  Completed: read failure  90%  8394  880016000
# 2  Extended offline  Completed: read failure  80%  8354  880016000
# 3  Short offline  Completed without error  00%  8297  -
# 4  Short offline  Completed without error  00%  8038  -
# 5  Extended offline  Completed without error  00%  7962  -
# 6  Short offline  Completed without error  00%  7876  -
# 7  Short offline  Completed without error  00%  7707  -
# 8  Extended offline  Completed without error  00%  7628  -
# 9  Short offline  Completed without error  00%  7576  -
#10  Short offline  Completed without error  00%  7301  -
#11  Extended offline  Completed without error  00%  7219  -
#12  Short offline  Completed without error  00%  7137  -
#13  Short offline  Completed without error  00%  6968  -
#14  Extended offline  Completed without error  00%  6880  -
#15  Short offline  Completed without error  00%  6803  -
#16  Short offline  Completed without error  00%  6581  -
#17  Extended offline  Completed without error  00%  6494  -
#18  Short offline  Completed without error  00%  6413  -
#19  Short offline  Completed without error  00%  6247  -
#20  Extended offline  Completed without error  00%  6164  -
#21  Short offline  Completed without error  00%  6085  -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

From the result I guess it's pretty evident that there is an actual event of a Read error.
I do have a spare disk and no spare sata cable/port. Do I let the hard disk run till it has life or should I start resilvering ASAP?
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Yup, defiantly time to replace that disk.

Also, just to note, your drive temp is at 39C. While drives will normally operate at this temperature, see if you can squeak out some more cooling and bring them to at least 35C.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Do I let the hard disk run till it has life or should I start resilvering ASAP?
Is your spare burned in? If not, I would start with that.

Once the spare is burned in, I'd replace the suspect drive. After resilvering is complete, you have a choice of immediate RMA, or further testing to see if the suspect drive might have correctable flaw. One way is to burn it in again, to see if any of the write passes will cause a sector reallocation. If not, you will see errors on each subsequent read pass, and should return it.
 
Status
Not open for further replies.
Top