Drive Error

NasKar

Guru
Joined
Jan 8, 2016
Messages
739
I just noticed an alert Device: /dev/da2 [SAT], Self-Test Log error count increased from 0 to 1. Is my drive failing?

Code:
########## SMART status report for da2 drive (Western Digital Red: WD-WCC*******) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   177   167   021    Pre-fail  Always       -       8125
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       192
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       23182
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       156
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       98
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       771
194 Temperature_Celsius     0x0022   124   110   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed: read failure       10%     23151         3261243016
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Code:
Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed: read failure       10%     23151         3261243016
Yep, that drive is done. Check if it is in warranty and prepare to replace it.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

NasKar

Guru
Joined
Jan 8, 2016
Messages
739
Thanks for confirming my suspicion's Chris. It's very annoying that the warranty was up in May (3 yrs). Will have to order a new drive. I have the smart.sh script running on a cron task from the GUI. It does email me everyday and my original post is an excerpt of the email. On my old system when a drive went bad it sent me a email that a drive was failing. I only noticed the alert in the GUI and then checked my email on that drive more closely. BTW I can get a test email (system/email/sendmail)
 

NasKar

Guru
Joined
Jan 8, 2016
Messages
739
Interesting but recent results from the smart.sh script don't identify any errors and the alert doesn't show up in the GUI. Do you still think the drive needs to be replaced?

########## SMART status report for da2 drive (Western Digital Red: WD-WCC*********) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 172 167 021 Pre-fail Always - 8358
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 193
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 069 069 000 Old_age Always - 23241
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 157
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 99
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 772
194 Temperature_Celsius 0x0022 124 110 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 2

No Errors Logged

Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
Short offline Completed without error 00% 23239 -
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hey NasKar,

Trust is easy to loose, very hard to gain. As such, from the moment a hard drive starts doing errors, I do not wait until that many errors before replacing the drive. Considering the time for ordering, receiving, testing, installing, re-silvering, ... Better to start doing it ASAP.

Good luck,
 

NasKar

Guru
Joined
Jan 8, 2016
Messages
739
Heracles good advice. I ordered a new drive this morning.
 

NasKar

Guru
Joined
Jan 8, 2016
Messages
739
After reading

I did a smartctl -A /dev/da0 for each drive and noticed that another drive da0 has Raw_Read_error_Rate of 248, and a Multi_Zone_Error_Rate of 11. Not getting any messages in the GUI on da0 just the original da2.

Do I need to replace both? Why am I not getting an alert on the da0? Will an email only be sent automatically when it gets critical?

Code:
root@freenasSuper:/git/freenas-iocage-nextcloud # smartctl -A /dev/da2
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   172   167   021    Pre-fail  Always       -       8358
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       193
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       23300
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       157
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       99
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       774
194 Temperature_Celsius     0x0022   124   110   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       3

root@freenasSuper:/git/freenas-iocage-nextcloud # smartctl -A /dev/da0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       249
  3 Spin_Up_Time            0x0027   181   177   021    Pre-fail  Always       -       7941
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       96
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   075   075   000    Old_age   Always       -       18357
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       96
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       55
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1551
194 Temperature_Celsius     0x0022   125   115   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       11
 
Top