SMART Errors occurring on one drive. Need help understanding them

momobozo

Cadet
Joined
Aug 16, 2017
Messages
8
Hello,

I have been receiving some SMART errors on one of my six drives. I have the drives in a RAID-Z2 configuration (4 data + 2 redundancy). These SMART errors seem to only show up on one of the drives, but I am unfamiliar with what to make of them. Can someone help me understand them or point me in the right direction?

Thank you

Below is the output. Notice the errors only show for ada0.

Code:
########## ZPool status report for storage ##########


  pool: storage
state: ONLINE
  scan: scrub repaired 0 in 0 days 01:20:40 with 0 errors on Mon Jul  1 02:20:41 2019
config:

    NAME                                            STATE     READ WRITE CKSUM
    storage                                         ONLINE       0     0     0
      raidz2-0                                      ONLINE       0     0     0
        gptid/deb50bc2-b2fd-11e7-94b1-ac1f6b1956d2  ONLINE       0     0     0
        gptid/df39ddfb-b2fd-11e7-94b1-ac1f6b1956d2  ONLINE       0     0     0
        gptid/dfbf39bd-b2fd-11e7-94b1-ac1f6b1956d2  ONLINE       0     0     0
        gptid/e04ea62a-b2fd-11e7-94b1-ac1f6b1956d2  ONLINE       0     0     0
        gptid/e0d14e92-b2fd-11e7-94b1-ac1f6b1956d2  ONLINE       0     0     0
        gptid/e155e419-b2fd-11e7-94b1-ac1f6b1956d2  ONLINE       0     0     0

errors: No known data errors





########## SMART status report for ada0 drive (Western Digital Red: 7SGEEVGC) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   128   128   054    Old_age   Offline      -       116
  3 Spin_Up_Time            0x0007   199   199   024    Pre-fail  Always       -       267 (Average 394)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       31
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       9141
10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       31
22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       403
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       403
194 Temperature_Celsius     0x0002   166   166   000    Old_age   Always       -       39 (Min/Max 24/42)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

ATA Error Count: 31 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 31 occurred at disk power-on lifetime: 62 hours (2 days + 14 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 80 c8 ae 80 40 08   2d+14:00:42.940  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 08   2d+14:00:40.215  READ LOG EXT
  60 08 70 c8 ae 80 40 08   2d+14:00:37.469  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 08   2d+14:00:37.465  READ LOG EXT
  60 08 60 c8 ae 80 40 08   2d+14:00:34.719  READ FPDMA QUEUED

Error 30 occurred at disk power-on lifetime: 62 hours (2 days + 14 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 70 c8 ae 80 40 08   2d+14:00:40.215  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 08   2d+14:00:37.465  READ LOG EXT
  60 08 60 c8 ae 80 40 08   2d+14:00:34.719  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 08   2d+14:00:34.715  READ LOG EXT
  60 08 50 c8 ae 80 40 08   2d+14:00:31.860  READ FPDMA QUEUED

Error 29 occurred at disk power-on lifetime: 62 hours (2 days + 14 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 60 c8 ae 80 40 08   2d+14:00:37.465  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 08   2d+14:00:34.715  READ LOG EXT
  60 08 50 c8 ae 80 40 08   2d+14:00:31.860  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 08   2d+14:00:31.857  READ LOG EXT
  60 08 40 c8 ae 80 40 08   2d+14:00:29.158  READ FPDMA QUEUED

Error 28 occurred at disk power-on lifetime: 62 hours (2 days + 14 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 50 c8 ae 80 40 08   2d+14:00:34.715  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 08   2d+14:00:31.857  READ LOG EXT
  60 08 40 c8 ae 80 40 08   2d+14:00:29.158  READ FPDMA QUEUED
  60 08 38 c0 ae 80 40 08   2d+14:00:29.158  READ FPDMA QUEUED
  60 08 30 b8 ae 80 40 08   2d+14:00:29.157  READ FPDMA QUEUED

Error 27 occurred at disk power-on lifetime: 62 hours (2 days + 14 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 40 c8 ae 80 40 08   2d+14:00:31.856  READ FPDMA QUEUED
  60 08 38 c0 ae 80 40 08   2d+14:00:29.158  READ FPDMA QUEUED
  60 08 30 b8 ae 80 40 08   2d+14:00:29.157  READ FPDMA QUEUED
  60 08 28 b0 ae 80 40 08   2d+14:00:29.157  READ FPDMA QUEUED
  60 08 20 a8 ae 80 40 08   2d+14:00:29.157  READ FPDMA QUEUED

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%      8653         -
Short offline       Completed without error       00%      9096         -





########## SMART status report for ada1 drive (Western Digital Red: 7SGEGHRC) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   128   128   054    Old_age   Offline      -       117
  3 Spin_Up_Time            0x0007   196   196   024    Pre-fail  Always       -       270 (Average 400)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       31
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       9141
10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       31
22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       404
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       404
194 Temperature_Celsius     0x0002   158   158   000    Old_age   Always       -       41 (Min/Max 25/44)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%      8651         -
Short offline       Completed without error       00%      9096         -





########## SMART status report for ada2 drive (Western Digital Red: 7SGB3U0C) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   129   129   054    Old_age   Offline      -       112
  3 Spin_Up_Time            0x0007   189   189   024    Pre-fail  Always       -       266 (Average 429)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       33
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       9141
10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33
22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       406
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       406
194 Temperature_Celsius     0x0002   166   166   000    Old_age   Always       -       39 (Min/Max 25/42)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%      8651         -
Short offline       Completed without error       00%      9096         -





########## SMART status report for ada3 drive (Western Digital Red: 7SGERNKC) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   128   128   054    Old_age   Offline      -       116
  3 Spin_Up_Time            0x0007   185   185   024    Pre-fail  Always       -       273 (Average 437)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       9141
10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       32
22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       405
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       405
194 Temperature_Celsius     0x0002   158   158   000    Old_age   Always       -       41 (Min/Max 25/44)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%      8653         -
Short offline       Completed without error       00%      9096         -





########## SMART status report for ada4 drive (Western Digital Red: 7SGD9UZC) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   129   129   054    Old_age   Offline      -       112
  3 Spin_Up_Time            0x0007   198   198   024    Pre-fail  Always       -       271 (Average 394)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       31
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       9141
10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       31
22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       404
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       404
194 Temperature_Celsius     0x0002   158   158   000    Old_age   Always       -       41 (Min/Max 25/43)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%      8651         -
Short offline       Completed without error       00%      9096         -





########## SMART status report for ada5 drive (Western Digital Red: 7SGEG9GC) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   130   130   054    Old_age   Offline      -       108
  3 Spin_Up_Time            0x0007   195   195   024    Pre-fail  Always       -       275 (Average 399)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       31
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       9141
10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       31
22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       404
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       404
194 Temperature_Celsius     0x0002   166   166   000    Old_age   Always       -       39 (Min/Max 25/43)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%      8651         -
Short offline       Completed without error       00%      9096         -
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
That drive is running for 9141 hours (380 days).
Error 31 occurred at disk power-on lifetime: 62 hours (2 days + 14 hours)
or about one year ago.

I didn't see any flags on the test results. Considering the time frame above, when did you notice it for the first time?
 

momobozo

Cadet
Joined
Aug 16, 2017
Messages
8
That drive is running for 9141 hours (380 days).

or about one year ago.

I didn't see any flags on the test results. Considering the time frame above, when did you notice it for the first time?
That’s what I thought. I didn’t notice it until a few days ago, but when I looked at my old logs I notice it there. Guess I just never paid close attention before. Is the error of any significance or should I ignore it?
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
I did all of this when I first got the drives. I ran them in the same order that they’re shown in that thread. Could that error be from the burn in test?

EDIT: when I say burn in test, I mean the badblocks portion

Your last "error" occurred at day 31. If you did the above mentioned tests, that was on day 1. The resource I've linked has the main fields you should be looking at to identify failing disks at the very end of it. Keep your eyes on them:

Code:
Some of the more important fields right now include the Reallocated_Sector_Ct, Current_Pending_Sector, and Offline_Uncorrectable lines. All of these should have a RAW_VALUE of 0. I'm not sure why the VALUE field is listed as 200, but as long as the RAW_VALUE for each of these fields is 0, that means there are currently no bad sectors. Any result greater than 0 on a new drive should be cause for an immediate RMA.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Out of curiosity, what was your failure message? How did FreeNAS communicate the error message? Or was this just something you did, looking at the output and assume there was an error in progress? And what model drive is it? I see it has a value to Hellium so it's special in my book.

To break down what I see: Your drive ada0 had reported an error only 62 hours into the drives life as @melloa said, and that may have been during your burn-in testing, or just some random SMART test going on and a READ operation was attempted at LBA 0 but failed. That was over a year ago.

Do you likely have a problem??? Not likely.
 
Top