Failed SMART usage Attribute:

Status
Not open for further replies.

James Harris

Dabbler
Joined
Mar 19, 2017
Messages
11
Just built new system, v11, 9211-i8 with 8x2tb HDD and the disks are sitting there not attached to a volume yet and freenas is flashing critical.

Code:
CRITICAL:						July 24, 2017, 9:47 p.m. - Device: /dev/da1 [SAT], Failed SMART usage Attribute: 184 End-to-End_Error.
			CRITICAL:						July 24, 2017, 9:47 p.m. - Device: /dev/da3 [SAT], Failed SMART usage Attribute: 184 End-to-End_Error.


I have swapped disks from another bay, swapped the disks for new, even swapped the 2 sas leads in the LSI card and freenas is still flashing critical on da1 and da3, a shutdown before each change.

This seems like a freenas issue to me and not a disk problem. What can I do to resolve?
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Please paste the full output of smartctl -a /dev/da1 and smartctl -a /dev/da3 using the [ code ] tags.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
End-to-end errors? Those are typically a very bad sign.
 

James Harris

Dabbler
Joined
Mar 19, 2017
Messages
11
Output as requested...

Code:
root@thesun:~ # smartctl -a /dev/da3
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Seagate Barracuda 7200.14 (AF)
Device Model:  ST2000DM001-9YN164
Serial Number:  S2401WM9
LU WWN Device Id: 5 000c50 0469cf904
Firmware Version: CC46
User Capacity:  2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  7200 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Tue Jul 25 19:22:34 2017 AEST

==> WARNING: A firmware update for this drive is available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
  was completed without error.
  Auto Offline Data Collection: Enabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (  592) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  1) minutes.
Extended self-test routine
recommended polling time:  ( 270) minutes.
Conveyance self-test routine
recommended polling time:  (  2) minutes.
SCT capabilities:  (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x000f  118  090  006  Pre-fail  Always  -  182608984
  3 Spin_Up_Time  0x0003  093  092  000  Pre-fail  Always  -  0
  4 Start_Stop_Count  0x0032  100  100  020  Old_age  Always  -  142
  5 Reallocated_Sector_Ct  0x0033  100  100  036  Pre-fail  Always  -  1144
  7 Seek_Error_Rate  0x000f  080  060  030  Pre-fail  Always  -  109298114
  9 Power_On_Hours  0x0032  095  095  000  Old_age  Always  -  4530
 10 Spin_Retry_Count  0x0013  100  100  097  Pre-fail  Always  -  0
 12 Power_Cycle_Count  0x0032  100  100  020  Old_age  Always  -  118
183 Runtime_Bad_Block  0x0032  100  100  000  Old_age  Always  -  0
184 End-to-End_Error  0x0032  001  001  099  Old_age  Always  FAILING_NOW 420
187 Reported_Uncorrect  0x0032  001  001  000  Old_age  Always  -  1990
188 Command_Timeout  0x0032  100  100  000  Old_age  Always  -  0 0 6
189 High_Fly_Writes  0x003a  100  100  000  Old_age  Always  -  0
190 Airflow_Temperature_Cel 0x0022  070  049  045  Old_age  Always  -  30 (Min/Max 20/30)
191 G-Sense_Error_Rate  0x0032  100  100  000  Old_age  Always  -  0
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -  103
193 Load_Cycle_Count  0x0032  099  099  000  Old_age  Always  -  2001
194 Temperature_Celsius  0x0022  030  051  000  Old_age  Always  -  30 (0 15 0 0 0)
197 Current_Pending_Sector  0x0012  100  001  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0010  100  001  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x003e  200  200  000  Old_age  Always  -  0
240 Head_Flying_Hours  0x0000  100  253  000  Old_age  Offline  -  1228h+38m+28.201s
241 Total_LBAs_Written  0x0000  100  253  000  Old_age  Offline  -  5809371864716
242 Total_LBAs_Read  0x0000  100  253  000  Old_age  Offline  -  37994802063524

SMART Error Log Version: 1
ATA Error Count: 1291 (device log contains only the most recent five errors)
  CR = Command Register [HEX]
  FR = Features Register [HEX]
  SC = Sector Count Register [HEX]
  SN = Sector Number Register [HEX]
  CL = Cylinder Low Register [HEX]
  CH = Cylinder High Register [HEX]
  DH = Device/Head Register [HEX]
  DC = Device Command Register [HEX]
  ER = Error register [HEX]
  ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1291 occurred at disk power-on lifetime: 4511 hours (187 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 68 05 b7 05  Error: UNC at LBA = 0x05b70568 = 95880552

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 80 ff 04 b7 45 00  02:18:51.533  READ DMA EXT
  25 03 80 7f 04 b7 45 00  02:18:51.531  READ DMA EXT
  25 03 80 ff 03 b7 45 00  02:18:51.529  READ DMA EXT
  25 03 80 7f 03 b7 45 00  02:18:51.527  READ DMA EXT
  25 03 80 ff 02 b7 45 00  02:18:51.525  READ DMA EXT

Error 1290 occurred at disk power-on lifetime: 4511 hours (187 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 68 05 b7 05  Error: UNC at LBA = 0x05b70568 = 95880552

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 80 ff 04 b7 45 00  02:18:51.533  READ DMA EXT
  25 03 80 7f 04 b7 45 00  02:18:51.531  READ DMA EXT
  25 03 80 ff 03 b7 45 00  02:18:51.529  READ DMA EXT
  25 03 80 7f 03 b7 45 00  02:18:51.527  READ DMA EXT
  25 03 80 ff 02 b7 45 00  02:18:51.525  READ DMA EXT

Error 1289 occurred at disk power-on lifetime: 4509 hours (187 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 68 05 b7 05  Error: UNC at LBA = 0x05b70568 = 95880552

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 80 5f 05 b7 45 00  09:41:49.556  READ DMA EXT
  25 03 60 7f f8 b6 45 00  09:41:49.554  READ DMA EXT
  25 03 80 ff f7 b6 45 00  09:41:49.552  READ DMA EXT
  25 03 80 7f f7 b6 45 00  09:41:49.550  READ DMA EXT
  25 03 80 ff f6 b6 45 00  09:41:49.548  READ DMA EXT

Error 1288 occurred at disk power-on lifetime: 4509 hours (187 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 68 05 b7 05  Error: UNC at LBA = 0x05b70568 = 95880552

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 80 5f 05 b7 45 00  09:41:49.556  READ DMA EXT
  25 03 60 7f f8 b6 45 00  09:41:49.554  READ DMA EXT
  25 03 80 ff f7 b6 45 00  09:41:49.552  READ DMA EXT
  25 03 80 7f f7 b6 45 00  09:41:49.550  READ DMA EXT
  25 03 80 ff f6 b6 45 00  09:41:49.548  READ DMA EXT

Error 1287 occurred at disk power-on lifetime: 4487 hours (186 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 68 05 b7 05  Error: UNC at LBA = 0x05b70568 = 95880552

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 80 5f 05 b7 45 00  2d+14:41:40.512  READ DMA EXT
  25 03 80 7f f2 b6 45 00  2d+14:41:40.485  READ DMA EXT
  25 03 80 ff f1 b6 45 00  2d+14:41:40.461  READ DMA EXT
  25 03 80 7f f1 b6 45 00  2d+14:41:40.458  READ DMA EXT
  25 03 80 ff f0 b6 45 00  2d+14:41:40.455  READ DMA EXT

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



Code:
 smartctl -a /dev/da1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Seagate Barracuda 7200.14 (AF)
Device Model:  ST2000DM001-9YN164
Serial Number:  S2401KKV
LU WWN Device Id: 5 000c50 0469c5649
Firmware Version: CC46
User Capacity:  2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  7200 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Tue Jul 25 19:20:45 2017 AEST

==> WARNING: A firmware update for this drive is available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
  was completed without error.
  Auto Offline Data Collection: Enabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (  600) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  1) minutes.
Extended self-test routine
recommended polling time:  ( 264) minutes.
Conveyance self-test routine
recommended polling time:  (  2) minutes.
SCT capabilities:  (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x000f  114  089  006  Pre-fail  Always  -  64283592
  3 Spin_Up_Time  0x0003  093  092  000  Pre-fail  Always  -  0
  4 Start_Stop_Count  0x0032  100  100  020  Old_age  Always  -  149
  5 Reallocated_Sector_Ct  0x0033  099  099  036  Pre-fail  Always  -  2016
  7 Seek_Error_Rate  0x000f  080  060  030  Pre-fail  Always  -  4400196893
  9 Power_On_Hours  0x0032  096  096  000  Old_age  Always  -  4217
 10 Spin_Retry_Count  0x0013  100  100  097  Pre-fail  Always  -  0
 12 Power_Cycle_Count  0x0032  100  100  020  Old_age  Always  -  126
183 Runtime_Bad_Block  0x0032  100  100  000  Old_age  Always  -  0
184 End-to-End_Error  0x0032  001  001  099  Old_age  Always  FAILING_NOW 251
187 Reported_Uncorrect  0x0032  001  001  000  Old_age  Always  -  467
188 Command_Timeout  0x0032  100  100  000  Old_age  Always  -  2 2 2
189 High_Fly_Writes  0x003a  100  100  000  Old_age  Always  -  0
190 Airflow_Temperature_Cel 0x0022  070  049  045  Old_age  Always  -  30 (Min/Max 20/30)
191 G-Sense_Error_Rate  0x0032  100  100  000  Old_age  Always  -  0
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -  109
193 Load_Cycle_Count  0x0032  099  099  000  Old_age  Always  -  3250
194 Temperature_Celsius  0x0022  030  051  000  Old_age  Always  -  30 (0 15 0 0 0)
197 Current_Pending_Sector  0x0012  100  001  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0010  100  001  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x003e  200  200  000  Old_age  Always  -  0
240 Head_Flying_Hours  0x0000  100  253  000  Old_age  Offline  -  1259h+00m+59.843s
241 Total_LBAs_Written  0x0000  100  253  000  Old_age  Offline  -  5821904640220
242 Total_LBAs_Read  0x0000  100  253  000  Old_age  Offline  -  19867167320892

SMART Error Log Version: 1
ATA Error Count: 42 (device log contains only the most recent five errors)
  CR = Command Register [HEX]
  FR = Features Register [HEX]
  SC = Sector Count Register [HEX]
  SN = Sector Number Register [HEX]
  CL = Cylinder Low Register [HEX]
  CH = Cylinder High Register [HEX]
  DH = Device/Head Register [HEX]
  DC = Device Command Register [HEX]
  ER = Error register [HEX]
  ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 42 occurred at disk power-on lifetime: 2055 hours (85 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 70 d2 a6 03  Error: UNC at LBA = 0x03a6d270 = 61264496

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 6f d2 a6 43 00  3d+19:23:31.516  READ DMA EXT
  25 03 08 67 d2 a6 43 00  3d+19:23:30.683  READ DMA EXT
  25 03 08 5f d2 a6 43 00  3d+19:23:28.383  READ DMA EXT
  25 03 08 57 d2 a6 43 00  3d+19:23:25.482  READ DMA EXT
  25 03 08 4f d2 a6 43 00  3d+19:23:22.632  READ DMA EXT

Error 41 occurred at disk power-on lifetime: 2055 hours (85 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 70 d2 a6 03  Error: UNC at LBA = 0x03a6d270 = 61264496

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 6f d2 a6 43 00  3d+19:23:31.516  READ DMA EXT
  25 03 08 67 d2 a6 43 00  3d+19:23:30.683  READ DMA EXT
  25 03 08 5f d2 a6 43 00  3d+19:23:28.383  READ DMA EXT
  25 03 08 57 d2 a6 43 00  3d+19:23:25.482  READ DMA EXT
  25 03 08 4f d2 a6 43 00  3d+19:23:22.632  READ DMA EXT

Error 40 occurred at disk power-on lifetime: 2055 hours (85 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 57 d2 a6 03  Error: UNC at LBA = 0x03a6d257 = 61264471

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 57 d2 a6 43 00  3d+19:23:25.482  READ DMA EXT
  25 03 08 4f d2 a6 43 00  3d+19:23:22.632  READ DMA EXT
  25 03 08 47 d2 a6 43 00  3d+19:23:19.700  READ DMA EXT
  25 03 08 3f d2 a6 43 00  3d+19:23:16.940  READ DMA EXT
  25 03 08 37 d2 a6 43 00  3d+19:23:12.005  READ DMA EXT

Error 39 occurred at disk power-on lifetime: 2055 hours (85 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 4f d2 a6 03  Error: UNC at LBA = 0x03a6d24f = 61264463

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 4f d2 a6 43 00  3d+19:23:22.632  READ DMA EXT
  25 03 08 47 d2 a6 43 00  3d+19:23:19.700  READ DMA EXT
  25 03 08 3f d2 a6 43 00  3d+19:23:16.940  READ DMA EXT
  25 03 08 37 d2 a6 43 00  3d+19:23:12.005  READ DMA EXT
  25 03 08 2f d2 a6 43 00  3d+19:23:09.130  READ DMA EXT

Error 38 occurred at disk power-on lifetime: 2055 hours (85 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 47 d2 a6 03  Error: UNC at LBA = 0x03a6d247 = 61264455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 47 d2 a6 43 00  3d+19:23:19.700  READ DMA EXT
  25 03 08 3f d2 a6 43 00  3d+19:23:16.940  READ DMA EXT
  25 03 08 37 d2 a6 43 00  3d+19:23:12.005  READ DMA EXT
  25 03 08 2f d2 a6 43 00  3d+19:23:09.130  READ DMA EXT
  25 03 08 27 d2 a6 43 00  3d+19:23:06.222  READ DMA EXT

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

droeders

Contributor
Joined
Mar 21, 2016
Messages
179
Your drives are not in good shape.

Besides the End-to-End_Error counts that @Ericloewe mentioned, you have several thousand reallocated sectors between the drives. Not to mention that no SMART tests have ever been run on these drives.

I would replace both drives immediately.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I would replace both drives immediately.
Concur. And set up regular SMART tests on all your drives (and you'll need to update that test schedule when you replace a drive).
 
Status
Not open for further replies.
Top