SOLVED Help understanding critical alerts and SMART test output

Status
Not open for further replies.

VictorR

Contributor
Joined
Dec 9, 2015
Messages
143
FreeNAS-11.1-U4

45 Drives Q30
SuperMicro X10DRL
2x E52620 v3 CPU
2x 120GB SSD Boot Drive
256GB RAM
2x LSI 9201 HBA
30x WD Re 4TB drives
3x X540T2BLK Intel X540 DA2

Our 3 year old, 120TB/30-drive(RAID Z2), NAS recently started showing critical alerts for multiple drives. We've had single drives(5) develop 8-10 bad sectors over the years, and replaced them under warranty. These drives are getting long in the tooth, so this is no surprise. I'm just wondering if any of these errors are worthy of concern?

CRITICAL: April 18, 2018, 4:45 p.m. - Device: /dev/da11 [SAT], 2 Currently unreadable (pending) sectors
CRITICAL: April 18, 2018, 4:45 p.m. - Device: /dev/da11 [SAT], 1 Offline uncorrectable sectors
CRITICAL: April 19, 2018, 4:55 p.m. - Device: /dev/da11 [SAT], Self-Test Log error count increased from 0 to 1
CRITICAL: April 20, 2018, 3:54 p.m. - Device: /dev/da11 [SAT], Self-Test Log error count increased from 1 to 2
CRITICAL: April 18, 2018, 4:45 p.m. - Device: /dev/da7 [SAT], 1 Currently unreadable (pending) sectors
CRITICAL: April 18, 2018, 4:45 p.m. - Device: /dev/da7 [SAT], 1 Offline uncorrectable sectors
CRITICAL: April 20, 2018, 6:56 p.m. - Device: /dev/da7 [SAT], Self-Test Log error count increased from 0 to 1
CRITICAL: April 18, 2018, 4:45 p.m. - Device: /dev/da27 [SAT], 1 Currently unreadable (pending) sectors

I have short and long SMART tests set to once a week, and bi-weekly, respectively. The most recent "freenas.local daily run output" returned:
Checking status of gmirror(8) devices:
Name Status Components
mirror/swap0 COMPLETE da29p1 (ACTIVE)
da28p1 (ACTIVE)
mirror/swap1 DEGRADED da26p1 (ACTIVE)
mirror/swap2 COMPLETE da25p1 (ACTIVE)
da24p1 (ACTIVE)
mirror/swap3 COMPLETE da23p1 (ACTIVE)
da22p1 (ACTIVE)
mirror/swap4 COMPLETE da21p1 (ACTIVE)
da20p1 (ACTIVE)

-- End of daily output --

Scrub runs with no errors or repairs.

My concern is about the "DEGRADED da26p1" drive. Although, its SMART test finds no errors. While the other drives do

Code:
[root@freenas ~]# smartctl -a /dev/da26

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)															 
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org														 
																																   
=== START OF INFORMATION SECTION ===																								
Model Family:	 Western Digital Re																								
Device Model:	 WDC WD4000FYYZ-01UL1B2																							
Serial Number:	WD-WCC134xxxxx																								   
LU WWN Device Id: 5 0014ee 2b75a8f52																								
Firmware Version: 01.01K03																										 
User Capacity:	4,000,787,030,016 bytes [4.00 TB]																				 
Sector Size:	  512 bytes logical/physical																						
Rotation Rate:	7200 rpm																										 
Device is:		In smartctl database [for details use: -P show]																   
ATA Version is:   ATA8-ACS (minor revision not indicated)																		   
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)																			
Local Time is:	Mon Apr 23 21:09:55 2018 PDT																					 
SMART support is: Available - device has SMART capability.																		 
SMART support is: Enabled																										   
																																   
=== START OF READ SMART DATA SECTION ===																							
SMART overall-health self-assessment test result: PASSED																			
																																   
General SMART Values:																											   
Offline data collection status:  (0x82) Offline data collection activity															
									   was completed without error.																
									   Auto Offline Data Collection: Enabled.													 
Self-test execution status:	  (   0) The previous self-test routine completed													
									   without error or no self-test has ever													 
									   been run.																				   
Total time to complete Offline																									 
data collection:				(48240) seconds.																					
Offline data collection																											 
capabilities:					(0x7b) SMART execute Offline immediate.															
									   Auto Offline data collection on/off support.												
									   Suspend Offline collection upon new														 
									   command.																					
									   Offline surface scan supported.															 
									   Self-test supported.																		
									   Conveyance Self-test supported.															 
									   Selective Self-test supported.															 
SMART capabilities:			(0x0003) Saves SMART data before entering															
									   power-saving mode.																		 
									   Supports SMART auto save timer.															 
Error logging capability:		(0x01) Error logging supported.																	
									   General Purpose Logging supported.														 
Short self-test routine																											 
recommended polling time:		(   2) minutes.																					
Extended self-test routine																										 
recommended polling time:		( 521) minutes.
Conveyance self-test routine																										
recommended polling time:		(   5) minutes.																					
SCT capabilities:			  (0x70bd) SCT Status supported.																	   
									   SCT Error Recovery Control supported.													   
									   SCT Feature Control supported.															 
									   SCT Data Table supported.																   
																																   
SMART Attributes Data Structure revision number: 16																				 
Vendor Specific SMART Attributes with Thresholds:																				   
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE									
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0											
  3 Spin_Up_Time			0x0027   211   158   021	Pre-fail  Always	   -	   8425										 
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   107										 
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0											
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0											
  9 Power_On_Hours		  0x0032   074   074   000	Old_age   Always	   -	   19030										
 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0											
 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0											
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   107										 
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0											
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   105										 
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   1											
194 Temperature_Celsius	 0x0022   117   107   000	Old_age   Always	   -	   35										   
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0											
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0											
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -	   0											
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0											
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0											
																																   
SMART Error Log Version: 1																										 
No Errors Logged																													
																																   
SMART Self-test log structure revision number 1																					 
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error									 
# 1  Short offline	   Completed without error	   00%	 18960		 -													 
# 2  Short offline	   Completed without error	   00%	 18959		 -													 
# 3  Short offline	   Completed without error	   00%	 18958		 -													 
# 4  Short offline	   Completed without error	   00%	 18957		 -													 
# 5  Short offline	   Completed without error	   00%	 18956		 -													 
# 6  Short offline	   Completed without error	   00%	 18955		 -													 
# 7  Short offline	   Completed without error	   00%	 18954		 -													 
# 8  Short offline	   Completed without error	   00%	 18953		 -													 
# 9  Short offline	   Completed without error	   00%	 18952		 -													 
#10  Short offline	   Completed without error	   00%	 18951		 -													 
#11  Short offline	   Completed without error	   00%	 18950		 -													 
#12  Extended offline	Completed without error	   00%	 18937		 -													 
#13  Short offline	   Completed without error	   00%	 18928		 -													 
#14  Extended offline	Completed without error	   00%	 18918		 -													 
#15  Short offline	   Completed without error	   00%	 18909		 -	 
#16  Short offline	   Completed without error	   00%	 14426		 -													 
#17  Short offline	   Completed without error	   00%	 14425		 -													 
#18  Short offline	   Completed without error	   00%	 14424		 -													 
#19  Short offline	   Completed without error	   00%	 14423		 -													 
#20  Short offline	   Completed without error	   00%	 14422		 -													 
#21  Short offline	   Completed without error	   00%	 14421		 -													 
																																   
SMART Selective self-test log data structure revision number 1																	 
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS																						
   1		0		0  Not_testing																								
   2		0		0  Not_testing																								
   3		0		0  Not_testing																								
   4		0		0  Not_testing																								
   5		0		0  Not_testing																								
Selective self-test flags (0x0):																									
  After scanning selected spans, do NOT read-scan remainder of disk.																
If Selective self-test is pending on power-up, resume after 0 minute delay. 


Code:
 
[root@freenas ~]# smartctl -a /dev/da11

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)															 
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org														 
																																   
=== START OF INFORMATION SECTION ===																								
Model Family:	 Western Digital Re																								
Device Model:	 WDC WD4000FYYZ-01UL1B2																							
Serial Number:	WD-WCC130ADRTTJ																								   
LU WWN Device Id: 5 0014ee 20cb0a76e																								
Firmware Version: 01.01K03																										 
User Capacity:	4,000,787,030,016 bytes [4.00 TB]																				 
Sector Size:	  512 bytes logical/physical																						
Rotation Rate:	7200 rpm																										 
Device is:		In smartctl database [for details use: -P show]																   
ATA Version is:   ATA8-ACS (minor revision not indicated)																		   
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)																			
Local Time is:	Mon Apr 23 20:54:51 2018 PDT																					 
SMART support is: Available - device has SMART capability.																		 
SMART support is: Enabled																										   
																																   
=== START OF READ SMART DATA SECTION ===																							
SMART overall-health self-assessment test result: PASSED																			
																																   
General SMART Values:																											   
Offline data collection status:  (0x82) Offline data collection activity															
									   was completed without error.																
									   Auto Offline Data Collection: Enabled.													 
Self-test execution status:	  (   0) The previous self-test routine completed													
									   without error or no self-test has ever													 
									   been run.																				   
Total time to complete Offline																									 
data collection:				(47100) seconds.																					
Offline data collection																											 
capabilities:					(0x7b) SMART execute Offline immediate.															
									   Auto Offline data collection on/off support.												
									   Suspend Offline collection upon new														 
									   command.																					
									   Offline surface scan supported.															 
									   Self-test supported.																		
									   Conveyance Self-test supported.															 
									   Selective Self-test supported.															 
SMART capabilities:			(0x0003) Saves SMART data before entering															
									   power-saving mode.																		 
									   Supports SMART auto save timer.															 
Error logging capability:		(0x01) Error logging supported.																	
									   General Purpose Logging supported.														 
Short self-test routine																											 
recommended polling time:		(   2) minutes.																					
Extended self-test routine																										 
recommended polling time:		( 509) minutes.
Conveyance self-test routine																										
recommended polling time:		(   5) minutes.																					
SCT capabilities:			  (0x70bd) SCT Status supported.																	   
									   SCT Error Recovery Control supported.													   
									   SCT Feature Control supported.															 
									   SCT Data Table supported.																   
																																   
SMART Attributes Data Structure revision number: 16																				 
Vendor Specific SMART Attributes with Thresholds:																				   
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE									
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0											
  3 Spin_Up_Time			0x0027   188   159   021	Pre-fail  Always	   -	   9566										 
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   101										 
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0											
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0											
  9 Power_On_Hours		  0x0032   074   074   000	Old_age   Always	   -	   19028										
 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0											
 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0											
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   101										 
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0											
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   99										   
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   1											
194 Temperature_Celsius	 0x0022   122   111   000	Old_age   Always	   -	   30										   
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0											
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   2											
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -	   1											
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0											
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   8											
																																   
SMART Error Log Version: 1																										 
No Errors Logged																													
																																   
SMART Self-test log structure revision number 1																					 
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error									 
# 1  Short offline	   Completed without error	   00%	 18959		 -													 
# 2  Short offline	   Completed without error	   00%	 18958		 -													 
# 3  Short offline	   Completed without error	   00%	 18957		 -													 
# 4  Short offline	   Completed without error	   00%	 18956		 -													 
# 5  Short offline	   Completed without error	   00%	 18955		 -													 
# 6  Short offline	   Completed without error	   00%	 18954		 -													 
# 7  Short offline	   Completed without error	   00%	 18953		 -													 
# 8  Short offline	   Completed without error	   00%	 18952		 -													 
# 9  Short offline	   Completed: read failure	   90%	 18951		 28449989											   
#10  Short offline	   Completed without error	   00%	 18950		 -													 
#11  Short offline	   Completed without error	   00%	 18949		 -													 
#12  Extended offline	Completed: read failure	   90%	 18928		 28357609											   
#13  Short offline	   Completed without error	   00%	 18927		 -													 
#14  Extended offline	Completed without error	   00%	 18917		 -													 
#15  Short offline	   Completed without error	   00%	 18908		 -	 
#16  Short offline	   Completed without error	   00%	 17393		 -													 
#17  Short offline	   Completed without error	   00%	 17392		 -													 
#18  Short offline	   Completed without error	   00%	 17391		 -													 
#19  Short offline	   Completed without error	   00%	 17353		 -													 
#20  Short offline	   Completed without error	   00%	 17352		 -													 
#21  Short offline	   Completed without error	   00%	 17351		 -													 
																																   
SMART Selective self-test log data structure revision number 1																	 
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS																						
   1		0		0  Not_testing																								
   2		0		0  Not_testing																								
   3		0		0  Not_testing																								
   4		0		0  Not_testing																								
   5		0		0  Not_testing																								
Selective self-test flags (0x0):																									
  After scanning selected spans, do NOT read-scan remainder of disk.																
If Selective self-test is pending on power-up, resume after 0 minute delay.		 
 

VictorR

Contributor
Joined
Dec 9, 2015
Messages
143
Code:
[root@freenas ~]# smartctl -a /dev/da7

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)															 
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org														 
																																   
=== START OF INFORMATION SECTION ===																								
Model Family:	 Western Digital Re																								
Device Model:	 WDC WD4000FYYZ-01UL1B2																							
Serial Number:	WD-WCC130ADRJP6																								   
LU WWN Device Id: 5 0014ee 2b75b755f																								
Firmware Version: 01.01K03																										 
User Capacity:	4,000,787,030,016 bytes [4.00 TB]																				 
Sector Size:	  512 bytes logical/physical																						
Rotation Rate:	7200 rpm																										 
Device is:		In smartctl database [for details use: -P show]																   
ATA Version is:   ATA8-ACS (minor revision not indicated)																		   
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)																			
Local Time is:	Mon Apr 23 21:01:05 2018 PDT																					 
SMART support is: Available - device has SMART capability.																		 
SMART support is: Enabled																										   
																																   
=== START OF READ SMART DATA SECTION ===																							
SMART overall-health self-assessment test result: PASSED																			
																																   
General SMART Values:																											   
Offline data collection status:  (0x82) Offline data collection activity															
									   was completed without error.																
									   Auto Offline Data Collection: Enabled.													 
Self-test execution status:	  (   0) The previous self-test routine completed													
									   without error or no self-test has ever													 
									   been run.																				   
Total time to complete Offline																									 
data collection:				(47580) seconds.																					
Offline data collection																											 
capabilities:					(0x7b) SMART execute Offline immediate.															
									   Auto Offline data collection on/off support.												
									   Suspend Offline collection upon new														 
									   command.																					
									   Offline surface scan supported.															 
									   Self-test supported.																		
									   Conveyance Self-test supported.															 
									   Selective Self-test supported.															 
SMART capabilities:			(0x0003) Saves SMART data before entering															
									   power-saving mode.																		 
									   Supports SMART auto save timer.															 
Error logging capability:		(0x01) Error logging supported.																	
									   General Purpose Logging supported.														 
Short self-test routine																											 
recommended polling time:		(   2) minutes.																					
Extended self-test routine																										 
recommended polling time:		( 514) minutes.				 
Conveyance self-test routine																										
recommended polling time:		(   5) minutes.																					
SCT capabilities:			  (0x70bd) SCT Status supported.																	   
									   SCT Error Recovery Control supported.													   
									   SCT Feature Control supported.															 
									   SCT Data Table supported.																   
																																   
SMART Attributes Data Structure revision number: 16																				 
Vendor Specific SMART Attributes with Thresholds:																				   
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE									
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0											
  3 Spin_Up_Time			0x0027   185   158   021	Pre-fail  Always	   -	   9716										 
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   101										 
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0											
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0											
  9 Power_On_Hours		  0x0032   074   074   000	Old_age   Always	   -	   19030										
 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0											
 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0											
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   101										 
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0											
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   99										   
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   1											
194 Temperature_Celsius	 0x0022   122   112   000	Old_age   Always	   -	   30										   
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0											
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   1											
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -	   1											
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0											
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   19										   
																																   
SMART Error Log Version: 1																										 
No Errors Logged																													
																																   
SMART Self-test log structure revision number 1																					 
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error									 
# 1  Short offline	   Completed without error	   00%	 18960		 -													 
# 2  Short offline	   Completed without error	   00%	 18959		 -													 
# 3  Short offline	   Completed without error	   00%	 18958		 -													 
# 4  Short offline	   Completed without error	   00%	 18957		 -													 
# 5  Short offline	   Completed without error	   00%	 18956		 -													 
# 6  Short offline	   Completed: read failure	   90%	 18955		 28056577											   
# 7  Short offline	   Completed without error	   00%	 18954		 -													 
# 8  Short offline	   Completed without error	   00%	 18953		 -													 
# 9  Short offline	   Completed without error	   00%	 18952		 -													 
#10  Short offline	   Completed without error	   00%	 18951		 -													 
#11  Short offline	   Completed without error	   00%	 18951		 -													 
#12  Extended offline	Completed without error	   00%	 18937		 -													 
#13  Short offline	   Completed without error	   00%	 18928		 -													 
#14  Extended offline	Completed without error	   00%	 18918		 -													 
#15  Short offline	   Completed without error	   00%	 18909		 -	
#16  Short offline	   Completed without error	   00%	 17395		 -													 
#17  Short offline	   Completed without error	   00%	 17394		 -													 
#18  Short offline	   Completed without error	   00%	 17393		 -													 
#19  Short offline	   Completed without error	   00%	 17354		 -													 
#20  Short offline	   Completed without error	   00%	 17353		 -													 
#21  Short offline	   Completed without error	   00%	 17352		 -													 
																																   
SMART Selective self-test log data structure revision number 1																	 
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS																						
   1		0		0  Not_testing																								
   2		0		0  Not_testing																								
   3		0		0  Not_testing																								
   4		0		0  Not_testing																								
   5		0		0  Not_testing																								
Selective self-test flags (0x0):																									
  After scanning selected spans, do NOT read-scan remainder of disk.																
If Selective self-test is pending on power-up, resume after 0 minute delay.	 


Code:
[root@freenas ~]# smartctl -a /dev/da27

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)															 
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org														 
																																   
=== START OF INFORMATION SECTION ===																								
Model Family:	 Western Digital Re																								
Device Model:	 WDC WD4000FYYZ-01UL1B2																							
Serial Number:	WD-WCC130HXDKTE																								   
LU WWN Device Id: 5 0014ee 20ce756f5																								
Firmware Version: 01.01K03																										 
User Capacity:	4,000,787,030,016 bytes [4.00 TB]																				 
Sector Size:	  512 bytes logical/physical																						
Rotation Rate:	7200 rpm																										 
Device is:		In smartctl database [for details use: -P show]																   
ATA Version is:   ATA8-ACS (minor revision not indicated)																		   
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)																			
Local Time is:	Mon Apr 23 21:04:24 2018 PDT																					 
SMART support is: Available - device has SMART capability.																		 
SMART support is: Enabled																										   
																																   
=== START OF READ SMART DATA SECTION ===																							
SMART overall-health self-assessment test result: PASSED																			
																																   
General SMART Values:																											   
Offline data collection status:  (0x82) Offline data collection activity															
									   was completed without error.																
									   Auto Offline Data Collection: Enabled.													 
Self-test execution status:	  (   0) The previous self-test routine completed													
									   without error or no self-test has ever													 
									   been run.																				   
Total time to complete Offline																									 
data collection:				(48240) seconds.																					
Offline data collection																											 
capabilities:					(0x7b) SMART execute Offline immediate.															
									   Auto Offline data collection on/off support.												
									   Suspend Offline collection upon new														 
									   command.																					
									   Offline surface scan supported.															 
									   Self-test supported.																		
									   Conveyance Self-test supported.															 
									   Selective Self-test supported.															 
SMART capabilities:			(0x0003) Saves SMART data before entering															
									   power-saving mode.																		 
									   Supports SMART auto save timer.															 
Error logging capability:		(0x01) Error logging supported.																	
									   General Purpose Logging supported.														 
Short self-test routine																											 
recommended polling time:		(   2) minutes.																					
Extended self-test routine																										 
recommended polling time:		( 521) minutes.		 
Conveyance self-test routine																										
recommended polling time:		(   5) minutes.																					
SCT capabilities:			  (0x70bd) SCT Status supported.																	   
									   SCT Error Recovery Control supported.													   
									   SCT Feature Control supported.															 
									   SCT Data Table supported.																   
																																   
SMART Attributes Data Structure revision number: 16																				 
Vendor Specific SMART Attributes with Thresholds:																				   
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE									
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0											
  3 Spin_Up_Time			0x0027   181   158   021	Pre-fail  Always	   -	   9933										 
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   49										   
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0											
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0											
  9 Power_On_Hours		  0x0032   092   092   000	Old_age   Always	   -	   6429										 
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0											
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0											
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   49										   
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0											
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   47										   
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   1											
194 Temperature_Celsius	 0x0022   118   111   000	Old_age   Always	   -	   34										   
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0											
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   1											
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -	   0											
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0											
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0											
																																   
SMART Error Log Version: 1																										 
No Errors Logged																													
																																   
SMART Self-test log structure revision number 1																					 
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error									 
# 1  Short offline	   Completed without error	   00%	  6359		 -													 
# 2  Short offline	   Completed without error	   00%	  6358		 -													 
# 3  Short offline	   Completed without error	   00%	  6357		 -													 
# 4  Short offline	   Completed without error	   00%	  6356		 -													 
# 5  Short offline	   Completed without error	   00%	  6355		 -													 
# 6  Short offline	   Completed without error	   00%	  6354		 -													 
# 7  Short offline	   Completed without error	   00%	  6353		 -													 
# 8  Short offline	   Completed without error	   00%	  6352		 -													 
# 9  Short offline	   Completed without error	   00%	  6351		 -													 
#10  Short offline	   Completed without error	   00%	  6350		 -													 
#11  Short offline	   Completed without error	   00%	  6350		 -													 
#12  Extended offline	Completed without error	   00%	  6337		 -													 
#13  Short offline	   Completed without error	   00%	  6327		 -													 
#14  Extended offline	Completed without error	   00%	  6317		 -													 
#15  Short offline	   Completed without error	   00%	  6308		 -	
SMART Selective self-test log data structure revision number 1																	 
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS																						
   1		0		0  Not_testing																								
   2		0		0  Not_testing																								
   3		0		0  Not_testing																								
   4		0		0  Not_testing																								
   5		0		0  Not_testing																								
Selective self-test flags (0x0):																									
  After scanning selected spans, do NOT read-scan remainder of disk.																
If Selective self-test is pending on power-up, resume after 0 minute delay.	   
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The drives with errors need to be replaced.
Three years is not, "long in the tooth," for those drives. They should last 5 years easily. I have a server at work with 60 of them. They do have a higher failure rate than the 6TB Red Pro drives in one of the other servers. Out of the 60 x 4TB drives, I have had to replace 9; where the 60 x 6TB drive server has only lost 4 drives.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Ideally, you want to replace a faulty drive before you get data errors that the scrub would need to try and fix. Your goal is to never have a scrub show an error.
You run SMART testing to find the bad drive that might cause an error, so you can replace it before the damage is done.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The gmirror error will go away with a reboot. That's the swap space and it is destroyed at shutdown and created during boot up.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 
Joined
May 10, 2017
Messages
838
da26 looks fine.
da11, da7 should be replaced as soon as possible.
da27 should be fine for now, monitor.

Any reason for hourly shorts SMART tests? That's overkill, make sure extended tests are regularly scheduled also.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I have short and long SMART tests set to once a week, and bi-weekly, respectively.
I think your schedule is not setup correctly because as @Johnnie Black pointed out, your short tests are running every hour, based on the log you showed. I usually run my short test daily after business hours and schedule the long test for the weekend.

Since you mentioned the drives were getting old, are you looking to replace them with something larger / newer? The underlying server should still be fine for another 5 years easily as the important parts of the technology have not changed significantly from what you already have.
 

VictorR

Contributor
Joined
Dec 9, 2015
Messages
143
Thanks for the replies, I really appreciate it.

Chris, it’s (partially) reassuring to know we aren’t the only ones having an unusually high number of these drives turn up errors. As I said, we had ~5 replaced already, and another two need to go now. That’s a >20% error rate. Luckily, I’ve got a couple of spares around and all of ours have 5 year warranty replacement. [RMA for da11 today]

The hourly SMART test setting was a brainfart, on my part. I’ll set it to run at 4am, nightly.
 
Last edited:
Status
Not open for further replies.
Top