Device: /dev/ada1, Self-Test Log error count increased from 0 to 1

Status
Not open for further replies.

theaddies

Contributor
Joined
Mar 28, 2015
Messages
105
I awoke this morning to this critical error. Can anyone advise what I need to do. I have 6 2TB WD enterprise drives in RAID Z2 configuration.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Yes, follow this guide first to check for errors. Post the output of "smartctl -a /dev/ada1" in code brackets.
 

theaddies

Contributor
Joined
Mar 28, 2015
Messages
105
Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)		 
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org	 
																				
=== START OF INFORMATION SECTION ===											
Model Family:	 Western Digital Re											
Device Model:	 WDC WD2000FYYZ-01UL1B2										
Serial Number:	WD-WMC1P0E2HE15											   
LU WWN Device Id: 5 0014ee 003f22207											
Firmware Version: 01.01K03													 
User Capacity:	2,000,398,934,016 bytes [2.00 TB]							 
Sector Size:	  512 bytes logical/physical									
Rotation Rate:	7200 rpm													 
Device is:		In smartctl database [for details use: -P show]			   
ATA Version is:   ATA8-ACS (minor revision not indicated)					   
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)						
Local Time is:	Sat Oct 29 09:06:39 2016 EDT								 
SMART support is: Available - device has SMART capability.					 
SMART support is: Enabled													   
																				
=== START OF READ SMART DATA SECTION ===										
SMART overall-health self-assessment test result: PASSED
General SMART Values:														   
Offline data collection status:  (0x85) Offline data collection activity		
										was aborted by an interrupting command f
rom host.																	   
										Auto Offline Data Collection: Enabled. 
Self-test execution status:	  ( 115) The previous self-test completed having
										the read element of the test failed.	
Total time to complete Offline												 
data collection:				(24840) seconds.								
Offline data collection														 
capabilities:					(0x7b) SMART execute Offline immediate.		
										Auto Offline data collection on/off supp
ort.																			
										Suspend Offline collection upon new	 
										command.								
										Offline surface scan supported.		 
										Self-test supported.					
										Conveyance Self-test supported.
										Self-test supported.					
										Conveyance Self-test supported.		 
										Selective Self-test supported.		 
SMART capabilities:			(0x0003) Saves SMART data before entering		
										power-saving mode.					 
										Supports SMART auto save timer.		 
Error logging capability:		(0x01) Error logging supported.				
										General Purpose Logging supported.	 
Short self-test routine														 
recommended polling time:		(   2) minutes.								
Extended self-test routine													 
recommended polling time:		( 271) minutes.								
Conveyance self-test routine													
recommended polling time:		(   5) minutes.								
SCT capabilities:			  (0x70bd) SCT Status supported.				   
										SCT Error Recovery Control supported.   
										SCT Feature Control supported.		 
										SCT Data Table supported.			   
																				
SMART Attributes Data Structure revision number: 16							 
Vendor Specific SMART Attributes with Thresholds:							   
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_
FAILED RAW_VALUE	   
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_
FAILED RAW_VALUE																
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -
	   0																		
  3 Spin_Up_Time			0x0027   177   166   021	Pre-fail  Always	   -
	   6116																	 
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -
	   36																	   
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -
	   0																		
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -
	   0																		
  9 Power_On_Hours		  0x0032   083   083   000	Old_age   Always	   -
	   12894																	
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -
	   0																		
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -
	   0																		
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -
	   24																	   
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -
	   0																	   
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -
	   0																		
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -
	   7																		
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -
	   28																	   
194 Temperature_Celsius	 0x0022   119   093   000	Old_age   Always	   -
	   31																	   
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -
	   0																		
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -
	   0																		
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -
	   0																		
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -
	   0																		
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -
	   0																		
																				
SMART Error Log Version: 1													 
No Errors Logged															   
No Errors Logged																
																				
SMART Self-test log structure revision number 1								 
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA
_of_first_error																 
# 1  Short offline	   Completed: read failure	   30%	 12886		 924
9797																			
# 2  Short offline	   Completed without error	   00%	 12718		 - 
# 3  Short offline	   Completed without error	   00%	 12551		 - 
# 4  Short offline	   Completed without error	   00%	 12383		 - 
# 5  Extended offline	Completed without error	   00%	 12222		 - 
# 6  Short offline	   Completed without error	   00%	 12180		 - 
# 7  Short offline	   Completed without error	   00%	 11999		 - 
# 8  Short offline	   Completed without error	   00%	 11831		 - 
# 9  Short offline	   Completed without error	   00%	 11664		 - 
#10  Extended offline	Completed without error	   00%	 11503		 - 
#11  Short offline	   Completed without error	   00%	 11425		 - 
#12  Short offline	   Completed without error	   00%	 11424		 - 
#13  Short offline	   Completed without error	   00%	 11256		 - 
#14  Short offline	   Completed without error	   00%	 10921		 - 
#15  Extended offline	Completed without error	   00%	 10759		 - 
#16  Short offline	   Completed without error	   00%	 10681		 - 
#17  Short offline	   Completed without error	   00%	 10513		 - 

#18  Short offline	   Completed without error	   00%	 10345		 - 
#19  Short offline	   Completed without error	   00%	 10178		 - 
#20  Extended offline	Completed without error	   00%	 10016		 - 
#21  Short offline	   Completed without error	   00%	  9962		 - 
																				
SMART Selective self-test log data structure revision number 1				 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS									
	1		0		0  Not_testing											
	2		0		0  Not_testing											
	3		0		0  Not_testing											
	4		0		0  Not_testing											
	5		0		0  Not_testing											
Selective self-test flags (0x0):												
  After scanning selected spans, do NOT read-scan remainder of disk.			
If Selective self-test is pending on power-up, resume after 0 minute delay.	 
																			   
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Follow the guide and run a Long test. If the long test passes then I'd say you had something happen to your drive like it lost power (just a guess). I don't see any issues with the drive except that one Short test failure.

EDIT: The reason I say run the Long test vice a Short test is just to run the most through test and fully test your drive out. You could run a Short test if you like since that is only 2 minutes in duration but then also run a Long test to cover your bases.
 

theaddies

Contributor
Joined
Mar 28, 2015
Messages
105
I ran the long test and it appears to have passed but there were some failures that I don't understand. The output came from smartctl -a /dev/ada1.
Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)		 
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org	 
																				
=== START OF INFORMATION SECTION ===											
Model Family:	 Western Digital Re											
Device Model:	 WDC WD2000FYYZ-01UL1B2										
Serial Number:	WD-WMC1P0E2HE15											   
LU WWN Device Id: 5 0014ee 003f22207											
Firmware Version: 01.01K03													 
User Capacity:	2,000,398,934,016 bytes [2.00 TB]							 
Sector Size:	  512 bytes logical/physical									
Rotation Rate:	7200 rpm													 
Device is:		In smartctl database [for details use: -P show]			   
ATA Version is:   ATA8-ACS (minor revision not indicated)					   
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)						
Local Time is:	Sat Oct 29 14:33:36 2016 EDT								 
SMART support is: Available - device has SMART capability.					 
SMART support is: Enabled													   
																				
=== START OF READ SMART DATA SECTION ===										
SMART overall-health self-assessment test result: PASSED						
																				
General SMART Values:														   
General SMART Values:														   
Offline data collection status:  (0x85) Offline data collection activity		
										was aborted by an interrupting command f
rom host.																	   
										Auto Offline Data Collection: Enabled. 
Self-test execution status:	  ( 120) The previous self-test completed having
										the read element of the test failed.	
Total time to complete Offline												 
data collection:				(24840) seconds.								
Offline data collection														 
capabilities:					(0x7b) SMART execute Offline immediate.		
										Auto Offline data collection on/off supp
ort.																			
										Suspend Offline collection upon new	 
										command.								
										Offline surface scan supported.		 
										Self-test supported.					
										Conveyance Self-test supported.		 
										Selective Self-test supported.		 
SMART capabilities:			(0x0003) Saves SMART data before entering		
										power-saving mode.					 
										Supports SMART auto save timer.		 
Error logging capability:		(0x01) Error logging supported.				
										General Purpose Logging supported.	 
Short self-test routine														 
recommended polling time:		(   2) minutes.								
Extended self-test routine													 
recommended polling time:		( 271) minutes.								
Conveyance self-test routine													
recommended polling time:		(   5) minutes.								
SCT capabilities:			  (0x70bd) SCT Status supported.				   
										SCT Error Recovery Control supported.   
										SCT Feature Control supported.		 
										SCT Data Table supported.			   
																				
SMART Attributes Data Structure revision number: 16							 
Vendor Specific SMART Attributes with Thresholds:							   
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_
FAILED RAW_VALUE																
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -
	   0																		
  3 Spin_Up_Time			0x0027   177   166   021	Pre-fail  Always	   -
	   6116																	 
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -
	   36																	   
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -
	   0																		
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -
	   0																		
  9 Power_On_Hours		  0x0032   083   083   000	Old_age   Always	   -
	   12900																	
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -
	   0																		
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -
	   0																		
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -
	   24																	   
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -
	   0																		
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -
	   7																		
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -
	   28																	   
194 Temperature_Celsius	 0x0022   120   093   000	Old_age   Always	   -
	   30																	   
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -
	   0																		
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -
	   0																		
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -
	   0																		
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -
	   0																		
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -
	   1																		
																				
SMART Error Log Version: 1													 
No Errors Logged																
																				
SMART Self-test log structure revision number 1								 
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA
_of_first_error																 
# 1  Extended offline	Completed: read failure	   80%	 12896		 924
9797																			
# 2  Short offline	   Completed: read failure	   30%	 12886		 924
9797																			
# 3  Short offline	   Completed without error	   00%	 12718		 - 
# 4  Short offline	   Completed without error	   00%	 12551		 - 
# 5  Short offline	   Completed without error	   00%	 12383		 - 
# 6  Extended offline	Completed without error	   00%	 12222		 - 
# 7  Short offline	   Completed without error	   00%	 12180		 - 
# 8  Short offline	   Completed without error	   00%	 11999		 - 
# 9  Short offline	   Completed without error	   00%	 11831		 - 
#10  Short offline	   Completed without error	   00%	 11664		 - 
#11  Extended offline	Completed without error	   00%	 11503		 - 
#12  Short offline	   Completed without error	   00%	 11425		 - 
#13  Short offline	   Completed without error	   00%	 11424		 - 
#14  Short offline	   Completed without error	   00%	 11256		 - 
#15  Short offline	   Completed without error	   00%	 10921		 - 
#16  Extended offline	Completed without error	   00%	 10759		 - 
#17  Short offline	   Completed without error	   00%	 10681		 - 
#18  Short offline	   Completed without error	   00%	 10513		 - 
#19  Short offline	   Completed without error	   00%	 10345		 - 
#20  Short offline	   Completed without error	   00%	 10178		 - 
#21  Extended offline	Completed without error	   00%	 10016		 - 
																				
SMART Selective self-test log data structure revision number 1				 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS									
SMART Selective self-test log data structure revision number 1				 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS									
	1		0		0  Not_testing											
	2		0		0  Not_testing											
	3		0		0  Not_testing											
	4		0		0  Not_testing											
	5		0		0  Not_testing											
Selective self-test flags (0x0):												
  After scanning selected spans, do NOT read-scan remainder of disk.			
If Selective self-test is pending on power-up, resume after 0 minute delay.	 
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
The test failed, it did not pass.
Code:
Num  Test_Description	Status			 Remaining  LifeTime(hours)  LBA_of_first_error
#1 Extended offline Completed: read failure	 80%		12896		9249797
 

theaddies

Contributor
Joined
Mar 28, 2015
Messages
105
Thanks. From the guide I couldn't figure out what this error meant. I thought the drive passed due to the comment in the report, "SMART overall-health self-assessment test result: PASSED".

Do you have any suggestions about what I should do now? Replace the drive? Is it urgent?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Thanks. From the guide I couldn't figure out what this error meant. I thought the drive passed due to the comment in the report, "SMART overall-health self-assessment test result: PASSED".

Do you have any suggestions about what I should do now? Replace the drive? Is it urgent?
I will clarify that in the instructions, thanks for the feedback.

You should run a Scrub on your pool and if there are any data issues, they should be automatically repaired because you have a RAIDZ2 setup. This is more of a "feel good" move right now.

Your drive is not in a critical state right now unless after a scrub it tells you that your pool is degraded. You have time to obtain another drive and replace it per the user manual. If you have any critical/important data, back it up to some other media just for safe keeping until the drive is replaced.

Here is what I'd do from here if I were you based on the single MultiZone Error (ID 200) and the LBA read error...
1) RMA the drive based on ID200 and the read failure (SMART Test Failure).
2) Replace the hard drive as soon as possible if I had a replacement available.
3) If I have to wait on a replacement hard drive, I'd shut down the NAS as a preservation tactic and power it up only if I needed it.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
SMART overall-health self-assessment test result: PASSED
This relates to whether any of the SMART attributes have reached their threshold, not to the results of SMART tests. It's a common source of confusion.
 

theaddies

Contributor
Joined
Mar 28, 2015
Messages
105
I managed to start the scrub. How do I know when it is done and how do I know if it repaired anything or generated further issues.

Thanks for all the help. This is super.
 

theaddies

Contributor
Joined
Mar 28, 2015
Messages
105
This is the result of my scrub. Nothing was found.
Code:
zpool status																									 
  pool: BigdaddyZFS																												 
state: ONLINE																													 
  scan: scrub repaired 0 in 15h49m with 0 errors on Sun Oct 30 12:49:24 2016														
config:																															 
																																	
		NAME											STATE	 READ WRITE CKSUM												 
		BigdaddyZFS									 ONLINE	   0	 0	 0												 
		  raidz2-0									  ONLINE	   0	 0	 0												 
			gptid/ff70f3ef-f66c-11e4-8308-0cc47a688be6  ONLINE	   0	 0	 0												 
			gptid/ffae78a0-f66c-11e4-8308-0cc47a688be6  ONLINE	   0	 0	 0												 
			gptid/ffea6bff-f66c-11e4-8308-0cc47a688be6  ONLINE	   0	 0	 0												 
			gptid/00283d8e-f66d-11e4-8308-0cc47a688be6  ONLINE	   0	 0	 0												 
			gptid/00635ce8-f66d-11e4-8308-0cc47a688be6  ONLINE	   0	 0	 0												 
			gptid/009f941c-f66d-11e4-8308-0cc47a688be6  ONLINE	   0	 0	 0												 
																																	
errors: No known data errors																										
																																	
  pool: freenas-boot																												
state: ONLINE																													 
  scan: scrub repaired 0 in 0h8m with 0 errors on Wed Oct  5 03:53:20 2016														 
config:																															 
																																	
		NAME										  STATE	 READ WRITE CKSUM													
		freenas-boot								  ONLINE	   0	 0	 0													
		  gptid/6ccd19df-e373-11e4-b685-382c4a740a91  ONLINE	   0	 0	 0													
																																	
errors: No known data errors																			 
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
ZFS is working fine. You have nothing to worry about except you have a failing hard drive that needs replacing.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
ZFS is working fine. You have nothing to worry about except you have a failing hard drive that needs replacing.
Exactly.
 
Status
Not open for further replies.
Top