Possible Bad Drive?

Status
Not open for further replies.

HumbleTechie

Cadet
Joined
Apr 6, 2017
Messages
3
I need some assistance reading logs from tests. I noticed this error as a critical alert in FreeNAS:
Device: /dev/ada0, 1 Currently unreadable (pending) sectors

Current setup:
3x2TB WD Red

Code:
zpool status																									   
  pool: Primary_Storage																											 
state: ONLINE																													 
status: Some supported features are not enabled on the pool. The pool can														   
		still be used, but some features are unavailable.																		   
action: Enable all features using 'zpool upgrade'. Once this is done,															   
		the pool may no longer be accessible by software that does not support													 
		the features. See zpool-features(7) for details.																			
  scan: scrub repaired 0 in 3h10m with 0 errors on Sun Apr  2 03:10:47 2017														 
config:																															 
																																	
		NAME											STATE	 READ WRITE CKSUM												 
		Primary_Storage								 ONLINE	   0	 0	 0												 
		  raidz1-0									  ONLINE	   0	 0	 0												 
			gptid/8f5428f2-6e04-11e4-b897-448a5bce7ef7  ONLINE	   0	 0	 0												 
			gptid/90084381-6e04-11e4-b897-448a5bce7ef7  ONLINE	   0	 0	 0												 
			gptid/90ba66f0-6e04-11e4-b897-448a5bce7ef7  ONLINE	   0	 0	 0												 
																																	
errors: No known data errors																										
																																	
  pool: freenas-boot																												
state: ONLINE																													 
  scan: scrub repaired 0 in 0h1m with 0 errors on Sun Apr  2 03:46:01 2017														 
config:																															 
																																	
		NAME		STATE	 READ WRITE CKSUM																					 
		freenas-boot  ONLINE	   0	 0	 0																					
		  da0p2	 ONLINE	   0	 0	 0																					 
																																	
errors: No known data errors


After running smart long test on ada0:

Code:
=== START OF READ SMART DATA SECTION ===																							
SMART overall-health self-assessment test result: PASSED																			
																																	
General SMART Values:																											   
Offline data collection status:  (0x00) Offline data collection activity															
										was never started.																		 
										Auto Offline Data Collection: Disabled.													 
Self-test execution status:	  (   0) The previous self-test routine completed													
										without error or no self-test has ever													 
										been run.																				   
Total time to complete Offline																									 
data collection:				(26100) seconds.																					
Offline data collection																											 
capabilities:					(0x7b) SMART execute Offline immediate.															
										Auto Offline data collection on/off support.												
										Suspend Offline collection upon new														 
										command.																					
										Offline surface scan supported.															 
										Self-test supported.																		
										Conveyance Self-test supported.															 
										Selective Self-test supported.															 
SMART capabilities:			(0x0003) Saves SMART data before entering															
										power-saving mode.																		 
										Supports SMART auto save timer.															 
Error logging capability:		(0x01) Error logging supported.																	
										General Purpose Logging supported.														 
Short self-test routine																											 
recommended polling time:		(   2) minutes.																					
Extended self-test routine																										 
recommended polling time:		( 264) minutes.							   
Conveyance self-test routine																										
recommended polling time:		(   5) minutes.																					
SCT capabilities:			  (0x703d) SCT Status supported.																	   
										SCT Error Recovery Control supported.													   
										SCT Feature Control supported.															 
										SCT Data Table supported.																   
																																	
SMART Attributes Data Structure revision number: 16																				 
Vendor Specific SMART Attributes with Thresholds:																				   
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE									
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   9											
  3 Spin_Up_Time			0x0027   172   172   021	Pre-fail  Always	   -	   4400										 
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   32										   
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0											
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0											
  9 Power_On_Hours		  0x0032   072   072   000	Old_age   Always	   -	   20642										
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0											
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0											
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   32										   
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   22										   
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   210										 
194 Temperature_Celsius	 0x0022   119   112   000	Old_age   Always	   -	   28										   
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0											
-- > 197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   1											
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0											
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0											
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   8											
																																	
SMART Error Log Version: 1																										 
No Errors Logged																													
																																	
SMART Self-test log structure revision number 1																					 
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error									 
# 1  Short offline	   Completed without error	   00%	 20589		 -													 
# 2  Extended offline	Completed without error	   00%	 20575		 -													 
																																	
SMART Selective self-test log data structure revision number 1																	 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS																						
	1		0		0  Not_testing																								
	2		0		0  Not_testing																								
	3		0		0  Not_testing																								
	4		0		0  Not_testing																								
	5		0		0  Not_testing																								
Selective self-test flags (0x0):																									
  After scanning selected spans, do NOT read-scan remainder of disk.																
If Selective self-test is pending on power-up, resume after 0 minute delay.


It seems as though everything is passing except the 1 pending sector?

Is this enough to RMA the drive?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Run a SMART Long test onthe drive and see if the problem is still there.

Even if the problem remains, this is not terrible just yet. You also have IS 200 MultiZone Errors which you should keep an eye on.

Technically it's enough to RMA the drive however I'd run the long test first. If the problem clears then you will be fine for now. If your drive is near the end of the 3 year warranty then I'd RMA it. If you got more than 6-8 months left then I'd wait unless you start getting more errors.

I take it that you do not run routine SMART tests on your drives. My advice, run a daily SMART Short test and weekly SMART long test on all your drives.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Setup smart tests to run automatically. Don't worry about the pending sector yet

Sent from my Nexus 5X using Tapatalk
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Now, I'm ultra-hard-core, I pull the panic button much much earlier than most people. But if I have pending sector(s) *and* multi-zone errors (you have 8 of them, apparently), then I consider the drive toast.

Just another view.

Speaking for a statistical standpoint, there is a decent chance your drive will still provide ample service moving forward, especially if after a SMART LONG test that you kick off if nothing gets worse.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Now, I'm ultra-hard-core, I pull the panic button much much earlier than most people.
I wouldn't say that is true. I think you are more laid back than that, but you have a valid point.
 

HumbleTechie

Cadet
Joined
Apr 6, 2017
Messages
3
Thanks for the information all.

Should I re-run a long smart test and post results? The last one I ran was Sunday, and I was still getting the 1 pending sector error after that.

I have also setup automatic tests going forward.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I say run the test now and maybe the error goes away but don't hold your breath. Feel free to post the data if you like but if you haven't already done so, check out the link in my signature and you can see what IDs are important to look at. If you would feel better sending the drive in for RMA, then do it. You can also perform an Advance RMA where WD will ship you a new-to-you drive (you need to secure it on a credit card) and then when you get the new drive, you swap it out and now you have a return box as well, and the last time I did it the return was paid for. Nothing gets charged to your credit card if you return the failed drive within a certain period of time.
 

HumbleTechie

Cadet
Joined
Apr 6, 2017
Messages
3
Thanks for the great suggestion. I will most likely go this route as the warranty expires later this year.

The long test finished. I'm still seeing 1 pending sector error, but the Multi_Zone_Error_Rate dropped down to 3 (not sure if that makes any difference). Everything else is the same.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
You will likely see the pending sector error until that location on the hard drive is written over again. The Multi zone errors dropping is good.

Since you are nearing the end of the warranty period, I'd consider doing the advanced RMA. Typically having a pending sector error will eventually lead to having more and worse errors.
 
Status
Not open for further replies.
Top