8 Offline uncorrectable sectors / 8 Currently unreadable (pending) sectors

Robert Thomspon · Aug 20, 2017

So, Ive had this issue since the drive was installed and sort of just ignored it... but, trying to tackle this so I can clear the log message... (The NAS stores ONLY Videos ... TV Shows and Movies so isn't critical in any way, feel free to skip the lecture on using a drive with a SMART error)... Anyway.... I have the following error in FreeNAS:

CRITICAL: Aug. 19, 2017, 12:55 p.m. - Device: /dev/ada2, 8 Currently unreadable (pending) sectors
CRITICAL: Aug. 19, 2017, 12:55 p.m. - Device: /dev/ada2, 8 Offline uncorrectable sectors

I have tried running smartctl to read where the errors are and force their move with dd to the specific sector... But it seems from the log that all 8 errors are the same sector (is this possible?) and the error count doesn't go down (nor has it ever gone up to be fair)... Does anyone know what might be the issue with this drive not letting me force a bad sector move?

Attached below is the output of smartctl -a /dev/ada2 (this is reading in the middle of a long test):
-------------------------------------------------------------------------------------------------------------------------------------------

Code:

% sudo smartctl -a /dev/ada2
 
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
 
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
 
 
=== START OF INFORMATION SECTION ===
 
Model Family:	 Seagate Desktop HDD.15
 
Device Model:	 ST5000DM000-1FK178
 
Serial Number:	W4J17D9Y
 
LU WWN Device Id: 5 000c50 0906a07fd
 
Firmware Version: CC49
 
User Capacity:	5,000,981,078,016 bytes [5.00 TB]
 
Sector Sizes:	 512 bytes logical, 4096 bytes physical
 
Rotation Rate:	5980 rpm
 
Device is:		In smartctl database [for details use: -P show]
 
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
 
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
 
Local Time is:	Sun Aug 20 04:03:48 2017 PDT
 
SMART support is: Available - device has SMART capability.
 
SMART support is: Enabled
 
 
=== START OF READ SMART DATA SECTION ===
 
SMART overall-health self-assessment test result: PASSED
 
 
General SMART Values:
 
Offline data collection status:  (0x82)	Offline data collection activity
 
					was completed without error.
 
					Auto Offline Data Collection: Enabled.
 
Self-test execution status:	  ( 248)	Self-test routine in progress...
 
					80% of test remaining.
 
Total time to complete Offline
 
data collection:		 (   96) seconds.
 
Offline data collection
 
capabilities:			 (0x7b) SMART execute Offline immediate.
 
					Auto Offline data collection on/off support.
 
					Suspend Offline collection upon new
 
					command.
 
					Offline surface scan supported.
 
					Self-test supported.
 
					Conveyance Self-test supported.
 
					Selective Self-test supported.
 
SMART capabilities:			(0x0003)	Saves SMART data before entering
 
					power-saving mode.
 
					Supports SMART auto save timer.
 
Error logging capability:		(0x01)	Error logging supported.
 
					General Purpose Logging supported.
 
Short self-test routine
 
recommended polling time:	 (   1) minutes.
 
Extended self-test routine
 
recommended polling time:	 ( 615) minutes.
 
Conveyance self-test routine
 
recommended polling time:	 (   2) minutes.
 
SCT capabilities:		   (0x3035)	SCT Status supported.
 
					SCT Feature Control supported.
 
					SCT Data Table supported.
 
 
SMART Attributes Data Structure revision number: 10
 
Vendor Specific SMART Attributes with Thresholds:
 
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
 
  1 Raw_Read_Error_Rate	 0x000f   112   097   006	Pre-fail  Always	   -	   45372328
 
  3 Spin_Up_Time			0x0003   092   091   000	Pre-fail  Always	   -	   0
 
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   415
 
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0
 
  7 Seek_Error_Rate		 0x000f   082   060   030	Pre-fail  Always	   -	   8945321617
 
  9 Power_On_Hours		  0x0032   087   087   000	Old_age   Always	   -	   11744
 
10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
 
12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   415
 
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0
 
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
 
187 Reported_Uncorrect	  0x0032   040   040   000	Old_age   Always	   -	   60
 
188 Command_Timeout		 0x0032   100   100   000	Old_age   Always	   -	   0 0 0
 
189 High_Fly_Writes		 0x003a   100   100   000	Old_age   Always	   -	   0
 
190 Airflow_Temperature_Cel 0x0022   064   055   045	Old_age   Always	   -	   36 (Min/Max 33/37)
 
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0
 
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   386
 
193 Load_Cycle_Count		0x0032   100   100   000	Old_age   Always	   -	   523
 
194 Temperature_Celsius	 0x0022   036   045   000	Old_age   Always	   -	   36 (0 16 0 0 0)
 
195 Hardware_ECC_Recovered  0x001a   112   100   000	Old_age   Always	   -	   45372328
 
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -	   8
 
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -	   8
 
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
 
240 Head_Flying_Hours	   0x0000   100   253   000	Old_age   Offline	  -	   11739h+17m+11.114s
 
241 Total_LBAs_Written	  0x0000   100   253   000	Old_age   Offline	  -	   36827315422
 
242 Total_LBAs_Read		 0x0000   100   253   000	Old_age   Offline	  -	   122627591389
 
 
SMART Error Log Version: 1
 
ATA Error Count: 56 (device log contains only the most recent five errors)
 
	CR = Command Register [HEX]
 
	FR = Features Register [HEX]
 
	SC = Sector Count Register [HEX]
 
	SN = Sector Number Register [HEX]
 
	CL = Cylinder Low Register [HEX]
 
	CH = Cylinder High Register [HEX]
 
	DH = Device/Head Register [HEX]
 
	DC = Device Command Register [HEX]
 
	ER = Error register [HEX]
 
	ST = Status register [HEX]
 
Powered_Up_Time is measured from power on, and printed as
 
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
 
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
 
Error 56 occurred at disk power-on lifetime: 11159 hours (464 days + 23 hours)
 
  When the command that caused the error occurred, the device was active or idle.
 
 
  After command completion occurred, registers were:
 
  ER ST SC SN CL CH DH
 
  -- -- -- -- -- -- --
 
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
 
  Commands leading to the command that caused the error were:
 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 
  -- -- -- -- -- -- -- --  ----------------  --------------------
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.531  READ DMA EXT
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.522  READ DMA EXT
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.471  READ DMA EXT
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.461  READ DMA EXT
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.445  READ DMA EXT
 
 
Error 55 occurred at disk power-on lifetime: 11159 hours (464 days + 23 hours)
 
  When the command that caused the error occurred, the device was active or idle.
 
 
  After command completion occurred, registers were:
 
  ER ST SC SN CL CH DH
 
  -- -- -- -- -- -- --
 
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
 
  Commands leading to the command that caused the error were:
 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 
  -- -- -- -- -- -- -- --  ----------------  --------------------
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.522  READ DMA EXT
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.471  READ DMA EXT
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.461  READ DMA EXT
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.445  READ DMA EXT
 
  25 00 08 ff ff ff 4f 00   5d+12:50:49.427  READ DMA EXT
 
 
Error 54 occurred at disk power-on lifetime: 11159 hours (464 days + 23 hours)
 
  When the command that caused the error occurred, the device was active or idle.
 
 
  After command completion occurred, registers were:
 
  ER ST SC SN CL CH DH
 
  -- -- -- -- -- -- --
 
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
 
  Commands leading to the command that caused the error were:
 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 
  -- -- -- -- -- -- -- --  ----------------  --------------------
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.471  READ DMA EXT
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.461  READ DMA EXT
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.445  READ DMA EXT
 
  25 00 08 ff ff ff 4f 00   5d+12:50:49.427  READ DMA EXT
 
  ca 00 10 90 02 40 e0 00   5d+12:50:49.415  WRITE DMA
 
 
Error 53 occurred at disk power-on lifetime: 11159 hours (464 days + 23 hours)
 
  When the command that caused the error occurred, the device was active or idle.
 
 
  After command completion occurred, registers were:
 
  ER ST SC SN CL CH DH
 
  -- -- -- -- -- -- --
 
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
 
  Commands leading to the command that caused the error were:
 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 
  -- -- -- -- -- -- -- --  ----------------  --------------------
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.461  READ DMA EXT
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.445  READ DMA EXT
 
  25 00 08 ff ff ff 4f 00   5d+12:50:49.427  READ DMA EXT
 
  ca 00 10 90 02 40 e0 00   5d+12:50:49.415  WRITE DMA
 
  35 00 10 ff ff ff 4f 00   5d+12:50:49.411  WRITE DMA EXT
 
 
Error 52 occurred at disk power-on lifetime: 11159 hours (464 days + 23 hours)
 
  When the command that caused the error occurred, the device was active or idle.
 
 
  After command completion occurred, registers were:
 
  ER ST SC SN CL CH DH
 
  -- -- -- -- -- -- --
 
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
 
  Commands leading to the command that caused the error were:
 
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 
  -- -- -- -- -- -- -- --  ----------------  --------------------
 
  25 00 c0 ff ff ff 4f 00   5d+12:50:49.445  READ DMA EXT
 
  25 00 08 ff ff ff 4f 00   5d+12:50:49.427  READ DMA EXT
 
  ca 00 10 90 02 40 e0 00   5d+12:50:49.415  WRITE DMA
 
  35 00 10 ff ff ff 4f 00   5d+12:50:49.411  WRITE DMA EXT
 
  35 00 10 ff ff ff 4f 00   5d+12:50:49.411  WRITE DMA EXT
 
 
SMART Self-test log structure revision number 1
 
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
 
# 1  Extended offline	Self-test routine in progress 80%	 11744		 -
 
# 2  Extended offline	Completed without error	   00%	 11048		 -
 
 
SMART Selective self-test log data structure revision number 1
 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
 
	1		0		0  Not_testing
 
	2		0		0  Not_testing
 
	3		0		0  Not_testing
 
	4		0		0  Not_testing
 
	5		0		0  Not_testing
 
Selective self-test flags (0x0):
 
  After scanning selected spans, do NOT read-scan remainder of disk.
 
If Selective self-test is pending on power-up, resume after 0 minute delay.

joeschmuck · Aug 20, 2017

Robert Thomspon said:
I have tried running smartctl to read where the errors are and force their move with dd to the specific sector... But it seems from the log that all 8 errors are the same sector (is this possible?) and the error count doesn't go down (nor has it ever gone up to be fair)... Does anyone know what might be the issue with this drive not letting me force a bad sector move?

The log only shows the last 5 errors, it just does not show all 8 different sectors. You can try to force the bad sectors but I think you are better off running the Extended Test to completion and if it all passes, check those values again. Ensure you run the Extended test frequently, this may help to remap the data to a good sector and lock out the bad ones.

How exactly are you using the dd command (the exact string you are entering). And after you run the dd command are you running a scrub?

Have you looked at the Wiki page for S.M.A.R.T. ? It explains part of the answer you want.

EDIT: Also please put your data in code brackets. I fixed your posting so it looks proper and easier to read.

Robert Thomspon · Aug 20, 2017

joeschmuck said:
The log only shows the last 5 errors, it just does not show all 8 different sectors. You can try to force the bad sectors but I think you are better off running the Extended Test to completion and if it all passes, check those values again. Ensure you run the Extended test frequently, this may help to remap the data to a good sector and lock out the bad ones.

How exactly are you using the dd command (the exact string you are entering). And after you run the dd command are you running a scrub?

Have you looked at the Wiki page for S.M.A.R.T. ? It explains part of the answer you want.

EDIT: Also please put your data in code brackets. I fixed your posting so it looks proper and easier to read.

Thanks... Im not sure how to code bracket something (I figured it out :)... sorry. And yes, Im aware that the output only shows the last 5 errors.. I'm not sure how to view the other 3... figured they'd show up when I clear these 5 :)

The dd command I am running is:

"dd if=/dev/zero of=/dev/ada2 bs=512 count=1 seek=268435455"

I also tried: "dd if=/dev/zero of=/dev/ada2 bs=512 count=1 seek=268435455 conv=noerror,sync"

both of which return:

Code:

1+0 records in

1+0 records out

512 bytes transferred in 0.000244 secs (2094815 bytes/sec)
1+0 records in

1+0 records out

512 bytes transferred in 0.000244 secs (2094815 bytes/sec)

joeschmuck · Aug 20, 2017

Robert Thomspon said:
I'm not sure how to view the other 3... figured they'd show up when I clear these 5 :)

Nope, you will never see the other error messages, they scroll off as far as I know. The most recent 5 will always be there even if they clear.

Unfortunately you are writing zeros to the one sector which means you destroyed data which resided there, assuming there was data there and I suspect there was. Run a scrub after the Smart long/extended test has completed to repair the damage. If the test completed without error then you are temporarily in the clear.

Lets say your SMART test fails again at the same LBA and you want to try to remap this block out of the drive table. I would use the following command:
dd if=/dev/ada2 of=/dev/ada2 bs=16M
This would read all data on the hard drive (from end to end) and write it back to the hard drive, thus refreshing the data vice writing zeros.

I'm pretty sure you could do this as well but I have not tested it and if you did this, it's all on you for the outcome as this could screw up all your data on the hard drive:
dd if=/dev/ada2 skip=26843000 of=/dev/ada2 seek=26843000 bs=16M
I believe this command you would start the copying just before your suspect area and then continue copying the remainder of the drive. Again, use at your own risk and refer to the dd man page for more details.

However if you can do it I would run badblocks on the entire drive at least 2 times just to play it safe and if you find a problem spot, target that area with badblocks and run it 50 times if needed, just ensure you padd it well before the proble location and well after.

I'm not sure if you understand why a sector goes bad when you have a lot of hours on the hard drive, and you might know this but if you don't... Generally it's because the surface area has become damaged. This could have been due to a head crash or heat, or some surface defect and the think layer of film is peeling off. Typically the area affected gets worse over time. While I understand that your data is not important to you, you can expect this problem to get worse over time.

Robert Thomspon · Aug 20, 2017

joeschmuck said:
Nope, you will never see the other error messages, they scroll off as far as I know. The most recent 5 will always be there even if they clear.

Unfortunately you are writing zeros to the one sector which means you destroyed data which resided there, assuming there was data there and I suspect there was. Run a scrub after the Smart long/extended test has completed to repair the damage. If the test completed without error then you are temporarily in the clear.

Lets say your SMART test fails again at the same LBA and you want to try to remap this block out of the drive table. I would use the following command:
dd if=/dev/ada2 of=/dev/ada2 bs=16M
This would read all data on the hard drive (from end to end) and write it back to the hard drive, thus refreshing the data vice writing zeros.

I'm pretty sure you could do this as well but I have not tested it and if you did this, it's all on you for the outcome as this could screw up all your data on the hard drive:
dd if=/dev/ada2 skip=26843000 of=/dev/ada2 seek=26843000 bs=16M
I believe this command you would start the copying just before your suspect area and then continue copying the remainder of the drive. Again, use at your own risk and refer to the dd man page for more details.

However if you can do it I would run badblocks on the entire drive at least 2 times just to play it safe and if you find a problem spot, target that area with badblocks and run it 50 times if needed, just ensure you padd it well before the proble location and well after.

I'm not sure if you understand why a sector goes bad when you have a lot of hours on the hard drive, and you might know this but if you don't... Generally it's because the surface area has become damaged. This could have been due to a head crash or heat, or some surface defect and the think layer of film is peeling off. Typically the area affected gets worse over time. While I understand that your data is not important to you, you can expect this problem to get worse over time.

I had anticipate that this would get worse over time.. but the error popped up when the drive was brand new... never got worse, also never went away... I will look into bad blocks (I've used it before but its been YEARS since)... as for the time on the drives, they're only 20% fish into their expected lifetime... I think. (They're just over a year old...)

joeschmuck · Aug 20, 2017

Well it should still be under warranty then and your error is covered. I'd RMA the drive to be honest. I don't know if Seagate does advanced RMAs like WD does, where you toss them your credit card in case you do not return the failed drive, they toss you a replacment drive immediately and you ship the failed drive back in the same box.

If you plan to keep the drive, I'd run badblock on it. This will force the remapping of the bad sectors. Here is a link for running badblocks.

Robert Thomspon · Aug 20, 2017

joeschmuck said:
Well it should still be under warranty then and your error is covered. I'd RMA the drive to be honest. I don't know if Seagate does advanced RMAs like WD does, where you toss them your credit card in case you do not return the failed drive, they toss you a replacment drive immediately and you ship the failed drive back in the same box.

If you plan to keep the drive, I'd run badblock on it. This will force the remapping of the bad sectors. Here is a link for running badblocks.

UNFORTUUNATELY, Seagate won't warranty the drives... they were purchased as externally enclosed drives... save close to a grand vs buying the damned things as desktop drives... I didn't keep the enclosures. and Seagate won't do anything with the bare drive, it has to be in the original enclosure... on the upside... I DID buy it from Costco... so, they will take anything back.

joeschmuck · Aug 20, 2017

If you are forced to retain the drive, at least you have something you can try. I'd also ensure you are running a SMART Long test weekly for a few months at least (I run mine weekly always) to ensure more failures do not occur and then since you have a 5TB drive, maybe relax that to once every 2 weeks for the long test.

Robert Thomspon · Aug 20, 2017

Before the migration to a server machine (was running everything off of a dell SFF machine... kept killing power supplies)... Ihad the SMART tests running I think every 10 days... This is a dump from that old machine to a bunch of externals, move the 4x5TB drives to a better box, and dump back all the data... So, since I've only put 2TB back on the machine, I'm running bad blocks on all the drives since all the data is currently on other drives anyways...so, will revisit them in a day or two :)

joeschmuck · Aug 20, 2017

Very good call and that is good luck to have the data already backed up.

NASbox · Aug 20, 2017

Robert Thomspon said:
UNFORTUUNATELY, Seagate won't warranty the drives... they were purchased as externally enclosed drives... save close to a grand vs buying the damned things as desktop drives... I didn't keep the enclosures. and Seagate won't do anything with the bare drive, it has to be in the original enclosure... on the upside... I DID buy it from Costco... so, they will take anything back.

AFAIK CostCo is 90 Days for computer/electronic stuff, so if you are close to that move fast!

You might find this helpful: https://www.backblaze.com/blog/hard-drive-smart-stats/
https://www.backblaze.com/blog/hard-drive-smart-stats/
Keep an eye on 187 Reported_Uncorrect if that increases, ditch the drive really quickly.
If it's stable, then you may get away with it.

Based on my experience, there is a reason why WD/HGST drives cost more... better quality, but no drive is perfect, and that's what backups are for.

Good luck.

Robert Thomspon · Aug 20, 2017

NASbox said:
AFAIK CostCo is 90 Days for computer/electronic stuff, so if you are close to that move fast!
Good luck.

Just looked... Costco now sells them for $114 ... Im comfortable paying that much for 5TB... And the new box has 5 SATA ports rather than the 4 I had had... I may just pick up 2... have a total of 25TB storage...

joeschmuck · Aug 21, 2017

But you are using a non-NAS drive which was not designed for 24/7 operation. I'm sure you understand the risk so I won't pester you about it.

Important Announcement for the TrueNAS Community.

8 Offline uncorrectable sectors / 8 Currently unreadable (pending) sectors

Robert Thomspon

Patron

joeschmuck

Old Man

Robert Thomspon

Patron

joeschmuck

Old Man

Robert Thomspon

Patron

joeschmuck

Old Man

Robert Thomspon

Patron

joeschmuck

Old Man

Robert Thomspon

Patron

joeschmuck

Old Man

NASbox

Guru

Robert Thomspon

Patron

joeschmuck

Old Man

Similar threads

Important Announcement for the TrueNAS Community.

8 Offline uncorrectable sectors / 8 Currently unreadable (pending) sectors

Patron

Old Man

Patron

Old Man

Patron

Old Man

Patron

Old Man

Patron

Old Man

Guru

Patron

Old Man

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "8 Offline uncorrectable sectors / 8 Currently unreadable (pending) sectors"

Similar threads