Help with badblocks on one of my drives

Oct 1, 2013
Hey everyone, I'm running FreeNAS 9.1.1 for quite a few years, and one of my drives finally failed. I think it's toast, but I wanted to see if there's a way to attempt to fix it so I can recover some files.

Some background on my setup. Because of budget at the time of the build, I spent most of my money on the PC that I run my FreeNAS, and not on the drives. I have 2x pools; one has 5 drives with different sizes and no parity, which is where I save my files; and another pool (called Backup) with 2x WD greens that are mirrored. I have rsync jobs that copy my files to this Backup pool.

All my really important files were backed up, but I found out that some of the jobs for the not so important files weren't running for quite a while, and I would like to attempt to get these files back.

After resolving this issue my plan is to install FreeNAS 11, import the backup pool, and then setup a new RAIDZ1 with 3 new WD reds that I just got.

Getting to the point... here's the error message I started to see:

Jun 13 14:10:33 FreeNAS kernel: (ada2:ahcich2:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Jun 13 14:10:33 FreeNAS kernel: (ada2:ahcich2:0:0:0): RES: 41 40 ac 00 40 40 00 00 00 00 00
Jun 13 14:10:33 FreeNAS kernel: (ada2:ahcich2:0:0:0): Retrying command
Jun 13 14:10:33 FreeNAS kernel: (ada2:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 80 00 40 40 00 00 00 01 00 00
Jun 13 14:10:33 FreeNAS kernel: (ada2:ahcich2:0:0:0): CAM status: ATA Status Error

Output from `smartctl` with read failure

# smartctl -a /dev/ada2
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke,

Model Family:	 Western Digital Caviar Blue Serial ATA
Device Model:	 WDC WD5000AAKS-22TMA0
Serial Number:	WD-WCAPW4102538
LU WWN Device Id: 5 0014ee 255bcb94d
Firmware Version: 12.01C01
User Capacity:	500,107,862,016 bytes [500 GB]
Sector Size:	  512 bytes logical/physical
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:	Wed Jun 13 14:30:55 2018 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)   Offline data collection activity
				   was completed without error.
				   Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0)   The previous self-test routine completed
				   without error or no self-test has ever
				   been run.
Total time to complete Offline
data collection:		(12000) seconds.
Offline data collection
capabilities:			 (0x7b) SMART execute Offline immediate.
				   Auto Offline data collection on/off support.
				   Suspend Offline collection upon new
				   Offline surface scan supported.
				   Self-test supported.
				   Conveyance Self-test supported.
				   Selective Self-test supported.
SMART capabilities:			(0x0003)   Saves SMART data before entering
				   power-saving mode.
				   Supports SMART auto save timer.
Error logging capability:		(0x01)   Error logging supported.
				   General Purpose Logging supported.
Short self-test routine
recommended polling time:	 (   2) minutes.
Extended self-test routine
recommended polling time:	 ( 150) minutes.
Conveyance self-test routine
recommended polling time:	 (   6) minutes.
SCT capabilities:		   (0x303f)   SCT Status supported.
				   SCT Error Recovery Control supported.
				   SCT Feature Control supported.
				   SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate	 0x000f   200   200   051	Pre-fail  Always	   -	   284
  3 Spin_Up_Time			0x0003   177   168   021	Pre-fail  Always	   -	   6116
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   520
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000e   200   200   051	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   019   019   000	Old_age   Always	   -	   59419
 10 Spin_Retry_Count		0x0012   100   100   051	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0012   100   100   051	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   466
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   380
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   520
194 Temperature_Celsius	 0x0022   111   090   000	Old_age   Always	   -	   39
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0012   194   194   000	Old_age   Always	   -	   489
198 Offline_Uncorrectable   0x0010   195   195   000	Old_age   Offline	  -	   455
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   197   197   051	Old_age   Offline	  -	   248

SMART Error Log Version: 1
ATA Error Count: 3473 (device log contains only the most recent five errors)
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 3473 occurred at disk power-on lifetime: 59418 hours (2475 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  40 51 00 80 5c 38 40  Error: UNC at LBA = 0x00385c80 = 3693696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 80 5c 38 3a 08	  00:04:12.832  READ DMA EXT
  25 00 00 80 5c 38 3a 08	  00:04:10.601  READ DMA EXT
  25 00 00 80 5c 38 3a 08	  00:04:08.373  READ DMA EXT
  25 00 00 80 5c 38 3a 08	  00:04:06.433  READ DMA EXT
  25 00 00 80 5c 38 3a 08	  00:04:04.485  READ DMA EXT

Error 3472 occurred at disk power-on lifetime: 59418 hours (2475 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  40 51 00 80 5c 38 40  Error: UNC at LBA = 0x00385c80 = 3693696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 80 5c 38 3a 08	  00:04:10.601  READ DMA EXT
  25 00 00 80 5c 38 3a 08	  00:04:08.373  READ DMA EXT
  25 00 00 80 5c 38 3a 08	  00:04:06.433  READ DMA EXT
  25 00 00 80 5c 38 3a 08	  00:04:04.485  READ DMA EXT
  2f 00 01 10 00 00 00 08	  00:04:04.482  READ LOG EXT

Error 3471 occurred at disk power-on lifetime: 59418 hours (2475 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  40 51 00 80 5c 38 40  Error: UNC at LBA = 0x00385c80 = 3693696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 80 5c 38 3a 08	  00:04:08.373  READ DMA EXT
  25 00 00 80 5c 38 3a 08	  00:04:06.433  READ DMA EXT
  25 00 00 80 5c 38 3a 08	  00:04:04.485  READ DMA EXT
  2f 00 01 10 00 00 00 08	  00:04:04.482  READ LOG EXT
  60 00 00 80 5a 38 3a 08	  00:04:02.387  READ FPDMA QUEUED

Error 3470 occurred at disk power-on lifetime: 59418 hours (2475 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  40 51 00 80 5c 38 40  Error: UNC at LBA = 0x00385c80 = 3693696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 80 5c 38 3a 08	  00:04:06.433  READ DMA EXT
  25 00 00 80 5c 38 3a 08	  00:04:04.485  READ DMA EXT
  2f 00 01 10 00 00 00 08	  00:04:04.482  READ LOG EXT
  60 00 00 80 5a 38 3a 08	  00:04:02.387  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 08	  00:04:02.385  READ LOG EXT

Error 3469 occurred at disk power-on lifetime: 59418 hours (2475 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  40 51 00 80 5c 38 40  Error: UNC at LBA = 0x00385c80 = 3693696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 80 5c 38 3a 08	  00:04:04.485  READ DMA EXT
  2f 00 01 10 00 00 00 08	  00:04:04.482  READ LOG EXT
  60 00 00 80 5a 38 3a 08	  00:04:02.387  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 08	  00:04:02.385  READ LOG EXT
  60 00 00 80 5a 38 3a 08	  00:03:59.898  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed: read failure	   90%	 59311		 62207

SMART Selective self-test log data structure revision number 1
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

And I also ran 'badblocks' non-destructive, and it did find a high number of badblocks. I can't find where I saved the output, so I'm running again and will post here.

Additional system info:


# camcontrol devlist
<WDC WD30EZRX-00D8PB0 80.00A80>	at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD30EZRX-00DC0B0 80.00A80>	at scbus1 target 0 lun 0 (ada1,pass1)
<WDC WD5000AAKS-22TMA0 12.01C01>   at scbus2 target 0 lun 0 (ada2,pass2)
<WDC WD30EZRX-00DC0B0 80.00A80>	at scbus3 target 0 lun 0 (ada3,pass3)
<WDC WD10EADS-00L5B1 01.01A01>	 at scbus4 target 0 lun 0 (ada4,pass4)
<WDC WD20EARS-00MVWB0 50.0AB50>	at scbus5 target 0 lun 0 (ada5,pass5)
<Kingmax USB2.0 FlashDisk 0.00>	at scbus6 target 0 lun 0 (pass6,da0)

# zpool status -v
  pool: Backup
 state: ONLINE
  scan: scrub repaired 0 in 2h8m with 0 errors on Sun Jun 10 02:08:43 2018

   Backup										ONLINE	   0	 0	 0
	 gptid/2bcb9274-74ba-11e3-a683-94de806dfb29  ONLINE	   0	 0	 0
	 gptid/2cc595fd-74ba-11e3-a683-94de806dfb29  ONLINE	   0	 0	 0

errors: No known data errors

# /sbin/ifconfig | grep media
   media: Ethernet autoselect (1000baseT <full-duplex>)

Thanks for any help.
