Critical Drive Errors

Status
Not open for further replies.

ggoosen

Dabbler
Joined
Aug 21, 2017
Messages
21
  • CRITICAL: Sept. 6, 2017, 9:18 p.m. - Device: /dev/ada0, 12584 Currently unreadable (pending) sectors
  • CRITICAL: Sept. 6, 2017, 9:18 p.m. - Device: /dev/ada0, 12584 Offline uncorrectable sectors
  • CRITICAL: Sept. 6, 2017, 9:19 p.m. - The volume Disk12TB state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.

I ran a smartctl -t short -C /dev/ada0

Below are the results.

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Seagate Barracuda 7200.14 (AF)
Device Model:	 ST2000DM001-9YN164
Serial Number:	W1E07H33
LU WWN Device Id: 5 000c50 0454f83f1
Firmware Version: CC9D
User Capacity:	2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	7200 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is:	Wed Sep  6 21:33:12 2017 ACST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:	  ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline
data collection:		 (  592) seconds.
Offline data collection
capabilities:			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time:	 (   1) minutes.
Extended self-test routine
recommended polling time:	 ( 244) minutes.
Conveyance self-test routine
recommended polling time:	 (   2) minutes.
SCT capabilities:		   (0x3081)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   090   090   006	Pre-fail  Always	   -	   66762136
  3 Spin_Up_Time			0x0003   095   094   000	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   156
  5 Reallocated_Sector_Ct   0x0033   100   100   036	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000f   078   060   030	Pre-fail  Always	   -	   8735073364
  9 Power_On_Hours		  0x0032   083   082   000	Old_age   Always	   -	   14944
10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   125
183 Runtime_Bad_Block	   0x0032   093   093   000	Old_age   Always	   -	   7
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   001   001   000	Old_age   Always	   -	   248
188 Command_Timeout		 0x0032   100   099   000	Old_age   Always	   -	   9 9 9
189 High_Fly_Writes		 0x003a   098   098   000	Old_age   Always	   -	   2
190 Airflow_Temperature_Cel 0x0022   082   039   045	Old_age   Always   In_the_past 18 (0 94 18 17 0)
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   91
193 Load_Cycle_Count		0x0032   033   033   000	Old_age   Always	   -	   134125
194 Temperature_Celsius	 0x0022   018   061   000	Old_age   Always	   -	   18 (0 11 0 0 0)
197 Current_Pending_Sector  0x0012   063   017   000	Old_age   Always	   -	   6184
198 Offline_Uncorrectable   0x0010   063   017   000	Old_age   Offline	  -	   6184
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
240 Head_Flying_Hours	   0x0000   100   253   000	Old_age   Offline	  -	   2927h+51m+00.851s
241 Total_LBAs_Written	  0x0000   100   253   000	Old_age   Offline	  -	   1450126304226
242 Total_LBAs_Read		 0x0000   100   253   000	Old_age   Offline	  -	   838585338767

SMART Error Log Version: 1
ATA Error Count: 226 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 226 occurred at disk power-on lifetime: 14944 hours (622 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 20 d5 b4 05  Error: UNC at LBA = 0x05b4d520 = 95737120

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 28 20 d5 b4 e5 00	  00:30:50.532  READ DMA
  c8 00 28 20 d5 b4 e5 00	  00:30:47.125  READ DMA
  b0 d5 01 06 4f c2 e0 00	  00:30:47.021  SMART READ LOG
  c8 00 28 20 d5 b4 e5 00	  00:30:43.593  READ DMA
  b0 d5 01 00 4f c2 e0 00	  00:30:43.524  SMART READ LOG

Error 225 occurred at disk power-on lifetime: 14944 hours (622 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 20 d5 b4 05  Error: UNC at LBA = 0x05b4d520 = 95737120

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 28 20 d5 b4 e5 00	  00:30:47.125  READ DMA
  b0 d5 01 06 4f c2 e0 00	  00:30:47.021  SMART READ LOG
  c8 00 28 20 d5 b4 e5 00	  00:30:43.593  READ DMA
  b0 d5 01 00 4f c2 e0 00	  00:30:43.524  SMART READ LOG
  c8 00 28 20 d5 b4 e5 00	  00:30:40.100  READ DMA

Error 224 occurred at disk power-on lifetime: 14944 hours (622 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 20 d5 b4 05  Error: UNC at LBA = 0x05b4d520 = 95737120

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 28 20 d5 b4 e5 00	  00:30:43.593  READ DMA
  b0 d5 01 00 4f c2 e0 00	  00:30:43.524  SMART READ LOG
  c8 00 28 20 d5 b4 e5 00	  00:30:40.100  READ DMA
  b0 d0 01 00 4f c2 e0 00	  00:30:39.445  SMART READ DATA
  c8 00 28 20 d5 b4 e5 00	  00:30:36.550  READ DMA

Error 223 occurred at disk power-on lifetime: 14944 hours (622 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 20 d5 b4 05  Error: UNC at LBA = 0x05b4d520 = 95737120

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 28 20 d5 b4 e5 00	  00:30:40.100  READ DMA
  b0 d0 01 00 4f c2 e0 00	  00:30:39.445  SMART READ DATA
  c8 00 28 20 d5 b4 e5 00	  00:30:36.550  READ DMA
  c8 00 08 d8 73 5d e0 00	  00:30:36.549  READ DMA
  c8 00 08 48 5f 55 e0 00	  00:30:36.549  READ DMA

Error 222 occurred at disk power-on lifetime: 14944 hours (622 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 20 d5 b4 05  Error: UNC at LBA = 0x05b4d520 = 95737120

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 28 20 d5 b4 e5 00	  00:30:36.550  READ DMA
  c8 00 08 d8 73 5d e0 00	  00:30:36.549  READ DMA
  c8 00 08 48 5f 55 e0 00	  00:30:36.549  READ DMA
  c8 00 08 20 7e 54 e0 00	  00:30:36.549  READ DMA
  ca 00 10 90 02 40 e0 00	  00:30:36.549  WRITE DMA

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive	   Completed: read failure	   90%	 14944		 95737120

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Can someone help me understand how bad it is?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

Morphix

Cadet
Joined
Jun 13, 2016
Messages
6
197 Current_Pending_Sector 0x0012 063 017 000 Old_age Always - 6184
198 Offline_Uncorrectable 0x0010 063 017 000 Old_age Offline - 6184

The 000 is the threshold, those are both at 6184.. That's insanely high considering any are not good, but thousands is nuts.

Sent from my LG-H990 using Tapatalk
 

ggoosen

Dabbler
Joined
Aug 21, 2017
Messages
21
This is the output from zpool status -v

Code:
root@freenas:/mnt/Disk12TB # zpool status -v
  pool: Disk12TB
 state: ONLINE
status: One or more devices has experienced an error resulting in data
		corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
		entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: none requested
config:

		NAME										  STATE	 READ WRITE CKSUM
		Disk12TB									  ONLINE	  43	 0	 0
		  gptid/d40f6179-923a-11e7-bd75-00215ad49ec2  ONLINE	  45	 0	 0

errors: Permanent errors have been detected in the following files:

		/var/db/system/syslog-a7c4a4a3d45a4720a1e6b8ad799731fd/log/debug.log
		/var/db/system/rrd-a7c4a4a3d45a4720a1e6b8ad799731fd/freenas.goosen/df-mnt-Disk12TB-D1WindowsDS/df_complex-free.rrd
		Disk12TB/jails/sickrage_1:<0x0>
		/mnt/Disk12TB/d1unixds/movies/A Dark Song (2016)
		Disk12TB/jails/plexmediaserver_1:<0x0>

  pool: Disk21TB
 state: ONLINE
  scan: none requested
config:

		NAME										  STATE	 READ WRITE CKSUM
		Disk21TB									  ONLINE	   0	 0	 0
		  gptid/eda92b7b-923a-11e7-bd75-00215ad49ec2  ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h2m with 0 errors on Wed Sep  6 03:47:03 2017
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors

 

ggoosen

Dabbler
Joined
Aug 21, 2017
Messages
21
197 Current_Pending_Sector 0x0012 063 017 000 Old_age Always - 6184
198 Offline_Uncorrectable 0x0010 063 017 000 Old_age Offline - 6184

The 000 is the threshold, those are both at 6184.. That's insanely high considering any are not good, but thousands is nuts.

Sent from my LG-H990 using Tapatalk

Ok so safe to assume this drive is pretty much rubbish, and i shouldnt use it.

I should have also said this is a test system, i pulled the drive from an external USB drive and configured it with no redundancy.

Only data on there are some jails that i setup (any easy way to backup those so i dont have to redo them) if not.. it doesnt matter i can just recreate.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
You could try upgrading the single disk to a mirror, which apparently has to be done manually from the command line. Whatever data is damaged will still be damaged though.
 
Status
Not open for further replies.
Top