Critical Disk Error but Volume is Healthy??

Status
Not open for further replies.

djseto

Dabbler
Joined
Aug 19, 2013
Messages
32
My Setup (it's old):
Motherboard is ASUS M4A78LT-M
CPU is AMD Athalon II X2 250
RAM is 8GB
HDs: Seagate 1TB ST1000DM003-1ER162
Freenas 9.10.2-U4 (27ae72978)

I am getting errors on /dev/ada3. When I look into my Volume Status, it looks like there are no issues on /dev/ada3p2. I guess my first question is how do I find out what /dev/ada3 is? From the Volume Status it appears my data is on /dev/ada3p2 (does this mean Partition 2?). Below is the output of SMARTCTL. I could run it on /dev/ada3 but not /dev/ada3p2. I need some guidance on where to start to troubleshoot...

Screen Shot 2017-06-26 at 12.03.02 AM.png



smartctl -a /dev/ada3

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===

Model Family:	 Seagate Barracuda 7200.14 (AF)

Device Model:	 ST1000DM003-1ER162

Serial Number:	Z4YBKW5N

LU WWN Device Id: 5 000c50 087b68c15

Firmware Version: CC46

User Capacity:	1,000,204,886,016 bytes [1.00 TB]

Sector Sizes:	 512 bytes logical, 4096 bytes physical

Rotation Rate:	7200 rpm

Form Factor:	  3.5 inches

Device is:		In smartctl database [for details use: -P show]

ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b

SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)

Local Time is:	Mon Jun 26 00:08:55 2017 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

See vendor-specific Attribute list for marginal Attributes.


General SMART Values:

Offline data collection status:  (0x82)	Offline data collection activity

					was completed without error.

					Auto Offline Data Collection: Enabled.

Self-test execution status:	  (   0)	The previous self-test routine completed

					without error or no self-test has ever

					been run.

Total time to complete Offline

data collection:		 (   80) seconds.

Offline data collection

capabilities:			 (0x7b) SMART execute Offline immediate.

					Auto Offline data collection on/off support.

					Suspend Offline collection upon new

					command.

					Offline surface scan supported.

					Self-test supported.

					Conveyance Self-test supported.

					Selective Self-test supported.

SMART capabilities:			(0x0003)	Saves SMART data before entering

					power-saving mode.

					Supports SMART auto save timer.

Error logging capability:		(0x01)	Error logging supported.

					General Purpose Logging supported.

Short self-test routine

recommended polling time:	 (   1) minutes.

Extended self-test routine

recommended polling time:	 ( 105) minutes.

Conveyance self-test routine

recommended polling time:	 (   2) minutes.

SCT capabilities:		   (0x1085)	SCT Status supported.


SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate	 0x000f   117   099   006	Pre-fail  Always	   -	   140339832

  3 Spin_Up_Time			0x0003   098   097   000	Pre-fail  Always	   -	   0

  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   20

  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0

  7 Seek_Error_Rate		 0x000f   078   060   030	Pre-fail  Always	   -	   60000073

  9 Power_On_Hours		  0x0032   088   088   000	Old_age   Always	   -	   11066

10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0

12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   20

183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0

184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0

187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0

188 Command_Timeout		 0x0032   100   100   000	Old_age   Always	   -	   0 0 0

189 High_Fly_Writes		 0x003a   100   100   000	Old_age   Always	   -	   0

190 Airflow_Temperature_Cel 0x0022   059   044   045	Old_age   Always   In_the_past 41 (Min/Max 30/47 #19)

191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0

192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   9

193 Load_Cycle_Count		0x0032   100   100   000	Old_age   Always	   -	   68

194 Temperature_Celsius	 0x0022   041   056   000	Old_age   Always	   -	   41 (0 9 0 0 0)

197 Current_Pending_Sector  0x0012   094   093   000	Old_age   Always	   -	   1008

198 Offline_Uncorrectable   0x0010   094   093   000	Old_age   Offline	  -	   1008

199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0

240 Head_Flying_Hours	   0x0000   100   253   000	Old_age   Offline	  -	   11066h+38m+55.002s

241 Total_LBAs_Written	  0x0000   100   253   000	Old_age   Offline	  -	   10519176548

242 Total_LBAs_Read		 0x0000   100   253   000	Old_age   Offline	  -	   37509088737


SMART Error Log Version: 1

No Errors Logged


SMART Self-test log structure revision number 1

Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline	Completed without error	   00%	 10949		 -

# 2  Short offline	   Completed without error	   00%	 10686		 -

# 3  Extended offline	Completed without error	   00%	 10613		 -

# 4  Extended offline	Completed without error	   00%	 10205		 -

# 5  Short offline	   Completed without error	   00%	 10182		 -

# 6  Extended offline	Completed without error	   00%	  9940		 -

# 7  Extended offline	Interrupted (host reset)	  10%	  9932		 -

# 8  Short offline	   Completed without error	   00%	  9931		 -


SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

	1		0		0  Not_testing

	2		0		0  Not_testing

	3		0		0  Not_testing

	4		0		0  Not_testing

	5		0		0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I need some guidance on where to start to troubleshoot
What's to troubleshoot? ada3 has over 1000 bad sectors. Replace it.
 

djseto

Dabbler
Joined
Aug 19, 2013
Messages
32
So I'm confused as to why my Volume shows as healthy? This whole ada3 vs. ada3p2 still confuses me. This drive still should be under warranty from Seagate...
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
So I'm confused as to why my Volume shows as healthy?
SMART tests and ZFS volume status are independent. ZFS hasn't encountered an error (i.e., an i/o error when reading from, or writing to, that disk, or that data read from that disk doesn't match its checksum), but the disk's SMART status shows that it has over 1000 bad sectors.
This whole ada3 vs. ada3p2 still confuses me.
You got it in your OP. ada3 is the disk, ada3p2 is a partition on the disk. FreeNAS, by default, partitions disks it adds to pools, creating a 2 GB swap partition at the beginning of the disk, and using the rest for ZFS. But partitions don't have SMART data.
 

djseto

Dabbler
Joined
Aug 19, 2013
Messages
32
Do I need to replace with the exact same model drive or does it just have to be the same capacity?
 

djseto

Dabbler
Joined
Aug 19, 2013
Messages
32
So switching off this unreliable 7200rpm 1TB Seagate crap to a WD Red 1TB 5400rpm drive is safe?
 

djseto

Dabbler
Joined
Aug 19, 2013
Messages
32
If I go larger will I be able to take advantage of the extra space since the rest are still 1TB. Thinking of going to 2TB based on the price difference but only if I can use that extra TB.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
Status
Not open for further replies.
Top