SOLVED Weird Issue After Resilvering

Status
Not open for further replies.

MeCJay12

Dabbler
Joined
Jul 20, 2018
Messages
18
Hello. I have an array of 12 disks. 8 are 1TB and 4 are 2TB. I am slowly replacing the 1TBs with 2TBs as they fail. I had a 1TB fail yesterday so I do the normal thing to replace it. Pop the drive (hot swappable chassis), verify the serial number, insert new drive, reboot FreeNAS, and click replace disk. It took about 24 hours to resilver then everything seemed fine. The critical error went away, disk speeds returned to normal, etc. After about 10 min, I get a notification that there is an error. I assumed it was invailid but I double checked FreeNAS and there is a new error:

The volume Storage state is ONLINE: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.

The only this I noticed that is unusual is that the new disk has a 103 under Checksum and is climbing in the Volume Status. After an hour or so the disk went to DEGRADED. This is a brand new disk I opened before installing. I pulled the drive out of the system and CDI says it's fine. The other thing I tried was smartctl from the FreeNAS web shell but I could only get

Read Device Identity failed: Inappropriate ioctl for device

After an hour or so the disk went to DEGRADED. This is my first build of FreeNAS and the company that hosts my backup server said that it suffered a double disk failure losing my backup so I'm not looking to chance anything at the moment. Thanks in advance.
 

CraigD

Patron
Joined
Mar 8, 2016
Messages
343

MeCJay12

Dabbler
Joined
Jul 20, 2018
Messages
18
Here is the output from smartctl -A /dev/da16. Hopefully you can read it:
Code:
=== START OF READ SMART DATA SECTION ===										
SMART Attributes Data Structure revision number: 10							
Vendor Specific SMART Attributes with Thresholds:							  
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_
FAILED RAW_VALUE																
  1 Raw_Read_Error_Rate	 0x000f   105   100   006	Pre-fail  Always	   -
	  8018784																
  3 Spin_Up_Time			0x0003   099   099   000	Pre-fail  Always	   -
	  0																		
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -
	  3																		
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -
	  0																		
  7 Seek_Error_Rate		 0x000f   065   060   030	Pre-fail  Always	   -
	  3596397																
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -
	  30																	  
 10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -
	  0																		
 12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -
	  3																		
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -
	  0																		
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -
	  0																		
188 Command_Timeout		 0x0032   100   100   000	Old_age   Always	   -
	  0																		
189 High_Fly_Writes		 0x003a   100   100   000	Old_age   Always	   -
	  0																		
190 Airflow_Temperature_Cel 0x0022   066   063   045	Old_age   Always	   -
	  34 (Min/Max 33/36)													  
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -
	  0																		
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -
	  2																		
193 Load_Cycle_Count		0x0032   100   100   000	Old_age   Always	   -
	  3																		
194 Temperature_Celsius	 0x0022   034   040   000	Old_age   Always	   -
	  34 (0 30 0 0 0)														
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -
	  0																		
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -
	  0																		
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -
	  0
 
Last edited by a moderator:

CraigD

Patron
Joined
Mar 8, 2016
Messages
343
The drive looks fine

I would get another drive and burn it in, replace the disk with the checksum errors

Then burn in the other disk, the now known good drive can swap out a small drive or be used as a cold spare

Have Fun
 

CraigD

Patron
Joined
Mar 8, 2016
Messages
343
Pull the disk with the errors, replace with a burnt in drive, then burn in the drive you removed

Are you using a RAID card or an HBA?

24 hours is a long time for a 1TB resilver, can we see "pool status"?

Have Fun
Code:
root@freenas:~ # zpool status
 
  pool: WDRed
 state: ONLINE
  scan: scrub repaired 0 in 9h55m with 0 errors on Sat Jul 21 09:55:15 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		WDRed										   ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/8b6e9691-46c3-11e7-b75f-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/8cb509ee-46c3-11e7-b75f-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/8d72e644-46c3-11e7-b75f-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/9039b411-a428-11e7-998b-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/8eef42c2-46c3-11e7-b75f-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/8fcc9b9a-46c3-11e7-b75f-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/9092843c-46c3-11e7-b75f-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/915353b9-46c3-11e7-b75f-0cc47aab6f2a  ONLINE	   0	 0	 0
		  raidz2-1									  ONLINE	   0	 0	 0
			gptid/904f454a-0ca3-11e8-a51e-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/911e7d80-0ca3-11e8-a51e-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/91cafc26-0ca3-11e8-a51e-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/92dda435-0ca3-11e8-a51e-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/1f1c9cbb-4b39-11e8-b61b-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/946276dc-0ca3-11e8-a51e-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/950d1ac3-0ca3-11e8-a51e-0cc47aab6f2a  ONLINE	   0	 0	 0
			gptid/95b819f8-0ca3-11e8-a51e-0cc47aab6f2a  ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu Jul 19 03:45:12 2018
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  ada0p2	ONLINE	   0	 0	 0

errors: No known data errors
 
Last edited by a moderator:

MeCJay12

Dabbler
Joined
Jul 20, 2018
Messages
18
Code:
root@freenas:~ # zpool status
  pool: ESXi
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:03:41 with 0 errors on Sun Jul  8 00:03:41 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		ESXi											ONLINE	   0	 0	 0
		  raidz1-0									  ONLINE	   0	 0	 0
			gptid/114a0892-23f4-11e8-97f9-000c29412593  ONLINE	   0	 0	 0
			gptid/11ad119a-23f4-11e8-97f9-000c29412593  ONLINE	   0	 0	 0
			gptid/11fe48a6-23f4-11e8-97f9-000c29412593  ONLINE	   0	 0	 0
			gptid/125e8169-23f4-11e8-97f9-000c29412593  ONLINE	   0	 0	 0

errors: No known data errors

  pool: Storage
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
		attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
		using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 483M in 0 days 00:19:15 with 0 errors on Fri Jul 20 14:25:02 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		Storage										 DEGRADED	 0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/10423df5-3838-11e8-b57e-000c29412593  ONLINE	   0	 0	 0
			gptid/e1e21a68-24c6-11e8-8808-000c29412593  ONLINE	   0	 0	 0
			gptid/e2b72606-24c6-11e8-8808-000c29412593  ONLINE	   0	 0	 0
			gptid/e3d793b6-24c6-11e8-8808-000c29412593  ONLINE	   0	 0	 0
		  raidz2-1									  DEGRADED	 0	 0	 0
			gptid/e6bd3ba9-24c6-11e8-8808-000c29412593  ONLINE	   0	 0	 0
			gptid/e7ff184b-24c6-11e8-8808-000c29412593  ONLINE	   0	 0	 0
			gptid/e98315ee-24c6-11e8-8808-000c29412593  ONLINE	   0	 0	 0
			gptid/ea812566-24c6-11e8-8808-000c29412593  ONLINE	   0	 0	 0
			gptid/ec0113f5-24c6-11e8-8808-000c29412593  ONLINE	   0	 0	 0
			gptid/edef2bfb-24c6-11e8-8808-000c29412593  ONLINE	   0	 0	 0
			gptid/f0453447-24c6-11e8-8808-000c29412593  ONLINE	   0	 0	 0
			gptid/d1ba9fb0-8b97-11e8-86ea-000c29412593  DEGRADED	 0	 0   238  too many errors

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:24 with 0 errors on Wed Jul 18 03:45:24 2018
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors


Will get a new drive Monday to burn and swap in.
 

MeCJay12

Dabbler
Joined
Jul 20, 2018
Messages
18
To any lurkers or future readers: I got a new drive, swapped that in, and everything is fixed. The new new drive allowed the array to resilver in hours not days and there are not checksum errors. The old new drive still appears as 'good" in CDI so it must be a slight malfunction causing my headache.
 
Status
Not open for further replies.
Top