First Resilver [Surprised it took this long to happen]

Status
Not open for further replies.

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Hey Guru's,

I started my FreeNAS journy after knocking a WD MyBook drive off my desk one day 3 years ago and here we are today with me still knowing almost nothing about FreeNAS other than it just works amazingly. I wanted to ask if you think this is good on performance and if I were to have to replace a drive what would be the total time to rebuild? Think my build is still in my sig.
It took 30m to do 80MB's I'm assuming which means I'd be looking at two years to rebuild so I know my math is off somewhere.
Code:
pool: zedpm2.0
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
		continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun May 14 00:02:59 2017
		13.4T scanned out of 13.5T at 3.33G/s, 0h0m to go
		80.1M resilvered, 99.55% done
config:

		NAME											STATE	 READ WRITE CKSUM
		zedpm2.0										ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/e5fbed04-a8aa-11e3-ad86-20cf3007fa56  ONLINE	   0	 0	 0
			gptid/e6b18810-a8aa-11e3-ad86-20cf3007fa56  ONLINE	   0	 0	 0
			gptid/e7288a04-a8aa-11e3-ad86-20cf3007fa56  ONLINE	   0	 0	 0
			gptid/e7ce827f-a8aa-11e3-ad86-20cf3007fa56  ONLINE	   0	 0	 0  (resilvering)
			gptid/e83bc95a-a8aa-11e3-ad86-20cf3007fa56  ONLINE	   0	 0	 0
			gptid/fe497ac2-aa41-11e3-aa5a-20cf3007fa56  ONLINE	   0	 0	 0

 

zoomzoom

Guru
Joined
Sep 6, 2015
Messages
677
With a 32TB Z2 pool and ~10TB utilization, it normally takes ~24 - 32hrs to resilver one of my drives with a 5900RM spindle speed. Provided all the drives are hooked to SATA 3 ports, resilver speeds should be around 350MBytes/s.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
It looks like it was almost finished (99.55%) :) when you ran zpool status.

Code:
scan: resilver in progress since Sun May 14 00:02:59 2017
		13.4T scanned out of 13.5T at 3.33G/s, 0h0m to go
		80.1M resilvered, 99.55% done
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
80MB is the data that was wrong and got corrected. All the data was checked, Ie 13.5TiB
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
That sounds more like a 'catch-up' resilver, not a full resilver. Did the drive drop out and then come back online? Better check it out.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
That sounds more like a 'catch-up' resilver, not a full resilver. Did the drive drop out and then come back online? Better check it out.

That's certainly what it looks like. If you power cycle a drive or pull it and put it back, it'll resilver like that, and the 80MB is most likely the data that was written while the drive was not part of the array.

BUT why did it drop out?
 

zoomzoom

Guru
Joined
Sep 6, 2015
Messages
677
@Stux Great question. @Nomad have you seen any CAM status errors output after or during boot?
  • I know when I had issues with drives dropping out and resilvering it was due to SATA cables intermittently failing (SilverStone CP11 gen 2's [black], gen 1's [blue] don't have this issue) and throwing CAM status errors.
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Code:
  pool: zedpm2.0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
		continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jun 25 01:24:15 2017
		6.95T scanned out of 13.0T at 3.45G/s, 0h29m to go
		173M resilvered, 53.48% done
config:

		NAME											STATE	 READ WRITE CKSUM
		zedpm2.0										ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/e5fbed04-a8aa-11e3-ad86-20cf3007fa56  ONLINE	   0	 0	 0
			gptid/e6b18810-a8aa-11e3-ad86-20cf3007fa56  ONLINE	   0	 0	 0
			gptid/e7288a04-a8aa-11e3-ad86-20cf3007fa56  ONLINE	   0	 0	 0
			gptid/e7ce827f-a8aa-11e3-ad86-20cf3007fa56  ONLINE	   0	 0	 0  (resilvering)
			gptid/e83bc95a-a8aa-11e3-ad86-20cf3007fa56  ONLINE	   0	 0	 0
			gptid/fe497ac2-aa41-11e3-aa5a-20cf3007fa56  ONLINE	   0	 0	 0


Same Drive, happened again. Guess it's time to swap it out.
 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
Code:
[root@freenas] /mnt/zedpm2.0# smartctl -a /dev/ada4
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Caviar Green (AF)
Device Model:	 WDC WD30EZRS-11J99B1
Serial Number:	WD-WCAWZ0396719
LU WWN Device Id: 5 0014ee 205aaa971
Firmware Version: 80.00A80
User Capacity:	3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:	Sun Jun 25 01:53:05 2017 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
										was completed without error.
										Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(50160) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 482) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x3035) SCT Status supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   197   134   021	Pre-fail  Always	   -	   7116
  4 Start_Stop_Count		0x0032   099   099   000	Old_age   Always	   -	   1144
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   039   039   000	Old_age   Always	   -	   45112
 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   200
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   118
193 Load_Cycle_Count		0x0032   001   001   000	Old_age   Always	   -	   1263643
194 Temperature_Celsius	 0x0022   123   080   000	Old_age   Always	   -	   29
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   5

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended captive	Interrupted (host reset)	  90%	 13392		 -
# 2  Extended offline	Aborted by host			   90%	 13392		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Nomad

Contributor
Joined
Oct 14, 2013
Messages
125
@Stux Great question. @Nomad have you seen any CAM status errors output after or during boot?
  • I know when I had issues with drives dropping out and resilvering it was due to SATA cables intermittently failing (SilverStone CP11 gen 2's [black], gen 1's [blue] don't have this issue) and throwing CAM status errors.
How would I check?
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Status
Not open for further replies.
Top