Automatic Resilvering

Not open for further replies.

Simon Tiplady

Jun 19, 2015
While I was away I received an automated email from my FreeNAS server that it was resilvering a drive.
The volume z2 (ZFS) state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.

I hadn't seen or noticed any issues prior to this, but being away meant I didn't have a great deal of access and sort of forgot about it. Until I was working remotely today and realised I couldn't SSH into it.

I've taken a look tonight and couldn't access even over the local network, web GUI wasn't accessible either, and as I started to use a keyboard direct to the server, switched tty with ALT+F2 and typed "root" it then froze and nothing was responsive. Only option appeared to be to use the restart button on the case.

zpool status is showing:
# zpool status z2
  pool: z2
state: ONLINE
  scan: resilvered 30.7M in 0h24m with 0 errors on Sun Jan 29 05:04:48 2017

z2											  ONLINE	   0	 0	 0
  raidz2-0									  ONLINE	   0	 0	 0
	gptid/bbfac008-1e39-11e5-bcb6-74d435beb194  ONLINE	   0	 0	 0
	gptid/bc6092b0-1e39-11e5-bcb6-74d435beb194  ONLINE	   0	 0	 0
	gptid/bcc6dad2-1e39-11e5-bcb6-74d435beb194  ONLINE	   0	 0	 0
	gptid/bd254b2e-1e39-11e5-bcb6-74d435beb194  ONLINE	   0	 0	 0

errors: No known data errors

Ok, so it only appears to be 30.7M that required resilvering, but how can I find out what caused the resilver and if I should be ordering some spare harddrives ready to replace...

From the Reporting page of the Web GUI it looks like the stats stopped recording at around 4:35am on Sun Jan 29, this is before the resilver appeared to have finished. stats are not stored to the z2 storage.
Last edited by a moderator:

Simon Tiplady

Jun 19, 2015
Please provide hardware details as per the forum rules. You're more likely to responses by providing them.
Build FreeNAS-9.10.2-U1 (86c7ef5)
Platform Intel(R) Pentium(R) CPU G3240 @ 3.10GHz
Memory 8047MB

Base Board Information
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: Z97M-DS3H

Version: Intel(R) Pentium(R) CPU G3240 @ 3.10GHz

Drives from camcontrol devlist, 1st drive was a temporary drive I used to test Freenas before I bought the other drives and setup the ZFS. Didn't realise I still had the old seagate drive in there, thought that had already been swapped out, will be interesting to see if that was the problem drive.
<WDC WD10EFRX-68PJCN0 82.00A82>	at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD30EFRX-68EUZN0 82.00A82>	at scbus1 target 0 lun 0 (pass1,ada1)
<ST33000651AS \271\006\010???>	 at scbus2 target 0 lun 0 (pass2,ada2)
<WDC WD30EFRX-68EUZN0 82.00A82>	at scbus3 target 0 lun 0 (pass3,ada3)
<WDC WD30EFRX-68EUZN0 82.00A82>	at scbus4 target 0 lun 0 (pass4,ada4)

GUI shows raidz2 for the raid type, 4 x 3TB drives gives me about 4TB of space.

Robert Trevellyan

Pony Wrangler
May 16, 2014
The good news is that the pool looks healthy right now. I'd hazard a guess that a spontaneous resilver might be cause by a drive becoming unresponsive for a while, then coming back online.

Probably time to have a look at your SMART data using smartctl -x for each device.

Simon Tiplady

Jun 19, 2015
Probably time to have a look at your SMART data using smartctl -x for each device.

Seems to produce a lot of info, this is just one device, what should I be looking out for?

# smartctl -x /dev/ada0

smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke,

Model Family:	 Western Digital Red
Device Model:	 WDC WD10EFRX-68PJCN0
Serial Number:	WD-WCC4J6YPXNKT
LU WWN Device Id: 5 0014ee 2b6618fc5
Firmware Version: 82.00A82
User Capacity:	1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Wed Feb  1 07:33:54 2017 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (13320) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:  (   2) minutes.
Extended self-test routine
recommended polling time:  ( 152) minutes.
Conveyance self-test routine
recommended polling time:  (   5) minutes.
SCT capabilities:		 (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate	 POSR-K   200   200   051	-	0
  3 Spin_Up_Time			POS--K   141   139   021	-	3908
  4 Start_Stop_Count		-O--CK   100   100   000	-	22
  5 Reallocated_Sector_Ct   PO--CK   200   200   140	-	0
  7 Seek_Error_Rate		 -OSR-K   200   200   000	-	0
  9 Power_On_Hours		  -O--CK   081   081   000	-	14210
10 Spin_Retry_Count		-O--CK   100   253   000	-	0
11 Calibration_Retry_Count -O--CK   100   253   000	-	0
12 Power_Cycle_Count	   -O--CK   100   100   000	-	21
192 Power-Off_Retract_Count -O--CK   200   200   000	-	9
193 Load_Cycle_Count		-O--CK   200   200   000	-	446
194 Temperature_Celsius	 -O---K   106   093   000	-	37
196 Reallocated_Event_Count -O--CK   200   200   000	-	0
197 Current_Pending_Sector  -O--CK   200   200   000	-	0
198 Offline_Uncorrectable   ----CK   100   253   000	-	0
199 UDMA_CRC_Error_Count	-O--CK   200   200   000	-	0
200 Multi_Zone_Error_Rate   ---R--   200   200   000	-	0
							||||||_ K auto-keep
							|||||__ C event count
							||||___ R error rate
							|||____ S speed/performance
							||_____ O updated online
							|______ P prefailure warning

General Purpose Log Directory Version 1
SMART		   Log Directory Version 1 [multi-sector log support]
Address	Access  R/W   Size  Description
0x00	   GPL,SL  R/O	  1  Log Directory
0x01		   SL  R/O	  1  Summary SMART error log
0x02		   SL  R/O	  5  Comprehensive SMART error log
0x03	   GPL	 R/O	  6  Ext. Comprehensive SMART error log
0x06		   SL  R/O	  1  SMART self-test log
0x07	   GPL	 R/O	  1  Extended self-test log
0x09		   SL  R/W	  1  Selective self-test log
0x10	   GPL	 R/O	  1  SATA NCQ Queued Error log
0x11	   GPL	 R/O	  1  SATA Phy Event Counters log
0x21	   GPL	 R/O	  1  Write stream error log
0x22	   GPL	 R/O	  1  Read stream error log
0x80-0x9f  GPL,SL  R/W	 16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS	  16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS	   1  Device vendor specific log
0xbd	   GPL,SL  VS	   1  Device vendor specific log
0xc0	   GPL,SL  VS	   1  Device vendor specific log
0xc1	   GPL	 VS	  93  Device vendor specific log
0xe0	   GPL,SL  R/W	  1  SCT Command/Status
0xe1	   GPL,SL  R/W	  1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 14203		 -
# 2  Short offline	   Completed without error	   00%	 14179		 -
# 3  Extended offline	Completed without error	   00%	 14160		 -
# 4  Short offline	   Completed without error	   00%	 14155		 -
# 5  Short offline	   Completed without error	   00%	 14131		 -
# 6  Short offline	   Completed without error	   00%	 14107		 -
# 7  Short offline	   Completed without error	   00%	 14083		 -
# 8  Short offline	   Completed without error	   00%	 14059		 -
# 9  Short offline	   Completed without error	   00%	 14035		 -
#10  Short offline	   Completed without error	   00%	 14011		 -
#11  Extended offline	Completed without error	   00%	 13992		 -
#12  Short offline	   Completed without error	   00%	 13987		 -
#13  Short offline	   Completed without error	   00%	 13963		 -
#14  Short offline	   Completed without error	   00%	 13939		 -
#15  Short offline	   Completed without error	   00%	 13915		 -
#16  Short offline	   Completed without error	   00%	 13891		 -
#17  Short offline	   Completed without error	   00%	 13867		 -
#18  Short offline	   Completed without error	   00%	 13843		 -

SMART Selective self-test log data structure revision number 1
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:				  3
SCT Version (vendor specific):	   258 (0x0102)
SCT Support Level:				   1
Device State:						Active (0)
Current Temperature:					37 Celsius
Power Cycle Min/Max Temperature:	 28/42 Celsius
Lifetime	Min/Max Temperature:	 19/50 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:	 2
Temperature Sampling Period:		 1 minute
Temperature Logging Interval:		1 minute
Min/Max recommended Temperature:	  0/60 Celsius
Min/Max Temperature Limit:		   -41/85 Celsius
Temperature History Size (Index):	478 (176)

Index	Estimated Time   Temperature Celsius
177	2017-01-31 23:36	37  ******************
...	..(202 skipped).	..  ******************
380	2017-02-01 02:59	37  ******************
381	2017-02-01 03:00	38  *******************
...	..( 24 skipped).	..  *******************
406	2017-02-01 03:25	38  *******************
407	2017-02-01 03:26	37  ******************
...	..(154 skipped).	..  ******************
  84	2017-02-01 06:01	37  ******************
  85	2017-02-01 06:02	38  *******************
...	..(  4 skipped).	..  *******************
  90	2017-02-01 06:07	38  *******************
  91	2017-02-01 06:08	39  ********************
...	..( 14 skipped).	..  ********************
106	2017-02-01 06:23	39  ********************
107	2017-02-01 06:24	38  *******************
...	..( 35 skipped).	..  *******************
143	2017-02-01 07:00	38  *******************
144	2017-02-01 07:01	37  ******************
...	..( 31 skipped).	..  ******************
176	2017-02-01 07:33	37  ******************

SCT Error Recovery Control:
		  Read:	 70 (7.0 seconds)
		  Write:	 70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID	  Size	 Value  Description
0x0001  2			0  Command failed due to ICRC error
0x0002  2			0  R_ERR response for data FIS
0x0003  2			0  R_ERR response for device-to-host data FIS
0x0004  2			0  R_ERR response for host-to-device data FIS
0x0005  2			0  R_ERR response for non-data FIS
0x0006  2			0  R_ERR response for device-to-host non-data FIS
0x0007  2			0  R_ERR response for host-to-device non-data FIS
0x0008  2			0  Device-to-host non-data FIS retries
0x0009  2		   30  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2		   30  Device-to-host register FISes sent due to a COMRESET
0x000b  2			0  CRC errors within host-to-device FIS
0x000f  2			0  R_ERR response for host-to-device data FIS, CRC
0x0012  2			0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4	  2402075  Vendor specific
Not open for further replies.