Register for the iXsystems Community to get an ad-free experience

Drives unavail after update

Western Digital Drives - The Preferred Drives of FreeNAS and TrueNAS CORE
Status
Not open for further replies.

othnin

Cadet
Joined
Feb 6, 2014
Messages
7
Hi,
I upgraded from 9.3 to 9.10by creating a new USB ISO and then going through the installation/upgrade process. When I get into the GUI it shows that my storage is degraded. I took a look at it from the CLI and here's what I see:
Code:
[root@elysium] ~# zpool status -v
  pool: Lyceum_Volume
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
		the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub in progress since Fri Mar 31 10:03:30 2017
		21.9G scanned out of 8.42T at 121M/s, 20h13m to go
		0 repaired, 0.25% done
config:

		NAME											STATE	 READ WRITE CKSUM
		Lyceum_Volume								   DEGRADED	 0	 0	 0
		  raidz2-0									  DEGRADED	 0	 0	 0
			gptid/662623c3-b553-11e3-9b2b-d850e64ec83b  ONLINE	   0	 0	 0
			gptid/66c6668f-b553-11e3-9b2b-d850e64ec83b  ONLINE	   0	 0	 0
			7655017483856963116						 UNAVAIL	  0	 0	 0  was /dev/gptid/ab7c0ab6-f2d4-11e4-b2ee-d850e64ec83b
			gptid/6860bb9c-b553-11e3-9b2b-d850e64ec83b  ONLINE	   0	 0	 0
			gptid/6900d3e8-b553-11e3-9b2b-d850e64ec83b  ONLINE	   0	 0	 0
			5236701102348180652						 UNAVAIL	  0	 0	 0  was /dev/gptid/699d0a61-b553-11e3-9b2b-d850e64ec83b

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: none requested
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da6p2	 ONLINE	   0	 0	 0

errors: No known data errors


[root@elysium] ~# camcontrol devlist
<ATA WDC WD30EFRX-68A 0A80>		at scbus0 target 0 lun 0 (pass0,da0)
<ATA WDC WD30EFRX-68A 0A80>		at scbus0 target 1 lun 0 (pass1,da1)
<ATA WDC WD30EFRX-68A 0A80>		at scbus0 target 4 lun 0 (pass2,da2)
<ATA WDC WD30EFRX-68A 0A80>		at scbus0 target 5 lun 0 (pass3,da3)
<ATA WDC WD30EFRX-68A 0A80>		at scbus0 target 6 lun 0 (pass4,da4)
<ATA WDC WD30EFRX-68E 0A82>		at scbus0 target 7 lun 0 (pass5,da5)
<SanDisk Cruzer 1.27>			  at scbus8 target 0 lun 0 (pass6,da6)


[root@elysium] ~# gpart show
=>		34  5860533101  da0  GPT  (2.7T)
		  34		  94	   - free -  (47K)
		 128	 4194304	1  freebsd-swap  (2.0G)
	 4194432  5856338696	2  freebsd-zfs  (2.7T)
  5860533128		   7	   - free -  (3.5K)

=>		34  5860533101  da1  GPT  (2.7T)
		  34		  94	   - free -  (47K)
		 128	 4194304	1  freebsd-swap  (2.0G)
	 4194432  5856338696	2  freebsd-zfs  (2.7T)
  5860533128		   7	   - free -  (3.5K)

=>		34  5860533101  da2  GPT  (2.7T)
		  34		  94	   - free -  (47K)
		 128	 4194304	1  freebsd-swap  (2.0G)
	 4194432  5856338696	2  freebsd-zfs  (2.7T)
  5860533128		   7	   - free -  (3.5K)

=>		34  5860533101  da3  GPT  (2.7T)
		  34		  94	   - free -  (47K)
		 128	 4194304	1  freebsd-swap  (2.0G)
	 4194432  5856338696	2  freebsd-zfs  (2.7T)
  5860533128		   7	   - free -  (3.5K)

=>		34  5860533101  da5  GPT  (2.7T)
		  34		   6	   - free -  (3.0K)
		  40		1024	1  bios-boot  (512K)
		1064  5860532064	2  freebsd-zfs  (2.7T)
  5860533128		   7	   - free -  (3.5K)

=>	  34  15633341  da6  GPT  (7.5G)
		34	  1024	1  bios-boot  (512K)
	  1058		 6	   - free -  (3.0K)
	  1064  15632304	2  freebsd-zfs  (7.5G)
  15633368		 7	   - free -  (3.5K)

[root@elysium] ~# glabel status
									  Name  Status  Components
gptid/662623c3-b553-11e3-9b2b-d850e64ec83b	 N/A  da0p2
gptid/66c6668f-b553-11e3-9b2b-d850e64ec83b	 N/A  da1p2
gptid/6860bb9c-b553-11e3-9b2b-d850e64ec83b	 N/A  da2p2
gptid/6900d3e8-b553-11e3-9b2b-d850e64ec83b	 N/A  da3p2
gptid/a8a756e5-1504-11e7-962d-d850e64ec83b	 N/A  da5p1
gptid/a8b41233-1504-11e7-962d-d850e64ec83b	 N/A  da5p2
gptid/1056f1ff-161f-11e7-9a38-d850e64ec83b	 N/A  da6p1

[root@elysium] ~# zpool online Lyceum_Volume /dev/gptid/ab7c0ab6-f2d4-11e4-b2ee-d850e64ec83b
warning: device '/dev/gptid/ab7c0ab6-f2d4-11e4-b2ee-d850e64ec83b' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present


Seems unlikely that I lost 2 drives after an upgrade. Any ideas on how or if I can bring them online?

thanks.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
9,557
While it may seem coincidental to have the two drive failures, is it possible this existed before your upgrade?

Regardless I would ensure you have a backup of your important data becasue you are near complete failure of your pool. I'd then try to online one drive and see where it takes you. There is more to try but lets see what happens first.
 

othnin

Cadet
Joined
Feb 6, 2014
Messages
7
I don't think this was an issue before the upgrade. My USB stick was acting crappy so that was an impetus to update the system. In the GUI there is no online button for the drives, just replace. When I run the online from the CLI I get:
Code:
[root@elysium] ~# zpool online Lyceum_Volume /dev/gptid/ab7c0ab6-f2d4-11e4-b2ee-d850e64ec83b
warning: device '/dev/gptid/ab7c0ab6-f2d4-11e4-b2ee-d850e64ec83b' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
 

othnin

Cadet
Joined
Feb 6, 2014
Messages
7
Did some poking around the system trying to figure out which drives are failing. There is nothing in the GUI that points to the failing drives except for the error traffic light in the upper right corner. This by itself is pretty troubling that the GUI can't tell me anything about this. Is this standard behavior?
Down at the CLI the zpool status shows as degraded as I had shown earlier. glabel status shows the following:
Code:
root@elysium] ~# glabel status
									  Name  Status  Components
gptid/662623c3-b553-11e3-9b2b-d850e64ec83b	 N/A  da0p2
gptid/66c6668f-b553-11e3-9b2b-d850e64ec83b	 N/A  da1p2
gptid/6860bb9c-b553-11e3-9b2b-d850e64ec83b	 N/A  da2p2
gptid/6900d3e8-b553-11e3-9b2b-d850e64ec83b	 N/A  da3p2
gptid/a8a756e5-1504-11e7-962d-d850e64ec83b	 N/A  da5p1
gptid/a8b41233-1504-11e7-962d-d850e64ec83b	 N/A  da5p2
gptid/1056f1ff-161f-11e7-9a38-d850e64ec83b	 N/A  da6p1

The id's for the failing drive isn't shown here so using the process of elimination it looks like da5 is failing. But I don't see da4 at all. I see it with the following:
Code:
[root@elysium] ~# smartctl -a /dev/da4
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD30EFRX-68AX9N0
Serial Number:	WD-WCC1T0857780
LU WWN Device Id: 5 0014ee 2086dcc0a
Firmware Version: 80.00A80
User Capacity:	3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Fri Apr  7 13:30:54 2017 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


So is this drive failing also?
Does it seem like replacing da4 and da5 is correct? Have I dropped the ball somewhere?

Thanks.
 

Ericloewe

Not-very-passive-but-aggressive
Moderator
Joined
Feb 15, 2014
Messages
18,084

othnin

Cadet
Joined
Feb 6, 2014
Messages
7
Here is the full output:
Code:
root@elysium] ~# smartctl -a /dev/da5
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD30EFRX-68EUZN0
Serial Number:	WD-WMC4N0H87M9W
LU WWN Device Id: 5 0014ee 60546961b
Firmware Version: 82.00A82
User Capacity:	3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Fri Apr  7 13:31:02 2017 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(39840) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 399) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x703d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   192   177   021	Pre-fail  Always	   -	   5400
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   28
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   079   079   000	Old_age   Always	   -	   15421
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   28
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   26
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   398
194 Temperature_Celsius	 0x0022   123   111   000	Old_age   Always	   -	   27
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   100   253   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Ericloewe

Not-very-passive-but-aggressive
Moderator
Joined
Feb 15, 2014
Messages
18,084
Yeah, it's hard to reach conclusions without any tests.
 

othnin

Cadet
Joined
Feb 6, 2014
Messages
7
The shorttest with da4:
Code:
[root@elysium ~]# smartctl -l selftest /dev/da4																					 
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)															 
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org														 
																																	
=== START OF READ SMART DATA SECTION ===																							
SMART Self-test log structure revision number 1																					 
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error									 
# 1  Short offline	   Completed without error	   00%	 15421		 -													 


da5 the one I'm "pretty sure" is failing show the following when I try to run the test:
Code:
[root@elysium ~]# smartctl -t short /dev/da5																						
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)															 
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org														 
																																	
/dev/da5: Unknown USB bridge [0x0781:0x5530 (0x127)]																				
Please specify device type with the -d option.																					 
																																	
Use smartctl -h to get a usage summary		 
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
/dev/da5: Unknown USB bridge
This caught my eye.

Your original gpart show shows 6 drives, two of which have FreeNAS installed, one of which (da5) is a 3TB drive. So it looks like you lost one of your pool devices to a mistake with installation or boot device mirroring. There was no trace of da4, which kinda suggests that drive is blank.

I suggest you begin by following @joeschmuck's advice to backup your data, because your pool currently has no redundancy. Then carefully, positively identify each drive that's active in the degraded pool, and remove and wipe the 3TB drive that isn't meant to be part of the boot pool. Use that drive to replace one missing pool device and restore some redundancy. Then proceed with further troubleshooting.
 
Status
Not open for further replies.
Top