Unable to Offline Disk: Solved

Status
Not open for further replies.

Jerami1981

Dabbler
Joined
Jan 4, 2018
Messages
32
I have a failing HDD. When I try to offline the disk, I get an error saying "[MiddlewareError: b'Disk offline failed: "cannot offline gptid/a040ac5b-dd2d-11e7-9393-782bcb2ca227: no valid replicas, "']"

How do i go about prepping the system so I can replace this disk?
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
You need to provide all details including model numbers of all components. In addition to that, we need to see the output of zpool status
 

Jerami1981

Dabbler
Joined
Jan 4, 2018
Messages
32
Dell R510
Dual Xeon X5650 2.67 GHz Hexacore
64 GB DDR3 ECC RAM
Mix of drives. Mostly 4TB Seagate drives, with a few WD Reds mixed in if I recall correctly.
Mellanox 10G Fiber card

https://pastebin.com/pV514QQ5
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
How are your drives connected? What is your HBA?
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
You issue is that one disk is already missing:
Code:
3386550101567640421						 UNAVAIL	  0	 0	 0  was /dev/gptid/9ee95243-dd2d-11e7-9393-782bcb2ca227

And now your trying to remove another drive:
a040ac5b-dd2d-11e7-9393-782bcb2ca227
you need to replace a040ac5b.... and fully resilver before you and remove any other drives. If you force it, you will lose the entire array.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
It really is better to keep things on the forums--there's very rarely a need to point to external sites for text or image hosting, and it just makes your posts harder to read.

The problem, as noted above, is that you already have one disk offline, and your 12 x 4 TB RAIDZ1 pool only has one disk's redundancy. Taking another disk offline will lose the pool. You need to fix the disk that's already offline before trying to remove another one.

Once your pool is healthy, you should seriously re-evaluate your pool configuration. RAIDZ1 isn't recommended for much of anything any more, and 12-disk-wide vdevs further increase your risk (and decrease your performance).
 

Jerami1981

Dabbler
Joined
Jan 4, 2018
Messages
32
When pasting an output, is there something other than simply dumping it right into the standard chat box that i should use?

I am fairly certain the HBA is a Perc H310 flashed to IT mode.

How do I go about figuring out which disk is offline already? Just watch the activity lights on the server? I assume I need to power the system down when I do ANY disk replacements?

Unfortunately given the smaller size of these disk, the RAIDz1 was the only way to not already be out of space. My back up NAS server is RaidZ2 with 8TB disks, but I haven't been able to afford to upgrade the primary from 's to 8's yet. So for now I just need to keep it limping along.

Edit: I believe I figured out the first offline disk. When I go to shutdown the server, it is giving me a warning about currently doing a scrub/resilver. Do I need to wait for this to finish before shutting down, or go ahead and shutdown anyways.
 
Last edited:

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
When pasting an output, is there something other than simply dumping it right into the standard chat box that i should use?
Use the code tags in the editor. It will keep it readable and keep the formatting from the terminal output.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I assume I need to power the system down when I do ANY disk replacements?
If your hardware supports hot-swap, FreeNAS does as well, so you may not need to power down the system--but it's probably safer to do so.
How do I go about figuring out which disk is offline already?
Process of elimination always works, even if it's a pain--note the serial numbers of the disks currently in the pool (the View Disks page shows them), check your existing disks, and the one whose serial isn't in the pool is your culprit.
 

Jerami1981

Dabbler
Joined
Jan 4, 2018
Messages
32
I have now documented all serial numbers and locations. I replaced the disk that was completely offline. I powered the system back on and it appears to be doing another Scrub. Will the new disk get resilvered in once the Scrub is done, or will i need to manually start that.

Zpool status now shows:
Code:
[root@freenas ~]# zpool status																									
  pool: FreeNasStorage																											
 state: DEGRADED																													
status: One or more devices could not be opened.  Sufficient replicas exist for													
	   the pool to continue functioning in a degraded state.																	  
action: Attach the missing device and online it using 'zpool online'.															  
   see: http://illumos.org/msg/ZFS-8000-2Q																						
  scan: scrub in progress since Sat Jul 21 11:18:10 2018																			
	   693G scanned out of 30.3T at 321M/s, 26h53m to go																		  
	   0 repaired, 2.23% done																									
config:																															
																																  
	   NAME											STATE	 READ WRITE CKSUM												
	   FreeNasStorage								  DEGRADED	 0	 0	 0												
		 raidz1-0									  DEGRADED	 0	 0	 0												
		   gptid/9735eeb9-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
		   gptid/98976049-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
		   gptid/99d5d6e9-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
		   gptid/9ad2b7c0-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
		   gptid/9bf206f5-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
		   gptid/9d57bc7a-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
		   3386550101567640421						 UNAVAIL	  0	 0	 0  was /dev/gptid/9ee95243-dd2d-11e7-9393-782bcb2ca
227																																
		   gptid/a040ac5b-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
		   gptid/a1954532-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
		   gptid/a2e25b7b-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
		   gptid/a45475bb-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
		   gptid/a596ae03-dd2d-11e7-9393-782bcb2ca227  ONLINE	   0	 0	 0												
																																  
errors: No known data errors																										
																																  
  pool: freenas-boot																												
 state: ONLINE																													
  scan: scrub repaired 0 in 0h1m with 0 errors on Sat Jul 21 03:46:12 2018														
config:																															
																																  
	   NAME		STATE	 READ WRITE CKSUM																					
	   freenas-boot  ONLINE	   0	 0	 0																					
		 mirror-0  ONLINE	   0	 0	 0																					
		   da14p2  ONLINE	   0	 0	 0																					
		   da15p2  ONLINE	   0	 0	 0																					
																																  
errors: No known data errors

Should I be worried about that disk still showing "unavailable, or is that typical until after it has been resilvered? I did notice the new drives serial number shows up in my View Disks mode, so it appears to be recognizing the disk at least.

Edit: I figured out how to bring the new disk online, and it appears to be doing the resilver process. I will check back in, in a day or 2 when it finishes.
 
Last edited:

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
If you *replaced* the unavailable disk, it should be gone and replaced with the new one. can you provide the output of zpool list -v? Derp, you just did that.
EDIT: Sorry I'm reading a book, watching training videos, and working on some iocage stuff all at the same time. :confused:
 

Jerami1981

Dabbler
Joined
Jan 4, 2018
Messages
32
No worries kdragon.
Final update: After the first disk finished resilvering, I was able to offline the disk with bad sectors. I replaced it, and the resilver process has finished, and now the system is blinking the happy light! Thanks for the help everyone. Now to set up alerts so I don't have 2 disks go down without me knowing about it :(
 
Status
Not open for further replies.
Top