Replacing disk in one of a pair of mirrors - decommissioning a mirror

Status
Not open for further replies.

alexmc

Dabbler
Joined
Sep 21, 2013
Messages
10
Hi,

I think I just need someone to tell me my plan is correct and that I should proceed.

I have a 4x2Tb FreeNas system which a mate set up several years ago. It now has FreeNas 9.3 which I realise is a bit old. One drive is failing SMART but still seems to be working... sorta. I believe I should of course replace the disk rather than wait for it to fail entirely.

Code:
[root@freenas ~]# smartctl -A /dev/ada1																							 
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p12 amd64] (local build)														 
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org														 
																																   
=== START OF READ SMART DATA SECTION ===																							
SMART Attributes Data Structure revision number: 16																				 
Vendor Specific SMART Attributes with Thresholds:																				   
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE									
  1 Raw_Read_Error_Rate	 0x000b   056   056   016	Pre-fail  Always	   -	   128006279									
  2 Throughput_Performance  0x0005   013   013   054	Pre-fail  Offline  FAILING_NOW 4576										 
  3 Spin_Up_Time			0x0007   149   149   024	Pre-fail  Always	   -	   360 (Average 409)							
  4 Start_Stop_Count		0x0012   100   100   000	Old_age   Always	   -	   63										   
  5 Reallocated_Sector_Ct   0x0033   001   001   005	Pre-fail  Always   FAILING_NOW 1612										 
  etc etc etc
																																	  
					 


Scrubs are taking a long time - which I guess is due to the failing drive.

ZFS seems to be running ok though - presumably the mirror is doing its job.

Code:
[root@freenas ~]# zpool status -v																								   
  pool: freenas-boot																												
 state: ONLINE																													 
  scan: scrub repaired 0 in 0h1m with 0 errors on Tue Dec 19 03:46:23 2017														 
config:																															 
																																   
	   NAME		STATE	 READ WRITE CKSUM																					 
	   freenas-boot  ONLINE	   0	 0	 0																					
		 da0p2	 ONLINE	   0	 0	 0																					 
																																   
errors: No known data errors																										
																																   
  pool: tank																														
 state: ONLINE																													 
status: One or more devices are configured to use a non-native block size.														 
	   Expect reduced performance.																								 
action: Replace affected devices with devices that support the																	 
	   configured block size, or migrate data to a properly configured															 
	   pool.																													   
  scan: scrub repaired 25.1M in 45h2m with 0 errors on Mon Jan  8 21:02:52 2018													 
config:																															 
																																   
	   NAME											STATE	 READ WRITE CKSUM												 
	   tank											ONLINE	   0	 0	 0												 
		 mirror-0									  ONLINE	   0	 0	 0												 
		   gptid/57fffda3-2545-11e2-8c0c-00151784eecc  ONLINE	   0	 0	 0												 
		   gptid/58add88c-2545-11e2-8c0c-00151784eecc  ONLINE	   0	 0	 0												 
		 mirror-1									  ONLINE	   0	 0	 0												 
		   gptid/72ef860a-2545-11e2-8c0c-00151784eecc  ONLINE	   0	 0	 0												 
		   gptid/73995ac8-2545-11e2-8c0c-00151784eecc  ONLINE	   0	 0	 0												 
	   logs																														
		 gptid/3cee3aee-b952-4d66-b591-3c9ffe5c743f	ONLINE	   0	 0	 0  block size: 512B configured, 4096B native	   
	   cache																													   
		 ada0p2										ONLINE	   0	 0	 0												 
																																   
errors: No known data errors																										
[root@freenas ~]#				   


So I have been reading the FreeNAS documentation for my version of FreeNAS (9.3).

The plan is

A) Delete unwanted data, Move as much as I can off the NAS, Backup whatever is left
B) Offline the device. (I think that /dev/ada1 is gptid/57fffda3-2545-11e2-8c0c-00151784eecc but am not sure - I think that I can offline /dev/ada1 no matter what its id is).
C) because the drive is half of a mirror then I don't believe I have to wait for anything. I don't know if offlining a drive causes the data to be copied off it onto other devices or whether FreeNAS just uses the other 'copies' of the data already stored.
D) Shutdown the whole NAS because my hardware does not seem to be AHCI compliant, and also I do not know physically which drive is which.
E) Remove the drive which seems to be the one matching the ID of ada1. (The FreeNAS View Disks page tells me the ID is MN5220F33W67EK)
F) put in the new drive and power the system back up to get to the GUI again
G) in the GUI find the OFFLINE disk ad confirm I removed the right one, "click the disk again and then click its “Replace” button. Select the replacement disk from the drop-down menu and click the “Replace Disk” button"
H) The previous step will resilver the disk - copying all the data back from the still working mirror disk onto the new disk. I need to wait a long time for this. Presumably I can only see what is going on with "zpool status -v"


Is that correct?


NOW I have two issues.

Problem One:
I bought a 4Tb drive without fully investigating the process. Presumably if I put a 4Tb drive in a mirror with a 2Tb drive then I will have wasted half of the drive until I also replace that second 2Tb drive with another 4Tb one. (I believe I also need to check an autoexpand setting which seems to be switched on for me).

Problem Two:
I am seeing people suggest RAIDZ2 as a better (though possibly slower) option than mirrors. If I move to that it sounds like I need to move all the data off and create a brand new FreeNAS 11 system with the same hardware wiping all the disks.
Is it worth the benefit? (2 disks can fail before data loss)

Does this all make sense?

Thanks


Hardware:
Code:
Build FreeNAS-9.3-STABLE-201503270027
Platform AMD Turion(tm) II Neo N40L Dual-Core Processor
Memory 8133MB
 

alexmc

Dabbler
Joined
Sep 21, 2013
Messages
10
I should say that the title meant to include a third option

Problem three: Should I decommission the damaged mirror so that all the data is moved off of it before replacing the failed drive? If so how might I do that?


I should also say that I do see a warning in the zpool status response.

Code:
status: One or more devices are configured to use a non-native block size.														 
	 Expect reduced performance.																								 
action: Replace affected devices with devices that support the																	 
	 configured block size, or migrate data to a properly configured															 
	 pool.   


I believe that the drive they are referring to is ada0 which is the 16Gb USB flash disk which stores the OS.
So it is something which could be improved but I don't think it is a major error.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Is that correct?
It will work, but it contains many unnecessary steps--in particular, there's no reason to be manually copying data around. Assuming you have a spare SATA port, here's a much simpler set of steps:
  • Shut down the server
  • Install the replacement drive
  • In the GUI, find ada1, click Replace, select the new disk.
  • When resilvering completes (you can track this on the volume status screen), shut down the server again and remove the old disk.
If you don't have a spare SATA port, then the steps look like this:
  • Offline ada1 through the GUI
  • Shut down the server
  • Remove ada1 (identify it by its serial number) and replace it with the new disk
  • In the GUI, find the long number that's replaced ada1, click Replace, select the new disk.
Someone should write a resource about that.

Presumably if I put a 4Tb drive in a mirror with a 2Tb drive then I will have wasted half of the drive until I also replace that second 2Tb drive with another 4Tb one.
Correct.

I am seeing people suggest RAIDZ2 as a better (though possibly slower) option than mirrors. If I move to that it sounds like I need to move all the data off and create a brand new FreeNAS 11 system with the same hardware wiping all the disks.
Well, there's no need to migrate to FreeNAS 11 just to create a RAIDZ2 pool. I prefer RAIDZ2 to mirrors for most use cases, but with a four-disk pool, I don't see any real benefit to making the change.
Should I decommission the damaged mirror so that all the data is moved off of it before replacing the failed drive? If so how might I do that?
There's no way to do this.
I believe that the drive they are referring to is ada0 which is the 16Gb USB flash disk which stores the OS.
No, it's referring to the log device (SLOG) you have in your pool, but shouldn't. Remove the cache (L2ARC) device as well while you're at it--neither of them is doing anything for you, and the L2ARC in particular is only going to slow things down until you get a lot more RAM.
 

alexmc

Dabbler
Joined
Sep 21, 2013
Messages
10
Thanks.

I was basically copying/moving/deleting data so that I had less to worry about if I really messed up. I was just being paranoid.

The machine seems to be maxed out at 4 drives. I cannot physically fit another drive in there. There might be a spare sata port on the MB, and there might be a power connector spare but I honestly cannot see them.

Thanks for all those tips and recommendations. They were very helpful.
 

alexmc

Dabbler
Joined
Sep 21, 2013
Messages
10
And an especially big thanks to the link to the resource. I did google and RTFM, honest :smile:
 
Status
Not open for further replies.
Top