Replacing a Faulted Disk

Status
Not open for further replies.

netman06

Dabbler
Joined
Sep 13, 2012
Messages
21
Hello,

I have been using FreeNAS for about 2+ years and have not run into this issue yet.

I normally follow this document and all of my problems were able to get fixed using it.

But this time I have a Disk labeled Faulted and the Volume is Degraded.

So, I have been researching this for 2 days and have read many posts.

I'm running a smartctl -t long /dev/ada7, but I think that I just need to replace this disk.

As of now that only option that I have is Replace for the Disk.

Will it be safe to shutdown and replace drive, then follow this article.

Thanks for any information,

Before physically removing the failed device, go to Storage ‣ Volumes ‣ View Volumes. Next, select your volume’s name. At the bottom of the interface you will see several icons, one of which is “Volume Status”. Click the “Volume Status” icon and locate the failed disk. Once you have located the failed device in the GUI, perform the following steps:

  1. If the disk is formatted with ZFS, click the disk’s entry then its “Offline” button in order to change that disk’s status to OFFLINE. This step is needed to properly remove the device from the ZFS pool and to prevent swap issues. If your hardware supports hot-pluggable disks, click the disk’s “Offline” button, pull the disk, then skip to step 3. If there is no “Offline” button but only a “Replace” button, then the disk is already offlined and you can safely skip this step.

    Note


    if the process of changing the disk’s status to OFFLINE fails with a “disk offline failed - no valid replicas” message, you will need to scrub the ZFS volume first using its “Scrub Volume” button in Storage ‣ Volumes ‣ View Volumes. Once the scrub completes, try to “Offline” the disk again before proceeding.

  2. If the hardware is not AHCI capable, shutdown the system in order to physically replace the disk. When finished, return to the GUI and locate the OFFLINE disk.

  3. Once the disk has been replaced and is showing as OFFLINE, click the disk again and then click its “Replace” button. Select the replacement disk from the drop-down menu and click the “Replace Disk” button. If the disk is a member of an encrypted ZFS pool, the menu will also prompt you to input and confirm the passphrase for the pool. Once you click the “Replace Disk” button, the ZFS pool will start to resilver and the status of the resilver will be displayed.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Well, without knowing anything about your version of FreeNAS, your hardware, your pool configuration, or where you got those instructions from, it's hard to say. The instructions look like they're from one of the manuals, which is a good thing--if they're from the manual matching your installed version, even better.

Make sure you remove the correct disk. The View Disks option in the GUI will let you match up /dev/ada7 with its serial number; make sure that number matches the disk you pull. With that caveat, and assuming the instructions you quoted are from the manual that matches your installed version of FreeNAS, you should be good to go.
 

netman06

Dabbler
Joined
Sep 13, 2012
Messages
21
Hello danb35,

Thanks for looking at my issue.

I'm running version 9.2.1.3

Also, this is from a FreeNAS Manual and not some Internet site.

I looked it up again in my version User Manual.

6.3.12 Replacing a Failed Drive If you are using any form of redundant RAID, you should replace a failed drive as soon as possible to repair the degraded state of the RAID. Depending upon the capability of your hardware, you may or may not need to reboot in order to replace the failed drive. AHCI capable hardware does not require a reboot. FreeNAS® 9.2.1 Users Guide Page 133 of 280 NOTE: a stripe (RAID0) does not provide redundancy. If you lose a disk in a stripe, you will need to recreate the volume and restore the data from backup. Before physically removing the failed device, go to Storage → Volumes → View Volumes → Volume Status and locate the failed disk. Once you have located the failed device in the GUI, perform the following steps: 1. If the disk is formatted with ZFS, click the disk's entry then its “Offline” button in order to change that disk's status to OFFLINE. This step is needed to properly remove the device from the ZFS pool and to prevent swap issues. If your hardware supports hot-pluggable disks, click the disk's “Offline” button, pull the disk, then skip to step 3. If there is no “Offline” button but only a “Replace” button, then the disk is already offlined and you can safely skip this step. NOTE: if the process of changing the disk's status to OFFLINE fails with a “disk offline failed - no valid replicas” message, you will need to scrub the ZFS volume first using its Scrub Volume button in Storage → Volumes → View Volumes. Once the scrub completes, try to Offline the disk again before proceeding. 2. If the hardware is not AHCI capable, shutdown the system in order to physically replace the disk. When finished, return to the GUI and locate the OFFLINE disk. 3. Once the disk is showing as OFFLINE, click the disk again and then click its “Replace” button. Select the replacement disk from the drop-down menu and click the “Replace Disk” button. If the disk is a member of an encrypted ZFS pool, you will be prompted to input the passphrase for the pool. Once you click the “Replace Disk” button, the ZFS pool will start to resilver. You can use the zpool status command in Shell to monitor the status of the resilvering. 4. If the replaced disk continues to be listed after resilvering is complete, click its entry and use the “Detach” button to remove the disk from the list. In the example shown in Figure 6.3s, a failed disk is being replaced by disk ada2 in the volume named volume1. FreeNAS® 9.2.1 Users Guide


But, my main concern is if the disk does not show a offline button/option and the Volume is in a Degraded state, and I'm running ZFS in Raidz2.

I would shutdown the system, replace the correct disk by idenifity it with its serial number.

Then start system, then find old disk and click on the Replace button. Then the system will start the resilvering process.

Thanks for the help,
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
If the disk is marked as faulted, I believe it's already offline. You should be fine to shut down, identify the faulty disk by serial, replace it, power on, then replace through the web GUI. DO NOT UNDER ANY CIRCUMSTANCES use the volume manager to do this, because it won't be doing what you think it is (the caps is because too many people seem to think that's the way to replace a disk, and end up striping a single disk into their RAIDZ array).
 

Ed Clarke

Dabbler
Joined
Jul 22, 2014
Messages
11
I'm sort of in the same situation as above but I have a different question. When I first installed FreeNAS, I followed the instructions to the best of my ability. This included a SMART test that took several days to ensure that the drives were in good condition. I now have a failed drive (email messages at bottom of this posting).

When I get the new drive (I'm not going to try and re-seat or do other repairs), how do I get those SMART tests done? Do I need to shut down FreeNAS pull the good drives and run the SMART tests stand alone on this one drive? It'll be annoying to be without a NAS for several days but I'll do it if necessary.

------------------------------------ email messages below ----------------------------

Code:
Device: /dev/da2 [SAT], 192 Currently unreadable (pending) sectors
Device: /dev/da2 [SAT], Read SMART Self-Test Log Failed
Device: /dev/da2 [SAT], 192 Offline uncorrectable sectors
Device: /dev/da2 [SAT], not capable of SMART self-check
Device: /dev/da2 [SAT], Read SMART Error Log Failed
Device: /dev/da2 [SAT], failed to read SMART Attribute Data
The volume volume1 (ZFS) state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state.

And another message:
Code:
  pool: volume1
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 60K in 0h0m with 0 errors on Sat Oct 29 17:04:13 2016
config:

	NAME											STATE	 READ WRITE CKSUM
	volume1										 DEGRADED	 0	 0	 0
	  raidz3-0									  DEGRADED	 0	 0	 0
		gptid/1e13fe7f-1b75-11e4-ab2a-001ec9aadb11  ONLINE	   0	 0	 0
		gptid/1e7de006-1b75-11e4-ab2a-001ec9aadb11  ONLINE	   0	 0	 0
		10349220779226024032						UNAVAIL	  3   239	 0  was /dev/gptid/1ef014db-1b75-11e4-ab2a-001ec9aadb11
		gptid/1f5ec7bc-1b75-11e4-ab2a-001ec9aadb11  ONLINE	   0	 0	 0
		gptid/1fd498da-1b75-11e4-ab2a-001ec9aadb11  ONLINE	   0	 0	 0
		gptid/2043e8b7-1b75-11e4-ab2a-001ec9aadb11  ONLINE	   0	 0	 0
		gptid/20b59400-1b75-11e4-ab2a-001ec9aadb11  ONLINE	   0	 0	 0

errors: No known data errors
 
Last edited by a moderator:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
If you have a place to plug in the replacement drive, there's no reason you can't run the badblocks and SMART tests on it while the rest of the pool is online. Just be very careful to run the tests on the right drive--badblocks is destructive, so you really don't want to inadvertently run it on a disk that has data on it.
 

djakdarippa

Cadet
Joined
Aug 25, 2015
Messages
3
Thanks netman06, everything worked like a charm.

There is one more thing that everyone needs to be aware. The scheduled-automated SMART Tests DO NOT include the new hard drive. You have to do it yourself.

So, from the GUI choose "Tasks" and then go into "S.M.A.R.T. Tests."
Ideally in there you'll find scheduled two recurring tasks a SMART Long Self-Test and a SMART Short Self-Test.
For each test you have, you have to select it and then click on "Edit" at the bottom of the screen.
The new screen shows the properties of the test itself and at the very top is a list of the devices it will work on. You'll find your new drive is in the list but not highlighted, so it's not included in the test. CTRL+Click it to include it and then click on the "OK" button to save and you're done.

Bear in mind, your scrubs DO include the new drive. If everything has been done properly the new drive simply replaced the old drive in an existing pool.
 
Status
Not open for further replies.
Top