SOLVED Disk offline, can't replace

danb35 · Nov 13, 2015

So I was too clever by half, and have now brought my pool to a state that I'm not sure the best way to recover from. I wanted to remove one of my drives to run a couple of tests, and didn't want to bother with the few minutes of system downtime to power down. So I offlined the drive from the GUI, pulled it out of its hot-swap bay, ran the tests, put it back in, and tried to replace it with itself. That's when the fun started.

The web GUI gave me an error of:

Code:

Nov 13 09:09:35 freenas2 manage.py: [middleware.exceptions:38] [MiddlewareError: Disk replacement failed: "invalid vdev specification, use '-f' to override the following errors:, /dev/gptid/2b1fae37-8a10-11e5-bec2-002590de8695 is part of active pool 'tank'

...so I figured I had to reboot anyway, and the system should pick up the drive on bootup. No dice. When I rebooted and ran zpool status, this is what came up:

Code:

[root@freenas2] ~# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu Nov  5 03:45:51 2015
config:

    NAME                                            STATE     READ WRITE CKSUM
    freenas-boot                                    ONLINE       0     0     0
     mirror-0                                      ONLINE       0     0     0
       gptid/1b6fb23e-bec6-11e4-8407-0cc47a01304d  ONLINE       0     0     0
       gptid/1b7f00c5-bec6-11e4-8407-0cc47a01304d  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Online the device using 'zpool online' or replace the device with
    'zpool replace'.
  scan: scrub repaired 0 in 26h21m with 0 errors on Mon Nov  2 01:21:42 2015
config:

    NAME                                            STATE     READ WRITE CKSUM
    tank                                            DEGRADED     0     0     0
     raidz2-0                                      ONLINE       0     0     0
       gptid/9a85d15f-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9afa89ae-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9b6cc00b-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9c501d57-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9cc41939-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9d39e31d-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
     raidz2-1                                      DEGRADED     0     0     0
       gptid/f5b737a6-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       7019498564335405691                         OFFLINE      0     0     0  was /dev/gptid/f6284bf9-8e41-11e4-8732-0cc47a01304d
       gptid/f68f4fa9-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f722e509-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f7d115c2-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f84821c1-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0

errors: No known data errors

So then I figured I should be able to online the device, as the status message says:

Code:

[root@freenas2] ~# zpool online tank /dev/gptid/f6284bf9-8e41-11e4-8732-0cc47a01304d
warning: device '/dev/gptid/f6284bf9-8e41-11e4-8732-0cc47a01304d' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
[root@freenas2] ~# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu Nov  5 03:45:51 2015
config:

    NAME                                            STATE     READ WRITE CKSUM
    freenas-boot                                    ONLINE       0     0     0
     mirror-0                                      ONLINE       0     0     0
       gptid/1b6fb23e-bec6-11e4-8407-0cc47a01304d  ONLINE       0     0     0
       gptid/1b7f00c5-bec6-11e4-8407-0cc47a01304d  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 26h21m with 0 errors on Mon Nov  2 01:21:42 2015
config:

    NAME                                            STATE     READ WRITE CKSUM
    tank                                            DEGRADED     0     0     0
     raidz2-0                                      ONLINE       0     0     0
       gptid/9a85d15f-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9afa89ae-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9b6cc00b-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9c501d57-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9cc41939-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9d39e31d-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
     raidz2-1                                      DEGRADED     0     0     0
       gptid/f5b737a6-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       7019498564335405691                         UNAVAIL      0     0     0  was /dev/gptid/f6284bf9-8e41-11e4-8732-0cc47a01304d
       gptid/f68f4fa9-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f722e509-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f7d115c2-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f84821c1-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0

errors: No known data errors

Since this is a RAIDZ2 vdev, I'm not critically worried, but obviously it's something that needs to be fixed quickly. I could, I guess, wipe the disk with DBAN and then replace it, but that seems kind of wasteful. Any other ideas (other than "next time, just shut down the server!")?

Starpulkka · Nov 13, 2015

really.. you dont know? come on, you must be jokin =) i see a system what is workin exactly as its administrator tells it to do. ps. https://forums.freenas.org/index.php?threads/confusion-about-proper-drive-removal.25644/
just do zpool labelclear /dev/whatever on other machine to that hdd you poked and hope it has enough usable sectors to satisfy zfs when y try again replace unavai disk

not sure are things differend on fn9, i still use fn8. but if y poke zfs on command line shoulnd you change manually some list too if you decice force zfs accept back admin offlined disk, as it was stupid hard to manually swap disks from live pooll by using command line.. anyway have fun. im going to fetch some popcorn and read this topic at interest fun.

danb35 · Nov 13, 2015

Unfortunately, that didn't do it. Removed the disk again, put it in another machine, booted the FreeNAS installer, dropped to the shell, and did 'zpool labelclear /dev/ada0'. It returned to the shell prompt without any errors given.

Returned the drive to the FreeNAS server, went to Storage -> select pool -> Volume Status, selected the UNAVAIL device, clicked replace, and chose da6 (the only available option) in the pop-up window. It returned the error of:

Nov 13 16:18:35 freenas2 manage.py: [middleware.exceptions:38] [MiddlewareError: Disk replacement failed: "invalid vdev specification, use '-f' to override the following errors:, /dev/gptid/196e3bd7-8a4c-11e5-bec2-002590de8695 is part of active pool 'tank'

Starpulkka · Nov 13, 2015

ok stop. just stop . y sure y did pull right hdd? and y are selecting right drive for offer a replacement, that gptid looks weird..
edit:
i think that if someone else can confirm is it safe to continue after y have onlined disk in command line, (as zfs thinks its still there when y messing from commandline. as tryed to member earlier)
or you did not properly clear hdd.

edit:2 ah yes ,ok good. but i still would wait that confirmation for who knows fn9 that is it safe to continue after pokin zfs from command line
and im going to offline now myself its 23:50 here and someone else starting to get you know what i mean. sleep =)
umm. y cant replace faulted disk with disk that is from zfspool =)

edit:3 did some testing, to your scenario you can online disk after you have pressed replace button in gui from console, it accepts disk back to pool but in a faulted state.. so you need replace it anyway..

but if you put disk offline in gui and then online it from console it accepts it back to pool in green button happy. and if you offline a disk and yank it from machine, fn knows if you take disk out (on proper hardware) and when you put it back to and online a disk gui button goes yellow, and an attempt was made to corrrect the error.

Theres still seems to be some swap issues, or maby its just my beta version of fn9. seems that still safest bet is always reboot to get disk back to pool. Theres a cyberjock post about how to check is swap being used on a disk, and how to take it off, but im starting to lose interest at this topic, as i wasted one bag of popcorn. =/

danb35 · Nov 13, 2015

Yes, I'm sure I was pulling the right disk, and yes, I was selecting the right drive for replacement. And yes, zpool labelclear was the right answer--but, since FreeNAS doesn't use the entire device for the pool (instead creating a 2 GB (by default) swap partition at the start), I needed to do zpool labelclear /dev/ada0p2. When I did that, it warned about that device being part of a pool from another system, and required the -f flag to do the job. After doing zpool labelclear -f /dev/ada0p2, I was then able to return the disk to my server and start resilvering. It's running now. Thanks for the tip on zpool labelclear--I would have liked to have put the disk back online without having to resilver the whole thing, but that will do.

danb35 · Nov 14, 2015

The resilver is still running, now 88% complete with an estimated two hours to go. In retrospect, I think if I'd 'zpool online'd the disk when I put it back, rather than trying to replace it with itself, this probably would have worked much more smoothly, and the resilver would have been just a matter of a few seconds (since it would have only written the data that had changed since I pulled the drive). Or if I'd simply rebooted the server after putting the disk back.

Important Announcement for the TrueNAS Community.

SOLVED Disk offline, can't replace

danb35

Hall of Famer

Starpulkka

Contributor

danb35

Hall of Famer

Starpulkka

Contributor

danb35

Hall of Famer

danb35

Hall of Famer

Similar threads

Important Announcement for the TrueNAS Community.

SOLVED Disk offline, can't replace

danb35

Hall of Famer

Starpulkka

Contributor

danb35

Hall of Famer

Starpulkka

Contributor

danb35

Hall of Famer

danb35

Hall of Famer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Disk offline, can't replace"

Similar threads