unable to remove old faulted pool

dwoodard3950 · Jun 19, 2013

I have had a long run of trying to remove and purge an old faulted pool. This started with a failed drive that I replaced. While it was resilvering, another failed. In the end I decided to "start over" as I had a backup. Now, when I reboot, the system fails to mount my newly created pool and the old pool (of same name) is listed from cli as FAULTED. I have tried to wipe disks, zpool labelclear and such, but still no luck. If I run zdb, I see details regarding the old pool and disk replacement. How do I remove the history of this faulted pool and recycle these drives for use again?

Details:
FreeNAS-8.3.1-RELEASE-p2-x64
HP
[root@rome ~]# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tank - - - - - FAULTED -

zdbtank: version: 28 name: 'tank' state: 0 txg: 2399702 pool_guid: 16661726734032266985 hostid: 2270805793 hostname: '' vdev_children: 1 vdev_tree: type: 'root' id: 0 guid: 16661726734032266985 children[0]: type: 'raidz' id: 0 guid: 13347037953643918062 nparity: 1 metaslab_array: 31 metaslab_shift: 34 ashift: 12 asize: 3992209850368 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 439340949831327204 path: '/dev/gptid/76708775-5e7d-11e2-8efb-6805ca09f08c' phys_path: '/dev/gptid/76708775-5e7d-11e2-8efb-6805ca09f08c' whole_disk: 1 DTL: 33521 create_txg: 4 children[1]: type: 'replacing' id: 1 guid: 5301956354414286026 whole_disk: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 2323197472043909345 path: '/dev/gptid/af99d237-4b31-11e2-b9ae-6805ca09f08c' phys_path: '/dev/gptid/af99d237-4b31-11e2-b9ae-6805ca09f08c' whole_disk: 1 not_present: 1 DTL: 8540 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 2075023451200656815 path: '/dev/gptid/1735b055-c9a5-11e2-b4d4-6805ca09f08c' phys_path: '/dev/gptid/1735b055-c9a5-11e2-b4d4-6805ca09f08c' whole_disk: 1 DTL: 86017 create_txg: 4 resilvering: 1 children[2]: type: 'disk' id: 2 guid: 13740987679321907475 path: '/dev/gptid/b0400532-4b31-11e2-b9ae-6805ca09f08c' phys_path: '/dev/gptid/b0400532-4b31-11e2-b9ae-6805ca09f08c' whole_disk: 1 DTL: 8539 create_txg: 4 children[3]:
type: 'replacing' id: 3 guid: 15401327230276251409 whole_disk: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 2348113808455552577 path: '/dev/dsk/gptid/b0b5f7cf-4b31-11e2-b9ae-6805ca09f08c' phys_path: '/dev/gptid/b0b5f7cf-4b31-11e2-b9ae-6805ca09f08c' whole_disk: 1 DTL: 8538 create_txg: 4 offline: 1 children[1]: type: 'disk' id: 1 guid: 2714461895960674896 path: '/dev/gptid/9fb695f4-adb0-11e0-ae66-0025900b7d98' phys_path: '/dev/gptid/9fb695f4-adb0-11e0-ae66-0025900b7d98' whole_disk: 1 DTL: 106498 create_txg: 4 resilvering: 1

paleoN · Jun 19, 2013

dwoodard3950 said:
How do I remove the history of this faulted pool and recycle these drives for use again?

Detach & mark disks as new via the GUI.

Your post is essentially unreadable. Use [code][/code] tags for CLI output.

dwoodard3950 · Jun 19, 2013

Sorry about the mess for the code. I'll fix.

I did try and mark these disk as new with no success from the GUI. Further (when that failed), I did the following after destroying the pool;

Code:

dd if=/dev/zero of=/dev/ada[0,1,2,3] bs=512 count=34
 
dd if=/dev/zero of=/dev/ada[0,1,2,3] bs=512 seek=<end - 34>

However, I still see the following after reboot;

Code:

[root@rome] ~# zpool list
NAME  SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
tank      -      -      -      -      -  FAULTED  -

I then destroy the pool once again with;

Code:

[root@rome] ~# zpool destroy -f tank

The output from zdb is attached.

Is a write of zeros the entire disk req'd? What am I missing?

paleoN · Jun 20, 2013

dwoodard3950 said:
I did try and mark these disk as new with no success from the GUI.

How did it fail, /var/log/messages? You need to stop the GUI from importing the pool. You could reset factory defaults to clear the db.

dwoodard3950 said:
Is a write of zeros the entire disk req'd? What am I missing?

A quick wipe via the GUI is sufficient as long as GEOM isn't protecting the drives at the time.

What did you zdb -l anyway?

dwoodard3950 · Jun 20, 2013

I don't see much in /var/log/messages other than the devices are picked up as shown below, but no mention of the pool which upon initiating a reboot was healthy. Now it is as follows;

Code:

[root@rome] ~# zpool list
NAME  SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
tank      -      -      -      -      -  FAULTED  -

And the log results;

Code:

Jun 20 14:12:36 rome kernel: ZFS filesystem version 5
Jun 20 14:12:36 rome kernel: ZFS storage pool version 28
Jun 20 14:12:36 rome kernel: GEOM_ELI: Device ada0p1.eli created.
Jun 20 14:12:36 rome kernel: GEOM_ELI: Encryption: AES-XTS 256
Jun 20 14:12:36 rome kernel: GEOM_ELI:    Crypto: software
Jun 20 14:12:36 rome kernel: GEOM_ELI: Device ada1p1.eli created.
Jun 20 14:12:36 rome kernel: GEOM_ELI: Encryption: AES-XTS 256
Jun 20 14:12:36 rome kernel: GEOM_ELI:    Crypto: software
Jun 20 14:12:36 rome kernel: GEOM_ELI: Device ada2p1.eli created.
Jun 20 14:12:36 rome kernel: GEOM_ELI: Encryption: AES-XTS 256
Jun 20 14:12:36 rome kernel: GEOM_ELI:    Crypto: software
Jun 20 14:12:36 rome kernel: GEOM_ELI: Device ada3p1.eli created.
Jun 20 14:12:36 rome kernel: GEOM_ELI: Encryption: AES-XTS 256
Jun 20 14:12:36 rome kernel: GEOM_ELI:    Crypto: software
Jun 20 14:12:36 rome kernel: GEOM_ELI: Device ada4p1.eli created.
Jun 20 14:12:36 rome kernel: GEOM_ELI: Encryption: AES-XTS 256
Jun 20 14:12:36 rome kernel: GEOM_ELI:    Crypto: software

The labels are present;

Code:

[root@rome] ~# glabel status
                                      Name  Status  Components
gptid/47fd2817-d9ed-11e2-a64a-6805ca09f08c    N/A  ada0p2
gptid/489f6c14-d9ed-11e2-a64a-6805ca09f08c    N/A  ada1p2
gptid/4a20da5e-d9ed-11e2-a64a-6805ca09f08c    N/A  ada2p2
gptid/4951d7b9-d9ed-11e2-a64a-6805ca09f08c    N/A  ada3p2

With regard to zdb; that's just it... no command line options work when the pool is in this state. If I have a valid pool I can use some of the more typical options (i.e. zdb -l vdev). However, in this instance, with the pool faulted, there is nothing unless I provide no command line options. That is when I observe the output as attached in the above post.

With the pool in this state, I can destroy it and create a new one of the same name and things will appear to work fine until a reboot.

Code:

[root@rome] ~# zpool destroy -f tank
[root@rome] ~# zpool list
no pools available
--- now create the new pool from the GUI ----
[root@rome] ~# zpool list
NAME  SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
tank  3.62T  1.96M  3.62T    0%  1.00x  ONLINE  /mnt
[root@rome] ~# zpool status
  pool: tank
state: ONLINE
  scan: none requested
config:
 
    NAME                                            STATE    READ WRITE CKSUM
    tank                                            ONLINE      0    0    0
      raidz1-0                                      ONLINE      0    0    0
        gptid/9d37c588-d9f0-11e2-b93b-6805ca09f08c  ONLINE      0    0    0
        gptid/9dbe3467-d9f0-11e2-b93b-6805ca09f08c  ONLINE      0    0    0
        gptid/9e5ad759-d9f0-11e2-b93b-6805ca09f08c  ONLINE      0    0    0
        gptid/9f248126-d9f0-11e2-b93b-6805ca09f08c  ONLINE      0    0    0
 
errors: No known data errors

dwoodard3950 · Sep 26, 2013

More information on this as I never did solve it. The box is on a UPS and I just never shut it down (till now).

I just imported to a new name as follows;
zpool import -R /mnt <pool_guid> <new_pool_name>

This worked and I can now reboot and get the newly named pool each time. However, the faulted (old name) pool remains in the output from zpool list. It's status is faulted. I did zdb and it is in that result as well whereas the zdb -l for each member of the newly named pool is correct. How do I purge/destroy the faulted pool?

Code:

[root@rome] ~# zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
backup  2.72T  2.19T   542G    80%  1.00x  ONLINE  /mnt
rename  3.62T  3.26T   373G    89%  1.00x  ONLINE  /mnt
tank        -      -      -      -      -  FAULTED  -

See the status of the newly renamed pool.

Code:

[root@rome] ~# zpool status rename
  pool: rename
state: ONLINE
  scan: scrub repaired 0 in 8h46m with 0 errors on Sun Sep 15 08:46:41 2013
config:
 
    NAME                                            STATE    READ WRITE CKSUM
    rename                                          ONLINE      0    0    0
      raidz1-0                                      ONLINE      0    0    0
        gptid/9d37c588-d9f0-11e2-b93b-6805ca09f08c  ONLINE      0    0    0
        gptid/9dbe3467-d9f0-11e2-b93b-6805ca09f08c  ONLINE      0    0    0
        gptid/9e5ad759-d9f0-11e2-b93b-6805ca09f08c  ONLINE      0    0    0
        gptid/9f248126-d9f0-11e2-b93b-6805ca09f08c  ONLINE      0    0    0
 
errors: No known data errors

And status of old faulted pool which I'm trying to eliminate.

Code:

[root@rome] ~# zpool status tank
  pool: tank
state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
    replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
  see: http://www.sun.com/msg/ZFS-8000-3C
  scan: none requested
config:
 
    NAME                      STATE    READ WRITE CKSUM
    tank                      UNAVAIL      0    0    0
      raidz1-0                UNAVAIL      0    0    0
        439340949831327204    UNAVAIL      0    0    0  was /dev/gptid/76708775-5e7d-11e2-8efb-6805ca09f08c
        replacing-1            UNAVAIL      0    0    0
          2323197472043909345  UNAVAIL      0    0    0  was /dev/gptid/af99d237-4b31-11e2-b9ae-6805ca09f08c
          2075023451200656815  UNAVAIL      0    0    0  was /dev/gptid/1735b055-c9a5-11e2-b4d4-6805ca09f08c
        13740987679321907475  UNAVAIL      0    0    0  was /dev/gptid/b0400532-4b31-11e2-b9ae-6805ca09f08c
        replacing-3            UNAVAIL      0    0    0
          2348113808455552577  OFFLINE      0    0    0  was /dev/dsk/gptid/b0b5f7cf-4b31-11e2-b9ae-6805ca09f08c
          2714461895960674896  UNAVAIL      0    0    0  was /dev/gptid/9fb695f4-adb0-11e0-ae66-0025900b7d98

paleoN · Sep 27, 2013

dwoodard3950 said:
How do I purge/destroy the faulted pool?

Did you export the old pool from the GUI? Do not mark as new.

For each disk run:

Code:

zdb -l adaX

zdb -l adaXp1

You already check adaXp2 for each disk and the four labels are all consistent, yes?

xMx · Mar 25, 2018

A brute-force solution is the following: (proceed with extreme caution!!)
1) change the cachefile for the valid pools you want to keep (zpool set cachefile=<new cache file> <poolname>), export them.
2) move the zfs cachefile /etc/zfs/zpool.cache to a backup location
3) reboot forcefully or try to restart the zfs deamon.
4) Import the zpools you want to keep from their cachefiles. ( zfs import -c <cachefile> <poolname>).

(I am running Ubuntu 16 and the system-space zfs version=5,zfs-zed/xenial-updates,now 0.6.5.6-0ubuntu19)

Important Announcement for the TrueNAS Community.

unable to remove old faulted pool

dwoodard3950

Dabbler

paleoN

Wizard

dwoodard3950

Dabbler

Attachments

paleoN

Wizard

dwoodard3950

Dabbler

dwoodard3950

Dabbler

paleoN

Wizard

xMx

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

unable to remove old faulted pool

Dabbler

Wizard

Dabbler

Attachments

Wizard

Dabbler

Dabbler

Wizard

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "unable to remove old faulted pool"

Similar threads