How to re-attach a drive to an encrypted pool?

Eld

Cadet
Joined
Jun 5, 2019
Messages
4
Hi,

I have an encrypted pool that is comprised of a 2 disk mirror. I'm new to ZFS so wanted to check redundancy. I shut down, removed one of the disks, and rebooted. The pool was degraded, but accessible. All good.

However, after shutting down again and re-inserting the disk I had removed, the disk then showed as "unused" and the pool remains degraded.

I've managed to re-attach the drive using some of the commands described in this blog post, but it appears to be resilvering the whole drive (which seems unnecessary as the drive already contained all the data).

What's the recommended way to re-attach an (encrypted) mirror to a pool, and Is it possible to do withoput resilvering the whole drive?

Thanks for any pointers/advice!
 

Thund3rDuck

Explorer
Joined
Aug 24, 2013
Messages
64
Hey there, unfortunately if you remove a drive and then reattach, it will need to be resilvered. During the time that it was absent from the pool, data changes and they no longer match up. The resilvering process stripes the data out and rebuilds the pool or in the case of a mirror, it mirrors the data from the other drive. It doesn't matter the raid type, this will always be the case. Also, you should be able to detach and reattach directly from the GUI without running any commands.
 
  • Like
Reactions: Eld

Eld

Cadet
Joined
Jun 5, 2019
Messages
4
Thanks for your reply!

I understand that a disk that is absent from a pool then re-inserted will need to be resilvered. However, the files in the pool in question were not modified while the drive was removed (its only file storage), so I would have thought that resilvering should take only a short amount of time (verifying contents matches) rather than as if the drive was completely new (re-copying the entire contents), if I had done things correctly.

After the disk was re-inserted, the GUI reported it as "unused" and the encrypted pool as "DEGRADED". I'm pretty sure the GUI wasn't allowing me to remedy using the pool>status>replace option... So I used the commandline to detach & re-attach the encrypted volume to the pool. I've included the commands I used below. The fact that resilvering seems to be acting as though it were a blank disk makes me think that my process was not quite correct. So my question is really asking what I should have done instead?

It's a 10TB pool and resilvering from scratch is taking a long time. Once the disk is finished resilvering I can repeat the experiment with a smaller encrypted pool, so that I can be more exact with my question(s) if need be.

The commands I used to get it back to HEALTHY/resilvering were:
Code:
root@freenas[~]# zpool status FILES
  pool: FILES
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 7.23T in 1 days 13:27:47 with 0 errors on Thu Jun  6 08:00:35 2019
config:

    NAME                                                STATE     READ WRITE CKSUM
    FILES                                              DEGRADED     0     0     0
      mirror-0                                          DEGRADED     0     0     0
        4951056523714673127                             UNAVAIL      0     0     0  was /dev/gptid/6f5114d7-7c4f-11e9-b846-842b2bb89456.eli
        gptid/4269ca3f-86a3-11e9-915e-842b2bb89456.eli  ONLINE       0     0     0

errors: No known data errors

root@freenas[~]# zpool detach FILES 4951056523714673127

root@freenas[~]# geli attach -k /data/geli/14f7ae18-3917-43e3-91e7-1862103023d9.key /dev/gptid/6f5114d7-7c4f-11e9-b846-842b2bb89456

root@freenas[~]# zpool attach FILES /dev/gptid/4269ca3f-86a3-11e9-915e-842b2bb89456.eli /dev/gptid/6f5114d7-7c4f-11e9-b846-842b2bb89456.eli

root@freenas[~]# zpool status FILES
  pool: FILES
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jun  6 11:53:03 2019
    1.07T scanned at 1.31G/s, 8.33G issued at 41.2M/s, 6.78T total
    8.32G resilvered, 0.12% done, 1 days 23:52:23 to go
config:

    NAME                                                STATE     READ WRITE CKSUM
    FILES                                              ONLINE       0     0     0
      mirror-0                                          ONLINE       0     0     0
        gptid/4269ca3f-86a3-11e9-915e-842b2bb89456.eli  ONLINE       0     0     0
        gptid/6f5114d7-7c4f-11e9-b846-842b2bb89456.eli  ONLINE       0     0     0

errors: No known data errors
 

Thund3rDuck

Explorer
Joined
Aug 24, 2013
Messages
64
Thanks for your reply!

I understand that a disk that is absent from a pool then re-inserted will need to be resilvered. However, the files in the pool in question were not modified while the drive was removed (its only file storage), so I would have thought that resilvering should take only a short amount of time (verifying contents matches) rather than as if the drive was completely new (re-copying the entire contents), if I had done things correctly.

After the disk was re-inserted, the GUI reported it as "unused" and the encrypted pool as "DEGRADED". I'm pretty sure the GUI wasn't allowing me to remedy using the pool>status>replace option... So I used the commandline to detach & re-attach the encrypted volume to the pool. I've included the commands I used below. The fact that resilvering seems to be acting as though it were a blank disk makes me think that my process was not quite correct. So my question is really asking what I should have done instead?

It's a 10TB pool and resilvering from scratch is taking a long time. Once the disk is finished resilvering I can repeat the experiment with a smaller encrypted pool, so that I can be more exact with my question(s) if need be.

The commands I used to get it back to HEALTHY/resilvering were:
Code:
root@freenas[~]# zpool status FILES
  pool: FILES
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 7.23T in 1 days 13:27:47 with 0 errors on Thu Jun  6 08:00:35 2019
config:

    NAME                                                STATE     READ WRITE CKSUM
    FILES                                              DEGRADED     0     0     0
      mirror-0                                          DEGRADED     0     0     0
        4951056523714673127                             UNAVAIL      0     0     0  was /dev/gptid/6f5114d7-7c4f-11e9-b846-842b2bb89456.eli
        gptid/4269ca3f-86a3-11e9-915e-842b2bb89456.eli  ONLINE       0     0     0

errors: No known data errors

root@freenas[~]# zpool detach FILES 4951056523714673127

root@freenas[~]# geli attach -k /data/geli/14f7ae18-3917-43e3-91e7-1862103023d9.key /dev/gptid/6f5114d7-7c4f-11e9-b846-842b2bb89456

root@freenas[~]# zpool attach FILES /dev/gptid/4269ca3f-86a3-11e9-915e-842b2bb89456.eli /dev/gptid/6f5114d7-7c4f-11e9-b846-842b2bb89456.eli

root@freenas[~]# zpool status FILES
  pool: FILES
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jun  6 11:53:03 2019
    1.07T scanned at 1.31G/s, 8.33G issued at 41.2M/s, 6.78T total
    8.32G resilvered, 0.12% done, 1 days 23:52:23 to go
config:

    NAME                                                STATE     READ WRITE CKSUM
    FILES                                              ONLINE       0     0     0
      mirror-0                                          ONLINE       0     0     0
        gptid/4269ca3f-86a3-11e9-915e-842b2bb89456.eli  ONLINE       0     0     0
        gptid/6f5114d7-7c4f-11e9-b846-842b2bb89456.eli  ONLINE       0     0     0

errors: No known data errors

The GUI does allow you to do so. I don't feel like breaking my pool to take snapshots so I looked up a video. Take a look at this.

Also that is not how raid works usually. Typical non ZFS raid does the resilvering block by block. Though I did find this oracle article that says resilvering should be fast depending on the amount of data. Not sure if this is one of those things that doesn't apply being that it is OpenZFS and not Oracle's ZFS.
 

Eld

Cadet
Joined
Jun 5, 2019
Messages
4
Thanks again, it's a great help for my understanding.

If I understand what you're saying, there's no time difference between resilvering a drive that was simply absent for a single boot and shutdown, and resilvering a brand new drive which is a complete replacement for an old (failing) drive?

I'd also like to understand better how a pool being encrypted changes the process of returning a drive that was removed and then re-inserted. For example:

My boot pool is comprised of 2 x (non-encrypted) mirrored drives and when I remove one the system boots fine, and then I re-insert the removed disk it automatically starts resilvering it (and I suppose this would be the same with a regular non-encrypted pool as well).

But with encrypted drives, (in my experience so far), the re-inserted drive is not automatically detected, and instead shown as "unused" in the list of drives. So, what action should I take at that point, to get the drive back into the *encrypted* pool?

Very helpful video. A difference here it it is not a new disk and I am not replacing a drive, but simply re-inserting drive that (for whatever reason) was absent for a single boot and shutdown.

Now I'm leaning toward thinking there is no difference in the resilver between re-inserting a removed drive and replacing a failed drive with a brand new one, and that the automatic detection and attachment of a re-inserted mirror is a nicety with non-encrypted pools but not available with encrypted pools (please correct me on that if I am wrong).
 
Last edited:

Thund3rDuck

Explorer
Joined
Aug 24, 2013
Messages
64
Thanks again, it's a great help for my understanding.

If I understand what you're saying, there's no time difference between resilvering a drive that was simply absent for a single boot and shutdown, and resilvering a brand new drive which is a complete replacement for an old (failing) drive?

I'd also like to understand better how a pool being encrypted changes the process of returning a drive that was removed and then re-inserted. For example:

My boot pool is comprised of 2 x (non-encrypted) mirrored drives and when I remove one the system boots fine, and then I re-insert the removed disk it automatically starts resilvering it (and I suppose this would be the same with a regular non-encrypted pool as well).

But with encrypted drives, (in my experience so far), the re-inserted drive is not automatically detected, and instead shown as "unused" in the list of drives. So, what action should I take at that point, to get the drive back into the *encrypted* pool?

Very helpful video. A difference here it it is not a new disk and I am not replacing a drive, but simply re-inserting drive that (for whatever reason) was absent for a single boot and shutdown.

Now I'm leaning toward thinking there is no difference in the resilver between re-inserting a removed drive and replacing a failed drive with a brand new one, and that the automatic detection and attachment of a re-inserted mirror is a nicety with non-encrypted pools but not available with encrypted pools (please correct me on that if I am wrong).


As far as the research I've done over the past few days, ZFS is much faster resilvering because it only mirros the actual data. Not block by block. As far as the drive being encrypted, I'm not sure. Maybe that changes things. I also don't know what the exact workflow is for replacing a drive in a pool encrypted or not. I'd have to test to verify.
 

Eld

Cadet
Joined
Jun 5, 2019
Messages
4
Yes, I believe you're right that resilvering time is dependent on the amount of data (quicker with an emptier pool than a full one), and I think that aspect is no different with an encrypted pool, except the volume being resilvered will need to be unlocked first.

What I am still yet to understand is if resilvering time should also be reduced if there is less (or no) data changed since the volume was "synchronized" ...

... Because with the way I re-introduced the encrypted mirror drive as described above, resilvering took just as much time as if the drive had been a completely new one, and therefore I had wondered if I had re-introduced the drive incorrectly. The drive had only been absent for one boot, and no user files were modified, so I expected the resilvering to be quicker than it was.

Hopefully someone familiar with this will contribute some thoughts.
 
Top