TrueNAS Core 13, GELI disk replacement

fcm

Cadet
Joined
Jun 16, 2023
Messages
7
Hi,
I have an old GELI encrypted pool. One of the disk failed. I got a new disk and I followed the instruction from [1]. However, the replacement procedure asks for a passphrase which I don't have; I only have a key file without a passphrase.

2023-06-16_10-27.png


How does one replace a GELI encrypted disk in TrueNAS Core 13? The only thing mentioned in the current disk replacement documentation about GELI is

Can I replace a disk in a GELI-encrypted (Legacy) pool?

Although GELI encryption is deprecated, TrueNAS implements GELI encryption during a “GELI-Encrypted (Legacy) pool” disk replacement. TrueNAS uses GELI encryption for the lifetime of that pool, even after replacement.

which is not helpful.

[1]: https://www.truenas.com/docs/core/coretutorials/storage/disks/diskreplace/

Best,
-F
 
Joined
Oct 22, 2019
Messages
3,641
The documentation for TrueNAS Core 12+ sort of "leaves behind" the nuance of FreeNAS 11.x and earlier; including how to handle replacements of GELI drives in a pool.

From what I recall, it's a dangerous process that can lock you out of your data if you accidentally do a step in the wrong order or misunderstand what's happening. I think it's implying that you're "re-keying" all the drives with a new passphrase that you're using for the replacement drive, which will then allow the (new) GELI device to be resilvered into the pool's existing vdev. And then at a later point, you can "re-key" the pool's drives (again) to use a keyfile, passphrase, and/or recovery key. It is this new key that you can export as a file to save a copy somewhere safe. (The old key will no longer be useable.)

It's really unnerving, I'll be honest...

I honestly wouldn't try anything until you get a solid confirmation by one of the iXsystems people.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hello @fcm

I don't have a GELI-encrypted pool to test this on, but the legacy documentation [1] references a requirement to set a passphrase and back up your encryption keys before performing the replacement operation.

Encrypted pools must have a valid passphrase to replace a failed disk. Set a passphrase and back up the encryption key using the pool Encryption Operations before attempting to replace the failed drive.

I wouldn't consider this as "solid confirmation" because I haven't tried replicating the workflow of a GELI encrypted pool on the current CORE 13 interface at this time.

[1] https://www.ixsystems.com/documentation/freenas/11.3-U5/storage.html#replacing-a-failed-disk
 

fcm

Cadet
Joined
Jun 16, 2023
Messages
7
Thank you @winnielinnie and @HoneyBadger

I guess this is the price to pay for using rock-solid systems; problems don't arise often and when they do, a lot of things might have changed.

@HoneyBadger Can you confirm if it is safe to set the passphrase and back up the encryption key now that the old disk is offline and removed, i.e., can I perform this operation while the pool is in a degraded state? The "...before attempting to replace the failed drive." part of the quote is a little confusing on this aspect.

Thanks,
-F
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Thank you @winnielinnie and @HoneyBadger

I guess this is the price to pay for using rock-solid systems; problems don't arise often and when they do, a lot of things might have changed.

@HoneyBadger Can you confirm if it is safe to set the passphrase and back up the encryption key now that the old disk is offline and removed, i.e., can I perform this operation while the pool is in a degraded state? The "...before attempting to replace the failed drive." part of the quote is a little confusing on this aspect.

Thanks,
-F

I'm presently working through a scenario that should mimic yours. I've created a GELI pool on FN11, I'm upgrading now to TN12 and then will go to TN13, and at that point I'll try to validate that the steps still work on setting up the passphrase.

Generally speaking though, it should be safe to set the passphrase and back up the key in a degraded state because you can't expect to predict that you'll need to do this before a failure happens. However, I would suggest that once this is done and the resilver completes, that the key/passphrase combo is reset again.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I didn't get prompted for a passphrase when I replaced the disk in the failed pool - it was an encrypted, unlocked dataset, I selected REPLACE, it prompted for an empty disk, then said "thank you" and set it up.

Code:
  pool: gelipool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 755M in 00:00:17 with 0 errors on Fri Jun 16 09:15:39 2023
config:

        NAME                                                STATE     READ WRITE CKSUM
        gelipool                                            ONLINE       0     0     0
          mirror-0                                          ONLINE       0     0     0
            gptid/70680b33-0c5d-11ee-a0b9-000c292ff71e.eli  ONLINE       0     0     0
            gptid/fe6f7b9c-0c60-11ee-8f0b-000c292ff71e.eli  ONLINE       0     0     0

errors: No known data errors
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
After resetting the keys and setting the passphrase, I was able to get the same prompt dialog and perform a replacement.

Do you have the option highlighted below in the Storage -> Pools -> (Gear) menu for managing the encryption key/passphrase?

1686933647107.png


If so, this implies that your pool and keys do have a passphrase associated with them.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The challenge I'm having now is that while in a degraded state, any attempt to add or change a passphrase fails with an attempt to delete (edit: the key from) the missing/unavail .ELI file on the detached disk. Legacy GELI encryption with a missing disk seems to be in a very dangerous space, so I'm going to recommend we move very slowly, especially if it's expecting a passphrase that you don't seem to have readily available.

Do you have sufficient free space on another set of (unencrypted) disks that you can copy this data to, either via ZFS replication or at a higher level with a file copy process?
 
Last edited:

fcm

Cadet
Joined
Jun 16, 2023
Messages
7
After resetting the keys and setting the passphrase, I was able to get the same prompt dialog and perform a replacement.

Do you have the option highlighted below in the Storage -> Pools -> (Gear) menu for managing the encryption key/passphrase?

View attachment 67432

If so, this implies that your pool and keys do have a passphrase associated with them.
Yes, I do have this option!

Interesting, I don't recall setting a passphrase, and I don't need one to unlock the pool.

I'm presently working through a scenario that should mimic yours. I've created a GELI pool on FN11, I'm upgrading now to TN12 and then will go to TN13, and at that point I'll try to validate that the steps still work on setting up the passphrase.

Generally speaking though, it should be safe to set the passphrase and back up the key in a degraded state because you can't expect to predict that you'll need to do this before a failure happens. However, I would suggest that once this is done and the resilver completes, that the key/passphrase combo is reset again.
Thanks you, I did not expect this level of commitment! I really appreciate it.

The challenge I'm having now is that while in a degraded state, any attempt to add or change a passphrase fails with an attempt to delete the missing/unavail .ELI file on the detached disk. Legacy GELI encryption with a missing disk seems to be in a very dangerous space, so I'm going to recommend we move very slowly, especially if it's expecting a passphrase that you don't seem to have readily available.

Do you have sufficient free space on another set of (unencrypted) disks that you can copy this data to, either via ZFS replication or at a higher level with a file copy process?
I do not have sufficient free space on another system.

What happens if I try to replace the disk with the wrong passphrase? I could try a couple...
 

fcm

Cadet
Joined
Jun 16, 2023
Messages
7
I tried a couple of passwords and it seems I found the right one! The disk is currently resilvering and should be completed in a little more than 5h.

Thank you @winnielinnie and @HoneyBadger for taking the time to help me. It looks like replacing a GELI disk still works in TrueNAS Core 13.

If I may, I would suggest to enhance the GELI-related section the TrueNAS Core 13 documentation with the information found in this thread, as well as any other relevant piece of information you can think of. While the GELI encryption is a legacy feature, one may still have to replace a disk in such pool. The current documentation is quite slim on the subject.

Thanks again,
-F
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Thanks for the feedback @fcm and very glad that you found your passphrase! I've got an internal dialog open with engineering - while GELI is definitely a legacy encryption method, and there's no "in-place migration" available, I certainly don't want to leave any upgrading users out in the cold without a way to manage their setups.
 
Top