OFFLINE, ONLINE of encrypted disk does not work

r2p2

Cadet
Joined
Jan 29, 2019
Messages
8
I might be doing something wrong but can't figure out what. Before going live, I thought it would be a good idea to test things with the help of a virtual machine. The result was a vm containing 2 disks for the operating system and 4 in an enrypted raidz2 as storage pool. When playing around with virtually pulling and inserting the drives, I noticed that it seems impossible to get the same drive running again after it was pulled out. It does work if I replace a drive with a completely new one. After a while I've broken it down to the simplest not working scenario which looks like this:

  1. Setup everything as described (4 drives in an encrypted raidz2 storage pool)
  2. Storage > Pool > Pool Operations (Gear) > Status > Offline any of the disks
  3. Storage > Pool > Pool Operations (Gear) > Status > Online the offlined disk
The Result is the following error message:

Code:
Error: Traceback (most recent call last):

  File "/usr/local/lib/python3.6/site-packages/tastypie/resources.py", line 219, in wrapper
    response = callback(request, *args, **kwargs)

  File "./freenasUI/api/resources.py", line 886, in online_disk
    notifier().zfs_online_disk(obj, deserialized.get('label'))

  File "./freenasUI/middleware/notifier.py", line 1064, in zfs_online_disk
    assert volume.vol_encrypt == 0

AssertionError


As far as I know, it should be fine to offline -> online a working drive?
In case it is helpful, I am using FreeNAS-11.2-RELEASE-U1 (Build Date: Dec 20, 2018 22:41 and zpool status looks like:

Code:
root@freenas[~]# zpool status storage
  pool: storage
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 37.8M in 0 days 00:00:39 with 0 errors on Wed Jan 30 00:23:55 2019
config:

        NAME                                                STATE     READ WRITE CKSUM
        storage                                             DEGRADED     0     0     0
          raidz2-0                                          DEGRADED     0     0     0
            gptid/7afeb39f-1e8f-11e9-9423-080027323634.eli  ONLINE       0     0     0
            12034077341320501396                            OFFLINE      0     0     0  was /dev/gptid/d41df718-241c-11e9-8b42-080027323634.eli
            gptid/1351f31c-1fdd-11e9-a5f1-080027323634.eli  ONLINE       0     0     0
            gptid/7ce8fda5-1e8f-11e9-9423-080027323634.eli  ONLINE       0     0     0

errors: No known data errors


Thank you in advance,
Robert
 

r2p2

Cadet
Joined
Jan 29, 2019
Messages
8
Amendment: It seems to work when using the shell with the following command:
Code:
zpool online storage 12034077341320501396


Is that a bug? I'll try what happens if I pull and reinsert the same device.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I think it could be a case that was not tested. You might want to do a bug report so that the development team can look at it.
 

r2p2

Cadet
Joined
Jan 29, 2019
Messages
8
They have requested debug information and changed the ticket visibility to private, I guess in order to protect my data. The one and only activity besides that was the assignment to another guy18 days ago.

If I take a look into the logs myself, I would bet they do something wrong when building device paths. The double /dev/ does not look right.

Code:
/dev//dev/gptid/157b7104-247e-11e9-9f0d-080027323634.eli


I will keep this post updated if new information arises.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

r2p2

Cadet
Joined
Jan 29, 2019
Messages
8
The ticket is public again. Thinking about not using encrypted pools in my real setup. Seems to be not the smooth experience I was hoping for.
 

Magnetz

Dabbler
Joined
Jun 6, 2016
Messages
15
Encrypted pools are definitely "half-supported" and most people around this forum suggest not using them.

You can get an encrypted disk back online if you wrangle it manually.

https://forums.freenas.org/index.ph...h-onlining-encrypted-drive.26367/#post-167145

Freenas becomes unaware of the disk so won't automatically bring it back after a reboot unless you detach the volume and re-import it (this must be while the disk is attached to geli- i.e. before you reboot). If you export & re-import while the disk is offline it won't even let you treat the offline disk as a new disk.

Other than those pitfalls it works fine, it resilvered and the volume is back online.

I don't see a technical reason why the freenas UI can't reattach a drive.

This is also interesting: https://forums.freenas.org/index.ph...ng-a-failed-encrypted-disk.58278/#post-454459 regarding re-keying drives

This post is essential if you want to know how geli works on freenas: https://forums.freenas.org/index.php?threads/recover-encryption-key.16593/#post-85497

There is a script somewhere on the forum to backup your disk master keys too
 

FreeNas2019

Cadet
Joined
Apr 2, 2019
Messages
3
1. I have the same problem; being unable to online an encrypted disk that has been disconnected, either by physically pulling a cable, or offlining in the GUI. I can do a "work around" by unlocking the storage pool (if not already unlocked) then going into the storage section of the GUI, going to pools, and attempting to "import an existing pool" from the disconnected and subsequently reconnected disk. I select "Yes, decrypt the disks", select the offline disk from the drop-down, upload my .geli key in the GUI, and enter my password and then click next. It then takes me to next page where I need to select the pool to import. When I click the drop down, the "Pool *" indicator turns red and the wizard freezes. I then click cancel. I can then go to my pool status, and it shows the disk as online, healthy, and tells me there has been a resilver completed in two seconds. I assume the wizard is running the unlock commands on the drive in the background, and this allows it to be brought back online? If so, why doesn't restarting FreeNas accomplish the same thing? A restart wont bring the drive back without the goofy work around.

2. Also, I've found the disk can also be brought back online by disconnecting the entire storage pool, and then re-importing the entire pool (and selecting all of the disks in the mirror).

3. After bringing the disk online by either of these two methods, and running scrub, the system says there are no data integrity errors (but will show checksum errors on the removed disk if I physically pulled a cable; but not if I just offlined it through the GUI).

4. I don't know if there actually are data errors? Why can't we online an encrypted disk the normal way? And does my work around actually work, or am I destroying data in a way that FreeNas doesn't properly detect?

5. If my work around is legit; then let's add a simple GUI unlock wizard that runs when attempting to online a disk in an encrypted pool.
 
Last edited:

FreeNas2019

Cadet
Joined
Apr 2, 2019
Messages
3
Update, I've found I get the same result if I unlock the geli disk using the unofficial GUI trick I found, and then manually bring it online in CLI with the command, zpool online. This skips the step of trying to import the volume in the wizard (which fails). It appears to work... am I missing something? We need a GUI unlock wizard when trying to unlock from the disks screen.
 

FreeNas2019

Cadet
Joined
Apr 2, 2019
Messages
3
The "trick" to online an encrypted disk that was taken offline from a pool still works for me in 11.3-U4, but it now requires the disk be added by shell CLI.

Here is how I did it

Step 1 (Decrypt the disk):

Storage ---> Pools ---> Add--->Import Existing Pool--->Yes, Decrypt the Disks--->Select Offline Disk--->Browse For Encryption Key---->Enter passphrase--->Next----> Nothing shows up on drop down----> Click Cancel

Step 2 (Get the disk identifier):

Go to shell and type zpool status -----> Note the disk in the degraded pool listed as offline that says "was gptid/e8c2e45f...." This is my offline disk. I copy the number string identifier above the "was gptid"; example: 9201068361032811225
I do not copy anything from the gptid/ line

Step 3: Bring the disk online


zpool online mypoolname 9201068361032811225

Step4: Allow the disk to resilver itself

I check the progress by running shell zpool status. This tells me the resilver progress. It is usually pretty quick.



Comments/Questions to developers:

Unless I'm missing something, it still appears that it is possible to bring an encrypted disk back online after taking it offline in GUI, even though the GUI warns that it isn't possible? So this whole "bring encrypted disk back online" issue appears to be a GUI problem? Why can't the GUI be updated to perform the functions I am doing in shell?
 
Top