Re-key failure & cannot read metadata

Status
Not open for further replies.

darowley

Dabbler
Joined
Mar 16, 2017
Messages
16
I think I fell victim to a a combination of two "bugs".

No, I don't have a backup of my 24TB of data, but literally had just bought new hardware to setup as a full backup the day before and had started to copy a little bit of data over.

I'm new to this so here is what happened...

I added a passphrase to my 8 disk system (identical 5TB drives). Raidz2. (Was running freenas 9.10.1 but have now upgraded to 9.10.02) - I actually thought I had a passphrase but I never did to begin with. - Huge regret in adding that passphrase...

There was an error that popped up and then disappeared right after I added the passphrase. I didn't get a chance to read the error, but I think I saw the word "failed". I now think that this error was a re-key error. - I didn't change the encryption key, I just added a passphrase.

I rebooted.
Then the volume was LOCKED, and when trying to unlock, it said "Volume Failed Unlock".
I detached the volume to try to re attach it and only 3 of the drives showed up on page 2/3 (pick disks to decrypt). But with the correct key file and passphrase I got an error that it couldn't attach.
Via the shell I was able to "geli attach" those three drives, but the other FIVE seem to just give me this error: geli: Cannot read metadata from ada1p2: Invalid argument.

I have read a lot and have found this:
The freenas support page says this HIGLIGHTED in ORANGE: If a re-key fails on a multi-disk system, an alert is generated. Do not ignore this alert as doing so may result in the loss of data.
https://doc.freenas.org/9.10/storage.html#managing-encrypted-volumes
So what is the user supposed to do when a re-key error happens? (I didn't even see the error long enough to read it)
I restarted... Was that bad?

Bug #1: https://bugs.pcbsd.org/issues/13409

So after trying to attach my keys and constantly getting the "geli: Cannot read metadata from ada1p2: Invalid argument." error. I decided to search some more and found that the metadata might be bad. WHAT EXACTLY IS IN THIS METADATA? Apparently it's pretty important to have a backup copy. And I don't think I have a backup unless there are backups made as part of the config or somewhere else but from the bug reports below, it looks like users like me are left in the dust because they just don't know enough... And, according to Xin, even if I did know about metadata:, "Those who want to backup GELI metadata either do not understand or are ignorant about how encryption should work". If only I would have known that the meta data of a drive was crucial for certain types of backup. I would have made a backup of it. - The one thing I know about encryption is that without a backup of your key you will be toast if you forget your password. In the case of freenas, you have to have the key file (very understandable). And sometimes a copy of your metadata (how was I supposed to know?).

Bug #2: From these two below, it appears that it was decided that if you don't know about metadata backups then too bad.
https://bugs.freenas.org/issues/3206
https://bugs.freenas.org/issues/2375
My opinion is that the freenas GUI should have a "backup" page to help users understand all the things that should be backed up and some tips / warnings on how to store the sensitive backup data / keys.

https://bugs.freenas.org/issues/2375 / Xin Li wrote:
First of all, the documentation is exaggerating the problem. While it's true that losing the metadata means a total loss of the whole drive, it fails to consider this: how is it different from losing a single drive due to hardware failure? No, there is no difference, and any qualified system administrator should take hardware failures into account. Is it likely to happen? No, because the data is not being overwritten frequently, it's not even being read often, if a hard drive is fragile like this, better chances that one already have other data corruptions and it's not something that can be solved by software.

Well, I personally disagree that it is no different. IT IS VERY DIFFERENT. My 5 drives did not "FAIL", the freenas GUI seemed to have not correctly added / updated the keys / metadata. - This is a documented issue (re-Key failure) and it appears to be marked as "fixed" but without the metadata backups, it appears that it might mean a "total loss of the whole drive" (5 of 8 for me). - I'm really hoping that isn't the case...

So I'm at a point where I hope someone can help me with this issue "geli: Cannot read metadata from ada1p2: Invalid argument."

I tried "geli backup" from one of the 5 unreadable disks, it just gave the same "Invalid Argument" error.

I did a "geli backup" from one of my 3 working disks and did a backup to one of the seemly broken drives... It seemed to work to the extent that while doing a "geli attach" it asked for the passphrase and didn't complain... - So I don't get a "invalid argument" on that drive now. But I don't think the disk is actually readable / unlocked. - I'm thinking if anything, I need to get a copy of metadata from a drive that has the previous metadata with just the key.... I don't know if this is even possible (to restore meta data from a different drive) and if does happen to be possible: how would I create a drive with the same old key? - I have backups of my keys... I have a few more identical 5TB drives... Can I setup another freenas machine, or even use the same one and make a new drive with the old key? Then backup metadata from that working drive to one of those 5 drives?

If that isn't an option.. I am also wondering about metadata slot 1 and 2. - I'm hoping someone knows enough about this metadata stuff to get a few of my drives spinning again. - If all my identical 8 drives were part of the same volume / encryption key, then each of those slots on each drive would be the same right? - I assume that my 3 working drives are / have a set of different keys. But the other broken 5 together would be the same (old key)?

What I "think" I know about MS BitLocker is that the drives are encrypted with a key then there is a 300mb partition (or so) that holds the master key that unlocks the rest of the drive. That disk key is kept in the 300mb partition. So the user has to decrypt the 300mb partition first then the system decrypts the rest of the drive. If the user wants to change the encryption key, just the 300mb partition is re-encrypted, not the entire drive. - I assume that this is somewhat similar. So my data should be intact. - SO I HAVE HOPE.

Thanks for any help! - Please remember that I consider myself fairly new to this. I've been using freenas since 2011, and have setup about 6 - 8 servers, but I really haven't had to do a lot of troubleshooting for these.

As a reminder, I have backups of the new key, the old key that was on the system (i copied it from geli/data and renamed it to master.key dated AUG of 2016) I'm hoping that renaming the key doesn't affect anything... And I also have several backup encryption key copies that I downloaded over the past few months.
 

darowley

Dabbler
Joined
Mar 16, 2017
Messages
16
No. I'm hoping someone knows something about re keying and encryption.

I find it hard to believe that freenas would be able to / allowed to trash my drives in seconds. So I'm hoping that someone knows how to fix this issue.
 
D

dlavigne

Guest
I think it is worthwhile to create a ticket at bugs.freenas.org so a dev can take a closer look at it. Please post the issue number here after you do so.
 

darowley

Dabbler
Joined
Mar 16, 2017
Messages
16
Unfortunately this is above my pay grade. I don't know much about meta data an no one seems to respond to this post or the other:
https://forums.freenas.org/index.php?threads/locked-volume-volume-failed-unlock-after-changing-encryption-passphrase.51707/

But I have seen that others have ended up in this same or very similar situation. Others are having an encrypted pool with some of the disks opening / decrypting, but the others not decrypting, disabling the entire pool from being imported.

https://forums.freenas.org/index.php?threads/cant-import-unlock-encrypted-zpool-anymore.51154/

This person simply didn't know that his passphrase wasn't the only thing he needed, he also needed to download a recovery key. AND - what I have learned now is that you also need to make a backup of the meta data from EACH drive. I really wish that the developers would take a few minutes to update the icon info and add a HELP button next to the existing icons (Click Storage then your encrypted volume, the existing icons are at the bottom).
https://forums.freenas.org/index.php?threads/import-encrypted-volume-into-9-10-2.54399/

So what I have learned is:
The key (recovery) is not enough by itself. You must also know your passphrase (yes I did know this), and get a copy of the meta data from EACH disk.
That the passphrase is not for the encryption or chaining the encryption key.
A passphrase simply protects the (already) encrypted disks from mounting after a reboot (or a user manually locks the volume) within freenas until a user enters the passphrase - This just stops the volume from auto mounting.

There are multiple steps to take after changing a passphrase.
Changing the passphrase and saving the encryption key is not enough. A user must also "replace" (which should say "Replace & Download") the recovery key (which downloads a "geli_recovery.key" to the users local computer".) - This "recovery key" is the passphrase recovery key. - I'm not really sure how it's used.
The geli.key is the ENCRYPTION recovery key and the geli_recovery.key is the PASSPHRASE recovery key. (THE PROGRAMMERS NEED TO RENAME THESE FILES SO USERS UNDERSTAND WHAT THEY ARE / HAVE DOWNLOADED).
The user must also keep track of / remember the passphrase and passphrase recovery key as the PASSPHRASE is not a temporary passphrase for making / setting a new encryption key (does not re-encrypt the disks).

The add / change passphrase box should say: Adding a passphrase to an encrypted volume is a way to add an additional layer of security. It does not change your encryption key. A passphrase will lock your drive after a reboot and it will remain locked / unusable until you return to this page and unlock the drive by choosing "Unlock this Drive" and entering your passphrase. Remember to SAVE / download your passphrase recovery key (and the passphrase itself, in addition to the encryption recovery key) as this action invalidates the previous recovery key.

The "Wrong Key" error should be changed to WRONG GLID KEY FILE - this would make it clear to the user that the FILE was incorrect after just typing in a password / passphrase ( I think this error might be direct from linux, and if so, hopefully that will get fixed in a future release).

Overall, the documentation on encryption needs to be updated so it makes sense to the user. The GUI also needs to have this info right there so people can read the notes and know what they are doing and SUPPOSED TO DO. - Please EDUCATE US, give us the knowledge (in the GUI) so we know what we are doing.

I don' t know what caused this, but in my opinion this could be fixed in a future release:
1. Before re-keying, make a backup of the metadata of EACH disk into one encrypted file using either the passphrase or whatever the user wants. - Yes, this may be a security risk, but most of us that have enabled encryption can probably deal with keeping that ENCRYPTED file safe. OR put it on the USB drive that will auto delete after a week or so, allowing the user to have enough time to notice a problem, AND REQUEST, THEN RECEIVE HELP.

2. Once the backup has been made / completed, THEN re-key. - If there is a failure for ANY reason, the BACKUP meta data can be downloaded (via the GUI please)

3. Ask the user if they want to download a new encrypted backup copy of the metadata from all the disks.

4. Allow the user to download the metadata from all drives as one encrypted file at any time (not just during a re-key). Sure, have a page explaining the importance keeping the metadata safe.


From previous post above:
I did a "geli backup" from one of my 3 working disks and did a backup to one of the seemly broken drives... It seemed to work to the extent that while doing a "geli attach" it asked for the passphrase and didn't complain... - So I don't get a "invalid argument" on that drive now. But I don't think the disk is actually readable / unlocked. - I'm thinking if anything, I need to get a copy of metadata from a drive that has the previous metadata with just the key.... I don't know if this is even possible (to restore meta data from a different drive) and if does happen to be possible: how would I create a drive with the same old key? - I have backups of my keys... I have a few more identical 5TB drives... Can I setup another freenas machine, or even use the same one and make a new drive with the old key? Then backup metadata from that working drive to one of those 5 drives?

If that isn't an option.. I am also wondering about metadata slot 1 and 2. - I'm hoping someone knows enough about this metadata stuff to get a few of my drives spinning again. - If all my identical 8 drives were part of the same volume / encryption key, then each of those slots on each drive would be the same right? - I assume that my 3 working drives are / have a set of different keys. But the other broken 5 together would be the same (old key)?

I guess I just don't understand how downloading the recovery key, and knowing my passphrase isn't enough to decrypt a drive. - This really just doesn't make sense to me. - Which is why I hope someone that knows about meta data can help.
 
Status
Not open for further replies.
Top