I think I fell victim to a a combination of two "bugs".
No, I don't have a backup of my 24TB of data, but literally had just bought new hardware to setup as a full backup the day before and had started to copy a little bit of data over.
I'm new to this so here is what happened...
I added a passphrase to my 8 disk system (identical 5TB drives). Raidz2. (Was running freenas 9.10.1 but have now upgraded to 9.10.02) - I actually thought I had a passphrase but I never did to begin with. - Huge regret in adding that passphrase...
There was an error that popped up and then disappeared right after I added the passphrase. I didn't get a chance to read the error, but I think I saw the word "failed". I now think that this error was a re-key error. - I didn't change the encryption key, I just added a passphrase.
I rebooted.
Then the volume was LOCKED, and when trying to unlock, it said "Volume Failed Unlock".
I detached the volume to try to re attach it and only 3 of the drives showed up on page 2/3 (pick disks to decrypt). But with the correct key file and passphrase I got an error that it couldn't attach.
Via the shell I was able to "geli attach" those three drives, but the other FIVE seem to just give me this error: geli: Cannot read metadata from ada1p2: Invalid argument.
I have read a lot and have found this:
The freenas support page says this HIGLIGHTED in ORANGE: If a re-key fails on a multi-disk system, an alert is generated. Do not ignore this alert as doing so may result in the loss of data.
https://doc.freenas.org/9.10/storage.html#managing-encrypted-volumes
So what is the user supposed to do when a re-key error happens? (I didn't even see the error long enough to read it)
I restarted... Was that bad?
Bug #1: https://bugs.pcbsd.org/issues/13409
So after trying to attach my keys and constantly getting the "geli: Cannot read metadata from ada1p2: Invalid argument." error. I decided to search some more and found that the metadata might be bad. WHAT EXACTLY IS IN THIS METADATA? Apparently it's pretty important to have a backup copy. And I don't think I have a backup unless there are backups made as part of the config or somewhere else but from the bug reports below, it looks like users like me are left in the dust because they just don't know enough... And, according to Xin, even if I did know about metadata:, "Those who want to backup GELI metadata either do not understand or are ignorant about how encryption should work". If only I would have known that the meta data of a drive was crucial for certain types of backup. I would have made a backup of it. - The one thing I know about encryption is that without a backup of your key you will be toast if you forget your password. In the case of freenas, you have to have the key file (very understandable). And sometimes a copy of your metadata (how was I supposed to know?).
Bug #2: From these two below, it appears that it was decided that if you don't know about metadata backups then too bad.
https://bugs.freenas.org/issues/3206
https://bugs.freenas.org/issues/2375
My opinion is that the freenas GUI should have a "backup" page to help users understand all the things that should be backed up and some tips / warnings on how to store the sensitive backup data / keys.
https://bugs.freenas.org/issues/2375 / Xin Li wrote:
First of all, the documentation is exaggerating the problem. While it's true that losing the metadata means a total loss of the whole drive, it fails to consider this: how is it different from losing a single drive due to hardware failure? No, there is no difference, and any qualified system administrator should take hardware failures into account. Is it likely to happen? No, because the data is not being overwritten frequently, it's not even being read often, if a hard drive is fragile like this, better chances that one already have other data corruptions and it's not something that can be solved by software.
Well, I personally disagree that it is no different. IT IS VERY DIFFERENT. My 5 drives did not "FAIL", the freenas GUI seemed to have not correctly added / updated the keys / metadata. - This is a documented issue (re-Key failure) and it appears to be marked as "fixed" but without the metadata backups, it appears that it might mean a "total loss of the whole drive" (5 of 8 for me). - I'm really hoping that isn't the case...
So I'm at a point where I hope someone can help me with this issue "geli: Cannot read metadata from ada1p2: Invalid argument."
I tried "geli backup" from one of the 5 unreadable disks, it just gave the same "Invalid Argument" error.
I did a "geli backup" from one of my 3 working disks and did a backup to one of the seemly broken drives... It seemed to work to the extent that while doing a "geli attach" it asked for the passphrase and didn't complain... - So I don't get a "invalid argument" on that drive now. But I don't think the disk is actually readable / unlocked. - I'm thinking if anything, I need to get a copy of metadata from a drive that has the previous metadata with just the key.... I don't know if this is even possible (to restore meta data from a different drive) and if does happen to be possible: how would I create a drive with the same old key? - I have backups of my keys... I have a few more identical 5TB drives... Can I setup another freenas machine, or even use the same one and make a new drive with the old key? Then backup metadata from that working drive to one of those 5 drives?
If that isn't an option.. I am also wondering about metadata slot 1 and 2. - I'm hoping someone knows enough about this metadata stuff to get a few of my drives spinning again. - If all my identical 8 drives were part of the same volume / encryption key, then each of those slots on each drive would be the same right? - I assume that my 3 working drives are / have a set of different keys. But the other broken 5 together would be the same (old key)?
What I "think" I know about MS BitLocker is that the drives are encrypted with a key then there is a 300mb partition (or so) that holds the master key that unlocks the rest of the drive. That disk key is kept in the 300mb partition. So the user has to decrypt the 300mb partition first then the system decrypts the rest of the drive. If the user wants to change the encryption key, just the 300mb partition is re-encrypted, not the entire drive. - I assume that this is somewhat similar. So my data should be intact. - SO I HAVE HOPE.
Thanks for any help! - Please remember that I consider myself fairly new to this. I've been using freenas since 2011, and have setup about 6 - 8 servers, but I really haven't had to do a lot of troubleshooting for these.
As a reminder, I have backups of the new key, the old key that was on the system (i copied it from geli/data and renamed it to master.key dated AUG of 2016) I'm hoping that renaming the key doesn't affect anything... And I also have several backup encryption key copies that I downloaded over the past few months.
No, I don't have a backup of my 24TB of data, but literally had just bought new hardware to setup as a full backup the day before and had started to copy a little bit of data over.
I'm new to this so here is what happened...
I added a passphrase to my 8 disk system (identical 5TB drives). Raidz2. (Was running freenas 9.10.1 but have now upgraded to 9.10.02) - I actually thought I had a passphrase but I never did to begin with. - Huge regret in adding that passphrase...
There was an error that popped up and then disappeared right after I added the passphrase. I didn't get a chance to read the error, but I think I saw the word "failed". I now think that this error was a re-key error. - I didn't change the encryption key, I just added a passphrase.
I rebooted.
Then the volume was LOCKED, and when trying to unlock, it said "Volume Failed Unlock".
I detached the volume to try to re attach it and only 3 of the drives showed up on page 2/3 (pick disks to decrypt). But with the correct key file and passphrase I got an error that it couldn't attach.
Via the shell I was able to "geli attach" those three drives, but the other FIVE seem to just give me this error: geli: Cannot read metadata from ada1p2: Invalid argument.
I have read a lot and have found this:
The freenas support page says this HIGLIGHTED in ORANGE: If a re-key fails on a multi-disk system, an alert is generated. Do not ignore this alert as doing so may result in the loss of data.
https://doc.freenas.org/9.10/storage.html#managing-encrypted-volumes
So what is the user supposed to do when a re-key error happens? (I didn't even see the error long enough to read it)
I restarted... Was that bad?
Bug #1: https://bugs.pcbsd.org/issues/13409
So after trying to attach my keys and constantly getting the "geli: Cannot read metadata from ada1p2: Invalid argument." error. I decided to search some more and found that the metadata might be bad. WHAT EXACTLY IS IN THIS METADATA? Apparently it's pretty important to have a backup copy. And I don't think I have a backup unless there are backups made as part of the config or somewhere else but from the bug reports below, it looks like users like me are left in the dust because they just don't know enough... And, according to Xin, even if I did know about metadata:, "Those who want to backup GELI metadata either do not understand or are ignorant about how encryption should work". If only I would have known that the meta data of a drive was crucial for certain types of backup. I would have made a backup of it. - The one thing I know about encryption is that without a backup of your key you will be toast if you forget your password. In the case of freenas, you have to have the key file (very understandable). And sometimes a copy of your metadata (how was I supposed to know?).
Bug #2: From these two below, it appears that it was decided that if you don't know about metadata backups then too bad.
https://bugs.freenas.org/issues/3206
https://bugs.freenas.org/issues/2375
My opinion is that the freenas GUI should have a "backup" page to help users understand all the things that should be backed up and some tips / warnings on how to store the sensitive backup data / keys.
https://bugs.freenas.org/issues/2375 / Xin Li wrote:
First of all, the documentation is exaggerating the problem. While it's true that losing the metadata means a total loss of the whole drive, it fails to consider this: how is it different from losing a single drive due to hardware failure? No, there is no difference, and any qualified system administrator should take hardware failures into account. Is it likely to happen? No, because the data is not being overwritten frequently, it's not even being read often, if a hard drive is fragile like this, better chances that one already have other data corruptions and it's not something that can be solved by software.
Well, I personally disagree that it is no different. IT IS VERY DIFFERENT. My 5 drives did not "FAIL", the freenas GUI seemed to have not correctly added / updated the keys / metadata. - This is a documented issue (re-Key failure) and it appears to be marked as "fixed" but without the metadata backups, it appears that it might mean a "total loss of the whole drive" (5 of 8 for me). - I'm really hoping that isn't the case...
So I'm at a point where I hope someone can help me with this issue "geli: Cannot read metadata from ada1p2: Invalid argument."
I tried "geli backup" from one of the 5 unreadable disks, it just gave the same "Invalid Argument" error.
I did a "geli backup" from one of my 3 working disks and did a backup to one of the seemly broken drives... It seemed to work to the extent that while doing a "geli attach" it asked for the passphrase and didn't complain... - So I don't get a "invalid argument" on that drive now. But I don't think the disk is actually readable / unlocked. - I'm thinking if anything, I need to get a copy of metadata from a drive that has the previous metadata with just the key.... I don't know if this is even possible (to restore meta data from a different drive) and if does happen to be possible: how would I create a drive with the same old key? - I have backups of my keys... I have a few more identical 5TB drives... Can I setup another freenas machine, or even use the same one and make a new drive with the old key? Then backup metadata from that working drive to one of those 5 drives?
If that isn't an option.. I am also wondering about metadata slot 1 and 2. - I'm hoping someone knows enough about this metadata stuff to get a few of my drives spinning again. - If all my identical 8 drives were part of the same volume / encryption key, then each of those slots on each drive would be the same right? - I assume that my 3 working drives are / have a set of different keys. But the other broken 5 together would be the same (old key)?
What I "think" I know about MS BitLocker is that the drives are encrypted with a key then there is a 300mb partition (or so) that holds the master key that unlocks the rest of the drive. That disk key is kept in the 300mb partition. So the user has to decrypt the 300mb partition first then the system decrypts the rest of the drive. If the user wants to change the encryption key, just the 300mb partition is re-encrypted, not the entire drive. - I assume that this is somewhat similar. So my data should be intact. - SO I HAVE HOPE.
Thanks for any help! - Please remember that I consider myself fairly new to this. I've been using freenas since 2011, and have setup about 6 - 8 servers, but I really haven't had to do a lot of troubleshooting for these.
As a reminder, I have backups of the new key, the old key that was on the system (i copied it from geli/data and renamed it to master.key dated AUG of 2016) I'm hoping that renaming the key doesn't affect anything... And I also have several backup encryption key copies that I downloaded over the past few months.