Multiple Problems - Raidz5

valhallen282

Cadet
Joined
Apr 21, 2020
Messages
8
So the system specs are in my signature to simplify it I'll refer to pool names. This is a fresh and recently built system and I had just moved a lot of my data to it and I have no access to the old stuff as I had to clear drives to return them to my old business. (Yes, I know it was stupid)
MonStor is a Raidz5 used for storing movies and older files and is mainly accessed by two devices. Slicer was a Striped Raid used for temporary offloading of data from work machine. Everything is connected to a APC UPS BR1000G -- but isn't controlled or connected to the machine (unforunately lost the cable while moving)
One unfortunate morning i was fast asleep and power was out for two hours. During this time someone at home inadvertently hit the power button. When I woke up power was on and I was none the wiser. Log in and Slicer is degraded and cannot find the second drive but seems to be working. It also had the system dataset on it. So.
Slicer  Move system Data set to USB Drive -> Destroy Pool (it was empty) -> Recreated as Striped array -> Scrub drives -> Works fine.
MonStor was locked the whole time -> Attempt to unlock with correct key FAILED -> Giving it the key failed -> Downloaded a second backup of the key and Destroy Pool -> Import fails with no available disks. -> Rebooted -> USB Failed. Fiddlesticks.

Code:
root@freenas[~]# zpool status
  pool: Slicer
 state: ONLINE
  scan: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        Slicer                                          ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/74e8cc59-83bc-11ea-962b-503eaa0d3bdd  ONLINE       0     0     0
            gptid/751a75d1-83bc-11ea-962b-503eaa0d3bdd  ONLINE       0     0     0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors 

Code:
root@freenas[~]# gpart show -l
=>        40  1953525088  ada3  GPT  (932G)
          40          88        - free -  (44K)
         128     4194304     1  (null)  (2.0G)
     4194432  1949330696     2  (null)  (930G)

=>        40  1953525088  ada4  GPT  (932G)
          40          88        - free -  (44K)
         128     4194304     1  (null)  (2.0G)
     4194432  1949330696     2  (null)  (930G)

=>      40  31266736  da0  GPT  (15G)
        40    532480    1  (null)  (260M)
    532520  30703616    2  (null)  (15G)
  31236136     30640       - free -  (15M) 

The drives are showing in the GUI – but can only be added to a FRESH pool but don’t show up as a recoverable pool. Where did I goof…
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
OK, what does zpool import show ?
 

valhallen282

Cadet
Joined
Apr 21, 2020
Messages
8

valhallen282

Cadet
Joined
Apr 21, 2020
Messages
8
You destroyed the pool?
No i did not destroy the pool - i Export/Disconnected it very specifically because that's what a LOT of posts here always say. Download the key - Export/Disconnect and reimport. I looked through the zpool import man page to understand what commands i was giving and why.
-Df was used in case for whatever reason the system assumed it was destroyed
so to keep it moving with asssumptions of everything else i may be asked showing all import attempts+camcontrol+ zdb status->
Code:
root@freenas[~]# zpool import
root@freenas[~]# zpool import -a
root@freenas[~]# zpool import -Df
Code:
root@freenas[~]# camcontrol devlist
<HITACHI HUA723030ALA640 MKAONS00>  at scbus0 target 0 lun 0 (pass0,ada0)
<HITACHI HUA723030ALA640 MKAONS00>  at scbus1 target 0 lun 0 (pass1,ada1)
<Hitachi HUA723030ALA640 MKAOAA50>  at scbus2 target 0 lun 0 (pass2,ada2)
<ST1000DM003-1SB102 CC43>          at scbus4 target 0 lun 0 (pass3,ada3)
<ST1000DM003-1SB10C CC43>          at scbus8 target 0 lun 0 (pass4,ada4)
<SanDisk Ultra 1.00>               at scbus14 target 0 lun 0 (pass5,da0)
Code:
root@freenas[~]# zdb -l /dev/ada0
------------------------------------
LABEL 0
------------------------------------
failed to unpack label 0
------------------------------------
LABEL 1
------------------------------------
failed to unpack label 1
------------------------------------
LABEL 2
------------------------------------
failed to unpack label 2
------------------------------------
LABEL 3
------------------------------------
failed to unpack label 3
root@freenas[~]# zdb -l /dev/ada1
------------------------------------
LABEL 0
------------------------------------
failed to unpack label 0
------------------------------------
LABEL 1
------------------------------------
failed to unpack label 1
------------------------------------
LABEL 2
------------------------------------
failed to unpack label 2
------------------------------------
LABEL 3
------------------------------------
failed to unpack label 3
root@freenas[~]# zdb -l /dev/ada2
------------------------------------
LABEL 0
------------------------------------
failed to unpack label 0
------------------------------------
LABEL 1
------------------------------------
failed to unpack label 1
------------------------------------
LABEL 2
------------------------------------
failed to unpack label 2
------------------------------------
LABEL 3
------------------------------------
failed to unpack label 3
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700

valhallen282

Cadet
Joined
Apr 21, 2020
Messages
8
I just updated to it today! I was on the previous release! Is there any possible recovery method!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Is there any possible recovery method!
Looks like the damage is done to the disks directly with that issue, so no specific recovery that I'm aware of.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Here's the link to the update info https://www.ixsystems.com/blog/library/freenas-11-3-u2-1/

In it you'll find a link to the bug report that includes information on the underlying issue and analysis of it. Perhaps if you add to the bug report and state the details of the situation you are in you may be offered some helpful comment towards recovery.
 
Joined
Oct 18, 2018
Messages
969
@valhallen282 do you have backups of the geli metadata as well as the keys?
 

valhallen282

Cadet
Joined
Apr 21, 2020
Messages
8
@valhallen282 do you have backups of the geli metadata as well as the keys?
Keys yes. Metadata no. I wasn't aware I would need a backup of it. Else it would've been the first thing on my build list!

Here's the link to the update info https://www.ixsystems.com/blog/library/freenas-11-3-u2-1/ Perhaps if you add to the bug report and state the details of the situation you are in you may be offered some helpful comment towards recovery.
I've posted there as well. Let's hope something good comes out of it.
 
Joined
Oct 18, 2018
Messages
969
Keys yes. Metadata no. I wasn't aware I would need a backup of it. Else it would've been the first thing on my build list!
My understanding of the bug is that the geli metadata is destroyed on the disk as well. What version were you on before?

If you were affected by this bug, do you have backups?
 

mongoose

Cadet
Joined
Apr 25, 2020
Messages
4
Hey there,

I just wanted to join in the conversation. I am hit with the bug, too. At least it is looking like that at the moment.

I have/had a FreeNAS on version 11.2-U7 with a RaidZ2, 6 x 4 TB NAS HDD (mixed Ironwolf and WD Red). The zpool was encrypted.

In addition I had two 240 GB SSD non-raid for testing VMs and one as system pool (logs and stuff) rigged to a SATA/PCIe controller from High Point, Modell Rocket.

Before the update to 11.3-U2 I had to change the SATA/PCIe controller because of SMART errors. Nothing wrong with the SSD, but the High Point has a history here (report in german tech magazine).

So, after changing it to a new one with a marvell 88SE9215 chip, everything looked good. And I applied the update for 11.3-U2... And this is where the fun started. At this point FreeNAS also informed me to upgrade the pool flags, which I did as advised.

After reboot, I tried to unlock my big pool "Goliath" (there must be some sort of irony here, I know) in the pool section. But it didn't work.
I looked at the disk view section and saw, that the disks were there but marked as "unused" instead of the pool name.

The mentioned system pool was still there an working. So the config data of "Goliath" was still sitting there. I tried booting to U7, but no luck here.
Then I tried to export/disconnect the pool (without marking destroy data or config) to clean the config and then do a import. This obviously did not work...

Then I tried the shell to look for additional status - zpool status, zpool import and so on - no luck here.

I hooked the HDDs to SATA writeblockers and looked for the file system structure and partition tables via hex viewer... I found none at first sight. Overnight I additionally searched with UFS Explorer for additional GPT backup headers, and at least the programm found none.

I will look for them manually in the next days. In addition for looking at the mentioned GEOM::ELI structures.

If not found, I will at least try to build new GPT headers from scratch; maybe I can find the old GUID of the devices in a DB backup of the FreeNAS config.

To be honest, I am a bit familiar with file systems (NTFS, FAT, and so on). But ZFS is new to me. That is why it wanted to check out FreeNAS in the first place. Unfortunately I don't have a backup of the encrypted files... but up front: No backup, no pity, right?!

Those are not the most important data to me, but I would be still interested in a recovery. Right now I see it more like a sporty challenge...
But any hints or help would be very much appreciated.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Hey there,

I just wanted to join in the conversation. I am hit with the bug, too. At least it is looking like that at the moment.

I have/had a FreeNAS on version 11.2-U7 with a RaidZ2, 6 x 4 TB NAS HDD (mixed Ironwolf and WD Red). The zpool was encrypted.

In addition I had two 240 GB SSD non-raid for testing VMs and one as system pool (logs and stuff) rigged to a SATA/PCIe controller from High Point, Modell Rocket.

Before the update to 11.3-U2 I had to change the SATA/PCIe controller because of SMART errors. Nothing wrong with the SSD, but the High Point has a history here (report in german tech magazine).

So, after changing it to a new one with a marvell 88SE9215 chip, everything looked good. And I applied the update for 11.3-U2... And this is where the fun started. At this point FreeNAS also informed me to upgrade the pool flags, which I did as advised.

After reboot, I tried to unlock my big pool "Goliath" (there must be some sort of irony here, I know) in the pool section. But it didn't work.
I looked at the disk view section and saw, that the disks were there but marked as "unused" instead of the pool name.

The mentioned system pool was still there an working. So the config data of "Goliath" was still sitting there. I tried booting to U7, but no luck here.
Then I tried to export/disconnect the pool (without marking destroy data or config) to clean the config and then do a import. This obviously did not work...

Then I tried the shell to look for additional status - zpool status, zpool import and so on - no luck here.

I hooked the HDDs to SATA writeblockers and looked for the file system structure and partition tables via hex viewer... I found none at first sight. Overnight I additionally searched with UFS Explorer for additional GPT backup headers, and at least the programm found none.

I will look for them manually in the next days. In addition for looking at the mentioned GEOM::ELI structures.

If not found, I will at least try to build new GPT headers from scratch; maybe I can find the old GUID of the devices in a DB backup of the FreeNAS config.

To be honest, I am a bit familiar with file systems (NTFS, FAT, and so on). But ZFS is new to me. That is why it wanted to check out FreeNAS in the first place. Unfortunately I don't have a backup of the encrypted files... but up front: No backup, no pity, right?!

Those are not the most important data to me, but I would be still interested in a recovery. Right now I see it more like a sporty challenge...
But any hints or help would be very much appreciated.
If you never destroyed a pool your problem is different.
 

mongoose

Cadet
Joined
Apr 25, 2020
Messages
4
I found the thread, when I looked at the jira ticket concerning the above mentioned bug/hotfix for U2.1 which seems to be the underlying problem for him and for me.

Or shall I open a new thread?!
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I found the thread, when I looked at the jira ticket concerning the above mentioned bug/hotfix for U2.1 which seems to be the underlying problem for him and for me.

Or shall I open a new thread?!
If you read the ticket, you have to destroy a pool while the other pool is locked.
 

mongoose

Cadet
Joined
Apr 25, 2020
Messages
4
Sorry, forgot that part, indeed.

Before changing the SATA controller I destroyed the pool for the VMs on one of the SSD.

So should be the same problem.
 
Joined
Oct 18, 2018
Messages
969
If not found, I will at least try to build new GPT headers from scratch; maybe I can find the old GUID of the devices in a DB backup of the FreeNAS config.
I agree with @SweetAndLow, if you did not export AND destroy any pools while on 11-3-RELEASE|U1|U2 you likely were not affected by the bug.

I will say though that what you describe is exactly what I would expect from the bug. I was hit by it and did some research and found that not only are the partition tables destroyed but also the geli metadata. Even if you recreate the partitions exactly like they were you would still need the geli metadata to unlock the partition. What I'm not 100% certain of is whether /dev/zero is run across the entire drive or just near the partition boundaries. If it is just near the boundaries and you have the geli metadata you may be able to get something back.

To be honest, I am a bit familiar with file systems (NTFS, FAT, and so on). But ZFS is new to me. That is why it wanted to check out FreeNAS in the first place. Unfortunately I don't have a backup of the encrypted files... but up front: No backup, no pity, right?!
Are you saying you don't have the geli metadata or the keys? If data recovery is possible at all you'll need both.

Those are not the most important data to me, but I would be still interested in a recovery. Right now I see it more like a sporty challenge...
But any hints or help would be very much appreciated.
My hint would be as follows
Code:
gpart create -s GPT <device>
gpart add -a 4k -b 128 -s 2G -t freebsd-swap <device>
gpart add -a 4k -t freebsd-zfs <device>
geli restore /path/to/geli/metadata/backup.file /dev/gptid/<rawuuid of freebsd-zfs partition of disk>

Or at least something like that. I wrote the above from memory so please if you opt to give that a shot; please double-check the commands against what FreeNAS expects. You can now try to import the pool and see if it works. You'll need your keys to do that.
 

mongoose

Cadet
Joined
Apr 25, 2020
Messages
4
As written above: Sorry, forgot that specific part in my writedown.

Thanks for your hints/comments right away!!

Right now, I am making forensic copies of all drives plus the boot/system pool (just for safety, maybe there is some metadata left to use). Two disks at at time, it will probably take me till the middle of the week.

After that I will continue working with the copies and spin up the drives in a totally new install.

Then I will try to run your gpart commands and do a full search for the geli metadata on a forensic copy.

I will also speak to a colleague - who, from hearsay - had some forensic business with Geli in the past.

I keep you posted. If you hit some milestones, please let me know.
 
Joined
Oct 18, 2018
Messages
969
To be honest @mongoose my bet is that your attempts will be in vain, but I am interested to know if I end up being wrong.

After that I will continue working with the copies and spin up the drives in a totally new install.
The bug's effects will not be reversed by a fresh install. You should upgrade to 11.3-U2.1 though to avoid being hit by it again.

Then I will try to run your gpart commands and do a full search for the geli metadata on a forensic copy.
I'm not sure what you mean by a forensic copy (is that an industry term I am not familiar with?) unless you mean basically using dd or similar to copy the drive whole-sale over to another drive. Unless you have a copy of the metadata already in hand or a way to read bits after a pass with /dev/zero my guess is that your geli metadata is not recoverable and therefore your encrypted partition, even if recreated, is unusable.

I will also speak to a colleague - who, from hearsay - had some forensic business with Geli in the past.
I'm certainly no geli expert. I'd love to know if there is some secret path to metadata recovery.

Then I will try to run your gpart commands and do a full search for the geli metadata on a forensic copy.
Definitely worth double-checking me though. You may be able to find the partition commands FreeNAS uses in your logs. It is definitely discoverable via the source code though.

Sorry if I seem a little discouraging. I am very interested to know if you make a breakthrough though. I had planned to try to experiment with my own drives but opted not to because I really needed my backup pools online again ASAP.
 
Top