Recover data from a pool that cannot be imported...

jaggel · Sep 20, 2013

I had a raidz1 pool with 6 x 1tb disks on a Freenas 8.3.0 box till recently that I made a series of mistakes I think and cannot import it...
After a disk failure I physically replaced the disk but accidentally instead of replacing I simply added it to the pool thus resulting in a 'not recommended' situation with my pool consisting of a stripe set of a "degraded" raidz vdev and a single disk...
The real problem though came in when the single disk also failed and I can no longer import the pool... The error is the "One or more devices are missing from the system."
Since I have not written any new data on the pool after the "replacement" is there any way that I could extract the data from the "degraded" raidz vdev? Can we somehow bypass the missing top level vdev or is there any advanced way to import/recover the pool...?
Just for the record when I executed zdb I saw that the labels are missing on all of the disks of the pool but it seems that all the other info is there (apart from the failed single disk)...

Thanks in advance.

cyberjock · Sep 20, 2013

Post the output of zpool import and zpool status, in code tags. The code tags will save the formatting, which is very important.

jaggel · Sep 20, 2013

cyberjock said:
Post the output of zpool import and zpool status, in code tags. The code tags will save the formatting, which is very important.

The output of zpool import:

Code:

[root@neptune] ~# zpool import
  pool: tank2
    id: 2303632421064316926
  state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
        devices and try again.
  see: http://www.sun.com/msg/ZFS-8000-6X
config:
 
        tank2                                          UNAVAIL  missing device
          raidz1-0                                      DEGRADED
            gptid/9aa2891a-5734-11e2-8ebb-001e8c089abb  ONLINE
            gptid/9b61dcb6-5734-11e2-8ebb-001e8c089abb  ONLINE
            gptid/00ba9619-d784-11e2-90cf-001e8c089abb  ONLINE
            gptid/9ca58184-5734-11e2-8ebb-001e8c089abb  ONLINE
            16389829440877692601                        UNAVAIL  cannot open
            gptid/9e61fdfb-5734-11e2-8ebb-001e8c089abb  ONLINE
 
        Additional devices are known to be part of this pool, though their
        exact configuration cannot be determined.

The output of zpool status:

Code:

[root@neptune] ~# zpool status
  pool: tank1
state: ONLINE
  scan: scrub repaired 0 in 0h53m with 0 errors on Fri Sep 20 09:30:43 2013
config:
 
        NAME        STATE    READ WRITE CKSUM
        tank1      ONLINE      0    0    0
          ada1      ONLINE      0    0    0
          ada0      ONLINE      0    0    0
          ada7      ONLINE      0    0    0
 
errors: No known data errors

cyberjock · Sep 21, 2013

Well, its like this....

tank1 has 3 disks, and if any of those fail or otherwise have significant problems, you will lose the pool.

tank2 clearly has problems. Not sure what disks are missing since it says that there's additional disks but their configuration cannot be determined. But your assessment is correct, without the missing disks you cannot mount the pool. Since you can't mount the pool all data in the pool is lost. There is no bypassing the missing vdev, and the second that the disk was added to the pool as another vdev there was no going back. Sorry, but I don't see any way to get any data from the pool. Here's where I'd say recover from backup, but something tells me you don't have backups....

jaggel · Sep 21, 2013

Thanks for the prompt response...
As far as tank1 is concerned it is intentionally left without "protection" since there are only volatile data on it...
Now for the tank2 the missing device was the single one I accidentally added and created a new top level vdev on my pool... After a thorough search on the net and leaving out the hardcore solutions with mdb and zdb I found out that in a similar case a user just forged a vdev label to a new disk that will match the one of the missing device from the MOS...
I think I will give it a try and I will update with results...

cyberjock · Sep 21, 2013

Yeah, it doesn't work like that... but good luck!

jaggel · Oct 2, 2013

Well just an update...

I managed to import my pool even in a degraded state and I am in the process of the getting the data out of it...
The painful process of importing it included zfs source code examining (along with the excellent document zfs-on-disk specification) and hexediting the raw disk in order to write to correct the vdev labels...!

cyberjock · Oct 2, 2013

Any chance you can provide some steps on what you did? Naturally the commands you ran wont work for others, but someone might be interested in what you did.

rovan · Oct 2, 2013

Best of luck with your data recovery jaggel! pretty cool if your get your data back with that method.

paleoN · Oct 2, 2013

cyberjock said:
Any chance you can provide some steps on what you did? Naturally the commands you ran wont work for others, but someone might be interested in what you did.

Perhaps someone like you?

I know I would be interested. I take it you are importing the pool readonly. Now that you have the failed vdev "back" I'm curious what rolling back to an earlier txg would do. Assuming you have an old enough valid one and only after copying what you can off.

cyberjock · Oct 2, 2013

I have noticed that rollbacks are practically impossible for systems that are mounted. One system with no jails, no sharing services running, etc would still do 1 transaction every 6-10 seconds. ZFS does up to 127 transactions that can be rolled back(if my understanding is correct). So at best, you are looking at 20 minutes with the zpool mounted. Any activity on the pool will seriously curtail that 20 minutes. Anyone that mounts a pool and thinks they are going to copy data off, then try to do rollbacks don't have a chance. You get to copy your data off or do rollbacks, not both. Most people will naturally go for their data and not try to do a rollback. ;)

jaggel · Oct 3, 2013

cyberjock said:
Any chance you can provide some steps on what you did? Naturally the commands you ran wont work for others, but someone might be interested in what you did.

Of course I can provide info on what I've done but as you said the "solution" is highly customized to my situation and my configuration... Anyhow I think if someone understands a little of what ZFS does under the hood (and given some assumptions) he could probably rescue some (or all) of the data...

First of all and before trying it on my actual disks I simulated my situation on virtual machine with the exact same disk configuration and FreeNAS version. Having no way to get back to my initial configuration with a RAIDz 6-drive pool, I tried to go simply one step back and regenerate the pool with a degraded RAIDz pool (5 of 6 disks) and a single disk. I initially read almost all of the ZFS-On-Disk specifiation in order to understand how the vdev labels are stored on each disk.

Then I used zdb to extract the label/uberblocks info that was stored on the disks of the RAIDz vdev. The only zdb options the work on not loaded pools are -l -u and -e. I used the first two to extract the label and uberblock info. The single disk had on label on it (since it was empty) so initialy I tried to create a label on it by using glabel and gpart to create the appropriate gpt label and partitions. Then I created a new pool (zpool create) containing only this single disk. This forged the vdev label on my empty disk and I realized that it was not far from the one I needed. I destroyed the newly created pool (zpool destroy) and I checked (with zdb) that my label survived (with state marked as 2 - destroyed).

Now comes the hard part of editing the label on the disk in order to make it match with the destroyed... I saw immediately the I should change the following fields of the label:

Code:

pool name
pool state
pool txg
pool guid
top guid
guid
vdev children

along with the

Code:

id
guid

of the vdev_tree node. Also from my reading I deducted that I should also change the

Code:

txg
guid_sum

on the "latest" uberblock. All the vdev fields can be found from the output of zdb on the RAIDZ disks apart from the guid of the single disk. The latter was found, based on the assumption that the uberblocks hold a running sum of all the vdevs in the pool (guid_sum field), by substracting from the guid_sum the sum of all the guids from the zdb output on RAIDZ disks without the single disk.

At last you should hexedit the raw disk in order to change all 4 instances of the vdev label. I used a modified version of the SystemRescueCD ;) that could also access ZFS partitions and I hexedited the disk (beware that the uberblock part is little endian formatted). After this last step I tried (on SystemRescueCD linux) to zpool import the tank (using readonly and -f options) and it worked... The pool is of course marked as degraded (and the single disk is marked as faulted) but the data is there and I started copying them...

I repeat that this is not a panacea but in my situation and keeping in mind that no data was written on the pool after the first "bad replacement" I got my data back... I believe that in any other situation (containing a missing device on a stripeset) it would probably be impossible to get ALL your data back!

P.S. I tried to rollback using zpool import -T but as you said it was rather impossible to find the correct txg num to go to... So I tried to get my data out instead and then recreate correctly the pool.

paleoN · Oct 3, 2013

jaggel said:
The single disk had on label on it (since it was empty) so initialy I tried to create a label on it by using glabel and gpart to create the appropriate gpt label and partitions.

You only need gpart here.

jaggel said:
All the vdev fields can be found from the output of zdb on the RAIDZ disks apart from the guid of the single disk. The latter was found, based on the assumption that the uberblocks hold a running sum of all the vdevs in the pool (guid_sum field), by substracting from the guid_sum the sum of all the guids from the zdb output on RAIDZ disks without the single disk.

Interesting. Not having needed to recreate a label, I have never tried figuring this out. I think I will look at my 2 x mirrored pool later.

jaggel said:
I used a modified version of the SystemRescueCD ;) that could also access ZFS partitions and I hexedited the disk (beware that the uberblock part is little endian formatted).

Assuming ZFS v28 or lower pool, mfsBSD would also work for this.

jaggel said:
P.S. I tried to rollback using zpool import -T but as you said it was rather impossible to find the correct txg num to go to... So I tried to get my data out instead and then recreate correctly the pool.

You would want to copy the data out first anyway. I was curious about a zpool import -T now that you "recreated" the missing vdev or did you try that already?

cyberjock said:
I have noticed that rollbacks are practically impossible for systems that are mounted. One system with no jails, no sharing services running, etc would still do 1 transaction every 6-10 seconds.

Recovery should be done in single user mode and ideally with the pool imported read-only. However, you're right in that I shouldn't assume people are doing this. Also, when my pool is idle then it's idle.

cyberjock said:
ZFS does up to 127 transactions that can be rolled back(if my understanding is correct).

It's zero based so actually 128 uberblocks each 1K apiece. Of course it's much less with an ashift=12 pool.

jaggel · Oct 3, 2013

paleoN said:
You would want to copy the data out first anyway. I was curious about a zpool import -T now that you "recreated" the missing vdev or did you try that already?

No I haven't tried that... In a matter of fact I didn't even think of trying it but just out of curiosity I will give it a try and get back with results... :)

paleoN · Oct 4, 2013

paleoN said:
jaggel said:

All the vdev fields can be found from the output of zdb on the RAIDZ disks apart from the guid of the single disk. The latter was found, based on the assumption that the uberblocks hold a running sum of all the vdevs in the pool (guid_sum field), by substracting from the guid_sum the sum of all the guids from the zdb output on RAIDZ disks without the single disk.

Click to expand...

Interesting. Not having needed to recreate a label, I have never tried figuring this out. I think I will look at my 2 x mirrored pool later.

I took a glance at this earlier and my sum did not match. I guess I added wrong or was looking at the wrong thing.

cyberjock · Oct 4, 2013

paleoN said:
Recovery should be done in single user mode and ideally with the pool imported read-only. However, you're right in that I shouldn't assume people are doing this. Also, when my pool is idle then it's idle.

On the system I was looking at, it was idle to. The writes/reads were in 4k increments, which pretty much implied that it was making no actual data reads or writes since the transactions can't be less than 4k in size. I was a little confused by this, but I didn't really look into it further. Not to mention that many people leave atime enabled, which would update the time for "last accessed"which would count as transactions.

Not to mention, even read its mounted as readonly if ZFS finds an error and could correct it with parity data, it will actually correct it. The only thing that seems to actually be readonly is anything that the user would do. At least some internal components of ZFS seem to still write to the pool for corrections. Of course, if things are already borked you might not want it to correct it. In one situation readonly didn't save me from someone that had ada0p2 as part of a mirrored vdev, but then added ada0 as a stripe via the CLI. We learned the hard way that readonly will actually try to "fix" things ZFS finds broken, but in our case fixing anything would make things much worse because the same physical sectors were used for 2 different vdevs in the same zpool.

Overall, I think unless you realize 2 seconds ago that something you did was bad and immediately unmount your pool transaction rollback seems unlikely to have a chance of being able to be performed. I have tried to do rollbacks before and never had success. But I haven't tried to prove that rollbacks are pretty much impossible for FreeNAS, only that I couldn't get them to work.

jaggel · Oct 5, 2013

paleoN said:
I took a glance at this earlier and my sum did not match. I guess I added wrong or was looking at the wrong thing.

Yeap... That was exactly what I got the first time... No match... But after a while I figured it out... The guid sum has all leaf devices (files + disks) as the documentation implies but it also sums the pool guid and every internal vdev guide (mirror, raidz etc)...

And finally keep in mind that the sum is being stored in uint64 and is being manipulated by c so the sum will cut any digits over 64bits...

Important Announcement for the TrueNAS Community.

Recover data from a pool that cannot be imported...

jaggel

Cadet

cyberjock

Inactive Account

jaggel

Cadet

cyberjock

Inactive Account

jaggel

Cadet

cyberjock

Inactive Account

jaggel

Cadet

cyberjock

Inactive Account

rovan

Dabbler

paleoN

Wizard

cyberjock

Inactive Account

jaggel

Cadet

paleoN

Wizard

jaggel

Cadet

paleoN

Wizard

cyberjock

Inactive Account

jaggel

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Recover data from a pool that cannot be imported...

Cadet

Inactive Account

Cadet

Inactive Account

Cadet

Inactive Account

Cadet

Inactive Account

Dabbler

Wizard

Inactive Account

Cadet

Wizard

Cadet

Wizard

Inactive Account

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Recover data from a pool that cannot be imported..."

Similar threads