Recover data from a pool that cannot be imported...

Status
Not open for further replies.

jaggel

Cadet
Joined
Aug 13, 2012
Messages
7
I had a raidz1 pool with 6 x 1tb disks on a Freenas 8.3.0 box till recently that I made a series of mistakes I think and cannot import it...
After a disk failure I physically replaced the disk but accidentally instead of replacing I simply added it to the pool thus resulting in a 'not recommended' situation with my pool consisting of a stripe set of a "degraded" raidz vdev and a single disk...
The real problem though came in when the single disk also failed and I can no longer import the pool... The error is the "One or more devices are missing from the system."
Since I have not written any new data on the pool after the "replacement" is there any way that I could extract the data from the "degraded" raidz vdev? Can we somehow bypass the missing top level vdev or is there any advanced way to import/recover the pool...?
Just for the record when I executed zdb I saw that the labels are missing on all of the disks of the pool but it seems that all the other info is there (apart from the failed single disk)...

Thanks in advance.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Post the output of zpool import and zpool status, in code tags. The code tags will save the formatting, which is very important.
 

jaggel

Cadet
Joined
Aug 13, 2012
Messages
7
Post the output of zpool import and zpool status, in code tags. The code tags will save the formatting, which is very important.

The output of zpool import:
Code:
[root@neptune] ~# zpool import
  pool: tank2
    id: 2303632421064316926
  state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
        devices and try again.
  see: http://www.sun.com/msg/ZFS-8000-6X
config:
 
        tank2                                          UNAVAIL  missing device
          raidz1-0                                      DEGRADED
            gptid/9aa2891a-5734-11e2-8ebb-001e8c089abb  ONLINE
            gptid/9b61dcb6-5734-11e2-8ebb-001e8c089abb  ONLINE
            gptid/00ba9619-d784-11e2-90cf-001e8c089abb  ONLINE
            gptid/9ca58184-5734-11e2-8ebb-001e8c089abb  ONLINE
            16389829440877692601                        UNAVAIL  cannot open
            gptid/9e61fdfb-5734-11e2-8ebb-001e8c089abb  ONLINE
 
        Additional devices are known to be part of this pool, though their
        exact configuration cannot be determined.


The output of zpool status:
Code:
[root@neptune] ~# zpool status
  pool: tank1
state: ONLINE
  scan: scrub repaired 0 in 0h53m with 0 errors on Fri Sep 20 09:30:43 2013
config:
 
        NAME        STATE    READ WRITE CKSUM
        tank1      ONLINE      0    0    0
          ada1      ONLINE      0    0    0
          ada0      ONLINE      0    0    0
          ada7      ONLINE      0    0    0
 
errors: No known data errors
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, its like this....

tank1 has 3 disks, and if any of those fail or otherwise have significant problems, you will lose the pool.

tank2 clearly has problems. Not sure what disks are missing since it says that there's additional disks but their configuration cannot be determined. But your assessment is correct, without the missing disks you cannot mount the pool. Since you can't mount the pool all data in the pool is lost. There is no bypassing the missing vdev, and the second that the disk was added to the pool as another vdev there was no going back. Sorry, but I don't see any way to get any data from the pool. Here's where I'd say recover from backup, but something tells me you don't have backups....
 

jaggel

Cadet
Joined
Aug 13, 2012
Messages
7
Thanks for the prompt response...
As far as tank1 is concerned it is intentionally left without "protection" since there are only volatile data on it...
Now for the tank2 the missing device was the single one I accidentally added and created a new top level vdev on my pool... After a thorough search on the net and leaving out the hardcore solutions with mdb and zdb I found out that in a similar case a user just forged a vdev label to a new disk that will match the one of the missing device from the MOS...
I think I will give it a try and I will update with results...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, it doesn't work like that... but good luck!
 

jaggel

Cadet
Joined
Aug 13, 2012
Messages
7
Well just an update...

I managed to import my pool even in a degraded state and I am in the process of the getting the data out of it...
The painful process of importing it included zfs source code examining (along with the excellent document zfs-on-disk specification) and hexediting the raw disk in order to write to correct the vdev labels...!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Any chance you can provide some steps on what you did? Naturally the commands you ran wont work for others, but someone might be interested in what you did.
 

rovan

Dabbler
Joined
Sep 30, 2013
Messages
33
Best of luck with your data recovery jaggel! pretty cool if your get your data back with that method.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Any chance you can provide some steps on what you did? Naturally the commands you ran wont work for others, but someone might be interested in what you did.
Perhaps someone like you?

I know I would be interested. I take it you are importing the pool readonly. Now that you have the failed vdev "back" I'm curious what rolling back to an earlier txg would do. Assuming you have an old enough valid one and only after copying what you can off.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I have noticed that rollbacks are practically impossible for systems that are mounted. One system with no jails, no sharing services running, etc would still do 1 transaction every 6-10 seconds. ZFS does up to 127 transactions that can be rolled back(if my understanding is correct). So at best, you are looking at 20 minutes with the zpool mounted. Any activity on the pool will seriously curtail that 20 minutes. Anyone that mounts a pool and thinks they are going to copy data off, then try to do rollbacks don't have a chance. You get to copy your data off or do rollbacks, not both. Most people will naturally go for their data and not try to do a rollback. ;)
 

jaggel

Cadet
Joined
Aug 13, 2012
Messages
7
Any chance you can provide some steps on what you did? Naturally the commands you ran wont work for others, but someone might be interested in what you did.


Of course I can provide info on what I've done but as you said the "solution" is highly customized to my situation and my configuration... Anyhow I think if someone understands a little of what ZFS does under the hood (and given some assumptions) he could probably rescue some (or all) of the data...

First of all and before trying it on my actual disks I simulated my situation on virtual machine with the exact same disk configuration and FreeNAS version. Having no way to get back to my initial configuration with a RAIDz 6-drive pool, I tried to go simply one step back and regenerate the pool with a degraded RAIDz pool (5 of 6 disks) and a single disk. I initially read almost all of the ZFS-On-Disk specifiation in order to understand how the vdev labels are stored on each disk.

Then I used zdb to extract the label/uberblocks info that was stored on the disks of the RAIDz vdev. The only zdb options the work on not loaded pools are -l -u and -e. I used the first two to extract the label and uberblock info. The single disk had on label on it (since it was empty) so initialy I tried to create a label on it by using glabel and gpart to create the appropriate gpt label and partitions. Then I created a new pool (zpool create) containing only this single disk. This forged the vdev label on my empty disk and I realized that it was not far from the one I needed. I destroyed the newly created pool (zpool destroy) and I checked (with zdb) that my label survived (with state marked as 2 - destroyed).

Now comes the hard part of editing the label on the disk in order to make it match with the destroyed... I saw immediately the I should change the following fields of the label:
Code:
pool name
pool state
pool txg
pool guid
top guid
guid
vdev children
along with the
Code:
id
guid
of the vdev_tree node. Also from my reading I deducted that I should also change the
Code:
txg
guid_sum
on the "latest" uberblock. All the vdev fields can be found from the output of zdb on the RAIDZ disks apart from the guid of the single disk. The latter was found, based on the assumption that the uberblocks hold a running sum of all the vdevs in the pool (guid_sum field), by substracting from the guid_sum the sum of all the guids from the zdb output on RAIDZ disks without the single disk.

At last you should hexedit the raw disk in order to change all 4 instances of the vdev label. I used a modified version of the SystemRescueCD ;) that could also access ZFS partitions and I hexedited the disk (beware that the uberblock part is little endian formatted). After this last step I tried (on SystemRescueCD linux) to zpool import the tank (using readonly and -f options) and it worked... The pool is of course marked as degraded (and the single disk is marked as faulted) but the data is there and I started copying them...

I repeat that this is not a panacea but in my situation and keeping in mind that no data was written on the pool after the first "bad replacement" I got my data back... I believe that in any other situation (containing a missing device on a stripeset) it would probably be impossible to get ALL your data back!

P.S. I tried to rollback using zpool import -T but as you said it was rather impossible to find the correct txg num to go to... So I tried to get my data out instead and then recreate correctly the pool.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
The single disk had on label on it (since it was empty) so initialy I tried to create a label on it by using glabel and gpart to create the appropriate gpt label and partitions.
You only need gpart here.

All the vdev fields can be found from the output of zdb on the RAIDZ disks apart from the guid of the single disk. The latter was found, based on the assumption that the uberblocks hold a running sum of all the vdevs in the pool (guid_sum field), by substracting from the guid_sum the sum of all the guids from the zdb output on RAIDZ disks without the single disk.
Interesting. Not having needed to recreate a label, I have never tried figuring this out. I think I will look at my 2 x mirrored pool later.

I used a modified version of the SystemRescueCD ;) that could also access ZFS partitions and I hexedited the disk (beware that the uberblock part is little endian formatted).
Assuming ZFS v28 or lower pool, mfsBSD would also work for this.

P.S. I tried to rollback using zpool import -T but as you said it was rather impossible to find the correct txg num to go to... So I tried to get my data out instead and then recreate correctly the pool.
You would want to copy the data out first anyway. I was curious about a zpool import -T now that you "recreated" the missing vdev or did you try that already?



I have noticed that rollbacks are practically impossible for systems that are mounted. One system with no jails, no sharing services running, etc would still do 1 transaction every 6-10 seconds.
Recovery should be done in single user mode and ideally with the pool imported read-only. However, you're right in that I shouldn't assume people are doing this. Also, when my pool is idle then it's idle.

ZFS does up to 127 transactions that can be rolled back(if my understanding is correct).
It's zero based so actually 128 uberblocks each 1K apiece. Of course it's much less with an ashift=12 pool.
 

jaggel

Cadet
Joined
Aug 13, 2012
Messages
7
You would want to copy the data out first anyway. I was curious about a zpool import -T now that you "recreated" the missing vdev or did you try that already?

No I haven't tried that... In a matter of fact I didn't even think of trying it but just out of curiosity I will give it a try and get back with results... :)
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
All the vdev fields can be found from the output of zdb on the RAIDZ disks apart from the guid of the single disk. The latter was found, based on the assumption that the uberblocks hold a running sum of all the vdevs in the pool (guid_sum field), by substracting from the guid_sum the sum of all the guids from the zdb output on RAIDZ disks without the single disk.
Interesting. Not having needed to recreate a label, I have never tried figuring this out. I think I will look at my 2 x mirrored pool later.
I took a glance at this earlier and my sum did not match. I guess I added wrong or was looking at the wrong thing.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Recovery should be done in single user mode and ideally with the pool imported read-only. However, you're right in that I shouldn't assume people are doing this. Also, when my pool is idle then it's idle.

On the system I was looking at, it was idle to. The writes/reads were in 4k increments, which pretty much implied that it was making no actual data reads or writes since the transactions can't be less than 4k in size. I was a little confused by this, but I didn't really look into it further. Not to mention that many people leave atime enabled, which would update the time for "last accessed"which would count as transactions.

Not to mention, even read its mounted as readonly if ZFS finds an error and could correct it with parity data, it will actually correct it. The only thing that seems to actually be readonly is anything that the user would do. At least some internal components of ZFS seem to still write to the pool for corrections. Of course, if things are already borked you might not want it to correct it. In one situation readonly didn't save me from someone that had ada0p2 as part of a mirrored vdev, but then added ada0 as a stripe via the CLI. We learned the hard way that readonly will actually try to "fix" things ZFS finds broken, but in our case fixing anything would make things much worse because the same physical sectors were used for 2 different vdevs in the same zpool.

Overall, I think unless you realize 2 seconds ago that something you did was bad and immediately unmount your pool transaction rollback seems unlikely to have a chance of being able to be performed. I have tried to do rollbacks before and never had success. But I haven't tried to prove that rollbacks are pretty much impossible for FreeNAS, only that I couldn't get them to work.
 

jaggel

Cadet
Joined
Aug 13, 2012
Messages
7
I took a glance at this earlier and my sum did not match. I guess I added wrong or was looking at the wrong thing.

Yeap... That was exactly what I got the first time... No match... But after a while I figured it out... The guid sum has all leaf devices (files + disks) as the documentation implies but it also sums the pool guid and every internal vdev guide (mirror, raidz etc)...

And finally keep in mind that the sum is being stored in uint64 and is being manipulated by c so the sum will cut any digits over 64bits...
 
Status
Not open for further replies.
Top