Restoring vdev without resilvering

devinkb

Cadet
Joined
Jul 25, 2023
Messages
3
Hello,

Quick backstory:
I've got 8 10tb drives in a pool split into two vdevs using RaidZ1 (was trying to hit a 10gbit transfer speed - successful btw!). One of the drives had a couple minor errors months ago, and I thought, what the heck, I have a bunch of extra drives, I'll replace it. Let's call the failed disk "disk 3 of vdev1". I replaced it with "disk 3 replacement of vdev1."

Of course, while I was resilvering, another disk fails. Completely unrecognizable, makes some terrible noise on startup. Let's call this "disk 4 of vdev1." Resilver has a bajillion errors, but completes, and I can see my file system, but of course, most of the data is corrupt.

Current situation:

disk 3 of vdev1 hadn't fully failed. It just had a smart error over a month ago, but all of my data was still working fine. So in my mind, I'm thinking that disk1, disk2, and disk3 of vdev1 should still have a valid copy of all my data on it. The problem, however, is that disk3 is no longer part of the vdev - I replaced it with "disk 3 replacement of vdev1" and I can't just swap disk3 of vdev1 back in without resilvering via the GUI.

My question:
I'm wondering if there's any way I can force my vdev to be disks 1, 2, 3 without resilvering.

Thanks!
Devin

P.S. - I've learned my lesson, I'll never do RaidZ1 again!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It may be technically possible by rolling your pool back to a transaction before the disk in question was removed, but you will lose data for sure and it may not work anyway.

At this point, your pool is toast anyway, so maybe worth a shot.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It's likely you'll need to read up on the -F and -TX options... you may want to run with -F -nTX first to see what might happen before removing the n to actually try it.

 

devinkb

Cadet
Joined
Jul 25, 2023
Messages
3
Thanks for the advice and pointing me to the specific thing I need to read up on! I will give that a shot.
 

devinkb

Cadet
Joined
Jul 25, 2023
Messages
3
I spent about 6 hours on this, and eventually gave up. It just didn't want to roll back to the specific txg - kept giving an error about a missing drive and I couldn't force it to bring the pool back. I'm pretty confident I could have recovered some data with a data recovery tool like recoverme, because some of my data I could still access (if the files were small enough to fit in one block on a single drive). In the end I gave up though, because I had some other backups of my important data on another pool and another file server.

My steps were essentially:
  1. Look up the txg history with zpool history -i poolname
  2. offline the pool (did this in the gui)
  3. zpool import to see that my pool is recognized there.
  4. try to import the pool to a past txg state with zpool import -T txgnumber poolname. I tried many other modifier combos (-f -F -m -X, etc.) and none of them did the trick.
Along the way I also looked up all the different hard drive details with "gpart list" which might be helpful for somebody in the future looking for how to map gptids back to a physical drive.

What may have helped had I continued:
While zpool import worked fine on the current degraded pool, it seemed that it didn't want to import a degraded pool from the past with a missing disk from the past. It may have just been too much. So my idea, had I continued, would have been to make it think that the entire pool is online by plugging in another blank drive into the same physical location (which seems to trick the offline pool into thinking it is fine), and then run the import -T txgnumber poolname again, while also having the old disks plugged in to the system. Might not have worked, but just maybe...

Conclusion
Anyway - as I said in the intro, the data is backed up in a few other locations, so I'm just going to rebuild everything... with RaidZ2 this time ;).

Thanks for the guidance sretalla - I learned a lot!
 
Top