Removing log partition "wipes" whole disk - special vdev partition lost forever (?)

Migsi

Dabbler
Joined
Mar 3, 2021
Messages
40
Hello community!

I just ran into a pretty serious issue on my TrueNAS Scale box. I had a pool similar to the one in the first screenshot (other box, still running TrueNAS Core), with the NVMe drives partitioned to use ~95% of their storage as special vdev and the rest of its space as slog. As the UPS attached to the these boxes had to be maintenanced, I figured I must remove the slog device, as the SSDs do not have PoP themself. So far so good, this went fine on the TrueNAS Core box where I simply removed them from within the status view. The Scale box instead seems to have "wiped" the whole SSD, eliminating the other partition which was used as special vdev (as can be seen on the second screenshot). As can be seen on the third screenshot, fdisk reports there is still a zfs_member signature on (at least one) SSD. Is there any way to recover from here or is my pool likely gone forever? I have a backup, but its over a month old. I can apply that one if there is no other way, but I'd like to try to recover somehow at least before rebuilding/overwriting that pool. I already made a full disk image using dd so I can work on that one before breaking anything further. Any help is appreciated.

(If you think this remove behavior of Scale is erronous too, I'd happily open a bug report. I just want to hear at least one other opinion on this, especially as it was a non standard pool setup. Also this happened right before my holiday, so I'd like to fix my pool first and report the issue afterwards when I have more time.)

Best regards

1653599758236.png
1653600164813.png
1653600252165.png
 
Last edited:

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Partitioning disks is not supported. If you do something like this I recommend you use the command line for all pool manipulation, too. The UI and middleware simply assumes that one device has one role.

I am pretty sure your bug report would be closed on that argument.
 

Migsi

Dabbler
Joined
Mar 3, 2021
Messages
40
Partitioning disks is not supported. If you do something like this I recommend you use the command line for all pool manipulation, too. The UI and middleware simply assumes that one device has one role.

I am pretty sure your bug report would be closed on that argument.
While I fully agree that the UI is not meant to work with such pools, it still is able to show the actual pool layout, even when using partitions. Even if the behavior I encountered is to be expected (which I doubt, as it worked fine on Core), I still wonder why I had no issues on the Core instance, as the middleware should work 1:1 the same, at least regarding zpool actions. There could at least be put some measures into the UI to prevent using its features on partitions if the outcome is to be unexpected.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
So - you wiped your special vdev?
A special vdev is pool critical - so you lost the pool (you just wiped the metadata and any "small files"). Do you have backups?

Unfortunately you are firmly in the "here be dragons" realm when manually partitioning drives like you have (which incidently I do something vaguely similar myself with a pair of Optane's that I use as SLOG for three different pools).
 

Migsi

Dabbler
Joined
Mar 3, 2021
Messages
40
So - you wiped your special vdev?
A special vdev is pool critical - so you lost the pool (you just wiped the metadata and any "small files"). Do you have backups?

Unfortunately you are firmly in the "here be dragons" realm when manually partitioning drives like you have (which incidently I do something vaguely similar myself with a pair of Optane's that I use as SLOG for three different pools).
As I wrote in the initial thread, the remove operation triggered from within the GUI somehow "wiped" the disk. I'm not sure though how far this "wipe" actually went, as I can still see one partition on the disk (which should be the special vdev one) "nvme0n1". fdisk even claims to see a zfs_member signature when accessing it, so the data should still be there. But apparently the partition table is gone missing. If its really just that I should get out of this pretty easily, but I'm not completely sure and don't know how to proceed.

I do have backups, just not super recent ones. As I can't get by that machine physically until next week anyway and I'd like to recover the data added after the last backup, I hoped for some advice on how to move on in the meantime.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
You'll probably find it just zeroed out the first 1K of the disk, so you could put back a cloned copy of the mirror's partition table with gpart backup/restore (I saw somebody posted about that recently):


Although that's not for scale, so you would have to work it out via fdisk or some other equivalent command
 

Migsi

Dabbler
Joined
Mar 3, 2021
Messages
40
You'll probably find it just zeroed out the first 1K of the disk, so you could put back a cloned copy of the mirror's partition table with gpart backup/restore (I saw somebody posted about that recently):


Although that's not for scale, so you would have to work it out via fdisk or some other equivalent command
This sounds like a recovery should be possible. I'd do it exactly this way if my mirror wasn't completely broken (I removed the log partitions from the pool at "the same time"; I just waited 5 mins before the two commands to check if anything ill would happen). Unluckily the mess up manifested only after a reboot, where it was already to late to revert anything and now I've got to work with a dd image of one of those mirrored SSDs.
In case just the first 1k of the disk got lost I might be able to restore it, as the partitions and thus the data on them should still be there. I just hope native zfs pool encryption does not get into the way...
 

Migsi

Dabbler
Joined
Mar 3, 2021
Messages
40
I tried multiple testdisk runs on the SSDs/the image of them but it didn't find anything. Is testdisk even the appropriate tool to look for zfs partitions?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Is testdisk even the appropriate tool to look for zfs partitions?
IIRC testdisk does mention support for ZFS partitions in recent versions.

More reliable would be Klennet (although it's paid software if you want to recover anything and isn't cheap... on top of requiring a Windows setup to run).
 

Migsi

Dabbler
Joined
Mar 3, 2021
Messages
40
IIRC testdisk does mention support for ZFS partitions in recent versions.

More reliable would be Klennet (although it's paid software if you want to recover anything and isn't cheap... on top of requiring a Windows setup to run).
After a few tries I decided to tear the rest of the pool down and start over with my last backup. Apparently nothing unrecoverable was lost, so everything should be okayish. Still annoying though...

I don't understand why there is professional recovery software for Windows only, especially when dealing with ZFS.
 
Top