HELP ZFS Pool data recovery

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Could you point me to this manual? I am not sure which one to use. I greatly appreciate your help
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80

Attachments

  • Screenshot 2023-06-17 131748.png
    Screenshot 2023-06-17 131748.png
    8.3 KB · Views: 153

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Could you point me to this manual? I am not sure which one to use. I greatly appreciate your help

Top of the page, Documentation, TrueNAS, TrueNAS CORE, Configuration Tutorials, Storage, Disk, Disk Replacement
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
but I can't offline the disk when it isn't there. I also can't get to the replace disk screen (see screenshot)

Mmm, yes, because you can't replace something in a pool that isn't there. Fascinating. You have a bootstrapping problem because you can't import the pool.

Are you able to add a virtual disk to the TrueNAS VM? It should be possible to relabel a disk to make that gptid appear, but I don't have the time to go set up a testbed to figure out the correct way to do that this afternoon.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Mmm, yes, because you can't replace something in a pool that isn't there. Fascinating. You have a bootstrapping problem because you can't import the pool.

Are you able to add a virtual disk to the TrueNAS VM? It should be possible to relabel a disk to make that gptid appear, but I don't have the time to go set up a testbed to figure out the correct way to do that this afternoon.
Yes, I can definitely add a virtual disk. does it have to be 4tb? also, what is the command to rename the disk? or is that something I have to do in proxmox?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yes, I can definitely add a virtual disk. does it have to be 4tb? also, what is the command to rename the disk? or is that something I have to do in proxmox?

I don't think it needs to be 4TB. Just a few GB probably. From the FreeBSD shell, you have to look for the new disk that got added, which may just be comparing output from "camcontrol devlist" or seeing what shows up in "dmesg" if you can do a hot add. It looks to me like it might be something in the upper ada's like ada4 or ada5.

Pretending ada5 were to show up, partitioning works along the lines of

gpart create -s gpt ada5
gpart add -t freebsd-zfs -a 1m -s 500m -l ourrescue ada5

This gets you a partitioned ada5 that should be along the lines of TrueNAS compatible. If you then look at it under "gpart list" you should see it with some made up rawuuid, and this should show up in "glabel list" output too.

Now I think what needs to happen is

glabel label gptid/11b39573-ad95-11ed-8d1c-7df9cea98351 ada5p2

or it may be ada5p1 because we didn't create two partitions... take a look in glabel list output to see what it ended up as.

At that point, I think you should be able to try to import the pool but it may freak out at the smaller disk. But I *think* it may let you replace the disk. Otherwise we have to try again with a larger disk, or someone has to come up with a more clever solution.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
I don't think it needs to be 4TB. Just a few GB probably. From the FreeBSD shell, you have to look for the new disk that got added, which may just be comparing output from "camcontrol devlist" or seeing what shows up in "dmesg" if you can do a hot add. It looks to me like it might be something in the upper ada's like ada4 or ada5.

Pretending ada5 were to show up, partitioning works along the lines of

gpart create -s gpt ada5
gpart add -t freebsd-zfs -a 1m -s 500m -l ourrescue ada5

This gets you a partitioned ada5 that should be along the lines of TrueNAS compatible. If you then look at it under "gpart list" you should see it with some made up rawuuid, and this should show up in "glabel list" output too.

Now I think what needs to happen is

glabel label gptid/11b39573-ad95-11ed-8d1c-7df9cea98351 ada5p2

or it may be ada5p1 because we didn't create two partitions... take a look in glabel list output to see what it ended up as.

At that point, I think you should be able to try to import the pool but it may freak out at the smaller disk. But I *think* it may let you replace the disk. Otherwise we have to try again with a larger disk, or someone has to come up with a more clever solution.
Maybe I am doing it wrong? But it doesn't look to have worked. zpool import -f Tank results in the same I/O error. Honestly, I am totally willing to send you a link if you want to remote-control my PC and mess with it yourself?
 

Attachments

  • Screenshot 2023-06-17 143630.png
    Screenshot 2023-06-17 143630.png
    39.9 KB · Views: 152

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
I've been up all night trying to figure this out. and you seem like you could do it in a matter of minutes.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Maybe I am doing it wrong? But it doesn't look to have worked. zpool import -f Tank results in the same I/O error. Honestly, I am totally willing to send you a link if you want to remote-control my PC and mess with it yourself?

I appreciate the faith but that doesn't serve to show other people how to resolve their problems in the future, which is a large part of my goal in documenting steps and techniques here in the forums. It's some variation on the "give a person a fish/teach a person to fish" distinction, if I just solve your problem for you and the solution stays only in my head, what good is that?

What I'm trying here is basically just stuff that is "right-ish" shaped to solve what might not be exactly right. This is computer stuff and computers like the details to be exactly right. The fact that we're still getting an UNAVAIL device suggests we got something wrong, so now that's curious enough to me that I'll see if I can figure THAT out. Just to make sure this is really the problem, please repeat our "ls" test from before over in /dev/gptid. Check

ls /dev/gptid/11b39573*

to see if it exists. In the meantime I'll run through the same steps you did. So here we have a disk da9;

Code:
root@nas4:/tmp # gpart create -s gpt da9
da9 created
root@nas4:/tmp # gpart add -t freebsd-zfs -a 1m -s 500m -l ourrescue da9
da9p1 added
root@nas4:/tmp # glabel label gptid/11b39573-ad95-11ed-8d1c-7df9cea98351 da9p1
root@nas4:/tmp # glabel list da9p1
Geom name: da9p1
Providers:
1. Name: label/gptid/11b39573-
   Mediasize: 524287488 (500M)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 1048576
   Mode: r0w0e0
   secoffset: 0
   offset: 0
   seclength: 1023999
   length: 524287488
   index: 0
Consumers:
1. Name: da9p1
   Mediasize: 524288000 (500M)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 1048576
   Mode: r0w0e0



1. Name: label/gptid/11b39573-

Oh. Well that's not what we want. Mmm. Having now read through the documentation and looked at some code, it does not appear that there is a way to set the GPTID. That sucks.

Because it means we'd have to move on to somewhat more irreversible operations that I'm less familiar with. I think that this really needs to be made into ZFS's problem to resolve. My suggestion is to try "zpool import -F Tank" (note the capital F) and see if that works. My understanding is that this is supposed to work even with missing devices, but I don't generally have that issue on any of the pools here.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Hey, @HoneyBadger ... do you have any good recovery suggestions here? This feels like there should be something obvious but I really almost never have to recover ZFS pools, so my Zfu is weak in this area.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Also if it makes any difference. last night when I was fumbling through TrueNAS trying to fix, I exported the pool hoping I could reimport it. Once that failed I thought that I was screwed because I exported it, so I reverted a save back about 3 days before this all happened. So the drive that TrueNAS lives on, and the three 4tb HDs are about 3 days difference between the two. idk if this is why we're having issues. I was so dumb and tired last night I never made a backup before I reverted. So this 3 day old version of TrueNAS is the latest I got.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Isn't this thread a duplicate?

Why post another thread?
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Isn't this thread a duplicate?

Why post another thread?
Just trying to get multiple eyes on it.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Just trying to get multiple eyes on it.
I think you have the correct eyes on it right now, if there is any hope to fix it. Also, if you start getting advice and doing steps in another thread, how are the folks in this thread going to know what is happening? For this reason I will be removing your second thread. I understand your data is important and that is why I'm taking this action, it will protect you against doing something incorrect. You may need to wait for some more good advice to occur.
 
Joined
Oct 22, 2019
Messages
3,641
I'm confused. Aside from the whole virtualization rigamaroll, why can't you just first offline and then replace (resilver) the missing drive in the RAIDZ1 vdev?

Even with only two of three drives online, shouldn't you be able to import the pool in a degraded state?

Am I wrong to think you would "offline" the XXXX9573 drive, and then "replace" it with an available drive?
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
I think you have the correct eyes on it right now, if there is any hope to fix it. Also, if you start getting advice and doing steps in another thread, how are the folks in this thread going to know what is happening? For this reason I will be removing your second thread. I understand your data is important and that is why I'm taking this action, it will protect you against doing something incorrect. You may need to wait for some more good advice to occur.
I appreciate it. I am planning to do a full write up on how I solved this. On both threads. But no worries.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
I'm confused. Aide from the whole virtualization rigamaroll, why can't you just first offline and then replace (resilver) the missing drive in the RAIDZ1 vdev?

Even with only two of three drives online, shouldn't you be able to import the pool in a degraded state?

Am I wrong to think you would "offline" the XXXX9573 drive, and then "replace" it with an available drive?
How would I go about doing that?
 
Top