Removed a device, now my pool stopped working due to "corruption"

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Is a stripe pool supposed to not work when a drive is not present?
Yes, that's the design of a striped pool--when any device fails, the pool is toast.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I only put it into my computer with windows but didn't format it or anything.
Windows likes to put files like Thumbs.db and sometimes folders like System Volume Information on anything it has a write access to the second it sees it. I don't know if this is what corrupted it, but it could possibly be it.

But OP, you are a brave soul for doing a "joke" on data that you would weep for if you lose it. I never get the rationale of people that do that on non-experimental data.
 
Last edited:

Hexanilix

Dabbler
Joined
Oct 1, 2022
Messages
34
How is a pool supposed to work if some portions of its data are not available from anywhere?
Well you tell me, cuz I tried this with a mother test pool, ripped out the usb stick from it, but the pool was fine, except the fact that it didn't have a drive. So I have no idea wth is happening here
 

Hexanilix

Dabbler
Joined
Oct 1, 2022
Messages
34
Windows likes to put files like Thumbs.db and sometimes folders like System Volume Information on anything it has a write access to the second it sees it. I don't know if this is what corrupted it, but it could possibly be
Well it would have to do this with only sd card cuz it didn't do anything to my usb stick
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Well you tell me, cuz I tried this with a mother test pool, ripped out the usb stick from it, but the pool was fine, except the fact that it didn't have a drive. So I have no idea wth is happening here
Your other pool probably has a redundancy and not just a simple striped pool with no redundancy like this one.

Also, I don't know why anyone would ever make a ZFS pool with a bunch of flash drives. They die so fast in any kind of write heavy environment, which a ZFS pool is. It could very well be your SD card decided to belly up after you removed it and couldn't power cycle back on.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So I have no idea wth is happening here

You had no redundancy. You forcibly removed a device. ZFS says that there is insufficient replica data to sustain the pool. Bang, dead. That's what is happening here.
 

Sparkey

Dabbler
Joined
Nov 1, 2021
Messages
36
Due to my limited knowledge I make it a point to never monkey with a functioning TrueNAS box. Especially with one holding valuable data. Having said this I do occasionally shoot myself in the foot as you have done. Good luck. Hope you can sort this out.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Due to my limited knowledge I make it a point to never monkey with a functioning TrueNAS box. Especially with one holding valuable data.

Heck, I have lots of knowledge and I dread monkeying with a functioning TrueNAS box. I am always painfully aware of that slim chance that the valuable data might turn into a pile of gibberish bits. ZFS isn't to be trifled with.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Really, most people shouldn't really have to toy with their configurations after initial setup. If it was done properly, it should be one of those setup once and forget type of thing.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I am speechless.

Someone purposefully experiments on a supposedly valuable data pool. Then when they have problems, they don't have a full backup or back out plan.

And it's not TrueNAS nor ZFS' fault.
 

Hexanilix

Dabbler
Joined
Oct 1, 2022
Messages
34
Look I already now I'm dum af I could have just done that on a separate pool but I didn't.Cuz now I really curious
I am speechless.

Someone purposefully experiments on a supposedly valuable data pool. Then when they have problems, they don't have a full backup or back out plan.

And it's not TrueNAS nor ZFS' fault.
Aren't yes, ik. I already know that now way to well. I could have don that on a separate pool an nothing would have happend but look at me now
 

Hexanilix

Dabbler
Joined
Oct 1, 2022
Messages
34
Although now let's leave my original problem for now. I'm kind of confused at how truenas handles broken drives. If the pool can boot up without a drive (which I tested it does) why won't it boot up when a drive is for ex. Like here corrupt? I mean I'd does work just fine without it and I recreated my original scenario but it was wierd how it just didn't care if the drive was there or not. It did show me that the pool is unhealthy but I could do everything I would normally like read, write and delete files even though it did include the extra drive space into the total space. The pool without the usb stick was 899 GB and with it its 904 GB and it showed that 904 even when the stick was unplugged. So what's going on here?
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Although now let's leave my original problem for now. I'm kind of confused at how truenas handles broken drives. If the pool can boot up without a drive (which I tested it does) why won't it boot up when a drive is for ex. Like here corrupt? I mean I'd does work just fine without it and I recreated my original scenario but it was wierd how it just didn't care if the drive was there or not. It did show me that the pool is unhealthy but I could do everything I would normally like read, write and delete files even though it did include the extra drive space into the total space. The pool without the usb stick was 899 GB and with it its 904 GB and it showed that 904 even when the stick was unplugged. So what's going on here?
I'm guessing your stick was an 8 GB stick?
If so, then here's what I think happened. After overhead, it comes out to 5 GB and you striped it across, so you now have gained an extra 5 GB (904 GB).

Here's the thing, once you stripe a pool, it stays striped. It's an irreversible operation. So when you decided to "unplug" it. TrueNAS is just saying "Listen, this pool used to be 904 GB because there was another drive that I now can no longer find so I'm refusing to mount your drive". So it stays 904, you just can't mount it.
 

Sparkey

Dabbler
Joined
Nov 1, 2021
Messages
36
Using a USB stick as part of a pool seems kinda silly to me but what do I know.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I'm kind of confused at how truenas handles broken drives. If the pool can boot up without a drive (which I tested it does) why won't it boot up when a drive is for ex. Like here corrupt?

This is really a ZFS question, not a TrueNAS question.

ZFS handles broken drives by looking for redundancy elsewhere. In a mirror, that means the other paired disk. In RAIDZ, that means recomputing the contents of the broken drive by using the parity to back-compute what was supposed to be there.

In the very special case where no redundancy is available, ZFS doesn't know what to do. When all redundancy sources for a block are permanently lost, the file that they are part of becomes permanently damaged. If corrupt data gets written into a pool, it can be very difficult to expunge that as well.

ZFS pools do not have a "fsck" or "chkdsk" type command. This makes sense, if you think about it. How would you fsck a pool that was 5 petabytes large? It'd take forever, and the memory required to track state would be crazy.

ZFS instead relies on the administrator to have provided for appropriate redundancy during the pool design stage. With the redundancy, ZFS will attempt to correct the error and then move on.
 

Hexanilix

Dabbler
Joined
Oct 1, 2022
Messages
34
This is really a ZFS question, not a TrueNAS question.

ZFS handles broken drives by looking for redundancy elsewhere. In a mirror, that means the other paired disk. In RAIDZ, that means recomputing the contents of the broken drive by using the parity to back-compute what was supposed to be there.

In the very special case where no redundancy is available, ZFS doesn't know what to do. When all redundancy sources for a block are permanently lost, the file that they are part of becomes permanently damaged. If corrupt data gets written into a pool, it can be very difficult to expunge that as well.

ZFS pools do not have a "fsck" or "chkdsk" type command. This makes sense, if you think about it. How would you fsck a pool that was 5 petabytes large? It'd take forever, and the memory required to track state would be crazy.

ZFS instead relies on the administrator to have provided for appropriate redundancy during the pool design stage. With the redundancy, ZFS will attempt to correct the error and then move on.
Ok thank you for that this does help me a lot in terms of how ZFS works.
Now sorry to be such a burden again but back to the original problem. Sooo... what now? What should I do? What steps should I tak to recover my pool? (All options without loosing any data from the original drive are welcome)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Sooo... what now?
If the SD card does not work (which seems somewhat unlikely, but Murphy's Law and all that...), your best hope is to import the pool read-only, with whatever special incantation is needed to import a pool with missing vdevs. Consider all data from after the SD card was added as lost, but old data is likely to be recoverable - if you can get this to work.
I know it should be possible, but I've never actually seen it done.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@Hexanilix - Sorry I was harsh, this just caught me so off guard.


@Ericloewe may be right. I saw something a few years ago on a new feature request / in progress for importing a ZFS pool that has missing vDevs. All data that was not affected would be recoverable without errors. That said, I have not followed that feature, nor even clear if it was implemented. I can't tell you how to see if the feature is implemented, nor how to use it if implemented.


@Hexanilix - One other note that does not seem to have come up. ZFS stores all non-data with redundancy, even without vDev redundancy. ZFS takes the approach that directory entries, free space lists and such is more important than data, because a lost directory block could take out an entire file. Or even another entire directory. Thus, standard metadata has 2 copies, with critical metadata having 3 copies. This does not add much extra space, but does consume more than other file systems.

So in the case of a single disk, data has 1 copy, and the data's directory entries have 2 copies.

In the case of a 2 disk Mirror, data has 2 copies, and the data's directory entries have 4 copies. This allows a disk to fail, yet still maintain the more important metadata copies.


It is possible to adjust some of these features, but in general the defaults are good as they are. Occasionally someone with a single disk wants some DATA level redundancy. This can be done, either pool level, or dataset level by adjusting "copies=1", the default, to "copies=2". Of course, any data would take up twice as much space, (after any compression, if compression is used).
 
Top