SOLVED ZFS pool corrupted

renangv · Jan 25, 2022

Hi,
I am running proxmox (7.1-10) and a number of VMs. I accidentally forced to import from proxmox a ZFS pool that was being used by a VM (truenas 12.0-U7) with:

zpool import -f Pool-1

The command returned blank, and I thought I was ok.
I then rebooted Proxmox, and the pool was not there. I did then:

Code:

zpool import

pool: Pool-1
     id: 9292035031829486490
  state: FAULTED
status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
    The pool may be active on another system, but can be imported using
    the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-72
 config:

    Pool-1        FAULTED  corrupted data
      mirror-0    FAULTED  corrupted data
        sdd2      ONLINE
        sdb2      ONLINE
      indirect-1  ONLINE
      indirect-2  ONLINE
      indirect-3  ONLINE

At that time, I had not realised the mistake I was doing and tried to import once again:

Code:

zpool import -f Pool-1
internal error: cannot import 'Pool-1': Invalid exchange
Aborted

By then, I realised I was importing the wrong pool. I went back to Truenas, and the pool was offline. I checked whether Truenas could see the pool:

Code:

truenas# zpool import

   pool: Pool-1
     id: 9292035031829486490
  state: FAULTED
status: The pool was last accessed by another system.
 action: The pool cannot be imported due to damaged devices or data.
    The pool may be active on another system, but can be imported using
    the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
 config:

    Pool-1                                          FAULTED  corrupted data
      mirror-0                                      FAULTED  corrupted data
        gptid/f475cf25-9d3a-11eb-a1a4-0cc47a30748c  ONLINE
        gptid/f485d5c5-9d3a-11eb-a1a4-0cc47a30748c  ONLINE
      indirect-1                                    ONLINE
      indirect-2                                    ONLINE
      indirect-3                                    ONLINE

and tried first:

Code:

truenas# zpool import Pool-1
cannot import 'Pool-1': pool was previously in use from another system.
Last accessed by proxmox (hostid=2e5301d3) at Tue Jan 25 16:21:58 2022
The pool can be imported, use 'zpool import -f' to import the pool.

and then:

Code:

truenas# zpool import -f Pool-1
internal error: cannot import 'Pool-1': Integrity check failed
Abort trap (core dumped)

So, now if I try to force an import from proxmox I get "Invalid exchange" and if I try to force from truenas I get "Integrity check failed".
Does anyone have an idea how to solve this?

Thanks,
RG

Samuel Tai · Jan 25, 2022

Your only recourse is to destroy that pool and reconstitute it from backup, sorry. ZFS makes no allowances for administrator error. It assumes implicitly you know what you're doing at all times.

Ericloewe · Jan 25, 2022

Well, before going that far, it’s worth a shot to try and rollback on import, in hopes it gets you somewhere.

Samuel Tai · Jan 25, 2022

@Ericloewe, would rollback even work without a checkpoint?

winnielinnie · Jan 25, 2022

Samuel Tai said:
Your only recourse is to destroy that pool and reconstitute it from backup

Because of importing before exporting from a different server? I just can't fathom that ZFS, which boasts itself on data integrity, fault tolerance, and modernity would put all of your data at risk (really "grind it up") due to a procedural mixup.

What does this output (it's a non-committed action):

zpool import -F -n Pool-1

Samuel Tai · Jan 25, 2022

@winnielinnie, ZFS is not designed for multi-host access to the same pool. All bets are off in this scenario.

winnielinnie · Jan 25, 2022

Does that apply to physical servers too? If you shutdown a TrueNAS server (but forget to export the pool), and then later remove and plug the drives into a different server, and try to import the pool, your data is at risk for permanently being lost?

Or is it perfectly safe in such a scenario, but what makes this current issue (in the opening post) unique is the use of VMs simultaneously importing the same pool/drives?

Samuel Tai · Jan 25, 2022

No, I'm talking about simultaneous access from multiple hosts. With serial access, there's no problem, other than having to use forced imports.

What happened in this scenario is OP tried to import the same pool on the host as it was in use within a VM. This resulted in simultaneous multi-host access to the same pool, with 2 systems writing at the same time, but neither aware of the other's writes.

winnielinnie · Jan 25, 2022

Samuel Tai said:
What happened in this scenario is OP tried to import the same pool on the host as it was in use within a VM. This resulted in simultaneous multi-host access to the same pool, with 2 systems writing at the same time, but neither aware of the other's writes.

I thought ZFS would refuse to allow such an action, but then I realized the OP used "-f" which forces an import.

Well, hopefully zpool import -F -n Pool-1 gives some hope at recovery.

jgreco · Jan 25, 2022

winnielinnie said:
ZFS would refuse to allow such an action, but then I realized the OP used "-f" which forces an import.

High availability setups may in fact RELY on such actions. I don't know what TrueNAS HA does, but years ago, Nexenta implemented HA failover via a heartbeat system and an import -f on the surviving node. It is absolutely mandatory to make sure that you do not develop a split brain situation. The pool indeed can and ultimately will be shredded if you attempt to access it from multiple hosts simultaneously. ZFS is not a cluster-aware filesystem, which always struck me as an unfortunate deficiency.

Samuel Tai · Jan 25, 2022

@jgreco, isn't clustering one of the goals for Scale?

jgreco · Jan 25, 2022

Samuel Tai said:
@jgreco, isn't clustering one of the goals for Scale?

As in... what, exactly?

HA failover is where you have a "server" that has two heads, and a shared storage pool. Either head can run the storage, and if one fails, the other is supposed to transparently take over. In practice this doesn't seem to work quite like that. Variations on this theme include two heads and two pools in an active/active scenario where each head is normally responsible for half the storage. If one head fails, the other head adopts the other pool and the other node's filesharing responsibilities. This is dark magic. Also fallible magic. As far as I know, TrueNAS Enterprise has always targeted this as one of their big value-adds over the free product.

Clustering is something else. Gluster is to fileservers what ZFS is to hard drives. It's an abstraction layer, that takes a relatively solid foundation and expands it in a useful way. Consider ZFS, which is substantially more complex than FFS on a hard drive, and substantially more resource-piggy, but it lets you detect corruption, aggregate the storage of multiple HDD's, take snapshots, and do things to optimize your storage through compsci tricks and sorcery. Gluster applies a similar abstraction layer for fileservers, allowing for replication, scaling up, and a variety of other features.

Basically software has eaten the world. ZFS was early to that party, working to kill off legacy proprietary RAID controllers, but it really took the cloud computing revolution to force evolution of these new software abstractions that allow for some really cool things...

Unfortunately, Gluster is a fscking trainwreck, and that's on a good day. It's complicated to learn, complicated to set up, and easily outcomplexifies ZFS. Scale may be doing a good thing for Gluster by making an easy-to-manage system out of it all...

Ericloewe · Jan 26, 2022

Samuel Tai said:
@Ericloewe, would rollback even work without a checkpoint?

Not rollback of a checkpoint, that would need uncorrupted metadata. I mean zpool import -F, which rolls back a few TXGs.

jgreco said:
High availability setups may in fact RELY on such actions. I don't know what TrueNAS HA does, but years ago, Nexenta implemented HA failover via a heartbeat system and an import -f on the surviving node. It is absolutely mandatory to make sure that you do not develop a split brain situation. The pool indeed can and ultimately will be shredded if you attempt to access it from multiple hosts simultaneously.

There's even recent work to make this a bit more robust, but I haven't followed the details.

jgreco said:
HA failover is where you have a "server" that has two heads, and a shared storage pool. Either head can run the storage, and if one fails, the other is supposed to transparently take over. In practice this doesn't seem to work quite like that. Variations on this theme include two heads and two pools in an active/active scenario where each head is normally responsible for half the storage. If one head fails, the other head adopts the other pool and the other node's filesharing responsibilities. This is dark magic. Also fallible magic. As far as I know, TrueNAS Enterprise has always targeted this as one of their big value-adds over the free product.

I'm fairly certain TrueNAS Enterprise uses a single pool, too.

jgreco said:
do things to optimize your storage through compsci tricks and sorcery

Best description of ZFS I've heard all year.

jgreco said:
Unfortunately, Gluster is a fscking trainwreck

Best description of Gluster I've heard all year.

jgreco said:
It's complicated to learn, complicated to set up, and easily outcomplexifies ZFS

My favorite is that there doesn't seem to be an equivalent to zpool status that tells you which bricks form mirrored groups.

jgreco · Jan 26, 2022

Ericloewe said:
I'm fairly certain TrueNAS Enterprise uses a single pool, too.

Yeah, me too, but not having personally seen it, I avoid saying things I don't know to be true.

Ericloewe said:
Best description of ZFS I've heard all year.

Heh.

Ericloewe said:
Best description of Gluster I've heard all year.

The year's still young.

Ericloewe said:
My favorite is that there doesn't seem to be an equivalent to zpool status that tells you which bricks form mirrored groups.

Gluster seems to be perpetually at the ZFS-in-2008(?) era. You need to be a Gluster propellerhead and spend lots of time getting your system built and tested, and it's easy to make mistakes. This is really funny, since Gluster and ZFS are of similar age; ZFS has had GUI appliances for managing it for some time, and it is reasonably well understood.

Ericloewe · Jan 26, 2022

jgreco said:
Gluster seems to be perpetually at the ZFS-in-2008(?) era. You need to be a Gluster propellerhead and spend lots of time getting your system built and tested, and it's easy to make mistakes. This is really funny, since Gluster and ZFS are of similar age; ZFS has had GUI appliances for managing it for some time, and it is reasonably well understood.

For better or worse, they seem to be dropping some of the cruft that never worked, like RDMA. I hope it means some introspection happened and they are aware that they need to get the basics right first.

jgreco · Jan 26, 2022

On the other hand, failing to predict the future gets you crap like the block pointer rewrite fiasco.

renangv · Jan 27, 2022

Hi everyone,
I proceeded with truenas# zpool import -f -FXn Pool-1

And after 12 hours I got:

Code:

Would be able to return Pool-1 to its state as of Tue Jan 25 16:18:53 2022.
Would discard approximately 3 minutes of transactions.

I then went with zpool import -f -FX Pool-1

And my pool is now safe and sound.

Thanks for all suggestions,
RG

Ericloewe · Jan 27, 2022

Good thing that worked, but be careful next time - this was honestly a bit of a long-shot.

realizze · Feb 4, 2023

renangv said:
Hi everyone,
I proceeded with truenas# zpool import -f -FXn Pool-1

And after 12 hours I got:

Code:
Would be able to return Pool-1 to its state as of Tue Jan 25 16:18:53 2022. Would discard approximately 3 minutes of transactions.

I then went with zpool import -f -FX Pool-1

And my pool is now safe and sound.

Thanks for all suggestions,
RG

Your solution help me to save my data, and i come here to say thank you for that

Important Announcement for the TrueNAS Community.

SOLVED ZFS pool corrupted

Cadet

Never underestimate your own stupidity

Server Wrangler

Never underestimate your own stupidity

MVP

Never underestimate your own stupidity

MVP

Never underestimate your own stupidity

MVP

Resident Grinch

Never underestimate your own stupidity

Resident Grinch

Server Wrangler

Resident Grinch

Server Wrangler

Resident Grinch

Cadet

Server Wrangler

Cadet

Similar threads