Pool TruNAS state is OFFLINE

jcknox · Sep 9, 2022

Not sure what happened but my pool went offline. I'm a bit of a newbie on TruNAS so any help would be greatly appreciated. This is what I am seeing in the UI. I didn't want to start messing with anyting until I asked here first for fear of losing all my data. Anyone have any ideas?

Samuel Tai · Sep 9, 2022

From a shell, what does zpool import show? Also, please provide details of your hardware, per the Forum Rules.

jcknox · Sep 9, 2022

Thanks. This is what I see when running zpool import.

My hardware is as follows:

Running TrueNAS as a VM in ESXI Enviroment

Samuel Tai · Sep 9, 2022

How are you providing drives to the TrueNAS VM?

"Absolutely must virtualize FreeNAS!" ... a guide to not completely losing your data.

[---- 2018/02/27: This is still as relevant as ever. As PCIe-Passthru has matured, fewer problems are reported. I've updated some specific things known to be problematic ----] [---- 2014/12/24: Note, there is another post discussing how to deploy a small FreeNAS VM instance for basic file...

www.truenas.com

If you're doing raw device mapping, it's known to result in data loss like this. The only tried and true way of doing this is passing through a PCI HBA with physical disks attached.

Do you have backups? Because recovering from this will involve expensive tools, like Klennet ZFS Recovery, if you don't.

jcknox · Sep 9, 2022

I'm passing through a PCI HBA with physical disks attached. I don't have any backups.

I saw an article where someone suggested the following but said it may result in some data loss if it works. Still better than losing everything. Thought I might try it; just not sure how to determine which disk to run it on.

zpool import -fF -R /mnt Vol0

Samuel Tai · Sep 9, 2022

What model of HBA, and is it running in IT mode? You may have a RAID controller running in JBOD mode instead of a real HBA.

What's all the noise about HBA's, and why can't I use a RAID controller?

This is relevant to FreeNAS and TrueNAS CORE. Some parts of it might also be relevant to Scale, but I don't really know how reliable the Linux drivers are. 1) An HBA is a Host Bus Adapter. This is a controller that allows SAS and SATA devices to be attached to, and communicate directly with...

www.truenas.com

The syntax in your case is zpool import -fF -R /mnt TruNAS. You use the pool name, not the individual disks. However, I don't think this will work, as pool metadata is corrupt.

jcknox · Sep 9, 2022

I'm running a Perc H700 on an old PowerEdge 710. Don't think the Perc can be run in IT mode. It's been working fine for a few years, but we recently moved and sadly had a power outage before I had the UPS hooked up.

Samuel Tai · Sep 9, 2022

Yes, unfortunately, the H700 is a RAID controller. I think your only option is to try Klennet, if the zpool import -fF -R /mnt TruNAS fails. You've been running at risk for too long, and Murphy finally had his say. Why were you even running your server without a UPS in the first place? Why were you running critical data without backups?

jcknox · Sep 9, 2022

Guess I was living a bit dangerously. Only had it off the UPS for a few days but I see Murphy's Law is still in full effect. I'll look into Klennet. Maybe a few other things first. If you think of anything else, let me know. Thanks for your help.

Arwen · Sep 10, 2022

Let us be exceptionally clear, ZFS was specifically designed by Sun Microsystems to survive any and all power failures without data loss. You may loose data in flight, but that would apply to any file system.

The problem here appears to be a hardware RAID controller, which likely cached and or re-ordered writes, independent of what ZFS needs. It worked fine as long as a power failure did not occur at the wrong time.

ZFS' nature of copy on write means that it writes the updates to free space, then at the very end, updates the Uber blocks to reference the new update. If the Uber blocks get written first, (out of order writes), and a power failure occurs such that the underlying data & metadata did not get written, (or written incompletely), then you have corruption in that tree.

It may be possible to go back 1 or more transaction trees to recover the data. However, this is risky, and while I know the ability exists, I have no skill in how to make that happen.

One last note about ZFS surviving any and all power failures. You can have hardware failures that occur because of the power failure that will take out a ZFS pool. Like loosing 2 disks in a RAID-Z1 or a 2 way Mirror pool. It is just that ZFS' on disk structure and methodology of how it writes to the storage, will survive unexpected interruptions.

HoneyBadger · Sep 12, 2022

Arwen said:
It may be possible to go back 1 or more transaction trees to recover the data. However, this is risky, and while I know the ability exists, I have no skill in how to make that happen.

Obligatory "Here Be Dragons" warning.

Adding the -X parameter to the "fF" will do the next step of aggressive txg discards, ie: zpool import -fFX -R /mnt TruNAS

Failing that, we'll have to go hunting for valid uberblocks with zdb:

zdb -e -ul /dev/gptid/some-gptid-of-a-pool-device

You'll get output looking like this, repeating several times:

Code:

        Uberblock[0]
        magic = 0000000000bab10c
        version = 5000
        txg = 23227136
        guid_sum = 14295331914005933530
        timestamp = 1580580010 UTC = Sat Feb  1 13:00:10 2020
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
            labels = 0 1 2 3

Find the most recent txg number, and try to import your pool read-only passing that value to the -T parameter:

zpool import -c /data/zfs/zpool.cache -o readonly=on -T <txgid> yourpoolname

Look for your data.

You may need to repeat the process with progressively lower txg values. Hopefully one of them is in a consistent state and gives you data.

Important Announcement for the TrueNAS Community.

Pool TruNAS state is OFFLINE

jcknox

Cadet

Samuel Tai

Never underestimate your own stupidity

jcknox

Cadet

Samuel Tai

Never underestimate your own stupidity

"Absolutely must virtualize FreeNAS!" ... a guide to not completely losing your data.

jcknox

Cadet

Samuel Tai

Never underestimate your own stupidity

What's all the noise about HBA's, and why can't I use a RAID controller?

jcknox

Cadet

Samuel Tai

Never underestimate your own stupidity

jcknox

Cadet

Arwen

MVP

HoneyBadger

actually does care

Similar threads

Important Announcement for the TrueNAS Community.

Pool TruNAS state is OFFLINE

Cadet

Never underestimate your own stupidity

Cadet

Never underestimate your own stupidity

Cadet

Never underestimate your own stupidity

Cadet

Never underestimate your own stupidity

Cadet

MVP

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Pool TruNAS state is OFFLINE"

Similar threads