Pool TruNAS state is OFFLINE

jcknox

Cadet
Joined
Sep 9, 2022
Messages
5
Not sure what happened but my pool went offline. I'm a bit of a newbie on TruNAS so any help would be greatly appreciated. This is what I am seeing in the UI. I didn't want to start messing with anyting until I asked here first for fear of losing all my data. Anyone have any ideas?

1662774989269.png

1662775002588.png

1662775017737.png
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
From a shell, what does zpool import show? Also, please provide details of your hardware, per the Forum Rules.
 

jcknox

Cadet
Joined
Sep 9, 2022
Messages
5
Thanks. This is what I see when running zpool import.

1662776330080.png


My hardware is as follows:

Running TrueNAS as a VM in ESXI Enviroment

1662776611204.png


1662776554354.png
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
How are you providing drives to the TrueNAS VM?


If you're doing raw device mapping, it's known to result in data loss like this. The only tried and true way of doing this is passing through a PCI HBA with physical disks attached.

Do you have backups? Because recovering from this will involve expensive tools, like Klennet ZFS Recovery, if you don't.
 

jcknox

Cadet
Joined
Sep 9, 2022
Messages
5
I'm passing through a PCI HBA with physical disks attached. I don't have any backups.

I saw an article where someone suggested the following but said it may result in some data loss if it works. Still better than losing everything. Thought I might try it; just not sure how to determine which disk to run it on.

zpool import -fF -R /mnt Vol0
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
What model of HBA, and is it running in IT mode? You may have a RAID controller running in JBOD mode instead of a real HBA.


The syntax in your case is zpool import -fF -R /mnt TruNAS. You use the pool name, not the individual disks. However, I don't think this will work, as pool metadata is corrupt.
 

jcknox

Cadet
Joined
Sep 9, 2022
Messages
5
I'm running a Perc H700 on an old PowerEdge 710. Don't think the Perc can be run in IT mode. It's been working fine for a few years, but we recently moved and sadly had a power outage before I had the UPS hooked up.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
Yes, unfortunately, the H700 is a RAID controller. I think your only option is to try Klennet, if the zpool import -fF -R /mnt TruNAS fails. You've been running at risk for too long, and Murphy finally had his say. Why were you even running your server without a UPS in the first place? Why were you running critical data without backups?
 

jcknox

Cadet
Joined
Sep 9, 2022
Messages
5
Guess I was living a bit dangerously. Only had it off the UPS for a few days but I see Murphy's Law is still in full effect. I'll look into Klennet. Maybe a few other things first. If you think of anything else, let me know. Thanks for your help.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,599
Let us be exceptionally clear, ZFS was specifically designed by Sun Microsystems to survive any and all power failures without data loss. You may loose data in flight, but that would apply to any file system.

The problem here appears to be a hardware RAID controller, which likely cached and or re-ordered writes, independent of what ZFS needs. It worked fine as long as a power failure did not occur at the wrong time.

ZFS' nature of copy on write means that it writes the updates to free space, then at the very end, updates the Uber blocks to reference the new update. If the Uber blocks get written first, (out of order writes), and a power failure occurs such that the underlying data & metadata did not get written, (or written incompletely), then you have corruption in that tree.

It may be possible to go back 1 or more transaction trees to recover the data. However, this is risky, and while I know the ability exists, I have no skill in how to make that happen.


One last note about ZFS surviving any and all power failures. You can have hardware failures that occur because of the power failure that will take out a ZFS pool. Like loosing 2 disks in a RAID-Z1 or a 2 way Mirror pool. It is just that ZFS' on disk structure and methodology of how it writes to the storage, will survive unexpected interruptions.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
It may be possible to go back 1 or more transaction trees to recover the data. However, this is risky, and while I know the ability exists, I have no skill in how to make that happen.

Obligatory "Here Be Dragons" warning.

Adding the -X parameter to the "fF" will do the next step of aggressive txg discards, ie: zpool import -fFX -R /mnt TruNAS

Failing that, we'll have to go hunting for valid uberblocks with zdb:

zdb -e -ul /dev/gptid/some-gptid-of-a-pool-device

You'll get output looking like this, repeating several times:

Code:
        Uberblock[0]
        magic = 0000000000bab10c
        version = 5000
        txg = 23227136
        guid_sum = 14295331914005933530
        timestamp = 1580580010 UTC = Sat Feb  1 13:00:10 2020
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
            labels = 0 1 2 3 


Find the most recent txg number, and try to import your pool read-only passing that value to the -T parameter:

zpool import -c /data/zfs/zpool.cache -o readonly=on -T <txgid> yourpoolname

Look for your data.

You may need to repeat the process with progressively lower txg values. Hopefully one of them is in a consistent state and gives you data.
 
Top