Pool disappeared?!? Help!

Art

Dabbler
Joined
Dec 30, 2015
Messages
22
Version: TrueNAS-12.0-U8.1

A few days ago I had a hard drive failure in one of my pools. I have hot spares so things still working as normal and will take care of the failed drive at a later date.

However I noticed today I had 2 (out of 5) vdevs randomly become unavailable from this pool (the degraded vdev was not one of the 2). Which then made the pool unavailable. I rebooted the server and the pool came back up normally.

It happened a second time later in the day so I attempted the same reboot "fix" and things were fine again.

After this I was doing some configuration maintenance with TrueNAS server (mainly SSL certs) and rebooted once more. However it did not restart normally and I had to do a hard shutdown and start again.

Upon start the pool was no longer detectable (`zpool status`) and the UI had an option to disconnect/export the pool without destroying the data and configuration.

I disconnected the pool in thinking I could re-import the pool. This however did not work, the existing pool was not found and I tried to force the import (`zpool import -D -f data`) but still was not found.

I can see all the disks attached, but not sure why I ran into this issue. Did my force shutdown cause the pool not to shut down properly and become corrupt?

How can I recover my lost pool?

Thanks in advance.

Art
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
ZFS was specifically designed to survive hard / graceless shutdowns / power downs. This will NOT damage pre-existing data. Only data in flight could be lost on such a shutdown / power down.

That said, there ARE causes that can make ZFS loose it's pools. Hardware RAID controllers is one, generally the biggest cause. Virtualizing TrueNAS can be a problem if not implemented with HBA pass through. Western Digital SMR disks is another, (aka WD Red line, though not the larger ones, check docs). Last, power supply problems can take out disks.

Please list your hardware configuration, especially your disk host controller, (aka was it Intel motherboard SATA, LSI HBA set to IT mode, etc...). And also your disk manufacturer & models. Plus, if you virtualized TrueNAS.
 

Art

Dabbler
Joined
Dec 30, 2015
Messages
22
I updated my signature with my current hardware.. Hope this is enough info for you.

I've been running with my current HBA for a little over 3 years now without issue. I can't remember if it was set to IT mode or not.

I will take a closer look at the HBA tonight, but have any other items I should look at closer?

Thanks,

Art

Update -- here is the status of my "lost pool":


Code:
$ zpool import
   pool: data
     id: 9546008639022548689
  state: UNAVAIL
status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-3C
 config:

    data                                            UNAVAIL  insufficient replicas
      raidz2-0                                      ONLINE
        gptid/4c7cb250-e6ad-11ea-9068-0cc47a7b6a5a  ONLINE
        gptid/48519975-5dd8-11ec-b1e5-0cc47a7b6a5a  ONLINE
        gptid/bbb66329-5dd8-11ec-b1e5-0cc47a7b6a5a  ONLINE
        gptid/067bc6af-1cab-11e9-83c1-0cc47a7b6a5a  ONLINE
        gptid/4cf06948-91f5-11eb-9f95-0cc47a7b6a5a  ONLINE
        gptid/fa1791c5-5dd8-11ec-b1e5-0cc47a7b6a5a  ONLINE
        gptid/18d7c1c9-d22a-11e5-81f2-0cc47a7b6672  ONLINE
        gptid/3c7f55eb-5dd9-11ec-b1e5-0cc47a7b6a5a  ONLINE
      raidz2-1                                      UNAVAIL  insufficient replicas
        gptid/e27cc72a-3b5f-11e9-a164-0cc47a7b6a5a  UNAVAIL  cannot open
        gptid/e4f8e049-3b5f-11e9-a164-0cc47a7b6a5a  UNAVAIL  cannot open
        gptid/682b4b58-6b01-11eb-9da7-0cc47a7b6a5a  UNAVAIL  cannot open
        gptid/ec88bfbb-3b5f-11e9-a164-0cc47a7b6a5a  UNAVAIL  cannot open
        gptid/f40cff0f-3b5f-11e9-a164-0cc47a7b6a5a  UNAVAIL  cannot open
        gptid/e032a3ca-6b01-11eb-9da7-0cc47a7b6a5a  UNAVAIL  cannot open
      raidz2-2                                      UNAVAIL  insufficient replicas
        gptid/49c97f84-2f08-11ea-9d05-0cc47a7b6a5a  UNAVAIL  cannot open
        gptid/4c265f06-2f08-11ea-9d05-0cc47a7b6a5a  UNAVAIL  cannot open
        gptid/4e90d6ee-2f08-11ea-9d05-0cc47a7b6a5a  UNAVAIL  cannot open
        gptid/50f675f2-2f08-11ea-9d05-0cc47a7b6a5a  UNAVAIL  cannot open
        gptid/5358307a-2f08-11ea-9d05-0cc47a7b6a5a  UNAVAIL  cannot open
        gptid/55bb4f54-2f08-11ea-9d05-0cc47a7b6a5a  UNAVAIL  cannot open
      raidz2-3                                      ONLINE
        gptid/72de57ab-3d2b-11eb-ae9b-0cc47a7b6a5a  ONLINE
        gptid/74a67f1b-3d2b-11eb-ae9b-0cc47a7b6a5a  ONLINE
        gptid/750a5d88-3d2b-11eb-ae9b-0cc47a7b6a5a  ONLINE
        gptid/773c835f-3d2b-11eb-ae9b-0cc47a7b6a5a  ONLINE
        gptid/76e50c94-3d2b-11eb-ae9b-0cc47a7b6a5a  ONLINE
        gptid/7784f838-3d2b-11eb-ae9b-0cc47a7b6a5a  ONLINE
      raidz2-4                                      ONLINE
        gptid/4c9807af-618c-11ec-8335-0cc47a7b6a5a  ONLINE
        gptid/4dc31b11-618c-11ec-8335-0cc47a7b6a5a  ONLINE
        gptid/4e8870f9-618c-11ec-8335-0cc47a7b6a5a  ONLINE
        gptid/4ea3bd44-618c-11ec-8335-0cc47a7b6a5a  ONLINE
        gptid/4db43c83-618c-11ec-8335-0cc47a7b6a5a  ONLINE
        gptid/4e961c4e-618c-11ec-8335-0cc47a7b6a5a  ONLINE
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Hmm. This looks like a backplane failure, since 2 stripes of your RAIDZ2 are showing unavailable. It's very unlikely for all those adjacent disks to fail at the same time. Either power distribution failed to those banks, or your HBA or backplane no longer operates those banks.
 

Art

Dabbler
Joined
Dec 30, 2015
Messages
22
Hmm. This looks like a backplane failure, since 2 stripes of your RAIDZ2 are showing unavailable. It's very unlikely for all those adjacent disks to fail at the same time. Either power distribution failed to those banks, or your HBA or backplane no longer operates those banks.
I think you are right, it is an issue with my JBOD backplane. It's almost 2 years old now so that is unfortunate.

But after closer inspection, I believe my backplane ports 0-14 do not work. During power-on/boot all but those ports light up for those disks, so I can only assume it's an issue with the backplane and not the HBA controller itself.

I tried various ports of the HBA and the connection to the backplane but to no avail.

I will see if SuperMicro support can further assist.
 
Top