Pool died.... now arisen again... but why/how did it die?

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Until last night, my system was running fine. No pool degradation, no SMART errors, etc.

I then upgraded TrueNAS to 12U7 from U6.1. The system came back up but the pool had a big red dot with a white X. The only option: disconnect / export. Trying to import the dead pool afterwards resulted in an alphabet soup of middleware errors that likely only benefit the customer support team at iXsystems.

Dropping into the shell, "zpool import XXX" yielded " I/O Error, destroy and recreate the pool from backup"

Looking through the UI Storage/Disks submenu, the apparent failure mode is that all three mirrored SVDEV SSDs disappeared. The eight primary HDDs were still listed. I presume my first task should be to pull the SSDs, swap them around and see what happens?

Reverting to U6.1 did not fix the issue, so it's likely more to do with the reboot than the U7 upgrade.
 
Last edited:

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
... so I hotswap the three 1.46TB SSDs around and they now *do* get registered with the system - they show up in the Storage/Disks UI window, for example. Then if I drop down into the CLI and enter "zpool import XXX", the pool imports cleanly, then "zpool status XXX" shows 0 errors, yet the pool does not show up in the UI. So I reboot again...

After reboot, importing pool via the UI into TrueNAS was no problem. Everything is fine again, pool is online, no errors.

Quite a delta from 10 minutes ago when the pool was listed as dead. Now... for the $64,000 question: Why would the SSDs go offline in the first place? ... and why would swapping them around SATA ports make them go back online again?

When I hot-swapped them, the usual multi-screenfulls of status updates started scrolling down the console screen, so clearly they became "alive" once I pulled them and reinserted them back into the SATA backplanes. On the one hand I am glad the system is back... yet I am also perplexed that 3 mirrored SSDs could simply vanish from the SATA bus without the system giving better feedback.
 
Last edited:

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
So here is my suggestion to @morganL and other folk at iXsystems: The pool config presumably contains a list of the expected drives it is supposed to connect to (by UUID, capacity, type, intended use, etc.). When a pool import fails, why cannot the UI tell the user: "Hey, I cannot import your pool because the following disks (by SN, capacity, and SATA slot) are missing from VDEV XXX in the pool". Then the user has a better starting point than "your pool is so dead that it needs to be destroyed and rebuilt from backups", which is all I got.
 
Last edited:

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
1641728533987.png

That's all I can say...
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Good suggestion. I reported the bug via Jira in 12/21, along with suggestions on how to improve the user experience. No response yet. They seem pre-occupied with other bugs (see upcoming 12u7.1 core release)
 

flashdrive

Patron
Joined
Apr 2, 2021
Messages
264
Hello @Constantin

Having to deal with this:


I have now used the following "workflow" to check for the missed drives:

- see the TN Gui error report which could tell the serial number of the drive - these I got labeled so I can check in the box without having to pull out the drives

shell:
- zpool status
- glabel status

I second your wish for an easier overview.
 
Top