Truenas machine keeps booting when trying to import pool

Longfellow75

Dabbler
Joined
Aug 24, 2022
Messages
12
Hi all,

I have using truenas for a year now without any problem. My system built is a Dell PowerEdge R730xd (dual Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 32 GB RAM, no raid card) with one SSD boot disk and six 8TB NAS disks in raidz-2 configuration. After a series of power outage the pool went offline. I tried to import (reconnect it) with zpool import -f 'pool name' command. It showed that an import was possible with 10 seconds change loss. I run zpool import -F 'pool name' but the machine kept rebooting when trying to load the pool with KBD panic error, tries to use some specific uberblock and reboots again. The only way to have access to truenas again is to take out all the nas disks. I can see in GUI the pool name indicated as offline and when I insert back all the disks they show in pool column as N/A. When I tried in shell the command 'zpool import' it shows the pool name and all the disks as online. I ran S.M.A.R.T tests for disks they all came success. I have tried from GUI to export the pool, reboot the machine and tried to import the pool again with the same result. At rebooting it get stuck at the same precise point to that uberblock using and reboots infinitely.

I'm really concern. I have read most of the posts here for a solution but haven't found a way around. Please someone help. It will be highly appriciated.

Thanks,
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
Please provide the output of zpool import. It's likely there's one damaged disk. If we remove that disk from the pool and then import, your pool should return.
 

Longfellow75

Dabbler
Joined
Aug 24, 2022
Messages
12
Hi Samuel,

Thanks for your prompt answer. Here's the output of zpool import command
 

Attachments

  • Capture.PNG
    Capture.PNG
    19.3 KB · Views: 390

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
OK, we'll have to do this systematically:
  1. Physically remove the 1st disk from the pool
  2. Try the import. If successful, then you've found the bad disk, and can proceed with the normal disk replacement procedure.
  3. If not, then reinstall the 1st disk. Try steps 1-2 with the next disk, until you've worked through all pool disks.
 

Longfellow75

Dabbler
Joined
Aug 24, 2022
Messages
12
OK, we'll have to do this systematically:
  1. Physically remove the 1st disk from the pool
  2. Try the import. If successful, then you've found the bad disk, and can proceed with the normal disk replacement procedure.
  3. If not, then reinstall the 1st disk. Try steps 1-2 with the next disk, until you've worked through all pool disks.
Hi Samuel,
I tried to remove each and every one of the six drives and import the pool again with the same result. Just received the additional error number 2 (in the vga monitor connected) that wasn't present before. The machine just keeps rebooting entering KBD panic when it tries to import the pool. Any other step I should try?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
Can you take a picture of the exact error? Also, how are these disks connected to the motherboard?
 

Longfellow75

Dabbler
Joined
Aug 24, 2022
Messages
12
Can you take a picture of the exact error? Also, how are these disks connected to the motherboard?
Hi Samuel,

Sorry for the quality of the photos attached. Taken from the vga screen attached in the server room. The photos attached are from the moment when the system tries to import the pool and when it get panicked. I'm really desperate at this point. There are critical data of TV station in this pool and unfortunately we had no budget for a backup. Please advise...
 

Attachments

  • 20220824_155612.jpg
    20220824_155612.jpg
    256.4 KB · Views: 411
  • 20220824_155643.jpg
    20220824_155643.jpg
    304 KB · Views: 332

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
I'm afraid I don't have any good news for you. The screen shots show you have 2 failed drives in your RAIDZ2 pool, which means the pool is lost. The first failed drive has GPTID 0b5a08cd-f079-11eb-81ec-246e96051998, and is the 2nd disk in your pool. This is throwing error 2 (failed to open VDEV GEOM), which indicates a damaged ZFS partition. The second failed drive has GPTID 0c24001a-f079-11eb-81ec-246e96051998, and is the 3rd disk in your pool. This is the one causing the KDB panics due to damaged data structures.

Did you have a UPS on this server to prevent this sort of thing? The only way to recover your data now is to use an expensive recovery tool, Klennet ZFS Recovery.
 

Longfellow75

Dabbler
Joined
Aug 24, 2022
Messages
12
Hi Samuel,
Thanks a lot for your support. Would you please just double check the screenshots below! I send you the screen shots during the procedures you advised of taking disks out one by one. The photos attached are the real system situation with all the 6 disks attached. There are 6 NAS disks attached and one SSD system disk in the system.
1.
20220824_163439.jpg

2.
20220824_163501.jpg
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
Try running a memory test on your server using a Linux live thumb drive. I suspect your server may also have suffered RAM damage from the power bumps, and this could be a contributing factor to the KDB panics.
 

Longfellow75

Dabbler
Joined
Aug 24, 2022
Messages
12
Try running a memory test on your server using a Linux live thumb drive. I suspect your server may also have suffered RAM damage from the power bumps, and this could be a contributing factor to the KDB panics.
Hi Samuel,
Thanks for your help. The system has 32 GB ECC RAM and doesn't show during POST or in the reports of truenas any errors. Do you have in mind a specific way how to test it?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
Like I said, boot from a Linux live thumb drive to run memtest86+.
 

Longfellow75

Dabbler
Joined
Aug 24, 2022
Messages
12
Like I said, boot from a Linux live thumb drive to run memtest86+.
Hi Samuel,

I have booted from a live Linux usb drive and run memtest86+. After a few time it came out without any error in RAM memory. Is there a way to force import command to choose another previous point of metadata.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
Is there a way to force import command to choose another previous point of metadata.

Yes, but this is extremely hazardous, and could further damage your pool. I think you should run your disks through Klennet instead to recover your data.

For the sake of completeness, the command is zpool import -f -FX -R /mnt -c /data/zfs/zpool.cache TV-TEUTA. However, the man page for zpool-import says:

-X​
Used with the -F recovery option. Determines whether extreme measures to find a valid txg should take place. This allows the pool to be rolled back to a txg which is no longer guaranteed to be consistent. Pools imported at an inconsistent txg may contain uncorrectable checksum errors. For more details about pool recovery mode, see the -F option, above. WARNING: This option can be extremely hazardous to the health of your pool and should only be used as a last resort.​
 

Alex_K

Explorer
Joined
Sep 4, 2016
Messages
64
Mm I always thought RAID-Z2 is made to survive 2 disk loss (like RAID-Z survives one), is that not (always) so?

Another thing, in data recovery 1st thing you usually do is to clone each and every disk to new disk (borrow them if can't buy them), then you can experiment with copies or originals. You can run disk repair tools on originals too. You can try connecting these drives - or clones - to different computer (not even server), install TrueNAS onto thumb drive there and try importing to rule out any possible not-disk-themselves related issues.

Multitude options. It Should be cheaper then professional software/services (service costs usually dwarf the spare parts in these cases).

I can't check real cost because Klennet link does not open for me.
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,600
Yes, RAID-Z2 is supposed to survive a 2 disk loss without data loss. Their are caveats.

For example, a certain well known Internet computer personality, (who seems to do some things for entertainment, not real computer work), had problems using ZFS. Turns out he / his team, were not managing that NAS. No regular scrubs, SMART tests and verification that the server & it's data were healthy. So it was no surprise to those of us here that he had data loss and problems.

It may be possible to remove the 2 failing disks and try to re-import your pool.


There are other causes of ZFS data loss. One user pulled the wrong disk out and re-used it elsewhere, (MS-Windows). Western Digital Red SMR drivers are not compatible with ZFS, seemingly due to firmware bugs in the drives.

One last note, ZFS was specifically designed to survive unexpected power loss, under normal conditions. That said, it is possible that a disk corrupted it's self, (or the computer sent corrupt data to it, independent of ZFS). A disk write cache that lied about flushing to stable media can cause data loss. Or if a disk's firmware lies about write fencing. So, back to pull both affected disks.
 

Longfellow75

Dabbler
Joined
Aug 24, 2022
Messages
12
Yes, but this is extremely hazardous, and could further damage your pool. I think you should run your disks through Klennet instead to recover your data.

For the sake of completeness, the command is zpool import -f -FX -R /mnt -c /data/zfs/zpool.cache TV-TEUTA. However, the man page for zpool-import says:

-X​
Used with the -F recovery option. Determines whether extreme measures to find a valid txg should take place. This allows the pool to be rolled back to a txg which is no longer guaranteed to be consistent. Pools imported at an inconsistent txg may contain uncorrectable checksum errors. For more details about pool recovery mode, see the -F option, above. WARNING: This option can be extremely hazardous to the health of your pool and should only be used as a last resort.​
Hi Samuel,

Thanks for your answers. When I try the command zpool import -f -FX -R /mnt -c /data/zfs/zpool.cache TV-TEUTA that's what I get. The zpool import command still shows the pool as online with all the disks involved.

Capture.PNG


At this point I'm really desperate...
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
OK, try it without the -c /data/zfs/zpool.cache, and using the pool ID 14869030329496604096: zpool import -f -FX -R /mnt 14869030329496604096. However, I don't have high hopes, and think you should look at Klennet at this point. @Arwen is correct your pool should've survived the power outage and the loss of 2 disks. However, with 2 disks out of the pool, there is no more redundancy, and an ill-timed power bump could've scrambled the TXGs enough to result in your situation.
 

Longfellow75

Dabbler
Joined
Aug 24, 2022
Messages
12
OK, try it without the -c /data/zfs/zpool.cache, and using the pool ID 14869030329496604096: zpool import -f -FX -R /mnt 14869030329496604096. However, I don't have high hopes, and think you should look at Klennet at this point. @Arwen is correct your pool should've survived the power outage and the loss of 2 disks. However, with 2 disks out of the pool, there is no more redundancy, and an ill-timed power bump could've scrambled the TXGs enough to result in your situation.
Hi Samuel,

Same behavior, KBD panic and rebooting in loop. At this point, I'll try to consider the Klennet option if there's something that can be done. Thanks for your help and advice.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
There is another thing to try: Boot an Ubuntu 22.04 live thumb drive, and try the zpool import -f -FX -R /mnt 14869030329496604096 there. Since Ubuntu has a different kernel implementation, you may be able to mount your pool cleanly there. If it does mount, save your data, and then export your pool. A clean export will make the pool importable in TrueNAS.
 
Top