pool import hangs at boot

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
Hi everyone,
after forcing a shut down on a server that crawled to a halt while deleting things on the pool,
now at boot it stalls while importing the data pool with this messagges on screen:
20231003_090523.jpg


20231003_113446.jpg

I read somewhere that it could be a problem with the SAS cable or the HBA so I changed both
but the problem still remains, If I CTRL+C it skips the import and I can access the system,
but the pool is not imported.
I ran a short and long SMART test on all the disks but they're all fine, no errors reported.
I don't know if the pool is still committing some operation it was doing before the forced shut down,
is there a way to check a pool status before importing it?
Can I import a pool with a command line option to skip whatever it is doing that is blocking it?
This is a new situation for me and I don't know what to do, please kindly advice.

Thanks
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I don't have any suggestions at present.

But, please supply the complete hardware configuration, including make, model and how it is wired, (make sure that disk make & models are listed too). Plus, TrueNAS version. Then include the output of the following command from the SHELL, in code tags;
zpool import

Without more to go on, it is pure guess work.


Their are several hardware devices proven to be problematic for ZFS. For example, hardware RAID, (though you do list HBA, so that sounds good), or SMR disks. Even SATA port multipliers can give problems. So a complete hardware list is always a good first step in eliminating those from any problems.
 

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
Hi @Arwen, here you go:

System specs
Code:
Motherboard: Supermicro X8SIE6-F
On-board SAS controller: LSI SAS2008-IR
PCIe SAS controller: LSI ServerRAID M1015
Drives: 12xSeagate ST33000650NS

zpool import
Code:
   pool: BACKUP
     id: 14362081959487675084
  state: ONLINE
status: Some supported features are not enabled on the pool.
        (Note that they may be intentionally disabled if the
        'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
 config:

        BACKUP                                          ONLINE
          raidz1-0                                      ONLINE
            gptid/1631aeb0-0280-3b4d-82d6-f3a2284f0e83  ONLINE
            gptid/3e73b568-25d9-3be6-ecff-df49054cfd73  ONLINE
            gptid/43da68d3-498b-d8eb-c75f-d5b38ac32270  ONLINE
            gptid/4cc94ad3-5218-11ee-90f6-002590c05e02  ONLINE
            gptid/66c8332a-0529-146c-fee7-c132182366c8  ONLINE
            gptid/f9009aa8-189a-2fcd-a2a5-ab1f1d998436  ONLINE
            gptid/b2d00373-0fd8-8940-8847-949b0ed109cf  ONLINE
            gptid/b34e2931-9efc-30c5-fedd-dd6511b0ed8f  ONLINE
            gptid/f806d035-6a65-11ed-9bc7-002590c05e02  ONLINE
            gptid/82fe6cee-d2a0-5e43-a934-ec189e83e062  ONLINE
            gptid/cec706dd-54e3-d7ee-b481-899035cac61c  ONLINE
            gptid/d3edb832-1b4a-87e2-fb2e-8b15b20004d6  ONLINE


Now the disks are connected to the PCIe controller, but as I wrote before the problem still remains,
I had to disconnect all the drives and export the pool from the GUI in order to access the system at
boot, now if I try to import the pool the GUI gets stuck forever on "importing pool..".
Accessing via ssh I can see the data while it's still importing but I'm afraid to do anything with it.
Let me know if you need more infos.

Thanks for helping
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
On-board SAS controller: LSI SAS2008-IR
PCIe SAS controller: LSI ServerRAID M1015
Both these are problematic, unless they have specific firmware. I don't have details except that hardware RAID is REALLY not wanted by ZFS. The "-IR" indicates "Integrated RAID", instead of the proper IT firmware. But, that could just be what you copied and it could be using the IT firmware. You should research the firmware and make sure it's not RAID.

As for importing the pool, one really old problem with ZFS was Dataset destroy. This was solved more than 10 years ago, with the Async Dataset destroy feature. So I doubt that is your problem.

However, another issue can come up. Modern ZFS pools have Async Delete, which creates a list of items to delete in the background. On import, ZFS would resume deleting which could take a while. Also, if a ZFS scrub is in progress, that would slow things down too.

There are ways to see this. If you import from the GUI and then check via the command line with these 2 commands;
zpool status BACKUP zpool get freeing BACKUP
The first will let you know if a ZFS scrub is in progress.
The second will show how much is left to free up, after dataset destroys.
 

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
If you import from the GUI
Can I import from command line instead of GUI? So if it stalls again I can always cancel the operation with ctrl-c (right?).
Modern ZFS pools have Async Delete, which creates a list of items to delete in the background. On import, ZFS would resume deleting which could take a while.
Can I abort that operation somehow?
The second will show how much is left to free up, after dataset destroys.
I didn't delete a dataset, just files/folders, would that command show me those as well?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Can I import from command line instead of GUI? So if it stalls again I can always cancel the operation with ctrl-c (right?).
...
Yes, but I can't remember if their are alternate root mount options. So a simple import might not do the right thing, (though not bad per say...).
...
Can I abort that operation somehow?
...
You can stop a scrub, but you can't stop any deletes in progress. However, the pool should still be functional even with that occurring.
...
I didn't delete a dataset, just files/folders, would that command show me those as well?
I don't know for certain. I just know that Async Deletes was seriously being considered as a new feature more than 3 years ago.


There is probably something else going on. I just mention the above because they are straight forward to check.


Be sure to check your LSI card's firmware and make sure they are using IT and not RAID firmware.
 

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
I just issued "zpool import BACKUP", all other commands involving "zpool" command are stuck (status, get freeing)
on the other consoles... is there a way to know what's going on here?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I don't know.

But, you have not said if you checked your LSI cards' firmware for IT mode.
 

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
I don't think it's a firmware problem, this system has been running for years without problems.
Now on the screen it's full of messages like " pid ... was killed: failed to reclaim memory",
I think whatever it's trying to do while importing the pool is filling up the memory and crawls
to a halt, I'll try to limit the ARC size and see what happens.
Meanwhile I'll try and find an IT firmware for both cards.

Thanks
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
RAID firmware has been known to corrupt disk writes for ZFS under unusual circumstances, like un-expected power off. Probably not your case.

ZFS was specifically designed to handle hundreds, if not thousands of un-expected power offs, IF, note IF, the disk write are both in order and complete before the next ones are done. RAID firmware can do elevator seeks when writing, thus, write the disk blocks out of order. Not a problem unless their is an un-expected power off in the middle.

Memory can also corrupt a pool. Rare but it appears to happen, (because ZFS is now used to store billions of terabytes of data). And is more likely to happen on consumer hardware that does not support ECC. Or worse, uses over-clocking.
 

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
zpool status BACKUP
Code:
  pool: BACKUP
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 2.10T in 1 days 09:06:56 with 0 errors on Thu Sep 14 20:38:18 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        BACKUP                                          ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/1631aeb0-0280-3b4d-82d6-f3a2284f0e83  ONLINE       0     0     0
            gptid/3e73b568-25d9-3be6-ecff-df49054cfd73  ONLINE       0     0     0
            gptid/43da68d3-498b-d8eb-c75f-d5b38ac32270  ONLINE       0     0     0
            gptid/4cc94ad3-5218-11ee-90f6-002590c05e02  ONLINE       0     0     0
            gptid/66c8332a-0529-146c-fee7-c132182366c8  ONLINE       0     0     0
            gptid/f9009aa8-189a-2fcd-a2a5-ab1f1d998436  ONLINE       0     0     0
            gptid/b2d00373-0fd8-8940-8847-949b0ed109cf  ONLINE       0     0     0
            gptid/b34e2931-9efc-30c5-fedd-dd6511b0ed8f  ONLINE       0     0     0
            gptid/f806d035-6a65-11ed-9bc7-002590c05e02  ONLINE       0     0     0
            gptid/82fe6cee-d2a0-5e43-a934-ec189e83e062  ONLINE       0     0     0
            gptid/cec706dd-54e3-d7ee-b481-899035cac61c  ONLINE       0     0     0
            gptid/d3edb832-1b4a-87e2-fb2e-8b15b20004d6  ONLINE       0     0     0

errors: No known data errors

zpool get freeing BACKUP
Code:
NAME    PROPERTY  VALUE    SOURCE
BACKUP  freeing   0        -

I've set min/max arc usage via sysctl to 12gb/24gb (the system has only 32gb and can't be upgraded),
the pool is still importing right now, but I can see it mounted on the system, I hope it finished importing
the pool without going out of memory.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@Davide Zanon - One thing that can cause import problems, is the use of De-Dup. Except in very special cases, ZFS De-Duplication is really discouraged.

Are you using De-Dup on any Datasets or zVols?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Also, I have just noticed you are running twelve disks in RAIDZ1: in the best case that's a backup waiting to be pulled.

 

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
Are you using De-Dup on any Datasets or zVols?
twelve disks in RAIDZ1:
you're running IR firmware

Yes to all that.
This was a Netgear system that came with an IR-firmware, RAIDZ1 and dedup enabled.
A while ago I dumped the proprietary Netgear OS for Freenas (and now TrueNAS) but
I have to face with all these design problems that it came with originally.
Right now the data amounts to almost 50tb and I can't move it elsewhere to change
the RAIDZ1 and I can't disable dedup due to the nature of backups the system receives.
I just need to import it again and delete some stuff, I'll let you know tomorrow if the
operation finished successfully.

Thanks
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Here is a Resource I wrote about ZFS De-Dup;

I am not trying to stop you, (or others), from using ZFS De-Dup. But, I am trying to make sure your eyes are open to known issues with ZFS De-Dup. Like long import times and memory requirements.

If you find something to improve on that ZFS De-Dup Resource, it has a discussion thread where you can comment. The forum will notify me of such comments and I can make improvements to the Resource as needed.
 
Top