Newbie needs help troubleshooting pool import after failing reboot

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
Hello!
I'm pretty new to FreeNAS (set it up early this year), and have run into my first issue that I haven't been able to figure out myself.

When trying to copy about 6 GBs of files to a SMB share (in Windows Explorer) in the pool "greenTripple" (see setup), the share became unresponsive. I was not able to log into the webGUI, nor ssh into the server, so I rebooted it. I didn't have time for much debugging then, so when that did not seem to work I turned the server off (note that I did not have the possibility to connect a display atm, as I had previously removed the GPU and the MB does not have integrated grapics).
Rebooting a few days later with a GPU and display connected, I saw that the boot process halted for a long time after a "import pool version 5000 (...)":
pool import after reboot (org boot).jpg

I left it like that for more than 24 hours, before I tried reinstalling the latest FreeNAS version (FreeNAS-11.3-U5) to a fresh USB and restoring from config backup, after which the reboot halted in a similar place:
pool import after reboot.jpg

I tried reinstalling the latest FreeNAS version again to a fresh USB, and this time only importing the pool "greenTripple" from the GUI. That simply made the GUI hang at "Importing Pool". Trying to log into the GUI in another window, I got the login screen, but after logging in, all dashboards pages are empty. "Drives" lists nothing, and neither does "Pools". Refreshing the GUI at this point simply displays a message:
Code:
Connecting to FreeNAS ...
Make sure the FreeNAS system is powered on and connected to the network.


I left it like this for more than 24 hours, without any apparent change.

Rebooting once again (with the fresh boot on USB), I saw that it "hangs" in a similar place (this time with verbose output):
pool import after reboot - verbose.jpg

I tried being more patient, and left it like this for about 50 hours, seemingly without any changes.

I have since today tried (to no avail):
  • Moving the 3 disks in "greenTripple" to different slots in the 2 backplanes
  • Booting with 1 disk from the pool disconnected at a time
    • Disk 1 disconnected: Same result as before
    • Disk 2 disconnected: Seem to output some error messages during pool import; GUI works, where the pool is listed as "offline - unknown" and the 2 other connected disks listed as unused in disk view
    • Disk 3 disconnected: Same as for Disk 2
  • Disconnecting SSDs with original boot partition
  • Bypassing backplane (connecting power and SATA cables directly to MB)
The first time I rebooted after bypassing the backplane, I saw these "unable to alias" Warnings:
reboot after GUI import backplane bypass unable to alias.jpg

Also, whenever I Ctrl+C at the step where it always seems to halt (import pool), it later hangs at "Updating CPU Microcode":
Update CPU Microcode - after unable to alias.jpg

Lastly, I re-installed fresh once again to a USB stick (this time FreeNAS-11.3-U1, the same image I originally installed on the 2x SSDs), and extracted smartctl reports (attached) for all 3 disks in the seemingly problematic pool. I then again tried to import the pool through the GUI, which again made the GUI stop working like the previous time. This time I was already at the shell locally on the server, and I also noticed I was able to ssh into the server. Running htop I see that a python middlewared worker is using 100 % of a CPU core, and the local shell console keeps displaying a python exception (a few minutes apart or so):
htop after pool import GUI.jpg reappearing exception after pool import GUI.jpg

I must admit I have not dived deep into understanding how ZFS and pools work, so sorry for the possible "information overload" here from me trying to share lots of screen dumps with output I don't fully understand; I tried summarizing everything I've tried after searching posts where people complain about similar issues.

I greatly appreciate any help!
There is data on the "greenTripple" pool that I hope to recover (this is also where my jails are located, if that's of any importance)

My setup:
  • FreeNAS: FreeNAS-11.3-U1
  • MB: ASUS SABERTOOTH B3 P67 S-1155 ATX
  • CPU: INTEL CORE I7 2600K 3.40GHZ 8MB S-1155
  • RAM: CORSAIR 16GB DDR3 VENGEANCE PC3-12800 1600MHZ CL9 (4X4GB)
  • PSU: Corsair Vengeance 750M
  • Pools:
    • boot: 2x OCZ AGILITY 3 2.5" 120GB SSD SATA/600 MLC (Mirror)
    • greenTripple: 3x Seagate IronWolf 4TB 3.5'' NAS HDD (RAIDZ)
    • mirror: 2x SEAGATE BARRACUDA GREEN 2TB 5900RPM SATA/600 64MB (Mirror)
  • 2x Chieftec Backplane CMR-2131 SAS
 

Attachments

  • smart - greenTripple pool - Disk 3.txt
    7.1 KB · Views: 200
  • smart - greenTripple pool - Disk 2.txt
    7.3 KB · Views: 222
  • smart - greenTripple pool - Disk 1.txt
    7.1 KB · Views: 223

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
hangs at "Updating CPU Microcode":
You might want to try booting a linux live distribution from USB on that system first and see if you can get that message to stop.

My guess is that you updated the system BIOS which means your CPU is owed a microcode update which is somehow not working in FreeNAS.
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
You might want to try booting a linux live distribution from USB on that system first and see if you can get that message to stop.

My guess is that you updated the system BIOS which means your CPU is owed a microcode update which is somehow not working in FreeNAS.
I have not intentionally updated it in that case.

Should I not then get the same problem before I even try importing the problematic pool?
Booting succeeds (both on fresh USB and original SSDs) with all 3 disk in the apparently problematic pool disconnected, without that message (as far as I can tell).
The other pool "backupMirror" (i see that I mistakenly called it simply "mirror" in OP) also successfully imports then, and I am able to access its files.
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
Also, a small errata: The original boot is running FreeNAS-11.3-U3.2, not FreeNAS-11.3-U3.1 as in OP (seems like I'm not allowed to edit my post since I'm a fresh member)
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I have not intentionally updated it in that case.
If it's not recent, it may be something you've been living with for a long time (since the last BIOS update). Probably still good to clear it if you can.

Should I not then get the same problem before I even try importing the problematic pool?
My assumption from your OP was that you were getting that at every boot... if not, then we can ignore it.

So what do you see from zpool import?
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
So what do you see from zpool import?
Problem is, If I boot with the drives in greenTripple connected, it halts as mentioned above, before it gets to where GUI and SSH is enabled (or the local shell).
If I boot with the drives disconnected (which gets it up and running) and run zpool import, I get nothing.
Should I reinstall to a fresh USB again, and attempt to run zpool import from the CLI?
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
If I boot with the drives disconnected (which gets it up and running) and run zpool import, I get nothing.
This was when booting the original OS on the mirrored SSDs.

If I boot on the USB running FreeNAS-11.3-U3.1 from yesterday (where the GUI hung after I tried to import the greenTripple pool through it), i get:
Code:
root@freenas[~]# zpool import
   pool: freenas-boot
     id: 1911636244280595309
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        freenas-boot  ONLINE
          mirror-0  ONLINE
            ada3p2  ONLINE
            ada2p2  ONLINE

   pool: backupMirror
     id: 10986685335335328541
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        backupMirror                                    ONLINE
          mirror-0                                      ONLINE
            gptid/4b88e6b7-c1e8-11ea-aadc-f46d04b01019  ONLINE
            gptid/4c011b3c-c1e8-11ea-aadc-f46d04b01019  ONLINE

Note that this is still with the 3 drives in greenTripple disconnected, as otherwise it get stuck as explained in OP
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
Installed FreeNAS-11.3-U1 freshly on USB again:
Code:
root@freenas[~]# zpool import
   pool: greenTripple
     id: 7786540398784956987
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        greenTripple                                    ONLINE
          raidz1-0                                      ONLINE
            gptid/bc0ed76f-bcb6-11ea-b699-f46d04b01019  ONLINE
            gptid/5fded968-c15e-11ea-aadc-f46d04b01019  ONLINE
            gptid/2d66b180-b9eb-11ea-b6bb-f46d04b01019  ONLINE

   pool: freenas-boot
     id: 1911636244280595309
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        freenas-boot  ONLINE
          mirror-0  ONLINE
            ada3p2  ONLINE
            ada2p2  ONLINE

   pool: backupMirror
     id: 10986685335335328541
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        backupMirror                                    ONLINE
          mirror-0                                      ONLINE
            gptid/4b88e6b7-c1e8-11ea-aadc-f46d04b01019  ONLINE
            gptid/4c011b3c-c1e8-11ea-aadc-f46d04b01019  ONLINE
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You can try to import the problematic pool at the CLI then...
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
You can try to import the problematic pool at the CLI then...
Ok. Is it normal that it does not give any output at first, and takes a long time?
import through CLI.jpg

Edit: This also caused GUI to become unavailable again with the "Connecting to FreeNAS... (...)" message as before, and I'm seeing the same exceptions in the locally connected display:
20201013_103744.jpg
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It seems maybe you have an unhealthy pool and it is causing issues for FreeNAS when it's imported (he said, stating the obvious).

If you don't have a backup of the data from that pool and really need it, you might be forced to consider using a recovery tool like Klennet (unfortunately it's commercial software, but you can test it for free and it will show you what can be recovered before you need to pay).

Before getting to that, you might want to take an image of those drives first (to ensure you don't make things worse in an attempt to get it back) and then try zpool import with either the -Fn switch which will tell you if it looks like dropping the last few transactions on the filesystem will make it able to import... then zpool import -F to let it try.

Also there's -FX to attempt extreme measures (entire transaction groups discarded) to mount.

And in addition there's -o readonly=on to attempt a read-only mount which might work to just let you get the data off.
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
So the fact that zpool import greenTripple does not output anything, and the process has run for 2+ hours with 100 % on a single core, is an indication that it will probably never continue?

Before getting to that, you might want to take an image of those drives first (to ensure you don't make things worse in an attempt to get it back) and then try zpool import with either the -Fn switch which will tell you if it looks like dropping the last few transactions on the filesystem will make it able to import... then zpool import -F to let it try.

Also there's -FX to attempt extreme measures (entire transaction groups discarded) to mount.

And in addition there's -o readonly=on to attempt a read-only mount which might work to just let you get the data off.
Ok, I'll have a go at those options.
What's the easiest way to take an image of the drives?

Thanks for the help so far, btw!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
to image a drive, use dd.

dd if=/dev/adaX of=/path/adaX-file.img bs=1024k
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
Even though the zpool import greenTripple hung when run (and I had to kill the terminal), I noticed the pool is actually available when I ssh into the server, even though the import process is still running with 100 % CPU on one core:
1602718996735.png


I am therefore able to make a backup of the important data in the pool so that I can try the zpool import commands with special flags that you suggested, sretalla (have not got my hands on another storage large enough to image the 3x 4 TB drives in the pool yet)

I have tried to kill the zpool import greenTripple process, but it seems un-killable, which from what I understand means that its blocked in a system call, but shouldn't the STATE then be D and not T (terminated/stopped)?
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
Before getting to that, you might want to take an image of those drives first (to ensure you don't make things worse in an attempt to get it back) and then try zpool import with either the -Fn switch which will tell you if it looks like dropping the last few transactions on the filesystem will make it able to import... then zpool import -F to let it try.

I did try this, but it does not seem to change anything.
From the docs (https://www.freebsd.org/cgi/man.cgi?zpool(8)), I guess thats because "This option is ignored if the pool is importable".

And, like I indicated in my previous post, it actually seems like the import is successful, as it becomes available in the filesystem.
However it seems like the pool is then instantly busy for some reason, and the "zpool import" command never finishes (and sits at 100 % CPU, unkillable).

I tried exporting the pool afterwards, but only get an error that it cannot umount my shinobi jail:
Code:
root@freenas[~]# zpool export greenTripple
cannot unmount '/greenTripple/iocage/jails/guiltyShinobi': Device busy


I then tried a forced export, which also seems to fail:
Code:
root@freenas[/]# zpool export greenTripple
cannot export 'greenTripple': pool is busy

but after a hard reboot (since the import command is un-killable), the system came back up without the pool imported, so the forced export must have succeeded in some respect.
EDIT: Correction, pool was still imported, but not showing in GUI or filesystem and not available for import. However, running zpool export greenTripple another time now succeeded.

I then tried importing the pool as read-only, which seems to work fine without any issues; the server is still operational and I am able to access the data. Then exporting the pool and attempting another import (without readonly) I get the same issue where the command hangs at 100 % CPU forever, although the import does seem to succeed, as running commands in another shell I see the pool as imported and I'm still able to access the files.

Attempting to export it again returns an error stating that it cant unmount my shinobi jail.
Running zpool iostat shows me that no reads/writes actually occur on the pool...

Any Ideas what to try next?
 
Last edited:

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
Importing the pool with zpool import -N greenTripple also seems to succeed.
Will look into mounting file systems individually manually (mounting the iocage last)
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
Mounting the file systems that show up with zfs list one by one works fine, except for:
root@freenas[~]# zfs mount greenTripple/iocage/jails/guiltyShinobi/root
This causes the mount command to hang and climb to 100 % CPU usage, again being unkillable; i.e. similar behavour as running a normal zpool import greenTripple.

So I guess I've found the culprit. I hope simply destroying that dataset will make my pool importable again; it is not much work to reinstall the Shinobi Jail and reconfigure.

Still don't understand what has gone wrong here though.
 

JHBR

Dabbler
Joined
Oct 9, 2020
Messages
13
I solved the problem by importing the pool without mounting file systems and then nuking the Shinobi jail.
After that, uploading my backup config to the new boot device brought the system back up where it was, except of course for the Shinobi Jail.
Re-configuring that jail did fortunately not take much work.

Still don't know the root cause though...
 
Top