looking for suggested recovery steps after odd/annoying zpool import experience

mjch

Cadet
Joined
Apr 25, 2021
Messages
9
I've got a FreeNAS host I built out just over a year ago which I'm now having issues with - it's offline at the moment otherwise I'd collect more precise information about it, however for the moment I'm after some general guidance and suggested next steps.

In summary - FreeNAS was configured with GELI crypto devices over a four-spindle raidz volume of about 30TB ... the host is an HP Proliant Gen8 with 16GB of RAM (from memory) and prior to his experience I'd had no issues with the install or the hardware ... the following tale of woe is entirely my own fault, I recognise this.

Last night I was looking for a spare drive to use for a different project, found a 500GB USB2 device which appeared to have a single-vdev based zpool on there which my Linux hosts couldn't read as the version in the zpool seemed to be up around the 5000 level ... knowing this was the same as is used by FreeNAS, I figured I'd just hook it up directly to my server and import it there to see what was on it to know whether I could repurpose it ...

Well ... being familiar with ZFS on Solaris and FreeBSD for a while, I do know about being able to import a zpool and avoid exactly what happened next but it was late and I didn't think about it and in most cases I let my outboard zpools to configure themselves to hang off their own top-level mount as per default ... so when I ssh-ed into my FreeNAS host and did the zpool import from the console I was very surprised that the ZFS datasets on the USB2 disk neatly mounted directly over all the FreeNAS root filesystems and made things suddenly much much more complicated ... my guess is that the USB2 disk looks like it might have been from an earlier FreeNAS (10.x series?) or FreeBSD test install ... but the zpool name was ztmp, so it didn't really give me a clue going in ...

Anyway, once the USB2 zpool was imported I was unable to do any zpool operations as the binaries in my path were now out of date, so I could not export it again but had to power off the host without a clean shutdown ... I was expecting that this was likely to pose issues some minor issues with my encrypted data zpool (raccoon1) but should not have had any impact at all to the FreeNAS OS volume ...

On reboot however, I have found that I still can't do any zpool operations from the Web UI or from the console as the commands just appear to hang, so at this point I'm unable to apply any GELI keys and get at the content of that data zpool to know what state it's in ...

What suggestions do people have for recovery?

I'd like to assure myself that the data disks are good, so my current goal will be to boot from a FreeBSD DVD and bringing up the GELI devices manually and at least confirming the zpool imports ok, that would be a good first step - I'm happy that I can blow away whats on the FreeNAS OS device and re-deploy that, but I'll be doing that when the disks are out of the host ...

Thoughts?
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Typically if you disconnect the new offending devices and reboot the previous pool would import properly as expected.. Have you tried disconnecting and rebooting first?

Also, did you do the import from the CLI or WebUI of the new devices?
 

mjch

Cadet
Joined
Apr 25, 2021
Messages
9
Kris - thanks for the reply. Yes, that would have been my assumption as well, I honestly can't imagine how the state on the OS zpool became altered ... I did remove the outboard USB2 disk as soon as I had rebooted, so it's no longer attached to the system and wasn't when I last had the OS up and was trying my luck at getting the data zpool online

I'd done the zpool import from the CLI ... would that have made a difference, do you think? Does the GUI layer offer protection from this sort of thing?

tbh I think I might be too comfortable doing ZFS operations manually on the CLI and might have inadvertently crossed a line with regards to the system state that FreeNAS manages, which is always a problem with appliance-type things ...

this morning I was wondering whether my solution may in fact be to start the system from a previous Boot Environment as it comes up (beadm seems to be another binary which was hanging, because it relies on the zpool binary), but in order to that I'll need to hook up a keyboard and monitor ...
 

mjch

Cadet
Joined
Apr 25, 2021
Messages
9
ok, I had to wait until the weekend before I could work on this, hence the delay.

I can boot back to initial-install boot environment which seems to be operational, but I don't see my encrypted pool from the BUI (this I'm not so concerned about yet) ... the disks are found, but I think the relevant config for those is in the more recent boot environment which isn't active ... I'm glad I remembered the root password I'd used at the time when I had that as the active boot environment!

I've done a scrub over freenas-boot zpool and that comes up clean, so I guess I'm a bit more happy about the state of the system, except that clearly the 11.5-U5 boot environment is having Issues because when looking at the console as the host boots, it gets up to a certain point and just stops for a very long time (just after messages about starting devd, from memory) ... it did eventually come up to a login: prompt on the console and I was able to log in to the BUI, which was a relief

From the BUI it does know about my encrypted zpool, but for some reason the system seems to be doing a scan or read of some sort because it seems that zfsd is reading those disks to the tune of ~34MB/s and has since it booted ... I haven't entered my encryption key yet, so I don't know what it thinks it's doing ... the zpool isn't available, so it's unlikely to be a scrub ...

There does seem to be some cruft from the ztmp pool still around - zpool status and zpool list both report the old disk from the CLI console and I can see that it's also in the list of disks in the BUI but for the moment it doesn't seem to be impacting anything, so I'm leaving that alone for now.

Current plan is to let zfsd or whatever finish reading the disks and see what happens after that ...
 

Attachments

  • zpool-status.jpg
    zpool-status.jpg
    363.9 KB · Views: 220
  • iostat.jpg
    iostat.jpg
    382.7 KB · Views: 208

mjch

Cadet
Joined
Apr 25, 2021
Messages
9
also, I wasn't clear that I did reboot into the most current Boot Environment mid-way through paragraph three, which is why my declaration that I could see the encrypted zpool after previously saying I couldn't might have been confusing, apologies
 

mjch

Cadet
Joined
Apr 25, 2021
Messages
9
did some trussing of the zfsd process and found that it was spamming /var/log/messages with 'ZFS: vdev state changed, pool_guid=975482569780627184 vdev_guid=12320530512475991818' which matches the vdev uuid for the ztmp disk, so I thought I'd try a 'zpool export ztmp' and after that it seemed to clear things up and create the swap GELI devices as below ... also, IO on the system has dropped to nothing, so I think it was just continually trying to import ztmp from that missing USB2 disk and was stuck in a loop ... haven't looked at the BUI yet to see what that's saying

May 1 03:50:51 freenas kernel: GEOM_MIRROR: Device mirror/swap0 launched (2/2).
May 1 03:50:51 freenas kernel: GEOM_MIRROR: Device mirror/swap1 launched (2/2).
May 1 03:50:51 freenas kernel: GEOM_ELI: Device mirror/swap0.eli created.
 

mjch

Cadet
Joined
Apr 25, 2021
Messages
9
the USB2 disk is still listed in the BUI pane for disks, but on the console zpool list and zpool status don't mention it at all, so I think the BUI artifact is just that - will reboot and see what things look like coming up again after that
 

mjch

Cadet
Joined
Apr 25, 2021
Messages
9
well, it booted ok, but it still seems to pause for a long time just after 'Starting devd' appears on screen ...

Just prior to presenting me the login: prompt there was a lot of messages about not being able to start the middleware daemon ...

I have been able to log in to the BUI, so I'm not sure what that's complaining about - could it have been waiting for the missing USB2 disk to come online? that disk is still listed in the BUI, but there doesn't appear to be any way to remove it from there, as far as I can tell ...
 
Top