SLOG of encrypted pool destroyed -> Can't unlock pool anymore?

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Hi,

I was trying to put my Intel Optane in use again, but ran into some trouble...

I made 3 partitions on the optane
* 20GB for SLOG
* 16GB for SWAP
* The remainder as non-redundant test pool

I added the SLOG to my encrypted pool with
1612295888335.png

Which worked well

Then I added the remainder as a pool with
1612295931884.png

Then exporting the optanepool
And finally importing it again in the GUI.

This also worked, but when I tried enabling compression, I got an error.

So I tried some things, retried some things, rebooted a couple times...

Next I wanted to retry creating and importing the optanepool, but when I exported it via the GUI, I couldn't destroy this specific zpool anymore because it was exported. So I tried selecting the "destroy" checkbox in the GUI when doing the export of optanepool, but that not only destroyed the optanepool partition, but also the SLOG partition.

Now when I try to unlock my encrypted pool, it fails with below error:
Code:
Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 91, in main_worker
    res = MIDDLEWARE._run(*call_args)
  File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 45, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
  File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 977, in nf
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/zfs.py", line 371, in import_pool
    self.logger.error(
  File "libzfs.pyx", line 391, in libzfs.ZFS.__exit__
  File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/zfs.py", line 365, in import_pool
    zfs.import_pool(found, new_name or found.name, options, any_host=any_host)
  File "libzfs.pyx", line 1095, in libzfs.ZFS.import_pool
  File "libzfs.pyx", line 1123, in libzfs.ZFS.__import_pool
libzfs.ZFSException: one or more devices is currently unavailable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 361, in run
    await self.future
  File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 397, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 973, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool_/encryption_freebsd.py", line 290, in unlock
    raise e
  File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool_/encryption_freebsd.py", line 272, in unlock
    await self.middleware.call('zfs.pool.import_pool', pool['guid'], {
  File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1238, in call
    return await self._call(
  File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1203, in _call
    return await self._call_worker(name, *prepared_call.args)
  File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1209, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
  File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1136, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1110, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
libzfs.ZFSException: ('one or more devices is currently unavailable',)


I see in the console that it does unlock all HDDs one by one, but then in the end (I guess when it should unlock the SLOG), it fails and rolls back all unlocking steps.

I can't add my newly created SLOG, because the pool is still locked, but I don't know how to unlock it either...

Anyone have suggestion on how to proceed?
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Would it be ok / without risk if I would try to unlock the HDDs using command line, using "geli attach"?

edit:
something like:
geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key dev/gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe
geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key dev/gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe
...
for each HDD (and not the SLOG, as it was destroyed).
 
Last edited:

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Suspecting that TrueNAS might be confused that the Optane device is still present in the server, while that it couldn't find the LOG partition, I've just tried to physically remove the Intel Optane from my server and booting without Optane.

But also now I'm getting the same error when I try to unlock my pool.

I find it very strange that this causes the pool-unlocking to stop working. What if a LOG device of an encrypted pool dies? Is it normal that you then loose your data? Is a LOG device on an encrypted pool a single point of failure?
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
I just tried unlocking the pool while running the debug kernel, but it has the same issue and didn't give any more details on what exactly it was doing.

Is there perhaps anyone who can direct me to the TrueNAS middleware script that does actual unlocking? Perhaps I can decipher which commands it tries to do and try to manually execute them... Hopefully that should get me to an unlocked (but possible degraded because of the missing SLOG?) pool, so that I can at least
  • update my offline backup (it's more than 1 month old)
  • perhaps try to remove the SLOG from the pool configuration, so that it stops "missing" it
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
I just tried unlocking the pool while running the debug kernel, but it has the same issue and didn't give any more details on what exactly it was doing.

Is there perhaps anyone who can direct me to the TrueNAS middleware script that does actual unlocking? Perhaps I can decipher which commands it tries to do and try to manually execute them... Hopefully that should get me to an unlocked (but possible degraded because of the missing SLOG?) pool, so that I can at least
  • update my offline backup (it's more than 1 month old)
  • perhaps try to remove the SLOG from the pool configuration, so that it stops "missing" it

The simplest is just to remove the SLOG from your pool via zpool remove <pool name> <device>. However, if you still want to manually replicate the middleware actions, according to line 1098-9 of /usr/local/lib/python3.8/site-packages/middlewared/plugins/pool.py, the middleware runs disk.geli_detach_single, followed by pool.remove_from_storage_encrypted_disk.
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Thanks a lot for the response Samuel Tai!!

The problem is that I can't remove the SLOG from my pool because the pool is locked :confused: (as long as the pool is locked, the OS can't find or doesn't know about the pool)
As the GUI doesn't allow me to unlock the pool, I guess the only thing I can do is try to manually unlock it and then try to remove the SLOG, right?
I'll have a look at the script you mentioned later this evening for this... Thanks!

But if you know by heart if the below commands, which I mentioned earlier, should do the trick (and are safe to use), please feel free to let me know:
geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key dev/gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe
geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key dev/gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe
...
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
I'm afraid your only recourse is to destroy and recreate your hgstpool pool. Mixing CLI and GUI actions is dangerous, especially if you're using partitions on devices instead of the whole device, as the GUI assumes.
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Do you mean that the CLI geli commands that I mentioned will certainly not work and that it is certainly impossible to recover my pool?
Or do you mean that trying to recover my pool, might be possible, but could be dangerous and you advice against it?

I'm afraid my last offline backup is more than 1 month old, so if there is chance to recover, I would certainly like to try...
Even if it is only to temporarily bring my pool online and update my offline backup, before destroying everything and restarting from scratch...
 

indy

Patron
Joined
Dec 28, 2013
Messages
287
Did you go ahead with the attempt to import your pool manually through the cli?

Theoretically it should be possible to import the pool without the log device, discarding unfinished transactions:
-m Allows a pool to import when there is a missing log device. Recent transactions can be lost because the log device will be discarded.


For some additional pointers:
While the op of this issue was not successful, maybe you have more luck.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Every case I've seen where it's been a corrupted log partition has resulted in data loss, and the need to rebuild the pool. If it were a log device, there's a good chance to recover the pool.
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Did you go ahead with the attempt to import your pool manually through the cli?

Theoretically it should be possible to import the pool without the log device, discarding unfinished transactions:



For some additional pointers:
While the op of this issue was not successful, maybe you have more luck.
Thanks!

I already knew about zpool import and '-m'. That is not the problem. The problem is that that TrueNAS refuses to unlock my pool and while locked, the pool is invisible to the OS, so there is nothing to import...

Every case I've seen where it's been a corrupted log partition has resulted in data loss, and the need to rebuild the pool. If it were a log device, there's a good chance to recover the pool.
The good news might be that my pool was still locked after a reboot when I accidently destroyed the LOG partition. So the LOG should actually be empty at that point I think...
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
I just gave it a shot:
Code:
data# geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe
Enter passphrase:
data# geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe
Enter passphrase:
data# geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f03634f0-cdd4-11ea-a82a-d05099d3fdfe
Enter passphrase:
data# geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/efd8967a-cdd4-11ea-a82a-d05099d3fdfe
Enter passphrase:
data# geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f00c5450-cdd4-11ea-a82a-d05099d3fdfe
Enter passphrase:
data# geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f04a1e2c-cdd4-11ea-a82a-d05099d3fdfe
Enter passphrase:
data# geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/ef1cec0a-cdd4-11ea-a82a-d05099d3fdfe
Enter passphrase:
data# geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f01fd8e3-cdd4-11ea-a82a-d05099d3fdfe
Enter passphrase:
data# zpool import
   pool: hgstpool
     id: 11186148882051621824
  state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
        devices and try again.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-6X
config:

        hgstpool                                            UNAVAIL  missing device
          raidz2-0                                          ONLINE
            gptid/ef1cec0a-cdd4-11ea-a82a-d05099d3fdfe.eli  ONLINE
            gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe.eli  ONLINE
            gptid/f00c5450-cdd4-11ea-a82a-d05099d3fdfe.eli  ONLINE
            gptid/efd8967a-cdd4-11ea-a82a-d05099d3fdfe.eli  ONLINE
            gptid/f04a1e2c-cdd4-11ea-a82a-d05099d3fdfe.eli  ONLINE
            gptid/f01fd8e3-cdd4-11ea-a82a-d05099d3fdfe.eli  ONLINE
            gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe.eli  ONLINE
            gptid/f03634f0-cdd4-11ea-a82a-d05099d3fdfe.eli  ONLINE
        logs
          gptid/f8725bbb-64c5-11eb-bbcd-d05099d3fdfe        UNAVAIL  cannot open

        Additional devices are known to be part of this pool, though their
        exact configuration cannot be determined.
data#

It seems like this worked! I can see my pool again...

Now I wonder what I best try next:

1)
  • zpool remove hgstpool gptid/f8725bbb-64c5-11eb-bbcd-d05099d3fdfe (should remove the log device from the pool)
  • geli detach /dev/gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe
  • geli detach /dev/gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe
  • geli detach /dev/gptid/f03634f0-cdd4-11ea-a82a-d05099d3fdfe
  • geli detach /dev/gptid/efd8967a-cdd4-11ea-a82a-d05099d3fdfe
  • geli detach /dev/gptid/f00c5450-cdd4-11ea-a82a-d05099d3fdfe
  • geli detach /dev/gptid/f04a1e2c-cdd4-11ea-a82a-d05099d3fdfe
  • geli detach /dev/gptid/ef1cec0a-cdd4-11ea-a82a-d05099d3fdfe
  • geli detach /dev/gptid/f01fd8e3-cdd4-11ea-a82a-d05099d3fdfe
  • then reboot TrueNAS
  • And finally try to mount it again using the GUI
  • Update my offline backup
2)
  • zpool import -m hgstpool or zpool import -m -d /dev/disk/??? hgstpool (or something... dunno really)
  • zfs mount hgstpool (not sure if I need to make or specify a mountpoint? The usual mountpoint /mnt/hgstpool currently does not exist)
  • Try to get it shared somehow if TrueNAS doesn't automatically share it with my already existing share
  • Update my offline backup
3)
  • Or something else

edit:
I just also discovered that it is possible to export the pool before unlocking it... That's perhaps even safer... That would be the like this:

4)
  • Export hgstpool in the TrueNAS GUI
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f03634f0-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/efd8967a-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f00c5450-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f04a1e2c-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/ef1cec0a-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f01fd8e3-cdd4-11ea-a82a-d05099d3fdfe
  • zpool remove hgstpool gptid/f8725bbb-64c5-11eb-bbcd-d05099d3fdfe
  • geli detach /dev/gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe (or not?)
  • geli detach /dev/gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe (or not?)
  • geli detach /dev/gptid/f03634f0-cdd4-11ea-a82a-d05099d3fdfe (or not?)
  • geli detach /dev/gptid/efd8967a-cdd4-11ea-a82a-d05099d3fdfe (or not?)
  • geli detach /dev/gptid/f00c5450-cdd4-11ea-a82a-d05099d3fdfe (or not?)
  • geli detach /dev/gptid/f04a1e2c-cdd4-11ea-a82a-d05099d3fdfe (or not?)
  • geli detach /dev/gptid/ef1cec0a-cdd4-11ea-a82a-d05099d3fdfe (or not?)
  • geli detach /dev/gptid/f01fd8e3-cdd4-11ea-a82a-d05099d3fdfe (or not?)
  • reboot TrueNAS (or not?)
  • Import hgstpool using the GUI
  • Unlock hgstpool using the GUI
  • Update my offline backup
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Sorry, I had to edit this as the pool's not imported.

You'll need to try zpool import -f -m hgstpool first, to try a force import without a log. Then try removing the log device, and then exporting the pool. This will automatically geli detach each disk. Then you should be able to import it in the GUI.
 
Last edited:

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Thank Samuel Tai!

Can you please clarify below:
1) I need to import the pool before I can remove the LOG device from the pool?
2) Should I "export / disconnect" the pool from the GUI first? (even though the pool is not imported atm, it is still "visible / known" in the GUI (as a locked pool)) That way I suspect TrueNAS will be "less confused" if I start "messing" with the pool manually through the command line, right?

I'll retry to summarize below:
  • "Export/Disconnect" hgstpool in the TrueNAS GUI (the option is available, not sure if it will work though)
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f03634f0-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/efd8967a-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f00c5450-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f04a1e2c-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/ef1cec0a-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f01fd8e3-cdd4-11ea-a82a-d05099d3fdfe
  • zpool import -m hgstpool or zpool import -m -f hgstpool (if it doesn't work without -f)
  • zpool remove hgstpool gptid/f8725bbb-64c5-11eb-bbcd-d05099d3fdfe
  • zpool export hgstpool
  • reboot TrueNAS (or not?)
  • "Import" hgstpool using the GUI
  • "Unlock" hgstpool using the GUI
  • Update my offline backup
Correct?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
1) I need to import the pool before I can remove the LOG device from the pool?

Correct. zpool won't operate on pools that aren't imported.

2) Should I "export / disconnect" the pool from the GUI first?

No, avoid GUI actions initially, as the GUI makes assumptions which led to your current predicament. After the log is removed, export from CLI. This should remove the pool from visibility in the GUI.
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
After doing below steps already yesterday:
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/efde27c0-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/ef5b1297-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f03634f0-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/efd8967a-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f00c5450-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f04a1e2c-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/ef1cec0a-cdd4-11ea-a82a-d05099d3fdfe
  • geli attach -k /data/geli/012fff85-1065-49d7-a500-a4efe62e27a8.key /dev/gptid/f01fd8e3-cdd4-11ea-a82a-d05099d3fdfe
    • Which made my pool visible again in CLI
I've just tried following steps:
  • zpool import -m hgstpool
    • This worked and the pool immediately became fully visible in in GUI as degraded because of the missing SLOG
  • zpool remove hgstpool gptid/f8725bbb-64c5-11eb-bbcd-d05099d3fdfe
    • Also this worked very well. The pool became healthy again in the GUI
  • zpool export hgstpool
    • Since you stressed not doing this from the GUI, I did the above in CLI. This however didn't work as it should have. My pool didn't disappear from the GUI. It only became "offline".

Next I rebooted and unlocked the pool from the GUI, which worked fine. My data is available again. So I'll start updating my offline backup first now...

However... Should I perhaps still try to export it from the GUI and then import in the GUI again? So that all pool config can get properly loaded again...
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
No need. If the GUI sees it as healthy without the SLOG, you're back in business.
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Thanks!

Just for future reference, if anyone else runs into this problem:
Do you still suggest to do the CLI "zpool export", like I did? Or was it already "ok" before the export?

Also, do you think my TrueNAS setup is "healthy" after all this "CLI hacking"? Or do you suggest a rebuild from scratch, after I've updated my offline backups?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
It was probably OK. You should run a scrub to verify.
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Scrub was the first thing I started after the pool got online :)

Thanks again for your support...

edit: scrub completed without issues and backup = up-to-date
 
Last edited:
Top