So back in May I created the thread Can drives on encrypted zpools ever be "replaced"? and there I posted what was a workaround since there was some issues with gptids not being removed from the FreeNAS configuration when a disk is replaced. I asked in IRC a day or two ago about the actual procedure since no instructions for rekeying the zpool are in the manual and because I knew this would go south somehow. Well, I am creating a new thread.. so guess how it turned out? :p
About a month ago the geli key + passphrase stopped working. Presumably the key that the USB stick was on was having issues. Hooray for the replacement recovery key though. That has been used ever since then to mount the zpool.
Fast forward to tonight. I upgraded to FreeNAS 9.1 from 8.3.1 and I figured after I upgraded to 9.1 I would start by rekeying the zpool, creating a new passphrase, and then downloading the new key and recovery in that order. I felt that doing it in 9.1 is smarter because I've put in several tickets to make improvements on the encryption system FreeNAS uses. First ticket #2178, then #2242. I did several tests in a VM to verify that my steps worked perfectly(and they did).
But not so fast there hotshot.
After the upgrade I remounted the zpool without any issues. zpool status showed the zpool as healthy and everything is in good working order. So I clicked the "encryption rekey" button in the GUI and I got an error...
Aug 9 20:35:06 freenas notifier: dd: /data/geli/fcf2894c-7183-492c-9780-f04888349ce7.key.tmp: No such file or directory
Aug 9 20:35:06 freenas manage.py: [middleware.exceptions:38] [MiddlewareError: Unable to set key: geli: Cannot open keyfile /data/geli/fcf2894c-7183-492c-9780-f04888349ce7.key.tmp: No such file or directory. ]
At first I thought this may be related to me editing the database and removing the old drive from the thread back in May. But the last set of characters (f04888349ce7) don't match either of my zpools. I'm really not sure where it came from.
In any case, I then tried to add a new passphrase and I got a similar error:
Aug 9 20:39:06 freenas manage.py: [middleware.exceptions:38] [MiddlewareError: Unable to set passphrase: geli: Cannot open keyfile /data/geli/fcf2894c-7183-492c-9780-f04888349ce7.key: No such file or directory. ]
Then I thought I'd try to download the new key. After all, if the rekey worked despite the error and the server reboots I believe that the data will be not be able to be mounted again. I got a nice error....
And now the fun part. I can't make this too easy now, can I? I've just been told by the zpool owner that his backups have been trashed for more than a month and currently no backup exists. Nice huh?
We're in the process of copying the most important data off of the zpool in case we somehow end up with an unmountable zpool because the encryption goes horribly wrong. He's now decided that replacing his failed backup drives is a priority(shocker), but we need to figure out how to recover from this once the data is safely backed up.
Destroying and recreating the pool really isn't an option since some of the data won't be able to be backed up for almost a week. But the most important data is being backed up as we speak.
So how do I recover from this? I'm pretty amazed that I keep having all of these problems. It's server class hardware, only I have the password to admin it, and it's a basic server. Zpool shared with CIFS and that's all. Nothing fancy.
About a month ago the geli key + passphrase stopped working. Presumably the key that the USB stick was on was having issues. Hooray for the replacement recovery key though. That has been used ever since then to mount the zpool.
Fast forward to tonight. I upgraded to FreeNAS 9.1 from 8.3.1 and I figured after I upgraded to 9.1 I would start by rekeying the zpool, creating a new passphrase, and then downloading the new key and recovery in that order. I felt that doing it in 9.1 is smarter because I've put in several tickets to make improvements on the encryption system FreeNAS uses. First ticket #2178, then #2242. I did several tests in a VM to verify that my steps worked perfectly(and they did).
But not so fast there hotshot.
After the upgrade I remounted the zpool without any issues. zpool status showed the zpool as healthy and everything is in good working order. So I clicked the "encryption rekey" button in the GUI and I got an error...
Aug 9 20:35:06 freenas notifier: dd: /data/geli/fcf2894c-7183-492c-9780-f04888349ce7.key.tmp: No such file or directory
Aug 9 20:35:06 freenas manage.py: [middleware.exceptions:38] [MiddlewareError: Unable to set key: geli: Cannot open keyfile /data/geli/fcf2894c-7183-492c-9780-f04888349ce7.key.tmp: No such file or directory. ]
At first I thought this may be related to me editing the database and removing the old drive from the thread back in May. But the last set of characters (f04888349ce7) don't match either of my zpools. I'm really not sure where it came from.
In any case, I then tried to add a new passphrase and I got a similar error:
Aug 9 20:39:06 freenas manage.py: [middleware.exceptions:38] [MiddlewareError: Unable to set passphrase: geli: Cannot open keyfile /data/geli/fcf2894c-7183-492c-9780-f04888349ce7.key: No such file or directory. ]
Then I thought I'd try to download the new key. After all, if the rekey worked despite the error and the server reboots I believe that the data will be not be able to be mounted again. I got a nice error....
Code:
IOError at /storage/volume/2/key/download/ [Errno 2] No such file or directory: u'/data/geli/fcf2894c-7183-492c-9780-f04888349ce7.key' Request Method: GET Request URL: https://192.168.2.100/storage/volume/2/key/download/ Django Version: 1.5.1 Exception Type: IOError Exception Value: [Errno 2] No such file or directory: u'/data/geli/fcf2894c-7183-492c-9780-f04888349ce7.key' Exception Location: /usr/local/www/freenasUI/../freenasUI/storage/views.py in volume_key_download, line 983 Python Executable: /usr/local/bin/python Python Version: 2.7.5 Python Path: ['/usr/local/www/freenasUI', '/usr/local/lib/python2.7/site-packages/distribute-0.6.35-py2.7.egg', '/usr/local/lib/python2.7/site-packages/flup-1.0.2-py2.7.egg', '/usr/local/lib/python2.7/site-packages/South-0.7.6-py2.7.egg', '/usr/local/lib/python2.7/site-packages/pyasn1-0.1.4-py2.7.egg', '/usr/local/lib/python2.7/site-packages/pyasn1_modules-0.0.5-py2.7.egg', '/usr/local/lib/python2.7/site-packages/httplib2-0.7.6-py2.7.egg', '/usr/local/lib/python2.7/site-packages/oauth2-1.5.211-py2.7.egg', '/usr/local/lib/python2.7/site-packages/python_ldap-2.4.10-py2.7-freebsd-9.1-STABLE-amd64.egg', '/usr/local/lib/python2.7/site-packages/django_json_rpc-0.6.2-py2.7.egg', '/usr/local/lib/python2.7/site-packages/python_dateutil-2.1-py2.7.egg', '/usr/local/lib/python2.7/site-packages/rose-1.0.0-py2.7.egg', '/usr/local/lib/python2.7/site-packages/django_tastypie-0.9.15-py2.7.egg', '/usr/local/lib/python2.7/site-packages/python_daemon-1.5.5-py2.7.egg', '/usr/local/lib/python2.7/site-packages/eventlet-0.12.1-py2.7.egg', '/usr/local/lib/python2.7/site-packages/django_simple_captcha-0.3.8-py2.7.egg', '/usr/local/lib/python2.7/site-packages/requests-1.1.0-py2.7.egg', '/usr/local/lib/python27.zip', '/usr/local/lib/python2.7', '/usr/local/lib/python2.7/plat-freebsd9', '/usr/local/lib/python2.7/lib-tk', '/usr/local/lib/python2.7/lib-old', '/usr/local/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages/PIL', '/usr/local/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg-info', '/usr/local/www/freenasUI/..', '/usr/local/www', '/usr/local/www/freenasUI'] Server time: Fri, 9 Aug 2013 21:38:00 -0500
And now the fun part. I can't make this too easy now, can I? I've just been told by the zpool owner that his backups have been trashed for more than a month and currently no backup exists. Nice huh?
We're in the process of copying the most important data off of the zpool in case we somehow end up with an unmountable zpool because the encryption goes horribly wrong. He's now decided that replacing his failed backup drives is a priority(shocker), but we need to figure out how to recover from this once the data is safely backed up.
Destroying and recreating the pool really isn't an option since some of the data won't be able to be backed up for almost a week. But the most important data is being backed up as we speak.
So how do I recover from this? I'm pretty amazed that I keep having all of these problems. It's server class hardware, only I have the password to admin it, and it's a basic server. Zpool shared with CIFS and that's all. Nothing fancy.