Disk unavil

runevn

Explorer
Joined
Apr 4, 2019
Messages
63
I have two disk that became unavailable after I by mistake removed the drives... yea I know pretty stupid of me!

The pool is RAIDZ2 and encrypted. It is working but degraded.

What should I do?

Code:
root@nas[~]# zpool status storagepool
  pool: storagepool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 832K in 0 days 07:20:33 with 0 errors on Sun Jan 19 07:20:34 2020
config:

        NAME                                                STATE     READ WRITE CKSUM
        storagepool                                         DEGRADED     0     0     0
          raidz2-0                                          DEGRADED     0     0     0
            gptid/5613e975-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            gptid/5831da4a-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            gptid/5a699763-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            3737989742100079509                             UNAVAIL      0     0     0  was /dev/gptid/4bbd87aa-d4ba-11e9-9716-000c29ee2069.eli
            gptid/5eb3d5de-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            3228546123072300007                             UNAVAIL      0     0     0  was /dev/gptid/60b6bd83-aee3-11e9-9fbd-000c29ee2069.eli

errors: No known data errors


If I try to take the disks online or replace them I get the following error message:
Code:
Error: Traceback (most recent call last):

  File "/usr/local/lib/python3.6/site-packages/tastypie/resources.py", line 219, in wrapper
    response = callback(request, *args, **kwargs)

  File "./freenasUI/api/resources.py", line 892, in online_disk
    notifier().zfs_online_disk(obj, deserialized.get('label'))

  File "./freenasUI/middleware/notifier.py", line 1084, in zfs_online_disk
    raise MiddlewareError('Disk cannot be set to online in encrypted pool.')

freenasUI.middleware.exceptions.MiddlewareError: [MiddlewareError: Disk cannot be set to online in encrypted pool.]


Any help is much appreciated.
 

runevn

Explorer
Joined
Apr 4, 2019
Messages
63
Just to further add to this I just found these messages in the console

Code:
Feb  4 17:52:11 nas uwsgi: [middleware.exceptions:36] [MiddlewareError: Unable to geli attach gptid/5613e975-aee3-11e9-9fbd-000c29ee2069: geli: Cannot access gptid/5613e975-aee3-11e9-9fbd-000c29ee2069 (error=1).
]
Feb  4 17:52:13 nas uwsgi: [middleware.exceptions:36] [MiddlewareError: Unable to geli attach gptid/5831da4a-aee3-11e9-9fbd-000c29ee2069: geli: Cannot access gptid/5831da4a-aee3-11e9-9fbd-000c29ee2069 (error=1).
]
Feb  4 17:52:15 nas uwsgi: [middleware.exceptions:36] [MiddlewareError: Unable to geli attach gptid/5a699763-aee3-11e9-9fbd-000c29ee2069: geli: Cannot access gptid/5a699763-aee3-11e9-9fbd-000c29ee2069 (error=1).
]
Feb  4 17:52:17 nas uwsgi: [middleware.exceptions:36] [MiddlewareError: Unable to geli attach gptid/5eb3d5de-aee3-11e9-9fbd-000c29ee2069: geli: Cannot access gptid/5eb3d5de-aee3-11e9-9fbd-000c29ee2069 (error=1).
]
Feb  4 18:01:48 nas uwsgi: [middleware.exceptions:36] [MiddlewareError: Disk cannot be set to online in encrypted pool.]


I don't know if that helps.
 
Joined
Oct 18, 2018
Messages
969
You're in a rather urgent situation. You have two disks unavailable in a RAIDZ2 vdev. Another disk issue could cause data loss if you're unable to get the drives back. Here are a few things you can try.

Attempt to unlock the disks manually and online them after unlocking. You can attempt to unlock them manually by first identifying the correct key to use and then the correct drives. From your post above the two drives in question are /dev/gptid/4bbd87aa-d4ba-11e9-9716-000c29ee2069.eli and /dev/gptid/60b6bd83-aee3-11e9-9fbd-000c29ee2069.eli. If you check those directories you will notice those devices there but lacking the .eli. If these files are not there this indicates a problem which my steps may not solve and a different approach should be taken. You can verify the correct key with the following commant sqlite3 /data/freenas-v1.db 'select vol_encryptkey from storage_volume where vol_name = "storagepool";'. This will tell you which key in /data/geli belongs to this pool. Then, attempt to unlock each drives with geli attach -k /data/geli/<key> /dev/gptid/4bbd87aa-d4ba-11e9-9716-000c29ee2069. You will be prompted for the decryption passphrase. If you did not use a passphrase include the -p flag. From there attempt to "online" the disk. You may have some luck if doing this manually.

You can also "resilver" the disks. Select the offlined disks and click "replace" and select the original disks.

Whatever you do, you will want to have up-to-date backups. Failure to maintain up-to-date backups is a recipe for disaster.[/code]
 
Last edited:

runevn

Explorer
Joined
Apr 4, 2019
Messages
63
Hi PhilioEpisteme,

You are the hero of the day! It worked!

Now my volume looks like this
Code:
root@nas[~]# zpool status storagepool  pool: storagepool
 state: ONLINE
  scan: resilvered 452K in 0 days 00:00:00 with 0 errors on Wed Feb  5 07:38:222020
config:

        NAME                                                STATE     READ WRITE CKSUM
        storagepool                                         ONLINE       0     0     0
          raidz2-0                                          ONLINE       0     0     0
            gptid/5613e975-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            gptid/5831da4a-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            gptid/5a699763-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            gptid/4bbd87aa-d4ba-11e9-9716-000c29ee2069.eli  ONLINE       0     0     0
            gptid/5eb3d5de-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0
            gptid/60b6bd83-aee3-11e9-9fbd-000c29ee2069.eli  ONLINE       0     0     0

errors: No known data errors


Are there anything that I should do to make sure that every thing is fine?

Once again thanks a million!
 
Joined
Oct 18, 2018
Messages
969
I am happy to hear it worked.


Are there anything that I should do to make sure that every thing is fine?
I usually run a scrub if anything strange happened. I would be surprised if anything went wrong.

As I said above, make sure you have good backups. No amount of vdev redundancy is a replacement for a good backup. Your server could experience a catastrophic failure, encryption could get the better of you, etc.

Once again, happy you got it all sorted. :)
 
Last edited:

runevn

Explorer
Joined
Apr 4, 2019
Messages
63
Okay, thanks... luckily I always have a off-site backup of my data in case of I'm doing any stupid things or incase of a hardware failure.

But once again. Thanks for your great help!
 

Tool73

Cadet
Joined
Mar 5, 2018
Messages
6
PhiloEpisteme,
I'm finding myself in the same situation as Runevn. I am a complete newbie when it comes to Linux commands. I do not understand the sqlite3 command and how to use it. I typed in sqlite3 /data/freenas-v1.db 'select vol_encryptkey from storage_volume where vol_name = "storagepool";' , as listed and got nothing. On the plus side I do have everything backed up to another hard drive, but I'd like to get my to replacement drives online soon. Thank you in advance.
 

Tool73

Cadet
Joined
Mar 5, 2018
Messages
6
Ok,
So after watching a sqlite3 introduction on YouTube. I realized you had listed two separate line commands.

sqlite3 /data/freenas-v1.db
select vol_encryptkey from storage_volume where vol_name = "storagepool";
(I used the name of my pool inside the quotation marks)

This gave me a long number/letter sequence, I'm assuming that is the geli key. From there, do I get out of sqlite3 with the .exit command and get back to the root@freenas:~ # prompt? The next command you call out is,

geli attach -k /data/geli/<key> /dev/gptid/4bbd87aa-d4ba-11e9-9716-000c29ee2069

Do I then use the long sequence key in place of <key> in the above command? I do not have a pass phrase, then then add the -p flag after the -k flag? Sorry for all the questions, I just don't want dig myself in any deeper.
 
Joined
Oct 18, 2018
Messages
969
Hi. One thing that is generally helpful is to paste the exact command and output you got from that command and share it here. It is also useful to post the version you're using.
 
Last edited:
Joined
Oct 18, 2018
Messages
969
Yes, the table vol_encryptkey will give you the geli key for the disks in your pool.

Before continuing on though, it may be useful to know what your specific situation is. You have an encrypted pool I assume? Did you remove one or more of the encrypted disks like the person posting above? What do you get with the command zpool status <pool>?
 

Tool73

Cadet
Joined
Mar 5, 2018
Messages
6
My situation is almost exactly like the person above, have 5 drives that are encrypted. I am using Freenas-11.2-U7. I had 2 drives go bad at 2 separate times, I replaced and resilvered each drive with success. Later I tried to do the upgrade to Freenas 11.3 and everyting locked up. I couldn't access my files thru windows, the pool was locked up, and my geli key wouldn't unlock the pool. So I went back to what worked before, 11.2-U7. Did a fresh install on a pair of new SSD drives, uploaded my last saved config, and got access to my files again. Afterwards realized my pool was degraded because the 2 new drives I replaced in the pool became unavailable. I'd post the results of the zpool status, but the shell won't let me copy the text, but it looks just like the previous person's results, 3 online, 2 unavailable.
 
Joined
Oct 18, 2018
Messages
969
Okay, that is helpful, thanks.

So my guess is that two things happened

tried to do the upgrade to Freenas 11.3 and everyting locked up. I couldn't access my files thru windows, the pool was locked up, and my geli key wouldn't unlock the pool
My guess is that you forgot to _also_ back up your geli key(s) in addition to just the general config. When you have encrypted pools, you need the passphrase (if set) _and_ the geli key(s).

I went back to what worked before, 11.2-U7. Did a fresh install on a pair of new SSD drives, uploaded my last saved config, and got access to my files again. Afterwards realized my pool was degraded because the 2 new drives I replaced in the pool became unavailable.
My guess is that your last saved config was from _before_ you resilvered your disks.

I'd post the results of the zpool status, but the shell won't let me copy the text, but it looks just like the previous person's results, 3 online, 2 unavailable.
You can always take a screen shot of the text only. If you're not familiar with how, look up taking a screen shot of only part of the screen on windows. The output is _very_ helpful.



There are a few ways to approach this. One approach is to say, "lets try to get those new disks working ASAP". If you're using RAIDZ2 and two disks are already down, you're in a tough spot. You could lose all of your data if one more drive goes down. So time is important.

The trick is that it may be tricky to follow that approach and can be error prone. So the other approach is to say "lets just try to resilver".

I think in your case, the second option is probably better. But, just in case I'll provide a few debug commands just to see if we can avoid that

First, lets check to see what freenas is configured to expect as far as the disks in your pool. From my understanding you have 7 total disks.
- 3 good disks
- 2 old disks
- 2 new disks
Your pool of 5 disks had two disks go bad (2 old disks) and you replaced them with 2 new disks. So, lets see what Freenas expects as far as which disks are in your pool.

Before you begin, make sure you have a backup of the encryption key and passphrase for this pool. https://www.ixsystems.com/documenta...l?highlight=resilvering#encryption-operations
  1. zpool status {pool} This will tell us which disks zfs thinks should be a part of your pool. It should list 2 of them as unavailable or in some other error state. Depending on what state zfs is in those are either the two new or two old disks. Can you provide the list here?
  2. sqlite3 /data/freenas-v1.db 'select sed.encrypted_provider from storage_encrypteddisk as sed join storage_volume sv on sv.id = sed.encrypted_volume_id where sv.vol_name = "{pool}";' this will tell us which disks freenas thinks should be a part of your pool. The three ids which come back which match the 3 good disk ids from step 1 are the 3 good disks. The other 2 are either the old disks or new disks. Note whether the 2 disks are the same as the errored disks in step 1. Again, provide the list here.
  3. Now, lets see which disks are actually in your system. ls /dev/gptid/ will list your devices with a gptid. Note whether the two ids from step 1 are in this list, hopefully they are. If so, proceed on. Otherwise, stop and provide me with the output from step 1, 2 and 3.
  4. Get the geli key with sqlite3 /data/freenas-v1.db 'select vol_encryptkey from storage_volume where vol_name = "{pool}";'. Note that this value corresponds to the encryption key in your system found at /data/geli/{value}.key. When I refer to {key} I mean this longer string with ".key" pasted on the end. For example /data/geli/jd9end86-986d-7395-b68c-0ajd8dnakda8.key. You know you have the right value when you can do ls {value} and get a value back.
  5. Lets try to open the two errored or unavailable disks in step 1. geli attach -k {key} /dev/gptid/{uuid} where {uuid} is just the id from step 1 removing the .eli extension and gptid/ prefix. You can double check by trying ls /dev/gptid/{uuid}.
  6. If step 5 works, now check zpool status again. If the pool is no longer degraded, great we are part way there! There is still work to be done though. If this is the case, my guess is that the disk lists from step 1 and step 2 are different.
  7. If step 6 does _not_ work, paste the output here.

If all of the above is too much work and you prefer to just try to resilver again, you can go through those steps. Please note though that _after_ you resilver you should absolutely store another backup of your system configs & make sure you have a backup of your encryption key(s). The risk here is that resilvering is hard on disks. If you experience a disk failure before you get one of the disks resilvered, you could lose everything.
 
Last edited:
Joined
Oct 18, 2018
Messages
969
Also note that using encryption is not without risks. If you lose your key or forget your passphrase, you've lost all of your encrypted data. If you don't follow the user guide's steps correctly the system could become misconfigured and you could lose your data. I use it. But if you do not use it correctly it can really burn you. I put together a resource on encryption some time ago. You can find it at the following link. The resource links out to several other good articles and descriptions of encryption. https://www.truenas.com/community/resources/encryption-debugging-tips.134/

There are a _lot_ of people out there who will say that encryption is never worth it and that it is too easy to lose your data. I find that this view is often expressed in too simplistic of terms but has some truth to it. I would phrase it differently and advice that encryption is best used only if you satisfy 3 conditions (pertaining to freenas. I have not yet used encryption in truenas):

  1. You have a genuine need for it. Maybe your work requires it?
  2. You understand how to use it properly and have thoroughly read the Freenas manual on using encyption in pools. I generally consider this box checked after someone has gone through and practiced the typical actions on a pool which involve encryption on a test pool and confirmed that everything worked.
  3. You understand and accept that if you make a mistake such as forget your passphrase, lose your keys, or otherwise do not follow the procedures exactly right that using encryption poses significant risk that you may eventually lose your own data.

I personally have used encryption since day 1. I've resilvered several drives. I've added and removed SLOG and L2ARC devices. I've extended pools and I've exported/imported pools as well. I practiced all of these things before I started using encryption and then I wrote down the exact steps I follow when something happens such as a drive goes down or something. I also securely backup my passphrase, main key, and backup key.
 
Last edited:

Tool73

Cadet
Joined
Mar 5, 2018
Messages
6
So what is the secret to posting a screen shot? I click the insert image button and a pop up window askes for a http:// website. Why can't I just add it from my saved folder?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Just paste your images into the editor.
 

Tool73

Cadet
Joined
Mar 5, 2018
Messages
6
I'll do a ctrl+c on the .png file I want to share, then when I try to do a ctrl+v to paste the image, nothing shows up.
 

Tool73

Cadet
Joined
Mar 5, 2018
Messages
6
I have everything backed up to another local hard drive. Would I be better off formatting all the drives and starting a fresh install of the latest version of TrueNAS? Will the encryption on the drives prevent me from reformatting the drives?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'll do a ctrl+c on the .png file I want to share, then when I try to do a ctrl+v to paste the image, nothing shows up.
In that case, try just dragging and dropping.
 
Top