Very New to FreeNas. Please help!

caacceo

Cadet
Joined
Apr 22, 2020
Messages
3
Hello All,

I have been gifted with a new network that utilizes FreeNas, which I am very very limited in knowledge in (I've been doing my research though!) Now to the nitty gritty!

I have a degraded pool and degraded boot pool in which at least one drive has become unavail. I'm lost in locating the actual physical location of the drive due the the freenas is using at least two different storage servers.

I've created logs for review (although my knowledge is limited). Please let me know what I can do, next steps, anything to get this resolved.

Thank you in advance
------------------------------------------------------------------------------------------------------------------------

Copy of text

 
Joined
Oct 18, 2018
Messages
969
Hi @caacceo, welcome to the forums. The first thing you'll want to do is read up on some useful resources. I think this terminology post is a great start. I would suggest that you continue to read resources etc until the following paragraph really makes sense.

FreeNAS uses ZFS and ZFS exposes datasets to store data in. Datasets live in what is called a "pool". A pool is a collection of 1 or more "vdev"s. A "vdev" is made of one or more physical disks. You lose your entire pool if any one vdev within that pool fails. Therefore, it is important that you have redundancy/parity within your vdevs; for example by using mirror, raidz1, raidz2, or raidz3 vdevs.

In your case; it sounds like your boot pool and one of your data pools are experiencing a drive failure. A pool is "degraded" when one or more of its vdev has lost a drive but because of redundancy/parity it is able to remain functional, for the time being.

Some useful things that would help folks give you more specific advice is the output of zpool status as well as the version of FreeNAS you're running.
 

caacceo

Cadet
Joined
Apr 22, 2020
Messages
3
Hi @caacceo, welcome to the forums. The first thing you'll want to do is read up on some useful resources. I think this terminology post is a great start. I would suggest that you continue to read resources etc until the following paragraph really makes sense.

FreeNAS uses ZFS and ZFS exposes datasets to store data in. Datasets live in what is called a "pool". A pool is a collection of 1 or more "vdev"s. A "vdev" is made of one or more physical disks. You lose your entire pool if any one vdev within that pool fails. Therefore, it is important that you have redundancy/parity within your vdevs; for example by using mirror, raidz1, raidz2, or raidz3 vdevs.

In your case; it sounds like your boot pool and one of your data pools are experiencing a drive failure. A pool is "degraded" when one or more of its vdev has lost a drive but because of redundancy/parity it is able to remain functional, for the time being.

Some useful things that would help folks give you more specific advice is the output of zpool status as well as the version of FreeNAS you're running.


Thanks! I actually was just reading on this.

1. The pool in question is running RaidZ2 - one of the drives being offline, meaning I should be able to pull another out without any loss correct?

2. I'm running verison 11.2

3. I've added the zpool status, glabel status as well as the dmidecode to the oringal post
Here is another link to the logs : https://textuploader.com/1qa4z

ZPOOL status for quick review:

--------------------------------------------------------------------------------------------------------------



root@stor:~ # zpool status
pool: da-ssd1
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0 in 0 days 00:00:56 with 0 errors on Sun Mar 15 00:00:58 2020
config:

NAME STATE READ WRITE CKSUM
da-ssd1 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/7ff87fb7-e6ee-11e8-82e2-90e2ba889e9c ONLINE 0 0 0
gptid/80355970-e6ee-11e8-82e2-90e2ba889e9c ONLINE 0 0 0
gptid/8093f5f7-e6ee-11e8-82e2-90e2ba889e9c ONLINE 0 0 0
gptid/8112016c-e6ee-11e8-82e2-90e2ba889e9c ONLINE 0 0 0
gptid/81914c54-e6ee-11e8-82e2-90e2ba889e9c ONLINE 0 0 0
5719957880158157784 OFFLINE 0 0 0 was /dev/gptid/821d803a-e6ee-11e8-82e2-90e2ba889e9c
gptid/829db516-e6ee-11e8-82e2-90e2ba889e9c ONLINE 0 0 0
gptid/831f9ed4-e6ee-11e8-82e2-90e2ba889e9c ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
gptid/78904a91-0f64-11e9-898c-90e2ba889e9c ONLINE 0 0 0
gptid/7a8b464d-0f64-11e9-898c-90e2ba889e9c ONLINE 0 0 0

errors: No known data errors

pool: da-ssd2
state: ONLINE
scan: scrub repaired 0 in 0 days 00:04:25 with 0 errors on Sun Mar 15 00:04:25 2020
config:

NAME STATE READ WRITE CKSUM
da-ssd2 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/e07c301e-f8e7-11e8-9751-90e2ba889e9c ONLINE 0 0 0
gptid/e11301b9-f8e7-11e8-9751-90e2ba889e9c ONLINE 0 0 0
gptid/e1cf733d-f8e7-11e8-9751-90e2ba889e9c ONLINE 0 0 0
gptid/e269713b-f8e7-11e8-9751-90e2ba889e9c ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
gptid/e3254fd5-f8e7-11e8-9751-90e2ba889e9c ONLINE 0 0 0
gptid/e3c6edf2-f8e7-11e8-9751-90e2ba889e9c ONLINE 0 0 0
gptid/e463a82c-f8e7-11e8-9751-90e2ba889e9c ONLINE 0 0 0
gptid/e5131ee7-f8e7-11e8-9751-90e2ba889e9c ONLINE 0 0 0
logs
mirror-2 ONLINE 0 0 0
gptid/99faa807-2450-11e9-898c-90e2ba889e9c ONLINE 0 0 0
gptid/9b0f2ddd-2450-11e9-898c-90e2ba889e9c ONLINE 0 0 0

errors: No known data errors

pool: da-vol0
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0 in 0 days 09:24:10 with 0 errors on Sun Mar 29 09:24:27 2020
config:

NAME STATE READ WRITE CKSUM
da-vol0 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/389e0415-27b8-11e8-b32b-0007431242e0 ONLINE 0 0 0
gptid/39b14b26-27b8-11e8-b32b-0007431242e0 ONLINE 0 0 0
gptid/3ac91d0c-27b8-11e8-b32b-0007431242e0 ONLINE 0 0 0
gptid/3be422be-27b8-11e8-b32b-0007431242e0 ONLINE 0 0 0
gptid/3cf40f35-27b8-11e8-b32b-0007431242e0 ONLINE 0 0 0
gptid/3e0de825-27b8-11e8-b32b-0007431242e0 ONLINE 0 0 0
gptid/3f2c7ffa-27b8-11e8-b32b-0007431242e0 ONLINE 0 0 0
gptid/403fecbf-27b8-11e8-b32b-0007431242e0 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
gptid/c5931c10-0f47-11e9-898c-90e2ba889e9c ONLINE 0 0 0
gptid/c684e89c-0f47-11e9-898c-90e2ba889e9c ONLINE 0 0 0

errors: No known data errors

pool: ex-bk
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://illumos.org/msg/ZFS-8000-JQ
scan: scrub repaired 0 in 0 days 02:17:16 with 0 errors on Sun Feb 2 02:17:16 2020
config:

NAME STATE READ WRITE CKSUM
ex-bk UNAVAIL 0 0 0
5878147398582000115 REMOVED 0 0 0 was /dev/gptid/ee2c7590-030b-11e9-9751-90e2ba889e9c

errors: 2 data errors, use '-v' for a list

pool: freenas-boot
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0 in 0 days 00:03:53 with 0 errors on Thu Apr 16 03:48:53 2020
config:

NAME STATE READ WRITE CKSUM
freenas-boot DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
da23p2 ONLINE 0 0 0
da25p2 FAULTED 303 1.23K 896 too many errors

errors: No known data errors
 
Joined
Oct 18, 2018
Messages
969
1. The pool in question is running RaidZ2 - one of the drives being offline, meaning I should be able to pull another out without any loss correct?
Yes, but you shouldn't. :) Raidz2 tolerates two drive failures without losing the vdev; three drive failures and its gone.

One bit of good advice before you move on is to do the following
1. Make a backup of your system configuration. The User Guide for your version will have advice on that
2. Determine if you're using encryption identified as pools with a "lock" icon. For every pool where you're using encryption you want to backup your keys. Each encrypted pool has two keys; back them up by clicking the lock and first clicking "Download Encrypt Key" to get the first "primary" key and then "Add Recovery Key" to regenerate and download the second, "recovery" key.
3. Consider how important the data on the system is and whether you have an adequate backup for your risk tolerance.

From looking at your logs I see that the pool da-ssd1 is degraded because one drive is "missing". This could be because of a bad cable connection, a failed drive, etc.

The pool ex-bk is totally unavailable. You can see it was made of a single vdev composed of a single disk and that single disk is faulted. If that drive cannot be brought back; that vdev and therefore pool are gone. Do you know the purpose of that pool?

Your boot pool, as you said above, is also degraded.

A typical debugging response to something like this is to first double-check your cable connections; if that does not solve the problem you would replace the drive. The replacement procedure goes something like this
  1. Acquire a replacement drive that is at least as large as the drive it is replacing. You absolutely cannot replace a drive with one that is smaller than the original; you can replace it with a larger disk; but you will not benefit from that extra space until EVERY disk in the vdev is of the larger capacity. Unless you're trying to increase your pool's capacity you shouldn't need to worry about this.
  2. Burn in that new drive. If the new drive is an HDD rather than an SSD you'll want to run a burn in that consists of short, long, conveyance smart tests; badblocks, followed by another long smart test. This can take a long time for large drives. Search the forums for how to do this.
  3. Identity the physical drive that has failed and then follow the User Guide to the letter for how to replace it.

Identifying the correct drive to replace can be a pain in the ass. I recommend you go based off of the serial number. So, the trick is to get a perfect mapping of your failed drive to the serial number of that drive. There are a lot of ways to do it and some early record keeping helps here; but if the prior owner did not keep clear records no problem; the following commands will tell you what you need to know.

  1. First, look under Storage -> Disks and write down on a piece of paper the name to Serial number matching. You may need to select "Serial" from the columns drop down on the upper right. With this; you have a full accounting of all of the disks in your system that FreeNAS recognizes.
    Code:
    /dev/ada0: DFLDG134
    /dev/ada1: DSDF09123
    
  2. Now; you need to identify the gptid for each of those drives. To do that; you can use this convenient command gpart list | grep -e 'Geom name:' -e '^\s*rawuuid:' -e '^\s*type:'. This will give you output sorta like this
    Code:
    Geom name: ada0
        rawuuid: ads0f98asd0f978-asdf096asdf06-asdf
        type: freebsd-swap
        rawuuid: 8ds0fadfsd0f978-asdf096asdf06-asdf
        type: freebsd-zfs
    Geom name: ada1
        rawuuid: 8976asdfasd0f978-asdf096asdf06-asdf
        type: freebsd-swap
        rawuuid: 1234aadfsd0f978-asdf096asdf06-asdf
        type: freebsd-zfs
    
  3. With this information you can associate each serial number with two rawuuids, the one you're interested in is the one for the freebsd-zfs type. Update your sheet.
    Code:
    /dev/gptid/8ds0fadfsd0f978-asdf096asdf06-asdf: /dev/ada0: DFLDG1343
    /dev/gptid/1234aadfsd0f978-asdf096asdf06-asdf:/dev/ada1: DSDF09123
    
  4. You can now look at the output of zpool status and for each disk figure out what its Serial Number is. You may have some disks in the output of zpool status which are not in your list; this can occur when a disk is so badly fouled that the system cannot recognize it. ZFS still thinks the disk should be a part of the pool but your system can't see the drive.

With all of this information in hand you can shut down your machine and look at the S/N on each drive and determine exactly which pool it belongs to. If the S/N does not show up on your list and you took a full accounting of every disk your system recognizes you can assume the S/N in question is a disk not detected by the system; possibly because it failed or the cable connection is bad. Do take care to not remove any drives currently in good shape and in use by a pool. Also take care to follow the User Guide's instructions on how to replace a disk and note that replacing the data disk may be a bit different than replacing the boot disk.
 

caacceo

Cadet
Joined
Apr 22, 2020
Messages
3
Yes, but you shouldn't. :) Raidz2 tolerates two drive failures without losing the vdev; three drive failures and its gone.

One bit of good advice before you move on is to do the following
1. Make a backup of your system configuration. The User Guide for your version will have advice on that
2. Determine if you're using encryption identified as pools with a "lock" icon. For every pool where you're using encryption you want to backup your keys. Each encrypted pool has two keys; back them up by clicking the lock and first clicking "Download Encrypt Key" to get the first "primary" key and then "Add Recovery Key" to regenerate and download the second, "recovery" key.
3. Consider how important the data on the system is and whether you have an adequate backup for your risk tolerance.

From looking at your logs I see that the pool da-ssd1 is degraded because one drive is "missing". This could be because of a bad cable connection, a failed drive, etc.

The pool ex-bk is totally unavailable. You can see it was made of a single vdev composed of a single disk and that single disk is faulted. If that drive cannot be brought back; that vdev and therefore pool are gone. Do you know the purpose of that pool?

Your boot pool, as you said above, is also degraded.

A typical debugging response to something like this is to first double-check your cable connections; if that does not solve the problem you would replace the drive. The replacement procedure goes something like this
  1. Acquire a replacement drive that is at least as large as the drive it is replacing. You absolutely cannot replace a drive with one that is smaller than the original; you can replace it with a larger disk; but you will not benefit from that extra space until EVERY disk in the vdev is of the larger capacity. Unless you're trying to increase your pool's capacity you shouldn't need to worry about this.
  2. Burn in that new drive. If the new drive is an HDD rather than an SSD you'll want to run a burn in that consists of short, long, conveyance smart tests; badblocks, followed by another long smart test. This can take a long time for large drives. Search the forums for how to do this.
  3. Identity the physical drive that has failed and then follow the User Guide to the letter for how to replace it.

Identifying the correct drive to replace can be a pain in the ass. I recommend you go based off of the serial number. So, the trick is to get a perfect mapping of your failed drive to the serial number of that drive. There are a lot of ways to do it and some early record keeping helps here; but if the prior owner did not keep clear records no problem; the following commands will tell you what you need to know.

  1. First, look under Storage -> Disks and write down on a piece of paper the name to Serial number matching. You may need to select "Serial" from the columns drop down on the upper right. With this; you have a full accounting of all of the disks in your system that FreeNAS recognizes.
    Code:
    /dev/ada0: DFLDG134
    /dev/ada1: DSDF09123
    
  2. Now; you need to identify the gptid for each of those drives. To do that; you can use this convenient command gpart list | grep -e 'Geom name:' -e '^\s*rawuuid:' -e '^\s*type:'. This will give you output sorta like this
    Code:
    Geom name: ada0
        rawuuid: ads0f98asd0f978-asdf096asdf06-asdf
        type: freebsd-swap
        rawuuid: 8ds0fadfsd0f978-asdf096asdf06-asdf
        type: freebsd-zfs
    Geom name: ada1
        rawuuid: 8976asdfasd0f978-asdf096asdf06-asdf
        type: freebsd-swap
        rawuuid: 1234aadfsd0f978-asdf096asdf06-asdf
        type: freebsd-zfs
    
  3. With this information you can associate each serial number with two rawuuids, the one you're interested in is the one for the freebsd-zfs type. Update your sheet.
    Code:
    /dev/gptid/8ds0fadfsd0f978-asdf096asdf06-asdf: /dev/ada0: DFLDG1343
    /dev/gptid/1234aadfsd0f978-asdf096asdf06-asdf:/dev/ada1: DSDF09123
    
  4. You can now look at the output of zpool status and for each disk figure out what its Serial Number is. You may have some disks in the output of zpool status which are not in your list; this can occur when a disk is so badly fouled that the system cannot recognize it. ZFS still thinks the disk should be a part of the pool but your system can't see the drive.

With all of this information in hand you can shut down your machine and look at the S/N on each drive and determine exactly which pool it belongs to. If the S/N does not show up on your list and you took a full accounting of every disk your system recognizes you can assume the S/N in question is a disk not detected by the system; possibly because it failed or the cable connection is bad. Do take care to not remove any drives currently in good shape and in use by a pool. Also take care to follow the User Guide's instructions on how to replace a disk and note that replacing the data disk may be a bit different than replacing the boot disk.


Thanks for the info! I'm going to give it a try and will update the thread.... Quick question though, with the boot drive being degraded, can I expect any issues with shutdown/startup?
 
Joined
Oct 18, 2018
Messages
969
Thanks for the info! I'm going to give it a try and will update the thread.... Quick question though, with the boot drive being degraded, can I expect any issues with shutdown/startup?
It is possible. You may consider replacing the boot drive first. I cannot stress enough the importance of making sure you get the drive S/Ns right. Pulling the wrong disk can make things worse and add additional troubleshooting steps. Also, definitely back up your configs via System->General->Save Config and the encryption keys if your pools use encryption before you do anything.

All of this is assuming you care about the data on the system; if you don't you can just blow it all away and restart from scratch. :)
 
Top