Disk Died, couldn't offline, replaced, now can't bring pool online.

kaub07

Dabbler
Joined
May 19, 2015
Messages
12
I had a disk die and it crashed my system, so I could not offline it or remove it. Then it wouldn't boot at all with that disk connected. I replaced it and was able to boot but the pool is offline. I've tried to import it via CLI and it says Disk I/O error destroy and recover from backup sources. Is that really it? It's a 4 disk ZFS1 array so I can't believe it can't resilver. I just can't figure out how to get the pool to recognize the new disk and come back online.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
First, their is no such thing as ZFS1. Guessing you mean RAID-Z1.

ZFS does not normally import a pool with missing / failed components. Please supply the output of zpool import and we can begin helping you.
 
Last edited:

kaub07

Dabbler
Joined
May 19, 2015
Messages
12
Yes, you are correct, my mistake.

Code:
KaubleStore                                     DEGRADED
      raidz1-0                                      DEGRADED
        gptid/932be3e1-d86c-11ec-ae76-e839355f25b2  ONLINE
        gptid/914d3a73-950d-11e9-a953-e839355f25b2  ONLINE
        gptid/936c9461-950d-11e9-a953-e839355f25b2  UNAVAIL  cannot open
        gptid/f63d6a7c-ded9-11e9-af8f-e839355f25b2  ONLINE
 

kaub07

Dabbler
Joined
May 19, 2015
Messages
12
Yes, you are correct, my mistake.

Code:
KaubleStore                                     DEGRADED
      raidz1-0                                      DEGRADED
        gptid/932be3e1-d86c-11ec-ae76-e839355f25b2  ONLINE
        gptid/914d3a73-950d-11e9-a953-e839355f25b2  ONLINE
        gptid/936c9461-950d-11e9-a953-e839355f25b2  UNAVAIL  cannot open
        gptid/f63d6a7c-ded9-11e9-af8f-e839355f25b2  ONLINE
Edit: the unavailable drive is the new one.
If I try to import it, I get:
Code:
[root@freenas /]# zpool import KaubleStore
cannot import 'KaubleStore': I/O error
    Destroy and re-create the pool from
    a backup source.
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Please supply the entire output from zpool import as it may contain details we need. Plus, a more complete description of your hardware and which TrueNAS version you are using.

You can try this, which is the first attempt at recovery.

zpool import -f -R /mnt KaubleStore


Next, are options to discard most recent transaction, which of course cause data loss. So we don't want to do those without trying other things.
 

kaub07

Dabbler
Joined
May 19, 2015
Messages
12
Here is the entire output:
Code:
# zpool import
   pool: KaubleStore
     id: 947141681392497495
  state: DEGRADED
status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
    fault tolerance of the pool may be compromised if imported.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
 config:

    KaubleStore                                     DEGRADED
      raidz1-0                                      DEGRADED
        gptid/932be3e1-d86c-11ec-ae76-e839355f25b2  ONLINE
        gptid/914d3a73-950d-11e9-a953-e839355f25b2  ONLINE
        gptid/936c9461-950d-11e9-a953-e839355f25b2  UNAVAIL  cannot open
        gptid/f63d6a7c-ded9-11e9-af8f-e839355f25b2  ONLINE
[root@freenas /]#


And then the next command...
Code:
[root@freenas /]# zpool import -f -R /mnt KaubleStore
cannot import 'KaubleStore': I/O error
    Destroy and re-create the pool from
    a backup source.
[root@freenas /]#


This doesn't seem to be looking good.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
The first part showing that only 1 drive is un-available is somewhat expected.

For the second part, I also kinda expected it based on your earlier posts, but no harm in trying. It appears that something has damaged the pool.

It may be possible to roll-back a transaction and import the pool. But first, please supply the hardware configuration. And specifically how the disks are connected to the server. There are known configurations that will not work reliably, like hardware RAID.
 

kaub07

Dabbler
Joined
May 19, 2015
Messages
12
The first part showing that only 1 drive is un-available is somewhat expected.

For the second part, I also kinda expected it based on your earlier posts, but no harm in trying. It appears that something has damaged the pool.

It may be possible to roll-back a transaction and import the pool. But first, please supply the hardware configuration. And specifically how the disks are connected to the server. There are known configurations that will not work reliably, like hardware RAID.
Manufacturer: Hewlett-Packard
Product Name: HP Compaq 8200 Elite CMT PC
Version: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Memory Device: 8GB DDR3
Manufacturer: Samsung
<HGST HUS724040ALE640 MJAOA580> 4TB
<WDC WD4000FYYZ-05UL1B0 00.0NS05> 4TB
<WDC WD4000FYYZ-01UL1B1 01.01K02> 4TB
<Hitachi HUS724040ALE641 MJAOA5F0> 4TB
The disks are all connected to the motherboard SATA ports, no hardware RAID.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Have check if any of those disks are SMR models?

Your memory is probably half of the required amount. What TrueNAS version are you using?

Edit: How much power does your PSU deliver? How old is the machine? The CPU indicates about 10 years, but just to be sure.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Okay, it's possible to see if a pool is importable via more forceful measures. But, we will do it as an attempt;

zpool import -Fn -R /mnt KaubleStore

This should not hurt your pool, as the "-n" option just tests. But, if it seems importable, you can remove the "-n" and discard the last few transactions which may have corrupted your pool. That may cause new data loss... so it's up to you.


One last comment. Some people get into messes with ZFS because they have not done the following;
  • Performed regular scrubs
  • Performed regular SMART tests
  • Examined the status sent via E-Mail

Here is a description of the 2 options;
Code:
             -F      Recovery mode for a non-importable pool.  Attempt to return the pool to an importable state
                     by discarding the last few transactions.  Not all damaged pools can be recovered by using
                     this option.  If successful, the data from the discarded transactions is irretrievably lost.
                     This option is ignored if the pool is importable or already imported.

             -n      Used with the -F recovery option.  Determines whether a non-importable pool can be made im-
                     portable again, but does not actually perform the pool recovery.  For more details about
                     pool recovery mode, see the -F option, above.
 

kaub07

Dabbler
Joined
May 19, 2015
Messages
12
Okay, it's possible to see if a pool is importable via more forceful measures. But, we will do it as an attempt;

zpool import -Fn -R /mnt KaubleStore

This should not hurt your pool, as the "-n" option just tests. But, if it seems importable, you can remove the "-n" and discard the last few transactions which may have corrupted your pool. That may cause new data loss... so it's up to you.


One last comment. Some people get into messes with ZFS because they have not done the following;
  • Performed regular scrubs
  • Performed regular SMART tests
  • Examined the status sent via E-Mail

Here is a description of the 2 options;
Code:
             -F      Recovery mode for a non-importable pool.  Attempt to return the pool to an importable state
                     by discarding the last few transactions.  Not all damaged pools can be recovered by using
                     this option.  If successful, the data from the discarded transactions is irretrievably lost.
                     This option is ignored if the pool is importable or already imported.

             -n      Used with the -F recovery option.  Determines whether a non-importable pool can be made im-
                     portable again, but does not actually perform the pool recovery.  For more details about
                     pool recovery mode, see the -F option, above.
I ran zpool import -Fn -R /mnt KaubleStore; and it didn't seem to do anything. There was no output.
I then ran it without the n flag and received:
Code:
[root@freenas /]# zpool import -F -R /mnt KaubleStore
cannot import 'KaubleStore': one or more devices is currently unavailable
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
You may need the lower case "-f" too, to allow an import of a ZFS pool missing a member disk.

But remember, "-F" can cause data loss.
 

kaub07

Dabbler
Joined
May 19, 2015
Messages
12
You may need the lower case "-f" too, to allow an import of a ZFS pool missing a member disk.

But remember, "-F" can cause data loss.
Ugh..... It still says the same thing. At this point I have no data, which makes no sense, because with 3 available drives from the original configuration this shouldn't be so hard. Unfortunately due to the pool size, I don't have any backups.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Then their is the extreme option, "-X", but again, use at your own risk.

Code:
             -X      Used with the -F recovery option.  Determines whether extreme measures to find a valid txg
                     should take place.  This allows the pool to be rolled back to a txg which is no longer guar-
                     anteed to be consistent.  Pools imported at an inconsistent txg may contain uncorrectable
                     checksum errors.  For more details about pool recovery mode, see the -F option, above.
                     WARNING: This option can be extremely hazardous to the health of your pool and should only
                     be used as a last resort.
 

kaub07

Dabbler
Joined
May 19, 2015
Messages
12
I may just be hosed. It did something. But now the dashboard won't load. 'zpool list' just hangs. It may be a lost cause. I've had to replace and resilver pools before but never without being able to first offline and remove the damaged drive. Pretty disheartening losing a pool with 10TB of data.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Sorry to hear that.

It may be worth a root cause, to get a clue why this happened. As I listed earlier, there are some know causes of problems. Here is a list;
  • Not performing regular scrubs
  • Not performing regular SMART tests
  • Not examining the status sent via E-Mail
  • Using SMR disks
  • Using desktop disks
  • Using USB attached disks
  • Using hardware RAID supplied LUNs
  • Using virtualized storage to a VM TrueNAS
Several of these bit a famous computer themed Youtuber, (and he actually admitted it).
 
Top