Replacing disk in raidz1 ; can't offline

nycvelo

Dabbler
Joined
Feb 15, 2019
Messages
18
Greetings. A disk in a raidz1 zpool threw a bunch of read errors but I am unable to offline it from the GUI, even after scrubbing the pool. There are a couple of previous threads here about this, one of which suggests I can't offiline the disk because it's raidz1 (I have a friend who calls this scary RAID).

What is the optimal replace/repair procedure? Is it safe to just pull the faulted disk and replace it?

Here is the 'zpool status' for the degraded zpool. Thanks in advance for troubleshooting clues, and please let me know if you need other info.

Code:
root@freenas[~]# zpool status

..

pool: zpool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: scrub repaired 0B in 13:47:41 with 0 errors on Sat Aug  7 01:09:11 2021
config:

        NAME                                                STATE     READ WRITE CKSUM
        zpool                                               DEGRADED     0     0     0
          raidz1-0                                          DEGRADED     0     0     0
            gptid/e6717ad1-07f4-11e9-82db-ac1f6b855914.eli  ONLINE       0     0     0
            gptid/e813a700-07f4-11e9-82db-ac1f6b855914.eli  ONLINE       0     0     0
 

nycvelo

Dabbler
Joined
Feb 15, 2019
Messages
18
Where's the rest of it?

Sorry, there was copy/paste issue from the GUI shell. Here's the output from a terminal session.

Code:
% zpool status
 
 ..
 
   pool: zpool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: scrub repaired 0B in 13:47:41 with 0 errors on Sat Aug  7 01:09:11 2021
config:

    NAME                                                STATE     READ WRITE CKSUM
    zpool                                               DEGRADED     0     0     0
      raidz1-0                                          DEGRADED     0     0     0
        gptid/e6717ad1-07f4-11e9-82db-ac1f6b855914.eli  ONLINE       0     0     0
        gptid/e813a700-07f4-11e9-82db-ac1f6b855914.eli  ONLINE       0     0     0
        gptid/e9b402fa-07f4-11e9-82db-ac1f6b855914.eli  FAULTED    510     0     0  too many errors
        gptid/eb4d2209-07f4-11e9-82db-ac1f6b855914.eli  ONLINE       0     0     0

errors: No known data errors
%
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Recurring question…
If you have a spare port, plug in the new drive without removing the old one. In GUI, go to Storage>Drives and note the device name of the drive you want to replace (adaN), then go to Storage>Pool>(gear)>Status>(drive)>(3-dot)>Replace. Resilver will take place with full redundancy.

RAIDZ1 does not make sense with less than 3 disks—and is not a recommended geometry with large HDDs. If you've removed the failing drive, try putting it back.
Sub-questions: Are the drives CMR or SMR? Do you have safe backups of your encryption keys?
Encrypted pools are a very good way of irrecoverably shooting oneself in the foot.
 

nycvelo

Dabbler
Joined
Feb 15, 2019
Messages
18
Recurring question…


RAIDZ1 does not make sense with less than 3 disks—and is not a recommended geometry with large HDDs. If you've removed the failing drive, try putting it back.


Thanks. It's a 4-disk zpool. Due to a copy/paste error from the GUI shell the output didn't show the final two disks; I've repasted the output from a terminal shell but that post is awaiting moderator approval.

As it happens there are spare drive bays available. Thanks for that pointer.

Code:
% sudo sesutil show
ses0: <AHCI SGPIO Enclosure 2.00>; ID: 3061686369656d30
Desc     Dev     Model                     Ident                Size/Status
Slot 00  ada0    WDC WD6001F4PZ-49ZWCM0    WD-WX21D15A7FKV      6T
Slot 01  ada1    WDC WD6001F4PZ-49ZWCM0    WD-WX21DC42E64H      6T
Slot 02  ada2    WDC WD6001F4PZ-49ZWCM0    WD-WX21D1526H9F      6T, LED=locate
Slot 03  ada3    WDC WD6001F4PZ-49ZWCM0    WD-WX21D1526LD9      6T
Slot 04  -       -                         -                    Not Installed
Slot 05  -       -                         -                    Not Installed
Slot 06  -       -                         -                    Not Installed
Slot 07  -       -                         -                    Not Installed
%


Sub-questions: Are the drives CMR or SMR? Do you have safe backups of your encryption keys?
Encrypted pools are a very good way of irrecoverably shooting oneself in the foot.

I was today years old when I learned about CMR vs. SMR. These are relatively old Western Digital WD6001F4PZ-49ZWCM0 drives. I have not been able to find a definitive CMR or SMR spec for these.

Excellent point about the encryption keys -- fortunately I do have config and pool backups. Thanks again.
 
Last edited:

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
These are relatively old Western Digital WD6001F4PZ-49ZWCM0 drives. I have not been able to find a definitive CMR or SMR spec for these.
Enterprise drives optimised for cold storage… These may not be the best choice for hot storage in a ZFS NAS.
In the short term, I would replace the failing drive without removing it, since you have spare bays.

In the medium term, all other Digital Ae are likely to fail in the same way. Rather than just replacing them as they fail, or even replacing them in advance, I would look into moving to a new pool, preferably raidZ2 or better—you need to buy a whole set of new drives sooner or later anyway.
These drives are listed with an unrecoverable error rate of 1e-14. Your current pool can hold up to 3*6 TB of data, i.e. 1.8e13 bytes or 1.44e14 bytes. Which means that you are more likely than not to encounter an unrecoverable error while reading the whole pool (e.g. resilvering). As long as there is redundancy, ZFS will correct for it. If a drive had failed and there were no redundancy, the corresponding data would be lost: In practical terms, with multi-terabyte drives raidZ1 no longer fully protects data against the loss of one drive in the array.
 

nycvelo

Dabbler
Joined
Feb 15, 2019
Messages
18
Enterprise drives optimised for cold storage… These may not be the best choice for hot storage in a ZFS NAS.
In the short term, I would replace the failing drive without removing it, since you have spare bays.[/QUOTE}

Thanks. A CMR drive, a Western Digital WD102KRYZ, is now in another bay and the zpool is resilvering as I write this.

In the medium term, all other Digital Ae are likely to fail in the same way. Rather than just replacing them as they fail, or even replacing them in advance, I would look into moving to a new pool, preferably raidZ2 or better—you need to buy a whole set of new drives sooner or later anyway.
These drives are listed with an unrecoverable error rate of 1e-14. Your current pool can hold up to 3*6 TB of data, i.e. 1.8e13 bytes or 1.44e14 bytes. Which means that you are more likely than not to encounter an unrecoverable error while reading the whole pool (e.g. resilvering). As long as there is redundancy, ZFS will correct for it. If a drive had failed and there were no redundancy, the corresponding data would be lost: In practical terms, with multi-terabyte drives raidZ1 no longer fully protects data against the loss of one drive in the array.

In case it wasn’t already abundantly clear, my knowledge of storage tech would fill a thimble. I’m also unfamiliar with the term “Digital Ae” — what does Ae mean?

There are four free drive bays available in this system, so it’s possible I could create a new raidz2 pool with different drives. If not the WD Gold enterprise drives, what else would you recommend? This system is “just” for backups but then again it’s at a remote location so minimal trips to go replace stuff is also a consideration. Thanks again.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
"Digital Ae" is the product name I've found for your WD6001F4PZ drives. "A" should be for "archive".

What I recommend is obviously to create a new pool with new drives. What to do depends on the level of redundancy/data security you want for this remote backup system (already a good point!). 4 bays is enough for a raidZ2… with only 50% space efficiency, which is not so great, and adding a second 4-wide raidZ2 vdev for more space would still be at 50% space efficiency. 6 to 8-wide raidZ2 or 8-wide raidZ3 would make a good use of your system, but raises the question how you could transfer your data from the old to the new pool.
As for the drives, any (CMR!) NAS or enterprise drives would do: Seagate IronWolf (Pro or not) or Exos, WD Red Plus/Pro (but not plain Red) or Gold, the corresponding Toshiba products (which I'm less familiar with). Pick whatever is available and cheapest.
You may also consider "shucking" external HDDs when they go on discount… These often hold "white label" versions of NAS drives—but buyer beware!
 

nycvelo

Dabbler
Joined
Feb 15, 2019
Messages
18
What I recommend is obviously to create a new pool with new drives. What to do depends on the level of redundancy/data security you want for this remote backup system (already a good point!). 4 bays is enough for a raidZ2… with only 50% space efficiency, which is not so great, and adding a second 4-wide raidZ2 vdev for more space would still be at 50% space efficiency. 6 to 8-wide raidZ2 or 8-wide raidZ3 would make a good use of your system, but raises the question how you could transfer your data from the old to the new pool.

Thanks - I'll look at using the four available bays to create a new raidz2 pool with larger drives, and maybe add to it over time after decommissioning the existing raidz1 pool.

One thing I noticed is that the current raidz1 pool retained its current size after resilvering, even though the new replacement disk is 4 Tbytes larger than the three other disks. I don't know if the pool will expand on next scrub, or expand some other way, or if I bought 4T too much capacity for nothing.

As for the drives, any (CMR!) NAS or enterprise drives would do: Seagate IronWolf (Pro or not) or Exos, WD Red Plus/Pro (but not plain Red) or Gold, the corresponding Toshiba products (which I'm less familiar with). Pick whatever is available and cheapest.
You may also consider "shucking" external HDDs when they go on discount… These often hold "white label" versions of NAS drives—but buyer beware!

Thanks very much for these tips. CMR NAS drives it is from here on out.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
One thing I noticed is that the current raidz1 pool retained its current size after resilvering, even though the new replacement disk is 4 Tbytes larger than the three other disks.
That's to be expected. The pool won't expand until you replace all the disks.
 
Top