Are 2 of My Drives Failed? (See Edit: Moving Data Onto To New Vdev, To Remove Old)

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Normally I wouldn't suggest weekly scrubs, but we are outside of standard procedures.

You could try to reach to HGST (or WD?) in order to see if they have a way to fix this but I wouldn't be very optimistic.

I suggest you to update the firmware of your HBA anyway.

Edit: found this. It has a zip file compatibile with your drives.
 
Last edited:

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Hm okay, I'm wondering if this is becoming more effort than it's worth at this point. I will definitely update my HBA firmware, but maybe when I replace the drives.

As stated already, the drives are old with lots of hours. And I already have been trying to figure out what I want to do as far as replacing them. I feel a bit stuck, I really don't want to spend the money, but it already feels stupid having this many 4tb drives in my system.

I had made a post around the similar time to gather suggestions about suggestions for drives.
I was really considering just getting x4 20tb drives, and replacing all my 4tb drives with those. It's going to be like $1200-1400 which I really just don't want to spend, but I feel it'd give me a bit more peace of mind as they would be proper drives. I think maybe I'm a little worried about one or two failing, as I will incur another price hit to replace them. If it's like 2-3+ years down the line, okay. Not really sure how long these high density drives will last because all the I/O will be focused on 4 drives and that seems like kind of a lot.

Also not sure which drives I want to go with. The Seagate Exos drives were on sale right now for like $380 each. Or I can get like 14-16tb Seagate Ironwolf Pro's for slightly cheaper and have more drives. Not sure. I feel a bit stuck.
The other thing is the Seagates are SATA, where all my stuff is SAS. And there's a bunch of 4tb drives, vs only 4 high density ones (which I would probably add more in the future still).
I don't think I'm going to notice a performance hit difference from what I have now though, but maybe I'm optimistic.

And 4 drives would imply I'm sticking with the same 2 mirror setup, which idk.. it really scares me when a drive fails. But I don't have 2 grand to drop on hard drives. I mean 2 of the 4 drives would be spares. But still. Resilvering would take forever and it'd be sketchy because if the other drive fails during the resilver I'm fucked.

 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Not really sure how long these high density drives will last because all the I/O will be focused on 4 drives and that seems like kind of a lot.
The safest bet is, once you are outside the warranty period, plan them to fail. That does not mean they will fail right away, it means you should plan for the drives to start failing and have a plan to move forward.

It's going to be like $1200-1400 which I really just don't want to spend
Remember one thing that you should have known from the start... The hard drives are consumable items and you should expect to replace them at the end of the warranty. The rest of the system should continue to work fine except the fans will also wear out. This is the cost for any home NAS. A business can pay for these as part of the operating costs, but you and I need to open up our wallet to pay some hard earned cash for this luxury.

I personally would not use 20TB drives because as you said, resilvering times are very high. But if you do choose these very large drives, make sure you have the RAIDZ level you are comfortable with. RAIDZ3 is what I'd use if I had 20TB drives, but that is just my preference. There are configuration options out there that will minimize the risk of a second drive failure during resilvering.

Also, you could create periodic backups to an external hard drive or two. It's a way to reduce risk.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Up until recently hot-spares when triggered into a pool would not automatically go back to being spares after successful replacement and you needed to enter the below command to do so.

zpool detach PrimaryPool gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76

zpool detach PrimaryPool gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76
Ok so, here's the deal. I got two more replacement drives to replace one drive in each of the mirrors just in case.

For a sanity check, here was the mirrors and gptids.
Code:
  NAME                                              STATE     READ WRITE CKSUM
        PrimaryPool                                       ONLINE       0     0   0
          mirror-0                                        ONLINE       0     0   0
            gptid/d7476d46-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-1                                        ONLINE       0     0   0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-2                                        ONLINE       0     0   0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-3                                        ONLINE       0     0   0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-4                                        ONLINE       0     0   0
            gptid/9fd0872d-8f64-11ec-8462-002590f52cc2    ONLINE       0     0   0
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2    ONLINE       0     0   0
          mirror-5                                        ONLINE       0     0   0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       ONLINE       0     0   0
              gptid/03daa071-505c-11ed-a9fe-ac1f6be66d76  ONLINE       0     0   0
              gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
          mirror-6                                        ONLINE       0     0   0
            gptid/749a1891-1b5c-11ee-941f-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       ONLINE       0     0   0
              gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76  ONLINE       0     0   0
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use


So yes it looks like those are the two gptids for the spares that Johnny listed which makes sense.

It feels safer to do it this way, but correct me if I am wrong:
INSTEAD, do a replacement on the two drives that it is replacing in the pool?

gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76
gptid/03daa071-505c-11ed-a9fe-ac1f6be66d76
Correct?

So in the Web GUI if I simply do a Pool Status > Replace Drive on da9 and da14, and pop in the new drives, the spares should detatch automatically after it resilvers successfully.
librewolf_OcbD7477Nw - Copy.png


At least I am keeping the spares in tact. In case it had an actual reason to be attaching itself to replace those two drives.

Worst case scenario, the data isn't actually on the spares despite them being attached, and I take out 1 drive in each mirror that had the data on it, and replace it. And I simply put it back in and replace my new drives with that. Right?


And beyond that, now should I be concerned about the other few drives throwing SMART errors in your script @joeschmuck
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
The spares might not detach automatically, if so you will have to run the commands @Johnny Fartpants wrote. The plan seems solid, you just have to make sure to not replace the wrong drives... and even if you do, it's not the end of the world, you will have just wasted time.

The other path is to zpool clear PrimaryPool in order to try to see if the spares deatch automatically, then replace the suspect drives: this is riskier because it exposes you to a potential degraded vdev during resilver.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
you just have to make sure to not replace the wrong drives... and even if you do, it's not the end of the world, you will have just wasted time.
It would be these two to replace correct?
gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76
gptid/03daa071-505c-11ed-a9fe-ac1f6be66d76
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
It would be these two to replace correct?
gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76
gptid/03daa071-505c-11ed-a9fe-ac1f6be66d76
It appears to be so.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Just went ahead and replaced the two. It is resilvering now.
But now that I think about it, it seems it'd have made more sense to promote the two spares into that vdev, and then simply add the two new drives as new spares. No?
Not seeing an option for promoting them. But it'd reduce the wear on the spare drives because you are constantly cycling in new spares.
Too late now anyways, but would be good to know how for next time.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Ok, so this has been running since yesterday at 8:18 AM, and the resilver is nowhere near done.
I've replaced drives before, and it usually takes nowhere near this amount of time... and it's estimating 10 DAYS?

Is this concerning?

RESILVER
Status: SCANNING
Completed: 08.65%
Time Remaining: 10 days, 1 hour, 28 minutes, 16 seconds
Errors: 0

Code:
# zpool status -v
  pool: PrimaryPool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Aug 16 08:02:32 2023
        1.68T scanned at 21.3M/s, 1.68T issued at 21.3M/s, 19.4T total
        644G resilvered, 8.65% done, 10 days 01:33:28 to go
config:

        NAME                                                STATE     READ WRITE CKSUM
        PrimaryPool                                         DEGRADED     0     0     0
          mirror-0                                          ONLINE       0     0     0
            gptid/d7476d46-32ca-11ec-b815-002590f52cc2      ONLINE       0     0     0
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2      ONLINE       0     0     0
          mirror-1                                          ONLINE       0     0     0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2      ONLINE       0     0     0
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2      ONLINE       0     0     0
          mirror-2                                          ONLINE       0     0     0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2      ONLINE       0     0     0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2      ONLINE       0     0     0
          mirror-3                                          ONLINE       0     0     0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2      ONLINE       0     0     0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2      ONLINE       0     0     0
          mirror-4                                          ONLINE       0     0     0
            gptid/9fd0872d-8f64-11ec-8462-002590f52cc2      ONLINE       0     0     0
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2      ONLINE       0     0     0
          mirror-5                                          DEGRADED     0     0     0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76      ONLINE       0     0     0
            spare-1                                         DEGRADED     0     0     0
              replacing-0                                   DEGRADED     0     0     0
                gptid/03daa071-505c-11ed-a9fe-ac1f6be66d76  OFFLINE      0     0     0
                gptid/0cd1e905-3c2e-11ee-96af-ac1f6be66d76  ONLINE       0     0     0  (resilvering)
              gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76    ONLINE       0     0     0
          mirror-6                                          DEGRADED     0     0     0
            gptid/749a1891-1b5c-11ee-941f-ac1f6be66d76      ONLINE       0     0     0
            spare-1                                         DEGRADED     0     0     0
              replacing-0                                   DEGRADED     0     0     0
                gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76  OFFLINE      0     0     0
                gptid/c774316e-3c2c-11ee-96af-ac1f6be66d76  ONLINE       0     0     0  (resilvering)
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76    ONLINE       0     0     0
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76        INUSE     currentlyin use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76        INUSE     currentlyin use

errors: No known data errors


Code:

        spares
root@hinata[~]# zpool iostat -v              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write----------  -----  -----  -----  -----  -----  -----
PrimaryPool  19.5T  5.86T    215    139  6.50M  10.4M
  mirror-0  3.59T  37.0G     12      5   525K   641K
    gptid/d7476d46-32ca-11ec-b815-002590f52cc2      -      -      6      2   262
K   321K
    gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2      -      -      6      2   263
K   321K
  mirror-1  3.59T  36.6G     12      6   525K   628K
    gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2      -      -      6      3   262
K   314K
    gptid/db71bcb5-32ca-11ec-b815-002590f52cc2      -      -      6      3   263
K   314K
  mirror-2  3.59T  37.1G     10      5   520K   618K
    gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2      -      -      5      2   260
K   309K
    gptid/d96847a9-32ca-11ec-b815-002590f52cc2      -      -      5      2   260
K   309K
  mirror-3  3.59T  35.9G     11      6   513K   622K
    gptid/d9fb7757-32ca-11ec-b815-002590f52cc2      -      -      5      3   256
K   311K
    gptid/da1e1121-32ca-11ec-b815-002590f52cc2      -      -      5      3   257K   311K
  mirror-4  2.16T  1.46T     34     10   887K  1.22M
    gptid/9fd0872d-8f64-11ec-8462-002590f52cc2      -      -     17      5   443K   622K
    gptid/9ff0f041-8f64-11ec-8462-002590f52cc2      -      -     16      5   443K   622K
  mirror-5  1.49T  2.14T     65     38  1.75M  3.46M
    gptid/14811777-1b6d-11ed-8423-ac1f6be66d76      -      -     36      5   850K   764K
    spare-1      -      -     28     33   944K  2.72M
      replacing-0      -      -      6     89   201K  6.13M
        gptid/03daa071-505c-11ed-a9fe-ac1f6be66d76      -      -      2      358.8K   359K
        gptid/0cd1e905-3c2e-11ee-96af-ac1f6be66d76      -      -      0     7917.8K  5.04M
      gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      -      -     26      5   880K   764K
  mirror-6  1.51T  2.11T     67     66  1.85M  3.23M
    gptid/749a1891-1b5c-11ee-941f-ac1f6be66d76      -      -     38      5   918K   636K
    spare-1      -      -     29     60   973K  2.61M
      replacing-0      -      -      6    171   172K  6.13M
        gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76      -      -      1      350.8K   278K
        gptid/c774316e-3c2c-11ee-96af-ac1f6be66d76     
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I've replaced drives before, and it usually takes nowhere near this amount of time... and it's estimating 10 DAYS?

Is this concerning?
Maybe not, since the estimation starts far from reality and gradually approaches it as the counter goes down (usually faster than real time if you watch it).

But...

you're running multiple resilvers, so there's that... the pool is doing double-time to get parity blocks and recover 2 at once, so it will certainly be slower than a single drive replacement... maybe more than double the time due to the double resource drain.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
That is not a thing that ever happens... don't hold your breath.
Them NOT detaching never happens?
Or them detaching never happens?
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Maybe not, since the estimation starts far from reality and gradually approaches it as the counter goes down (usually faster than real time if you watch it).

But...

you're running multiple resilvers, so there's that... the pool is doing double-time to get parity blocks and recover 2 at once, so it will certainly be slower than a single drive replacement... maybe more than double the time due to the double resource drain.
I figured both of these.
Although it was at about 7% 5 hours ago. And I remember it being at about 3% around 9 or 10am.
..Which is what kind of is scaring me. It just seems like way too long of a time.

Resilver priority is disabled as well. Would enabling it before I go to bed tonight, and setting it for bedtime hours help? Or just make things worse. The system isn't used a ton by then. At least actively, as I am the only user.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Just went ahead and replaced the two. It is resilvering now.
But now that I think about it, it seems it'd have made more sense to promote the two spares into that vdev, and then simply add the two new drives as new spares. No?
Is there any way to revert the resilver, and do this? It seems a bit safer.
Figuring it's too late though
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
They won't automatically detach.
I thought I recall them doing so in the past, but I guess I'm wrong.
Maybe I'm just wasting my time re-silvering new drives here then. :smile:
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I think your time issue is resilvering two drives at the same time as previously mentioned. You should just wait it out and let them finish.

As for resilvering priority, if you are not accessing the data all the time then the priority should not affect this, well that is my understanding. If you had the system serving up files often then it would take affect there.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
I think your time issue is resilvering two drives at the same time as previously mentioned. You should just wait it out and let them finish.
Ok, just nervous because that seems like a long period of time and I know resilvering is very stressful on the drives so I just get scared that the other drive might fail lol
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Ok, just nervous because that seems like a long period of time and I know resilvering is very stressful on the drives so I just get scared that the other drive might fail lol
Unfortunately that is a real concern in most situations. The good thing is you are only stressing the drives in the specific mirror as I understand it. And you have two good drives already (the original and a spare) and you are reading from both of those drives to create the new mirror for that set. So you should not be as concerned since you have a good pair even before the resilvering is complete. If I have this wrong, I know someone will chime in. I am not the resilvering Guru.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
The long time might be because of a CPU cap. Looks at your specs. Nah, unless resilvering is single-threaded.
 
Top