another slow resilver post

outofnames

Cadet
Joined
Jan 1, 2023
Messages
4
So i got a offline sector count increased on tuesday 12-27-22 and knew I had another empty drive sitting in the system so I went to start a resilver. It chugged along at 120MBs reading and writing for about an hour then dropped to a few Kbs a couple hours later and with no real progress made I did a google search and someone recommended restarting the server (To someone else ) to get it back up to speed, before restarting {zpool status said about 12% done} after restarting it was back up to 150MBs and a few more hours and i was at 35%. But I noticed it slowed back down to a few kbs again, did the reboot again and it reran the exact same way as before where it would run full speed for a while and then drop speed. Under further investigating I used the wrong drive Not the one I intended to use and I dont think its smr but i haven't used them for much. (not really sure why it was in the machine) If i can offline the drive and make it pull the data from the other two drives, or if i could stop the resilvering I can restart it with a better drive.

drives are 3 z1 vdevs (I know its not good) of 3 {hgst HUS726060ALA640} the one that's replacing one is a {wd WD6001F4PZ-49CWHM0}​

machine is a dell r610 with a Intel X5650 2.67GHz and 110 gb ddr3 ecc with a md1200 to hold the discs can't recall the HBA​

running truenas scale 22.12.0​


here's the output of zpool status​

Code:
 pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:07:17 with 0 errors on Wed Dec 28 05:52:19 2022
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sdl3      ONLINE       0     0     0

errors: No known data errors

  pool: nastyv3
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Dec 27 22:51:29 2022
        20.1T scanned at 28.9G/s, 16.1T issued at 23.2G/s, 20.3T total
        8.52G resilvered, 79.14% done, 00:03:07 to go
config:

        NAME                                        STATE     READ WRITE CKSUM
        nastyv3                                     ONLINE       0     0     0
          raidz1-0                                  ONLINE       0     0     0
            effaadfe-c15f-4d2d-9ab2-e59f7966b330    ONLINE       0     0     0
            7d0b34db-f52e-42a7-8202-d8ebac0eecf8    ONLINE       0     0     0
            0d3ad415-ee0e-4352-aae9-fded352cc641    ONLINE       0     0     0
          raidz1-1                                  ONLINE       0     0     0
            e1954a4e-8a4f-40c5-a3d0-c89784cfd04f    ONLINE       0     0     0
            3a2c19bf-482a-47f4-962a-8217496f316b    ONLINE       0     0     0
            05af7f6e-f4d6-472e-adb0-100f8abf3fdb    ONLINE       0     0     0
          raidz1-2                                  ONLINE       0     0     0
            c6360361-d97a-4c09-a7bb-8c1180751d2d    ONLINE       0     0     0
            replacing-1                             ONLINE       0     0     0
              88981ba5-21bc-498e-8e1c-63dfe8ffc61b  ONLINE       0     0     0
              782eba58-b3cf-48a9-b2ad-31160a0bd071  ONLINE       0     0     0  (resilvering)
            03398c7d-4876-452a-bb8e-987b601dc6aa    ONLINE       0     0     0
        cache
          875ccb9d-3c33-4aee-bfbc-20da16489905      ONLINE       0     0     0
        spares
          b0443905-75ee-4f9a-b36b-20c06f0698fe      AVAIL   

errors: No known data errors
 

outofnames

Cadet
Joined
Jan 1, 2023
Messages
4
Just so if someone else runs into this issue I offlined the bad drive and it is now back to 150MBs. so we shall see.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Just to note that you can't measure a resilver by throughput of one or more disks.

While working through metadata, it can be very high on IOPS and therefore moving hardly any data, but taking a long time. (remembering that write IOPS on a single HDD will be 100-300 at best).

Just be patient (I think you've probably made it start from scratch 3 times already by trying to make it go faster).
 

outofnames

Cadet
Joined
Jan 1, 2023
Messages
4
Just to note that you can't measure a resilver by throughput of one or more disks.

While working through metadata, it can be very high on IOPS and therefore moving hardly any data, but taking a long time. (remembering that write IOPS on a single HDD will be 100-300 at best).

Just be patient (I think you've probably made it start from scratch 3 times already by trying to make it go faster
maybe so but it was strange. one thing to note almost instantly after I offlined the old hdd it went back up to full speed and was finished in less than an hour. when before I offlined the drive it was saying it would take 3 days + Also I didn't include this in the original post because i couldn't find it in the log but the disk said something about not being able to be read at one point. Thanks for responding!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If there's a "failing" drive in the mix (but somehow not bad enough to be faulted), you can indeed see a much more difficult situation... as the system knows the "failing" drive has the data it wants, it won't calculate it from parity, but reads may be very slow if the disk is struggling to get to the sectors in question.

So as you saw, offlining the "failing" drive forced the pool to seek that data elsewhere, which was much faster in that case.
 
Top