Continuous Resilvering after data cleanup

midnite_tank

Cadet
Joined
Dec 28, 2022
Messages
2
Hi all,

A couple of days ago I cleared out a large amount of archived footage from my pool, end of the year cleanup essentially. Since then, my pool has been continuously running resilvering scans. When a scan completes it waits for a little bit, usually 5-10 mins, then triggers another one. According to zpool status -v, there are no data errors and all drives are online and healthy. The status of the pool itself is also online and healthy. The resilvering is preventing my weekly scrub from executing properly. Is this something I need to be concerned about? Is there some way to stop it from resilvering?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
there are no data errors and all drives are online and healthy.
Well that's clearly BS... ZFS won't resilver anything if that's the case. (OK, maybe there are no data errors...)

The status of the pool itself is also online and healthy.
Maybe in-between the resilvers, it might be.

The resilvering is preventing my weekly scrub from executing properly
Forget about scrubs until you sort out the erroneous resilvers.

Is this something I need to be concerned about? Is there some way to stop it from resilvering?
Yes. Find and fix the issue causing the resilvers.

Must either be a system/hardware issue (I note you have supplied none of the required information to the forum) or a cabling/disk issue (or both).

Have a look at dmesg and share your hardware details with us and maybe we can suggest where to look next.
 

midnite_tank

Cadet
Joined
Dec 28, 2022
Messages
2
Maybe in-between the resilvers, it might be.
Nope, been keeping an eye on it all day. In between and during the resilvers the pool status has remained healthy and no alerts about degradation.

Output of zpool status -v

Code:
pool: Omniverse
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Dec 28 13:59:22 2022
    85.8T scanned at 4.26G/s, 85.3T issued at 4.24G/s, 107T total
    99.7G resilvered, 79.75% done, 01:27:11 to go
config:


    NAME                                            STATE     READ WRITE CKSUM
    Omniverse                                       ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        gptid/1f60dcd0-b894-11ec-b41c-a8a1598d2bd3  ONLINE       0     0     0
        gptid/1f98fade-b894-11ec-b41c-a8a1598d2bd3  ONLINE       0     0     0
        gptid/1fc2be7c-b894-11ec-b41c-a8a1598d2bd3  ONLINE       0     0     0
      raidz1-1                                      ONLINE       0     0     0
        gptid/1db213d9-b894-11ec-b41c-a8a1598d2bd3  ONLINE       0     0     0
        gptid/1dcd96a9-b894-11ec-b41c-a8a1598d2bd3  ONLINE       0     0     0  (resilvering)
        gptid/1f1a1d6b-b894-11ec-b41c-a8a1598d2bd3  ONLINE       0     0     0
      raidz1-2                                      ONLINE       0     0     0
        gptid/a170c8dd-2b6a-11ed-8998-80615f0ee42e  ONLINE       0     0     0
        gptid/a6369d8f-2ada-11ed-b431-80615f0ee42e  ONLINE       0     0     0
        gptid/960c7633-2a35-11ed-9e23-80615f0ee42e  ONLINE       0     0     0
    logs   
      gptid/fdfe0d8a-2cb3-11ed-8ca9-80615f0ee42e    ONLINE       0     0     0
    cache
      gptid/fea60ab1-2cb3-11ed-8ca9-80615f0ee42e    ONLINE       0     0     0


errors: No known data errors


  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
    The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:00:15 with 0 errors on Fri Dec 23 03:45:15 2022
config:


    NAME        STATE     READ WRITE CKSUM
    boot-pool   ONLINE       0     0     0
      ada4p2    ONLINE       0     0     0


errors: No known data errors


Drive info from dmesg
Code:
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD140EDGZ-11B1PA0 85.00A85> ACS-2 ATA SATA 3.x device
ada0: Serial Number 9MHABMDA
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 13351936MB (27344764928 512 byte sectors)
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD160EDGZ-11B2DA0 85.00A85> ACS-4 ATA SATA 3.x device
ada1: Serial Number 2BK8420N
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 15259648MB (31251759104 512 byte sectors)
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <Samsung SSD 850 EVO 500GB EMT02B6Q> ACS-2 ATA SATA 3.x device
ada2: Serial Number S21HNXAG873724F
ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada2: Command Queueing enabled
ada2: 476940MB (976773168 512 byte sectors)
ada2: quirks=0x3<4K,NCQ_TRIM_BROKEN>
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <WDC WD140EDGZ-11B1PA0 85.00A85> ACS-2 ATA SATA 3.x device
ada3: Serial Number 9LHA1XRG
ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 13351936MB (27344764928 512 byte sectors)
ada4 at ahcich4 bus 0 scbus4 target 0 lun 0
ada4: <CT1000MX500SSD4 M3CR023> ACS-3 ATA SATA 3.x device
ada4: Serial Number 2009E28EC298
ada4: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada4: Command Queueing enabled
ada4: 953869MB (1953525168 512 byte sectors)
ada5 at ahcich6 bus 0 scbus6 target 0 lun 0
ada5: <WDC WD160EDGZ-11B2DA0 85.00A85> ACS-4 ATA SATA 3.x device
ada5: Serial Number 3XG3537U
ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada5: Command Queueing enabled
ada5: 15259648MB (31251759104 512 byte sectors)
ada6 at ahcich8 bus 0 scbus8 target 0 lun 0
ada6: <WDC WD160EDGZ-11B2DA0 85.00A85> ACS-4 ATA SATA 3.x device
ada6: Serial Number 3XGMLZMU
ada6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada6: Command Queueing enabled
ada6: 15259648MB (31251759104 512 byte sectors)
ada7 at ahcich9 bus 0 scbus9 target 0 lun 0
ada7: <Samsung SSD 850 EVO 500GB EMT03B6Q> ACS-2 ATA SATA 3.x device
ada7: Serial Number S3PTNF0JA03371D
ada7: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada7: Command Queueing enabled
ada7: 476940MB (976773168 512 byte sectors)
ada7: quirks=0x3<4K,NCQ_TRIM_BROKEN>
ada8 at ahcich10 bus 0 scbus10 target 0 lun 0
ada8: <WDC WD140EDGZ-11B1PA0 85.00A85> ACS-2 ATA SATA 3.x device
ada8: Serial Number Y6GULK7C
ada8: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada8: Command Queueing enabled
ada8: 13351936MB (27344764928 512 byte sectors)
ada9 at ahcich11 bus 0 scbus11 target 0 lun 0
ada9: <WDC WD140EDGZ-11B1PA0 85.00A85> ACS-2 ATA SATA 3.x device
ada9: Serial Number Y6GGMZLD
ada9: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada9: Command Queueing enabled
ada9: 13351936MB (27344764928 512 byte sectors)
ada10 at ahcich12 bus 0 scbus12 target 0 lun 0
ada10: <WDC WD140EDGZ-11B1PA0 85.00A85> ACS-2 ATA SATA 3.x device
ada10: Serial Number Y6G40STC
ada10: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada10: Command Queueing enabled
ada10: 13351936MB (27344764928 512 byte sectors)
ada11 at ahcich13 bus 0 scbus13 target 0 lun 0
ada11: <WDC WD140EDGZ-11B1PA0 85.00A85> ACS-2 ATA SATA 3.x device
ada11: Serial Number 9MGYE8GJ
ada11: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada11: Command Queueing enabled
ada11: 13351936MB (27344764928 512 byte sectors)


Looked at dmesg and don't see anything out of the ordinary except maybe this? It's repeated over and over again.
Code:
ada0: <WDC WD140EDGZ-11B1PA0 85.00A85> s/n 9MHABMDA detached
(ada0:ahcich0:0:0:0): Periph destroyed
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD140EDGZ-11B1PA0 85.00A85> ACS-2 ATA SATA 3.x device
ada0: Serial Number 9MHABMDA
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 13351936MB (27344764928 512 byte sectors)
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
Looked at dmesg and don't see anything out of the ordinary except maybe this? It's repeated over and over again.
That would indicate that ada0 is "going away" and coming back.
Power issue or drive broken.
Is ada0 the drive that is being resilvered in zpool status ?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
In between and during the resilvers the pool status has remained healthy and no alerts about degradation.
That's what I said.

ada0: <WDC WD140EDGZ-11B1PA0 85.00A85> s/n 9MHABMDA detached (ada0:ahcich0:0:0:0): Periph destroyed
As mentioned already, this means you have a problem with ada0 and I'm willing to bet that this is at least a part of your resilvering issue.

You need to work out why this is happening ASAP. Maybe a bad SATA port or cable maybe the disk is no good, maybe early signs of SATA controller/HBA issues. (we still don't have a good idea about your hardware, so you're going to be mostly on your own here until you share).

I would be not very surprised to find that in the output of glabel status you can see that:
ada0p2 equates to gptid/1dcd96a9-b894-11ec-b41c-a8a1598d2bd3
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Just to chime in and agree with my fellow forum members, ada0 appears to be the problem here.

If you perform the steps requested above and the problem does not go away, please follow the forum rules (posted in red at the top of every forum page) and post your system specs, what version of TrueNAS you are running (we know it's Core based on the drive being ada0). The output from glabel status and the output from smartctl -a /dev/ada0 and this should provide us with enough information to provide the next piece of help information. And please post all the data from those commands, do not think something it not important. It might not be but it provides us a complete picture, and sometimes a person will remove a critical piece and we have to ask for it again.
 
Top