Resilvering for the last 1.5 months, 3 months to go.

dave_sp

Cadet
Joined
Aug 19, 2020
Messages
2
I replaced a bad hard drive in my 4 drive setup. Freenas started a resilver on July 6. It has been running since then. This seems extremely slow. I did some reading and verified my ashift was set to 12. I am looking for ideas on what to change or check.

Code:
root@freenas:~ # zpool status zorro
  pool: zorro
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Jul  6 11:05:15 2020
    3.53T scanned at 5.03M/s, 2.61T issued at 1.80M/s, 7.89T total
    598G resilvered, 33.09% done, no estimated completion time
config:

    NAME                                              STATE     READ WRITE CKSUM
    zorro                                             DEGRADED     0     0     0
      raidz2-0                                        DEGRADED     0     0     0
        gptid/acc8eb5c-0d00-11e7-b3b2-d05099c09d74    ONLINE       0     0     0
        gptid/ada9f484-0d00-11e7-b3b2-d05099c09d74    ONLINE       0     0     0
        gptid/ae7fe21d-0d00-11e7-b3b2-d05099c09d74    ONLINE       0     0     0
        replacing-3                                   DEGRADED     0     0     0
          3028245828089523359                         UNAVAIL      0     0     0  was /dev/gptid/af1375a0-0d00-11e7-b3b2-d05099c09d74
          gptid/b66e4460-bf91-11ea-877d-d05099c38b69  ONLINE       0     0     0

errors: No known data errors
root@freenas:~ #


Code:
root@freenas:~ # zdb -U /data/zfs/zpool.cache
zorro:
    version: 5000
    name: 'zorro'
    state: 0
    txg: 15370820
    pool_guid: 11217001964387628572
    hostid: 1573268323
    hostname: ''
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 11217001964387628572
        children[0]:
            type: 'raidz'
            id: 0
            guid: 11208868727535434611
            nparity: 2
            metaslab_array: 41
            metaslab_shift: 36
            ashift: 12
            asize: 11993752797184
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_top: 36
            children[0]:
                type: 'disk'
                id: 0
                guid: 17062546436440342488
                path: '/dev/gptid/acc8eb5c-0d00-11e7-b3b2-d05099c09d74'
                DTL: 523
                create_txg: 4
                com.delphix:vdev_zap_leaf: 37
            children[1]:
                type: 'disk'
                id: 1
                guid: 6998765710189217916
                path: '/dev/gptid/ada9f484-0d00-11e7-b3b2-d05099c09d74'
                DTL: 522
                create_txg: 4
                com.delphix:vdev_zap_leaf: 38
            children[2]:
                type: 'disk'
                id: 2
                guid: 1629230559443109953
                path: '/dev/gptid/ae7fe21d-0d00-11e7-b3b2-d05099c09d74'
                DTL: 521
                create_txg: 4
                com.delphix:vdev_zap_leaf: 39
            children[3]:
                type: 'replacing'
                id: 3
                guid: 5583997063394720348
                whole_disk: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 3028245828089523359
                    path: '/dev/gptid/af1375a0-0d00-11e7-b3b2-d05099c09d74'
                    not_present: 1
                    DTL: 520
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 40
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 10264237243552341098
                    path: '/dev/gptid/b66e4460-bf91-11ea-877d-d05099c38b69'
                    DTL: 43
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 42
                    resilver_txg: 15329553
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
root@freenas:~ #
 
Joined
Jul 10, 2016
Messages
521
What type of disks do you have? WD Red SMR by any chance?

 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I replaced a bad hard drive in my 4 drive setup. Freenas started a resilver on July 6. It has been running since then. This seems extremely slow. I did some reading and verified my ashift was set to 12. I am looking for ideas on what to change or check.

Welcome to the forums.

While there's a lot of helpful people here, please note that asking vague questions will usually result in low-quality less-helpful answers.

For technical support, we ask that you provide reasonably detailed information on your hardware platform, configuration, and the version of software you are running. There are specific suggestions in the Forum Rules, conveniently linked at the top of every page in red, as to useful things to describe.

The usual culprit for slow resilvers has been SMR drives lately, though the timeframe you describe seems excessive even for SMR. If you are also unfortunate enough to have done anything bad like trying to run databases, VM block storage, storing millions of small files, or torrents on your RAIDZ, this could be the bonus multiplier that is killing you. RAIDZ is best suited for the storage of large sequential files, written sequentially and not rewritten or overwritten, think "archival purposes".

If you're not familiar with what "SMR" is, Ars Technica has been doing a fine job of covering the topic.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That is definitely unfortunate and I am sorry to hear it. As you can maybe imagine, this is disheartening for those of us on the forum as well, as we've been running into this again and again. You will notice that @Jurgen Segaert didn't even ask for the details of your system before suggesting it.
 
Top