SMB Share performance went downhill fast

JohnFLi

Contributor
Joined
Sep 26, 2016
Messages
139
I have 4 seperate SMB shares on my freenas box. They have been working great with ZERO issues until last week. (no changes to freenas had been done)
3 shares continue to work great, 1 is being a pain. via windows explorer, I can browse through the share in question, sometimes it workes without issue, until i try to delete something.....then it sits until it eventually times out.
I have a program that does file backups and writes to the same share....some of the backups works fine, the other will get an error saying the share doesn't exist, or that it cannot write to it.
I have read through some of the other postings about Smb issues, and to be honest, they just don't makes sense to me.

in the rolling log at the bottom of the gui, I do see "kernel" Failed to fully fault in a core file segment at VA 0x81a6af000 with size 0x89000 to at offset 0xb00000 for proces smbd"

under VFS Objects for all my SMB shares 'zfs_space, zfsacl and streams_xattr' are selected.


any ideas why 1 share would just kinda go to poo?



  • motherboard make and model ------------ SuperMicro X10DRL
  • CPU make and model ----------------- Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
  • RAM quantity -----------------------65,392MB
  • hard drives, quantity, model numbers, and RAID configuration --------- 39, WD101KRYZ , some raidz2, some raidz1, some striped
  • hard disk controllers ---------------- 2x Rocket 750
  • network cards ------------ x540T2BLK
  • chassis -------------------- XL60 turbo
 

JohnFLi

Contributor
Joined
Sep 26, 2016
Messages
139
This came in via email the day after I restarted the system:

messages:
> SMP: AP CPU #11 Launched!
> SMP: AP CPU #18 Launched!
> SMP: AP CPU #19 Launched!
> SMP: AP CPU #30 Launched!
> SMP: AP CPU #7 Launched!
> SMP: AP CPU #24 Launched!
> SMP: AP CPU #29 Launched!
> SMP: AP CPU #26 Launched!
> SMP: AP CPU #22 Launched!
> SMP: AP CPU #16 Launched!
> SMP: AP CPU #2 Launched!
> SMP: AP CPU #13 Launched!
> SMP: AP CPU #14 Launched!
> SMP: AP CPU #27 Launched!
> SMP: AP CPU #6 Launched!
> SMP: AP CPU #10 Launched!
> SMP: AP CPU #25 Launched!
> SMP: AP CPU #28 Launched!
> SMP: AP CPU #4 Launched!
> SMP: AP CPU #21 Launched!
> SMP: AP CPU #17 Launched!
> Timecounter "TSC" frequency 2100039012 Hz quality 1000
> uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
> uhub1: 2 ports with 2 removable, self powered
> uhub3 numa-domain 0 on uhub2
> uhub3: <vendor 0x8087 product 0x8002, class 9/0, rev 2.00/0.05, addr 2> on usbus2
> ugen1.2: <vendor 0x8087 product 0x800a> at usbus1
> uhub4 numa-domain 0 on uhub1
> uhub4: <vendor 0x8087 product 0x800a, class 9/0, rev 2.00/0.05, addr 2> on usbus1
> uhub4: 6 ports with 6 removable, self powered
> uhub3: 8 ports with 8 removable, self powered
> da0: da4: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
> da0: Serial Number 7PG70NYR
> da0: 9537536MB (19532873728 512 byte sectors)
> da6 at hptnr0 bus 0 scbus11 target 6 lun 0
> da1 at hptnr0 bus 0 scbus11 target 1 lun 0
> da6: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da6: Serial Number 7PGGUL3G
> da6: 9537536MB (19532873728 512 byte sectors)
> da12 at hptnr0 bus 0 scbus11 target 12 lun 0
> da12: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> ada0: <SanDisk SD8SBAT128G1122 Z2333000> ACS-2 ATA SATA 3.x device
> ada0: Serial Number 162614405539
> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
> da10: 9537536MB (19532873728 512 byte sectors)
> da16 at hptnr0 bus 0 scbus11 target 16 lun 0
> ada1: <SanDisk SD8SBAT128G1122 Z2333000> ACS-2 ATA SATA 3.x device
> ada1: Serial Number 162614405494
> ada1: 600.000MB/s transfersda12: Serial Number 7PGHUJZG
> da12: 9537536MB (19532873728 512 byte sectors)
> da18 at hptnr0 bus 0 scbus11 target 18 lun 0
> da18: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da18: Serial Number 7PGHU3AG
> da3: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da3: Serial Number 7PGAN7GG
> da3: 9537536MB (19532873728 512 byte sectors)
> da9 at hptnr0 bus 0 scbus11 target 9 lun 0
> da16: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da20: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da20: Serial Number 7PGJ11UG
> da5: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da9: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da9: Serial Number 7PGJ01UG
> da9: 9537536MB (19532873728 512 byte sectors)
> da15 at hptnr0 bus 0 scbus11 target 15 lun 0
> (SATA 3.x, UDMA6, PIO 512bytes)
> ada1: Command Queueing enabled
> da15: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da1: da5: Serial Number 7PGHRX3G
> da5: 9537536MB (19532873728 512 byte sectors)
> da11 at hptnr0 bus 0 scbus11 target 11 lun 0
> da11: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da11: Serial Number 7PGH521G
> da16: Serial Number 7PGB800G
> da16: 9537536MB (19532873728 512 byte sectors)
> da22 at hptnr0 bus 0 scbus11 target 22 lun 0
> da22: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da22: Serial Number 7PG5KYLR
> da22: 9537536MB (19532873728 512 byte sectors)
> da28 at hptnr0 bus 0 scbus11 target 28 lun 0
> da28: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da28: Serial Number 7PJYJ5GC
> da28: 9537536MB (19532873728 512 byte sectors)
> da35 at hptnr1 bus 0 scbus12 target 3 lun 0
> da35: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da35: Serial Number 2TJ9BUND
> da35: 9537536MB (19532873728 512 byte sectors)
> da1: Serial Number 7PG7RPJR
> da1: 9537536MB (19532873728 512 byte sectors)
> da7 at hptnr0 bus 0 scbus11 target 7 lun 0
> da33: Serial Number 2TJ9667D
> da33: 9537536MB (19532873728 512 byte sectors)
> da39 at umass-sim0 bus 0 scbus13 target 0 lun 0
> da39: <Kingston DataTraveler 3.0 PMAP> Removable Direct Access SPC-4 SCSI device
> da30: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da7: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da7: Serial Number 7PGHUJJG
> da7: 9537536MB (19532873728 512 byte sectors)
> da13 at hptnr0 bus 0 scbus11 target 13 lun 0
> da13: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da13: Serial Number 7PGG469G
> da13: 9537536MB (19532873728 512 byte sectors)
> da19 at hptnr0 bus 0 scbus11 target 19 lun 0
> da11: 9537536MB (19532873728 512 byte sectors)
> da17 at hptnr0 bus 0 scbus11 target 17 lun 0
> da17: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da17: Serial Number 7PGHSUHG
> da17: 9537536MB (19532873728 512 byte sectors)
> da23 at hptnr0 bus 0 scbus11 target 23 lun 0
> da23: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da23: Serial Number 7PG7HTBR
> da23: 9537536MB (19532873728 512 byte sectors)
> da29 at hptnr0 bus 0 scbus11 target 29 lun 0
> da30: Serial Number 7PJYD7KC
> da38: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da38: Serial Number 2YHT4BDD
> da38: 9537536MB (19532873728 512 byte sectors)
> da30: 9537536MB (19532873728 512 byte sectors)
> da29: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da29: Serial Number 7PJYMASC
> da29: 9537536MB (19532873728 512 byte sectors)
> da15: Serial Number 7PGHU7RG
> da36 at hptnr1 bus 0 scbus12 target 4 lun 0
> da15: 9537536MB (19532873728 512 byte sectors)
> da21 at hptnr0 bus 0 scbus11 target 21 lun 0
> da21: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da21: Serial Number 7PGE84AG
> da21: 9537536MB (19532873728 512 byte sectors)
> da27 at hptnr0 bus 0 scbus11 target 27 lun 0
> da27: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da27: Serial Number 7PJYP78C
> da27: 9537536MB (19532873728 512 byte sectors)
> da34 at hptnr1 bus 0 scbus12 target 2 lun 0
> da37 at hptnr1 bus 0 scbus12 target 5 lun 0
> da37: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da37: Serial Number 2YHS6RND
> da37: 9537536MB (19532873728 512 byte sectors)
> da36: <ATA WDC WD101KRYZ-01 01.0> Fixed Direct Access SPC-3 SCSI device
> da36: Serial Number 2YHU2GPD
> da36: 9537536MB (19532873728 512 byte sectors)
> GEOM_MIRROR: Cancelling unmapped because of da33p1.
> GEOM_MIRROR: Cancelling unmapped because of da32p1.
> GEOM_MIRROR: Cancelling unmapped because of da30p1.
> GEOM_MIRROR: Cancelling unmapped because of da28p1.
> GEOM_MIRROR: Cancelling unmapped because of da26p1.
> Failed to fully fault in a core file segment at VA 0x81a6b9000 with size 0xa2000 to be written at offset 0x8b0a000 for process smbd
> Failed to fully fault in a core file segment at VA 0x81a772000 with size 0xb000 to be written at offset 0x8bc3000 for process smbd
> Failed to fully fault in a core file segment at VA 0x81cd10000 with size 0xbb000 to be written at offset 0x8de6000 for process smbd
> pid 14045 (smbd), uid 0: exited on signal 6 (core dumped)
> Failed to fully fault in a core file segment at VA 0x81a6af000 with size 0x89000 to be written at offset 0x8b00000 for process smbd
> Failed to fully fault in a core file segment at VA 0x81a738000 with size 0xa2000 to be written at offset 0x8b89000 for process smbd
> Failed to fully fault in a core file segment at VA 0x81cd10000 with size 0xbb000 to be written at offset 0x8e62000 for process smbd
> pid 14443 (smbd), uid 0: exited on signal 6 (core dumped)
> Failed to fully fault in a core file segment at VA 0x81a6b9000 with size 0xa2000 to be written at offset 0x8b09000 for process smbd
> Failed to fully fault in a core file segment at VA 0x81d316000 with size 0xbb000 to be written at offset 0x8de8000 for process smbd
> pid 12013 (smbd), uid 0: exited on signal 6 (core dumped)
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Do you run scheduled SMART tests on your disks? If not, it's a good idea to run the short and long tests regularly -- I run the short tests daily and the long tests weekly.

Judging from the message text you posted above, it looks like you may have problems with drives da26, da28, da30, da32, and/or da33.

To find out if this is the case, examine the smartctl program output for these drives and see if any of them have bad sectors or other indications of failure.

EDIT: Also, please show us the output of zpool status, preferably enclosed in code tags (for better readability).
 

JohnFLi

Contributor
Joined
Sep 26, 2016
Messages
139
I run the short tests 4 times a month, and the long tests 2 times a month. but upon checking.....I haven't added drives new drives to them since I set the tests up.

short test result:
da26 - completed without error

da28- completed without error

da30-

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Interrupted (host reset) 10% 14133

da32- completed without error

da33-completed without error

The pool named "vRanger" is the one i'm having issues with (the last one):

Code:
root@G1PPFreeNas01:/etc # zpool status

pool: PhotoArchive

state: ONLINE

scan: scrub repaired 0 in 0 days 16:53:58 with 0 errors on Sun Dec 2 16:54:10 2018

config:


NAME STATE READ WRITE CKSUM

PhotoArchive ONLINE 0 0 0

raidz2-0 ONLINE 0 0 0

gptid/e1d0a939-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e25e697c-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e2f06a22-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e36d1caf-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e3f9afed-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e47daea6-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

raidz2-1 ONLINE 0 0 0

gptid/64b1bb8a-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/65b5dbd1-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/664e579a-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/67682e7f-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/67f4f0a1-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/690c69bb-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0


errors: No known data errors


pool: SharepointBackup

state: ONLINE

scan: scrub repaired 0 in 0 days 10:49:54 with 0 errors on Sun Dec 2 10:49:54 2018

config:


NAME STATE READ WRITE CKSUM

SharepointBackup ONLINE 0 0 0

gptid/3a660f5c-1644-11e7-86d0-a0369fb4a0dc ONLINE 0 0 0

gptid/3ad214d4-1644-11e7-86d0-a0369fb4a0dc ONLINE 0 0 0



errors: No known data errors



pool: VideoStorage

state: ONLINE

scan: scrub repaired 0 in 1 days 13:07:59 with 0 errors on Sun Nov 4 12:08:00 2018

config:


NAME STATE READ WRITE CKSUM

VideoStorage ONLINE 0 0 0

raidz2-0 ONLINE 0 0 0

gptid/61263c2f-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/619daac7-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/62100d00-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/628b1c0c-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/6301d8cb-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/63782d93-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/63f266ac-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/2c5e9358-a01d-11e8-a591-a0369fb4a0dc ONLINE 0 0 0


errors: No known data errors


pool: freenas-boot

state: ONLINE

scan: scrub repaired 0 in 0 days 00:00:45 with 0 errors on Wed Dec 19 03:45:45 2018

config:


NAME STATE READ WRITE CKSUM

freenas-boot ONLINE 0 0 0

mirror-0 ONLINE 0 0 0

ada0p2 ONLINE 0 0 0

ada1p2 ONLINE 0 0 0


errors: No known data errors


pool: vRangerBackups

state: ONLINE

scan: scrub in progress since Sun Dec 23 00:00:01 2018

        15.3T scanned at 2.17G/s, 10.6T issued at 215M/s, 64.5T total

        0 repaired, 16.48% done, 3 days 01:02:57 to go

config:


NAME STATE READ WRITE CKSUM

vRangerBackups ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

gptid/5217d447-35b9-11e7-b98a-a0369fb4a0dc ONLINE 0 0 0

gptid/5312744e-35b9-11e7-b98a-a0369fb4a0dc ONLINE 0 0 0

gptid/5415909e-35b9-11e7-b98a-a0369fb4a0dc ONLINE 0 0 0

raidz1-1 ONLINE 0 0 0

gptid/8a248d06-2612-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/8aa47cc6-2612-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/8b2a2d96-2612-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

raidz1-2 ONLINE 0 0 0

gptid/e5590f18-2d20-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e6792ee9-2d20-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e7930aa7-2d20-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

raidz1-3 ONLINE 0 0 0

gptid/57eab543-fcc1-11e8-aa1e-a0369fb4a0dc ONLINE 0 0 0

gptid/59fd1cf8-fcc1-11e8-aa1e-a0369fb4a0dc ONLINE 0 0 0

gptid/5b90a570-fcc1-11e8-aa1e-a0369fb4a0dc ONLINE 0 0 0



errors: No known data errors

root@G1PPFreeNas01:/etc #
 
Last edited by a moderator:

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I run the short tests 4 times a month, and the long tests 2 times a month. but upon checking.....I haven't added drives new drives to them since I set the tests up.

short test result:
da26 - completed without error

da28- completed without error

da30-

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Interrupted (host reset) 10% 14133

da32- completed without error

da33-completed without error

The pool named "vRanger" is the one i'm having issues with (the last one):

Code:
root@G1PPFreeNas01:/etc # zpool status

pool: PhotoArchive

state: ONLINE

scan: scrub repaired 0 in 0 days 16:53:58 with 0 errors on Sun Dec 2 16:54:10 2018

config:


NAME STATE READ WRITE CKSUM

PhotoArchive ONLINE 0 0 0

raidz2-0 ONLINE 0 0 0

gptid/e1d0a939-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e25e697c-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e2f06a22-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e36d1caf-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e3f9afed-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e47daea6-2bbe-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

raidz2-1 ONLINE 0 0 0

gptid/64b1bb8a-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/65b5dbd1-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/664e579a-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/67682e7f-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/67f4f0a1-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/690c69bb-2bc0-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0


errors: No known data errors


pool: SharepointBackup

state: ONLINE

scan: scrub repaired 0 in 0 days 10:49:54 with 0 errors on Sun Dec 2 10:49:54 2018

config:


NAME STATE READ WRITE CKSUM

SharepointBackup ONLINE 0 0 0

gptid/3a660f5c-1644-11e7-86d0-a0369fb4a0dc ONLINE 0 0 0

gptid/3ad214d4-1644-11e7-86d0-a0369fb4a0dc ONLINE 0 0 0



errors: No known data errors



pool: VideoStorage

state: ONLINE

scan: scrub repaired 0 in 1 days 13:07:59 with 0 errors on Sun Nov 4 12:08:00 2018

config:


NAME STATE READ WRITE CKSUM

VideoStorage ONLINE 0 0 0

raidz2-0 ONLINE 0 0 0

gptid/61263c2f-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/619daac7-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/62100d00-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/628b1c0c-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/6301d8cb-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/63782d93-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/63f266ac-3a65-11e7-8936-a0369fb4a0dc ONLINE 0 0 0

gptid/2c5e9358-a01d-11e8-a591-a0369fb4a0dc ONLINE 0 0 0


errors: No known data errors


pool: freenas-boot

state: ONLINE

scan: scrub repaired 0 in 0 days 00:00:45 with 0 errors on Wed Dec 19 03:45:45 2018

config:


NAME STATE READ WRITE CKSUM

freenas-boot ONLINE 0 0 0

mirror-0 ONLINE 0 0 0

ada0p2 ONLINE 0 0 0

ada1p2 ONLINE 0 0 0


errors: No known data errors


pool: vRangerBackups

state: ONLINE

scan: scrub in progress since Sun Dec 23 00:00:01 2018

        15.3T scanned at 2.17G/s, 10.6T issued at 215M/s, 64.5T total

        0 repaired, 16.48% done, 3 days 01:02:57 to go

config:


NAME STATE READ WRITE CKSUM

vRangerBackups ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

gptid/5217d447-35b9-11e7-b98a-a0369fb4a0dc ONLINE 0 0 0

gptid/5312744e-35b9-11e7-b98a-a0369fb4a0dc ONLINE 0 0 0

gptid/5415909e-35b9-11e7-b98a-a0369fb4a0dc ONLINE 0 0 0

raidz1-1 ONLINE 0 0 0

gptid/8a248d06-2612-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/8aa47cc6-2612-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/8b2a2d96-2612-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

raidz1-2 ONLINE 0 0 0

gptid/e5590f18-2d20-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e6792ee9-2d20-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

gptid/e7930aa7-2d20-11e8-8c1e-a0369fb4a0dc ONLINE 0 0 0

raidz1-3 ONLINE 0 0 0

gptid/57eab543-fcc1-11e8-aa1e-a0369fb4a0dc ONLINE 0 0 0

gptid/59fd1cf8-fcc1-11e8-aa1e-a0369fb4a0dc ONLINE 0 0 0

gptid/5b90a570-fcc1-11e8-aa1e-a0369fb4a0dc ONLINE 0 0 0



errors: No known data errors

root@G1PPFreeNas01:/etc #
I recommend you run the short tests daily, FWIW. And of course you need to add any new drives to the SMART tests so that all of your drives get tested. It's easy to forget this... Ask me how I know! :)

Looks like da30 may be failing. See if da30 is a member of your 'problem' pool (vRangerBackups).

I would run a short test first ( smartctl -t short /dev/da30), then a long test ( smartctl -t long /dev/da30) and see what shows up. Before you start (i.e., right now) and after each test runs, show us the output of smartctl -a /dev/da30, in code tags.

Good luck!
 

JohnFLi

Contributor
Joined
Sep 26, 2016
Messages
139
I recommend you run the short tests daily, FWIW. And of course you need to add any new drives to the SMART tests so that all of your drives get tested. It's easy to forget this... Ask me how I know! :)

Looks like da30 may be failing. See if da30 is a member of your 'problem' pool (vRangerBackups).

I would run a short test first ( smartctl -t short /dev/da30), then a long test ( smartctl -t long /dev/da30) and see what shows up. Before you start (i.e., right now) and after each test runs, show us the output of smartctl -a /dev/da30, in code tags.

Good luck!

currently, that particular pool is still running a scrub.....it's been runninng since Sunday:
Code:
root@G1PPFreeNas01:~ # zpool status vRangerBackups
  pool: vRangerBackups
 state: ONLINE
  scan: scrub in progress since Sun Dec 23 00:00:01 2018
        16.3T scanned at 62.0M/s, 16.3T issued at 61.9M/s, 64.5T total
        0 repaired, 25.28% done, 9 days 11:03:46 to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        vRangerBackups                                  ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/5217d447-35b9-11e7-b98a-a0369fb4a0dc  ONLINE       0     0     0
            gptid/5312744e-35b9-11e7-b98a-a0369fb4a0dc  ONLINE       0     0     0
            gptid/5415909e-35b9-11e7-b98a-a0369fb4a0dc  ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            gptid/8a248d06-2612-11e8-8c1e-a0369fb4a0dc  ONLINE       0     0     0
            gptid/8aa47cc6-2612-11e8-8c1e-a0369fb4a0dc  ONLINE       0     0     0
            gptid/8b2a2d96-2612-11e8-8c1e-a0369fb4a0dc  ONLINE       0     0     0
          raidz1-2                                      ONLINE       0     0     0
            gptid/e5590f18-2d20-11e8-8c1e-a0369fb4a0dc  ONLINE       0     0     0
            gptid/e6792ee9-2d20-11e8-8c1e-a0369fb4a0dc  ONLINE       0     0     0
            gptid/e7930aa7-2d20-11e8-8c1e-a0369fb4a0dc  ONLINE       0     0     0
          raidz1-3                                      ONLINE       0     0     0
            gptid/57eab543-fcc1-11e8-aa1e-a0369fb4a0dc  ONLINE       0     0     0
            gptid/59fd1cf8-fcc1-11e8-aa1e-a0369fb4a0dc  ONLINE       0     0     0
            gptid/5b90a570-fcc1-11e8-aa1e-a0369fb4a0dc  ONLINE       0     0     0

errors: No known data errors



RESULT from smartctl -t short /dev/da30 :

Code:
root@G1PPFreeNas01:~ # smartctl -a /dev/da30
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Gold
Device Model:     WDC WD101KRYZ-01JPDB0
Serial Number:    7PJYD7KC
LU WWN Device Id: 5 000cca 251e98bbb
Firmware Version: 01.01H01
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Dec 27 14:19:06 2018 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Warning: This result is based on an Attribute check.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 241) Self-test routine in progress...
                                        10% of test remaining.
Total time to complete Offline
data collection:                (   93) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (1070) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   083   083   016    Pre-fail  Always       -       65969
  2 Throughput_Performance  0x0005   001   001   054    Pre-fail  Offline  FAILING_NOW 25053
  3 Spin_Up_Time            0x0007   100   100   024    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       7
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       14163
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7
22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   093   093   000    Old_age   Always       -       9562
193 Load_Cycle_Count        0x0012   093   093   000    Old_age   Always       -       9562
194 Temperature_Celsius     0x0002   230   230   000    Old_age   Always       -       26 (Min/Max 17/33)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       2112
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: unknown failure    10%     14163         0
# 2  Short offline       Completed: unknown failure    10%     14135         0
# 3  Short offline       Aborted by host               10%     14134         -
# 4  Short offline       Interrupted (host reset)      10%     14133         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



It's going to take awhile for the long test to finish, I will update those results tomorrow. It says it will take 1070 min.
would it be good to stop the scrub and just replace da30?
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Code:
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       2112

...snip...

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: unknown failure    10%     14163         0
# 2  Short offline       Completed: unknown failure    10%     14135         0
# 3  Short offline       Aborted by host               10%     14134         -
# 4  Short offline       Interrupted (host reset)      10%     14133         -


It's going to take awhile for the long test to finish, I will update those results tomorrow. It says it will take 1070 min.
would it be good to stop the scrub and just replace da30?
Do you know for certain that da30 is used in this pool?

The 'Current_Pending_Sector' count is a bad sign, and the drive can't seem to get past 10% when running the short test before it aborts. If it were me, I'd replace it with a known-good, tested spare.
 

JohnFLi

Contributor
Joined
Sep 26, 2016
Messages
139
The attached image shows that it da30 is in the pool that is giving issues.

Results for the long test:

Code:
root@G1PPFreeNas01:~ # smartctl -a /dev/da30
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Gold
Device Model:     WDC WD101KRYZ-01JPDB0
Serial Number:    7PJYD7KC
LU WWN Device Id: 5 000cca 251e98bbb
Firmware Version: 01.01H01
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Dec 28 08:00:34 2018 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Warning: This result is based on an Attribute check.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  73) The previous self-test completed having
                                        a test element that failed and the test
                                        element that failed is not known.
Total time to complete Offline
data collection:                (   93) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (1070) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   058   058   016    Pre-fail  Always       -       674106825
  2 Throughput_Performance  0x0005   001   001   054    Pre-fail  Offline  FAILING_NOW 25053
  3 Spin_Up_Time            0x0007   100   100   024    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       7
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       14181
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   093   093   000    Old_age   Always       -       9563
193 Load_Cycle_Count        0x0012   093   093   000    Old_age   Always       -       9563
194 Temperature_Celsius     0x0002   230   230   000    Old_age   Always       -       26 (Min/Max 17/33)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       2128
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: unknown failure    90%     14163         0
# 2  Short offline       Completed: unknown failure    10%     14163         0
# 3  Short offline       Completed: unknown failure    10%     14135         0
# 4  Short offline       Aborted by host               10%     14134         -
# 5  Short offline       Interrupted (host reset)      10%     14133         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@G1PPFreeNas01:~ #


The current scrub that is going on for the offending pool is at 33% with another 7 days to go.
Am I able to stop the scrub and then just replace the drive?
 

Attachments

  • da30.jpg
    da30.jpg
    42.4 KB · Views: 278

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
The attached image shows that it da30 is in the pool that is giving issues.

Results for the long test:

Code:
root@G1PPFreeNas01:~ # smartctl -a /dev/da30
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Gold
Device Model:     WDC WD101KRYZ-01JPDB0
Serial Number:    7PJYD7KC
LU WWN Device Id: 5 000cca 251e98bbb
Firmware Version: 01.01H01
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Dec 28 08:00:34 2018 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Warning: This result is based on an Attribute check.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  73) The previous self-test completed having
                                        a test element that failed and the test
                                        element that failed is not known.
Total time to complete Offline
data collection:                (   93) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (1070) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   058   058   016    Pre-fail  Always       -       674106825
  2 Throughput_Performance  0x0005   001   001   054    Pre-fail  Offline  FAILING_NOW 25053
  3 Spin_Up_Time            0x0007   100   100   024    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       7
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       14181
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7
22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   093   093   000    Old_age   Always       -       9563
193 Load_Cycle_Count        0x0012   093   093   000    Old_age   Always       -       9563
194 Temperature_Celsius     0x0002   230   230   000    Old_age   Always       -       26 (Min/Max 17/33)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       2128
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: unknown failure    90%     14163         0
# 2  Short offline       Completed: unknown failure    10%     14163         0
# 3  Short offline       Completed: unknown failure    10%     14135         0
# 4  Short offline       Aborted by host               10%     14134         -
# 5  Short offline       Interrupted (host reset)      10%     14133         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@G1PPFreeNas01:~ #


The current scrub that is going on for the offending pool is at 33% with another 7 days to go.
Am I able to stop the scrub and then just replace the drive?
I don't see any point in allowing the current scrub on that pool to finish.

I would stop the scrub, if need be by simply rebooting the system, and then replace the failing drive.
 

JohnFLi

Contributor
Joined
Sep 26, 2016
Messages
139
I figured you would say that, so I stopped the scrub a little bit ago.
so I went to the STORAGE tab, selected the volume in question, clicked on "volume status", highlighted da30, then clicked 'replace'. It asked what I wanted to replace it with, so I slected da34. poof, it started resilvering.

it has now been sitting at 53.18% for over an hour.

Code:
 zpool status vRangerBackups
  pool: vRangerBackups
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Dec 28 08:16:00 2018
        38.7T scanned at 1.50G/s, 34.1T issued at 1.32G/s, 64.1T total
        61.5G resilvered, 53.18% done, 0 days 06:27:15 to go
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I figured you would say that, so I stopped the scrub a little bit ago.
so I went to the STORAGE tab, selected the volume in question, clicked on "volume status", highlighted da30, then clicked 'replace'. It asked what I wanted to replace it with, so I slected da34. poof, it started resilvering.

it has now been sitting at 53.18% for over an hour.

Code:
 zpool status vRangerBackups
  pool: vRangerBackups
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Dec 28 08:16:00 2018
        38.7T scanned at 1.50G/s, 34.1T issued at 1.32G/s, 64.1T total
        61.5G resilvered, 53.18% done, 0 days 06:27:15 to go
Yes, it's always nerve-wracking to replace a drive. And especially so when you're using a set of striped RAIDZ1 vdevs, because if any of the RAIDZ1 vdevs loses more than one drive, you lose your entire pool.

Knock on wood, and hope that da32 and da33 both hold up.
 

JohnFLi

Contributor
Joined
Sep 26, 2016
Messages
139
Well, after a few days of resilvering, and little progress, I deleted the volume and re-created it....leaving out da30. It is now working great again. Is resilvering a drive usually that slow? or was it becasue the drive went partially bad?
 
Joined
Jan 18, 2017
Messages
525
have you checked the smart data on the rest of the drives to make sure they are also not having issues? do you now have all your drives running regular smart tests? do you have email notification setup? If I remember correctly reslivers will slow down if there is load on the server the heavier the load the slower it will go, if the resilver never finished I would be concerned about the health of the pool.
 

JohnFLi

Contributor
Joined
Sep 26, 2016
Messages
139
I do have smart test running against all the drive now. I have email setup, and do get emails weekly about anybody trying to log in.....I don't remember getting email results concerning the smart tests though.

Concerning the re-silvering.....I had stopped all teh items that was reading and writing to teh pool, but the time till completion kept getting pushed out further and further.

So i killed teh resilver, and killed teh pool. rebuilt it (without the drive in question) and it's been running fine.
 
Joined
Jan 18, 2017
Messages
525
You will not get the smart stats emailed to you unless you have a script running to do that specifically (there is a thread on the forums where such a script exists if you would like that) you should however get an email immediately if something like the drive drops from the pool or the number of bad sectors start increasing.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You get emails when something goes wrong. If smart just passes you don't get emails. If smart fails you get a email.
 
Top