ZFS scrub finding 'MEDIUM ERRORs' and 'unrecovered read errors' on two disks

Status
Not open for further replies.
Joined
Sep 5, 2017
Messages
8
Hello, I have a FreeNAS 8 setup that I plan to migrate soon. It ran into a scheduled scrub yesterday and during the scrub, it is finding a lot of 'MEDIUM ERRORs' on two disks of my zpool. My first instinct was to replace both of these disk but when I checked zpool status, the scrub is still running and they have status (repairing). Does that mean I don't have to intervene and that I can just let the scrub do its' work?

/var/log/dmesg.today

(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 42 ff d6 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b430087 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 69 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b48256d asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 bf 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4825bf asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 69 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b48256d asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 bf 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4825e9 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 69 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b48256d asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 69 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b48256d asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 69 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b48256d asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 79 ba 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b79bd asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 7a 2d 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b7a77 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 7a 2d 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b7a77 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 7a 2d 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b7a77 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 7a 2d 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b7a77 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 7a 2d 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b7a77 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4e 58 b4 0 0 c9 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4e58c9 asc:11,0 (Unrecovered read error) actual retry count: 277
(da9:mps0:0:17:0): READ(10). CDB: 28 0 38 ad 6f b1 0 0 e5 0
(da9:mps0:0:17:0): CAM status: SCSI Status Error
(da9:mps0:0:17:0): SCSI status: Check Condition
(da9:mps0:0:17:0): SCSI sense: MEDIUM ERROR info:38ad7006 asc:11,0 (Unrecovered read error) actual retry count: 277
(da9:mps0:0:17:0): READ(10). CDB: 28 0 38 ad 6f b1 0 0 e5 0
(da9:mps0:0:17:0): CAM status: SCSI Status Error
(da9:mps0:0:17:0): SCSI status: Check Condition
(da9:mps0:0:17:0): SCSI sense: MEDIUM ERROR info:38ad7006 asc:11,0 (Unrecovered read error) actual retry count: 277
(da9:mps0:0:17:0): READ(10). CDB: 28 0 38 ad 6f b1 0 0 e5 0
(da9:mps0:0:17:0): CAM status: SCSI Status Error
(da9:mps0:0:17:0): SCSI status: Check Condition
(da9:mps0:0:17:0): SCSI sense: MEDIUM ERROR info:38ad7006 asc:11,0 (Unrecovered read error) actual retry count: 277
(da9:mps0:0:17:0): READ(10). CDB: 28 0 38 ad 6f b1 0 0 e5 0
(da9:mps0:0:17:0): CAM status: SCSI Status Error
(da9:mps0:0:17:0): SCSI status: Check Condition
(da9:mps0:0:17:0): SCSI sense: MEDIUM ERROR info:38ad7006 asc:11,0 (Unrecovered read error) actual retry count: 277
(da9:mps0:0:17:0): READ(10). CDB: 28 0 38 ad 6f b1 0 0 e5 0
(da9:mps0:0:17:0): CAM status: SCSI Status Error



# zpool status
pool: data-1
state: ONLINE
scan: scrub in progress since Mon Oct 15 04:00:06 2018
20.4T scanned out of 38.7T at 193M/s, 27h39m to go
7.43M repaired, 52.68% done
config:

NAME STATE READ WRITE CKSUM
data-1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/a561d946-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a62215c0-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a6e15509-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a79fbb22-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a8604dd2-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a91ee918-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a9d23977-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/aa8e0dd8-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0 (repairing)
gptid/ab49d529-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0 (repairing)
gptid/ac069583-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/acc25cbe-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0

errors: No known data errors


smartctl output on first disk

smartctl -a /dev/da9
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor: WD
Product: WD4001FYYG-01SL3
Revision: VR07
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
Logical Unit id: 0x50000c0f0459ebb4
Serial number: WCC1F0439239
Device type: disk
Transport protocol: SAS
Local Time is: Tue Oct 16 10:49:53 2018 PDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 39 C
Drive Trip Temperature: 69 C
Manufactured in week 17 of year 2014
Specified cycle count over device lifetime: 1048576
Accumulated start-stop cycles: 13
Specified load-unload count over device lifetime: 1114112
Accumulated load-unload cycles: 42
Elements in grown defect list: 3

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 33686 4501 2879022 38187 14475 344951.070 9009
write: 375885 1100 1100 376985 1100 4990.270 0

Non-medium error count: 3377

SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 38041 - [- - -]
# 2 Background long Failed in segment --> 6 37947 949720328 [0x3 0x11 0x0]
# 3 Background short Completed - 37873 - [- - -]
# 4 Background short Completed - 37657 - [- - -]
# 5 Background long Failed in segment --> 6 37563 949720328 [0x3 0x11 0x0]
# 6 Background short Completed - 37489 - [- - -]
# 7 Background short Completed - 37322 - [- - -]
# 8 Background long Failed in segment --> 6 37228 949720328 [0x3 0x11 0x0]
# 9 Background short Completed - 37154 - [- - -]
#10 Background short Completed - 36915 - [- - -]
#11 Background long Failed in segment --> 4 36827 950896396 [0x3 0x11 0x0]
#12 Background short Completed - 36747 - [- - -]
#13 Background short Completed - 36595 - [- - -]
#14 Background long Failed in segment --> 6 36501 949720328 [0x3 0x11 0x0]
#15 Background short Completed - 36427 - [- - -]
#16 Background short Completed - 36188 - [- - -]
#17 Background long Failed in segment --> 6 36094 949720328 [0x3 0x11 0x0]
#18 Background short Completed - 36020 - [- - -]
#19 Background short Completed - 35852 - [- - -]
#20 Background long Failed in segment --> 6 35770 949720328 [0x3 0x11 0x0]


smartctl output on second disk

# smartctl -a /dev/da10
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor: WD
Product: WD4001FYYG-01SL3
Revision: VR07
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
Logical Unit id: 0x50000c0f045a3574
Serial number: WCC1F0450253
Device type: disk
Transport protocol: SAS
Local Time is: Tue Oct 16 10:49:49 2018 PDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 38 C
Drive Trip Temperature: 69 C
Manufactured in week 17 of year 2014
Specified cycle count over device lifetime: 1048576
Accumulated start-stop cycles: 13
Specified load-unload count over device lifetime: 1114112
Accumulated load-unload cycles: 41
Elements in grown defect list: 79

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 42925 10316 78653 53241 10423 344936.896 106
write: 372500 1084 3089 373584 1089 4988.426 0

Non-medium error count: 44

SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Aborted (by user command) - 38143 - [- - -]
# 2 Background short Completed - 38041 - [- - -]
# 3 Background long Failed in segment --> 6 37947 456748810 [0x3 0x11 0x0]
# 4 Background short Completed - 37873 - [- - -]
# 5 Background short Completed - 37657 - [- - -]
# 6 Background long Failed in segment --> 6 37563 456748810 [0x3 0x11 0x0]
# 7 Background short Completed - 37490 - [- - -]
# 8 Background short Completed - 37322 - [- - -]
# 9 Background long Failed in segment --> 6 37227 456765342 [0x3 0x11 0x0]
#10 Background short Completed - 37154 - [- - -]
#11 Background short Completed - 36914 - [- - -]
#12 Background long Failed in segment --> 6 36820 457472312 [0x3 0x11 0x0]
#13 Background short Completed - 36746 - [- - -]
#14 Background short Completed - 36595 - [- - -]
#15 Background long Completed - 36509 - [- - -]
#16 Background short Completed - 36427 - [- - -]
#17 Background short Completed - 36188 - [- - -]
#18 Background long Failed in segment --> 6 36093 456848313 [0x3 0x11 0x0]
#19 Background short Completed - 36020 - [- - -]
#20 Background short Completed - 35852 - [- - -]
 
Status
Not open for further replies.
Top