Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

ZFS scrub finding 'MEDIUM ERRORs' and 'unrecovered read errors' on two disks

Status
Not open for further replies.

wilhelmkeiserII

Neophyte
Joined
Sep 5, 2017
Messages
8
Hello, I have a FreeNAS 8 setup that I plan to migrate soon. It ran into a scheduled scrub yesterday and during the scrub, it is finding a lot of 'MEDIUM ERRORs' on two disks of my zpool. My first instinct was to replace both of these disk but when I checked zpool status, the scrub is still running and they have status (repairing). Does that mean I don't have to intervene and that I can just let the scrub do its' work?

/var/log/dmesg.today

(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 42 ff d6 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b430087 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 69 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b48256d asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 bf 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4825bf asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 69 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b48256d asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 bf 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4825e9 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 69 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b48256d asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 69 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b48256d asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 48 25 69 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b48256d asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 79 ba 0 0 1d 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b79bd asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 7a 2d 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b7a77 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 7a 2d 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b7a77 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 7a 2d 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b7a77 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 7a 2d 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b7a77 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4b 7a 2d 0 0 e5 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4b7a77 asc:11,0 (Unrecovered read error) actual retry count: 277
(da10:mps0:0:18:0): READ(10). CDB: 28 0 1b 4e 58 b4 0 0 c9 0
(da10:mps0:0:18:0): CAM status: SCSI Status Error
(da10:mps0:0:18:0): SCSI status: Check Condition
(da10:mps0:0:18:0): SCSI sense: MEDIUM ERROR info:1b4e58c9 asc:11,0 (Unrecovered read error) actual retry count: 277
(da9:mps0:0:17:0): READ(10). CDB: 28 0 38 ad 6f b1 0 0 e5 0
(da9:mps0:0:17:0): CAM status: SCSI Status Error
(da9:mps0:0:17:0): SCSI status: Check Condition
(da9:mps0:0:17:0): SCSI sense: MEDIUM ERROR info:38ad7006 asc:11,0 (Unrecovered read error) actual retry count: 277
(da9:mps0:0:17:0): READ(10). CDB: 28 0 38 ad 6f b1 0 0 e5 0
(da9:mps0:0:17:0): CAM status: SCSI Status Error
(da9:mps0:0:17:0): SCSI status: Check Condition
(da9:mps0:0:17:0): SCSI sense: MEDIUM ERROR info:38ad7006 asc:11,0 (Unrecovered read error) actual retry count: 277
(da9:mps0:0:17:0): READ(10). CDB: 28 0 38 ad 6f b1 0 0 e5 0
(da9:mps0:0:17:0): CAM status: SCSI Status Error
(da9:mps0:0:17:0): SCSI status: Check Condition
(da9:mps0:0:17:0): SCSI sense: MEDIUM ERROR info:38ad7006 asc:11,0 (Unrecovered read error) actual retry count: 277
(da9:mps0:0:17:0): READ(10). CDB: 28 0 38 ad 6f b1 0 0 e5 0
(da9:mps0:0:17:0): CAM status: SCSI Status Error
(da9:mps0:0:17:0): SCSI status: Check Condition
(da9:mps0:0:17:0): SCSI sense: MEDIUM ERROR info:38ad7006 asc:11,0 (Unrecovered read error) actual retry count: 277
(da9:mps0:0:17:0): READ(10). CDB: 28 0 38 ad 6f b1 0 0 e5 0
(da9:mps0:0:17:0): CAM status: SCSI Status Error



# zpool status
pool: data-1
state: ONLINE
scan: scrub in progress since Mon Oct 15 04:00:06 2018
20.4T scanned out of 38.7T at 193M/s, 27h39m to go
7.43M repaired, 52.68% done
config:

NAME STATE READ WRITE CKSUM
data-1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/a561d946-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a62215c0-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a6e15509-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a79fbb22-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a8604dd2-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a91ee918-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/a9d23977-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/aa8e0dd8-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0 (repairing)
gptid/ab49d529-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0 (repairing)
gptid/ac069583-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0
gptid/acc25cbe-fa35-11e3-98e0-90e2ba6dbb00 ONLINE 0 0 0

errors: No known data errors


smartctl output on first disk

smartctl -a /dev/da9
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor: WD
Product: WD4001FYYG-01SL3
Revision: VR07
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
Logical Unit id: 0x50000c0f0459ebb4
Serial number: WCC1F0439239
Device type: disk
Transport protocol: SAS
Local Time is: Tue Oct 16 10:49:53 2018 PDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 39 C
Drive Trip Temperature: 69 C
Manufactured in week 17 of year 2014
Specified cycle count over device lifetime: 1048576
Accumulated start-stop cycles: 13
Specified load-unload count over device lifetime: 1114112
Accumulated load-unload cycles: 42
Elements in grown defect list: 3

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 33686 4501 2879022 38187 14475 344951.070 9009
write: 375885 1100 1100 376985 1100 4990.270 0

Non-medium error count: 3377

SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 38041 - [- - -]
# 2 Background long Failed in segment --> 6 37947 949720328 [0x3 0x11 0x0]
# 3 Background short Completed - 37873 - [- - -]
# 4 Background short Completed - 37657 - [- - -]
# 5 Background long Failed in segment --> 6 37563 949720328 [0x3 0x11 0x0]
# 6 Background short Completed - 37489 - [- - -]
# 7 Background short Completed - 37322 - [- - -]
# 8 Background long Failed in segment --> 6 37228 949720328 [0x3 0x11 0x0]
# 9 Background short Completed - 37154 - [- - -]
#10 Background short Completed - 36915 - [- - -]
#11 Background long Failed in segment --> 4 36827 950896396 [0x3 0x11 0x0]
#12 Background short Completed - 36747 - [- - -]
#13 Background short Completed - 36595 - [- - -]
#14 Background long Failed in segment --> 6 36501 949720328 [0x3 0x11 0x0]
#15 Background short Completed - 36427 - [- - -]
#16 Background short Completed - 36188 - [- - -]
#17 Background long Failed in segment --> 6 36094 949720328 [0x3 0x11 0x0]
#18 Background short Completed - 36020 - [- - -]
#19 Background short Completed - 35852 - [- - -]
#20 Background long Failed in segment --> 6 35770 949720328 [0x3 0x11 0x0]


smartctl output on second disk

# smartctl -a /dev/da10
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor: WD
Product: WD4001FYYG-01SL3
Revision: VR07
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
Logical Unit id: 0x50000c0f045a3574
Serial number: WCC1F0450253
Device type: disk
Transport protocol: SAS
Local Time is: Tue Oct 16 10:49:49 2018 PDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 38 C
Drive Trip Temperature: 69 C
Manufactured in week 17 of year 2014
Specified cycle count over device lifetime: 1048576
Accumulated start-stop cycles: 13
Specified load-unload count over device lifetime: 1114112
Accumulated load-unload cycles: 41
Elements in grown defect list: 79

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 42925 10316 78653 53241 10423 344936.896 106
write: 372500 1084 3089 373584 1089 4988.426 0

Non-medium error count: 44

SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Aborted (by user command) - 38143 - [- - -]
# 2 Background short Completed - 38041 - [- - -]
# 3 Background long Failed in segment --> 6 37947 456748810 [0x3 0x11 0x0]
# 4 Background short Completed - 37873 - [- - -]
# 5 Background short Completed - 37657 - [- - -]
# 6 Background long Failed in segment --> 6 37563 456748810 [0x3 0x11 0x0]
# 7 Background short Completed - 37490 - [- - -]
# 8 Background short Completed - 37322 - [- - -]
# 9 Background long Failed in segment --> 6 37227 456765342 [0x3 0x11 0x0]
#10 Background short Completed - 37154 - [- - -]
#11 Background short Completed - 36914 - [- - -]
#12 Background long Failed in segment --> 6 36820 457472312 [0x3 0x11 0x0]
#13 Background short Completed - 36746 - [- - -]
#14 Background short Completed - 36595 - [- - -]
#15 Background long Completed - 36509 - [- - -]
#16 Background short Completed - 36427 - [- - -]
#17 Background short Completed - 36188 - [- - -]
#18 Background long Failed in segment --> 6 36093 456848313 [0x3 0x11 0x0]
#19 Background short Completed - 36020 - [- - -]
#20 Background short Completed - 35852 - [- - -]
 
Status
Not open for further replies.
Top