So one of my 2Y+ old 4TB WD-REDs in my HP N36L (specs in sig) got this during a bi-weekly scrub
Which zfs repaired (timestamps match exactly)
However, the disk passed a long SMART test the very next day, although the error is logged in SMART data (smartctl -x /dev/ada0)
As somewhat of an aside, shouldn't this be detected by SMART tests, since the error is logged in SMART data? I got to know about this from the "daily security run output" email containing kernel logs.
Even after extensive google'ing, I'm not sure if this is a failing disk, something to do with loose cables/PSU, or just a one-off disk error that has been totally handled by ZFS. But considering this is a backup & RAID-Z2, I guess there's no need to press the RMA button yet? And in any case since nothing too bad shows up in "smartctl -a /dev/ada0", I guess it won't qualify anyway.
Any thoughts? Am I reading the situation correctly? Btw, is there a link to WD's RMA policy/process somewhere?
Regards,
Saurav.
Code:
Jan 16 04:42:52 (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f8 fb f8 40 04 00 00 01 00 00 Jan 16 04:42:52 (ada0:ahcich0:0:0:0): CAM status: ATA Status Error Jan 16 04:42:52 (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC ) Jan 16 04:42:52 (ada0:ahcich0:0:0:0): RES: 41 40 a0 fc f8 40 04 00 00 00 00 Jan 16 04:42:52 (ada0:ahcich0:0:0:0): Retrying command Jan 16 04:42:56 (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f8 fb f8 40 04 00 00 01 00 00 Jan 16 04:42:56 (ada0:ahcich0:0:0:0): CAM status: ATA Status Error Jan 16 04:42:56 (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC ) Jan 16 04:42:56 (ada0:ahcich0:0:0:0): RES: 41 40 a0 fc f8 40 04 00 00 00 00 Jan 16 04:42:56 (ada0:ahcich0:0:0:0): Retrying command Jan 16 04:42:59 (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f8 fb f8 40 04 00 00 01 00 00 Jan 16 04:42:59 (ada0:ahcich0:0:0:0): CAM status: ATA Status Error Jan 16 04:42:59 (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC ) Jan 16 04:42:59 (ada0:ahcich0:0:0:0): RES: 41 40 a0 fc f8 40 04 00 00 00 00 Jan 16 04:42:59 (ada0:ahcich0:0:0:0): Retrying command Jan 16 04:43:03 (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f8 fb f8 40 04 00 00 01 00 00 Jan 16 04:43:03 (ada0:ahcich0:0:0:0): CAM status: ATA Status Error Jan 16 04:43:03 (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC ) Jan 16 04:43:03 (ada0:ahcich0:0:0:0): RES: 41 40 a0 fc f8 40 04 00 00 00 00 Jan 16 04:43:03 (ada0:ahcich0:0:0:0): Retrying command Jan 16 04:43:06 (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f8 fb f8 40 04 00 00 01 00 00 Jan 16 04:43:06 (ada0:ahcich0:0:0:0): CAM status: ATA Status Error Jan 16 04:43:06 (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC ) Jan 16 04:43:06 (ada0:ahcich0:0:0:0): RES: 41 40 a0 fc f8 40 04 00 00 00 00 Jan 16 04:43:06 (ada0:ahcich0:0:0:0): Error 5, Retries exhausted
Which zfs repaired (timestamps match exactly)
Code:
pool: tank state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub repaired 128K in 4h56m with 0 errors on Mon Jan 16 04:57:02 2017 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/39f2dbfd-4794-11e4-8a24-68b59972b65f ONLINE 0 0 0 gptid/3adbcda1-4794-11e4-8a24-68b59972b65f ONLINE 0 0 0 gptid/3bc8e677-4794-11e4-8a24-68b59972b65f ONLINE 0 0 0 gptid/3cb63aab-4794-11e4-8a24-68b59972b65f ONLINE 0 0 0
However, the disk passed a long SMART test the very next day, although the error is logged in SMART data (smartctl -x /dev/ada0)
As somewhat of an aside, shouldn't this be detected by SMART tests, since the error is logged in SMART data? I got to know about this from the "daily security run output" email containing kernel logs.
Even after extensive google'ing, I'm not sure if this is a failing disk, something to do with loose cables/PSU, or just a one-off disk error that has been totally handled by ZFS. But considering this is a backup & RAID-Z2, I guess there's no need to press the RMA button yet? And in any case since nothing too bad shows up in "smartctl -a /dev/ada0", I guess it won't qualify anyway.
Any thoughts? Am I reading the situation correctly? Btw, is there a link to WD's RMA policy/process somewhere?
Regards,
Saurav.