1.) FreeNas Version: FreeNAS-9.10-STABLE-201606270534 (dd17351)
2.) Hardware:
Let me preface that I've got SMART Tests and Scrubs setup according to Cyberjock's suggestions. Yesterday, my bi-monthly scrub encountered an unrecoverable error. It was 1 read error, but it said the scrub was able to repair itself with 0 errors. I didn't think much of it. I cleared the error and ran another scrub after the error was cleared and it found no errors.
This morning, I awoke to another error, different drive, it was a Write Error instead of a Write error. I copy pasted that one below:
[root@Titan ~]# zpool status
pool: MyVolume
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 24K in 9h43m with 0 errors on Wed Jun 29 06:54:59 2016
config:
NAME STATE READ WRITE CKSUM
MyVolume ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/64c807a0-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/6529a774-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/6580e983-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/65dbc685-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/66383d7a-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/6699e15e-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 1 0
gptid/66f8757b-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/675834f9-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
errors: No known data errors
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0h3m with 0 errors on Tue May 31 03:48:22 2016
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/028346a5-0d3b-11e5-837a-6c0b8409b5a4 ONLINE 0 0 0
errors: No known data errors
Now, I suspect that I could probably clear the error and run another scrub and maybe it would be fine, maybe it wouldn't, but I'd like to know, should I be concerned? These are large drives with roughly 8TB of data on them and part of me expects there to be an error here or there, but I'm relatively new to FreeNas and don't want to be ignoring tell-tell signs of something bad to come. I've seen these errors before with 1 read or 1 write error before (maybe about 3 other times in 12 months). Could someone advise or elaborate as to what may be going on or what I should do?
My SMART tests run according to Cyberjock's settings as well and all their last runs went through fine with no errors.
I'm not sure if it could be related, but when I pull up the status shell thing on Freenas, I see the following:
I know this is a TON of stuff I'm asking, but I'm here to learn so hopefully you all can help. Thank you in advance!
2.) Hardware:
- Lenovo TS440 Server
- Intel Xeon CPU E3-1245 V3 @ 3.40Ghz
- 32GB Crucial ECC Memory
- Mobo - Not sure on Mobo but it came w/ Lenovo TS440 Server
- 8x 5TB WD Red HDD in RaidZ2 (Roughly 40TB Raw, 27TB Accessible after Z2)
- RAID Controller: RAID 500 (Discrete, 0/1/10) - AKA MegaRAID 9240-8i RAID Controller Card
Let me preface that I've got SMART Tests and Scrubs setup according to Cyberjock's suggestions. Yesterday, my bi-monthly scrub encountered an unrecoverable error. It was 1 read error, but it said the scrub was able to repair itself with 0 errors. I didn't think much of it. I cleared the error and ran another scrub after the error was cleared and it found no errors.
This morning, I awoke to another error, different drive, it was a Write Error instead of a Write error. I copy pasted that one below:
[root@Titan ~]# zpool status
pool: MyVolume
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 24K in 9h43m with 0 errors on Wed Jun 29 06:54:59 2016
config:
NAME STATE READ WRITE CKSUM
MyVolume ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/64c807a0-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/6529a774-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/6580e983-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/65dbc685-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/66383d7a-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/6699e15e-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 1 0
gptid/66f8757b-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
gptid/675834f9-0ef4-11e5-9200-6c0b8409b5a4 ONLINE 0 0 0
errors: No known data errors
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0h3m with 0 errors on Tue May 31 03:48:22 2016
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/028346a5-0d3b-11e5-837a-6c0b8409b5a4 ONLINE 0 0 0
errors: No known data errors
Now, I suspect that I could probably clear the error and run another scrub and maybe it would be fine, maybe it wouldn't, but I'd like to know, should I be concerned? These are large drives with roughly 8TB of data on them and part of me expects there to be an error here or there, but I'm relatively new to FreeNas and don't want to be ignoring tell-tell signs of something bad to come. I've seen these errors before with 1 read or 1 write error before (maybe about 3 other times in 12 months). Could someone advise or elaborate as to what may be going on or what I should do?
My SMART tests run according to Cyberjock's settings as well and all their last runs went through fine with no errors.
I'm not sure if it could be related, but when I pull up the status shell thing on Freenas, I see the following:
Code:
Jun 28 20:33:49 Titan (da7:mps0:0:7:0): WRITE(10). CDB: 2a 00 20 7f 17 08 00 00 08 00 Jun 28 20:33:49 Titan (da7:mps0:0:7:0): CAM status: SCSI Status Error Jun 28 20:33:49 Titan (da7:mps0:0:7:0): SCSI status: Check Condition Jun 28 20:33:49 Titan (da7:mps0:0:7:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) Jun 28 20:33:49 Titan (da7:mps0:0:7:0): Info: 0x207f1708 Jun 28 20:33:49 Titan (da7:mps0:0:7:0): Error 22, Unretryable error Jun 28 20:43:50 Titan smartd[2721]: Warning via /usr/local/www/freenasUI/tools/smart_alert.py to edwardcantuii@gmail.com produced unexpected output (114 bytes) to STDOUT/STDERR: Jun 28 20:43:50 Titan smartd[2721]: usage: smart_alert.py [-h] [-d DEV] [-s S] Jun 28 20:43:50 Titan smartd[2721]: smart_alert.py: error: unrecognized arguments: edwardcantuii@gmail.com Jun 28 20:43:50 Titan smartd[2721]: Warning via /usr/local/www/freenasUI/tools/smart_alert.py to edwardcantuii@gmail.com: failed (32-bit/8-bit exit status: 512/2) Jun 28 21:36:48 Titan (da4:mps0:0:4:0): READ(10). CDB: 28 00 1a 26 11 90 00 00 28 00 length 20480 SMID 971 terminated ioc 804b scsi 0 state 0 xfer 0 Jun 28 21:36:48 Titan (da4:mps0:0:4:0): READ(10). CDB: 28 00 9e 56 2d 70 00 00 28 00 length 20480 SMID 595 terminated ioc 804b scsi 0 state 0 xfer (da4:mps0:0:4:0): READ(10). CDB: 28 00 1a 26 11 90 00 00 28 00 Jun 28 21:36:48 Titan 0 Jun 28 21:36:48 Titan (da4:mps0:0:4:0): CAM status: CCB request completed with an error Jun 28 21:36:48 Titan (da4:mps0:0:4:0): READ(10). CDB: 28 00 9e 56 2d a0 00 00 28 00 length 20480 SMID 147 terminated ioc 804b scsi 0 state 0 xfer (da4:0 Jun 28 21:36:48 Titan mps0:0:4:0): Retrying command Jun 28 21:36:48 Titan (da4:mps0:0:4:0): READ(10). CDB: 28 00 9e 56 2d 18 00 00 28 00 length 20480 SMID 261 terminated ioc 804b scsi 0 state 0 xfer (da4:mps0:0:4:0): READ(10). CDB: 28 00 9e 56 2d 70 00 00 28 00 Jun 28 21:36:48 Titan 0 Jun 28 21:36:48 Titan (da4:mps0:0:4:0): CAM status: CCB request completed with an error Jun 28 21:36:48 Titan (da4:mps0:0:4:0): READ(10). CDB: 28 00 9e 56 2d d8 00 00 30 00 length 24576 SMID 137 terminated ioc 804b scsi 0 state 0 xfer (da4:0 Jun 28 21:36:48 Titan mps0:0:4:0): Retrying command Jun 28 21:36:48 Titan (da4:mps0:0:4:0): READ(10). CDB: 28 00 9e 56 2d a0 00 00 28 00 Jun 28 21:36:48 Titan (da4:mps0:0:4:0): CAM status: CCB request completed with an error Jun 28 21:36:48 Titan (da4:mps0:0:4:0): Retrying command Jun 28 21:36:48 Titan (da4:mps0:0:4:0): READ(10). CDB: 28 00 9e 56 2d 18 00 00 28 00 Jun 28 21:36:48 Titan (da4:mps0:0:4:0): CAM status: CCB request completed with an error Jun 28 21:36:48 Titan (da4:mps0:0:4:0): Retrying command Jun 28 21:36:48 Titan (da4:mps0:0:4:0): READ(10). CDB: 28 00 9e 56 2d d8 00 00 30 00 Jun 28 21:36:48 Titan (da4:mps0:0:4:0): CAM status: CCB request completed with an error Jun 28 21:36:48 Titan (da4:mps0:0:4:0): Retrying command Jun 28 21:36:48 Titan (da4:mps0:0:4:0): READ(10). CDB: 28 00 1a 26 11 b8 00 00 30 00 Jun 28 21:36:48 Titan (da4:mps0:0:4:0): CAM status: SCSI Status Error Jun 28 21:36:48 Titan (da4:mps0:0:4:0): SCSI status: Check Condition Jun 28 21:36:48 Titan (da4:mps0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) Jun 28 21:36:48 Titan (da4:mps0:0:4:0): Info: 0x1a2611b8 Jun 28 21:36:48 Titan (da4:mps0:0:4:0): Error 5, Unretryable error Jun 29 00:00:00 Titan syslog-ng[1438]: Configuration reload request received, reloading configuration; Jun 29 01:34:49 Titan zfsd: CaseFile::Serialize: Unable to open /etc/zfs/cases/pool_17446544915582231449_vdev_9123050385180188811.case. Jun 29 16:02:53 Titan (da5:mps0:0:5:0): READ(10). CDB: 28 00 54 d7 6c 70 00 00 28 00 length 20480 SMID 147 terminated ioc 804b scsi 0 state 0 xfer 0 Jun 29 16:02:53 Titan (da5:mps0:0:5:0): READ(10). CDB: 28 00 54 d7 6c 70 00 00 28 00 Jun 29 16:02:53 Titan (da5:mps0:0:5:0): CAM status: CCB request completed with an error Jun 29 16:02:53 Titan (da5:mps0:0:5:0): Retrying command Jun 29 16:02:53 Titan (da5:mps0:0:5:0): WRITE(10). CDB: 2a 00 23 f7 69 f8 00 00 30 00 Jun 29 16:02:53 Titan (da5:mps0:0:5:0): CAM status: SCSI Status Error Jun 29 16:02:53 Titan (da5:mps0:0:5:0): SCSI status: Check Condition Jun 29 16:02:53 Titan (da5:mps0:0:5:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) Jun 29 16:02:53 Titan (da5:mps0:0:5:0): Info: 0x23f769f8 Jun 29 16:02:53 Titan (da5:mps0:0:5:0): Error 22, Unretryable error
I know this is a TON of stuff I'm asking, but I'm here to learn so hopefully you all can help. Thank you in advance!