I built my first FreeNAS server with six 4TB WD Red drives about 2 months ago. The drives are in a RAIDZ2 array. A few days ago I got a reported SMART 'read failure' error on one of the disks after an extended test, so I ran a zpool scrub and 256K of data was repaired. I ran another SMART extended test and a 'read failure' occurred at a different LBA, and I'm in the process of running another zpool scrub in which another 256K of data has been repaired so far. Should I take this as sign of the disk starting to go bad, or is this typical wear and tear behavior?
I actually have three new 4TB WD Red drives that are currently undergoing disk stress testing in a separate Linux machine. I was planning on building a separate RAIDZ2 with eight total disks with them after I get five more drives, but I'm in no rush to get that done. Should I just use one of these disks to resilver the array and RMA the disk with the errors?
I'm still new to all this so thanks for all your help. Also, the server uses ECC RAM if that is helpful. I am enclosing the output of smartctl and zpool status below:
cmd: smartctl -a /dev/da5
cmd: zpool status
I actually have three new 4TB WD Red drives that are currently undergoing disk stress testing in a separate Linux machine. I was planning on building a separate RAIDZ2 with eight total disks with them after I get five more drives, but I'm in no rush to get that done. Should I just use one of these disks to resilver the array and RMA the disk with the errors?
I'm still new to all this so thanks for all your help. Also, the server uses ECC RAM if that is helpful. I am enclosing the output of smartctl and zpool status below:
cmd: smartctl -a /dev/da5
Code:
SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 15 3 Spin_Up_Time 0x0027 187 176 021 Pre-fail Always - 7641 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 47 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 701 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 47 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 41 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 1530 194 Temperature_Celsius 0x0022 119 118 000 Old_age Always - 33 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 7 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 10% 678 24681960 # 2 Extended offline Completed: read failure 90% 676 24681960 # 3 Short offline Completed: read failure 10% 666 24681960 # 4 Short offline Completed without error 00% 656 - # 5 Short offline Completed without error 00% 630 - # 6 Extended offline Completed: read failure 90% 621 24682008 # 7 Short offline Completed without error 00% 618 - # 8 Short offline Completed without error 00% 606 - # 9 Short offline Completed without error 00% 594 - #10 Short offline Completed without error 00% 582 - #11 Short offline Completed without error 00% 570 - #12 Short offline Completed without error 00% 557 - #13 Short offline Completed without error 00% 556 - #14 Short offline Completed without error 00% 544 - #15 Short offline Completed without error 00% 532 - #16 Short offline Completed without error 00% 520 - #17 Short offline Completed without error 00% 508 - #18 Short offline Completed without error 00% 496 - #19 Short offline Completed without error 00% 484 - #20 Short offline Completed without error 00% 472 - #21 Short offline Completed without error 00% 460 -
cmd: zpool status
Code:
pool: tank1 state: ONLINE scan: scrub in progress since Fri Dec 20 21:53:03 2013 7.75T scanned out of 12.2T at 175M/s, 7h20m to go 256K repaired, 63.70% done config: NAME STATE READ WRITE CKSUM tank1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/6177eafb-4c2c-11e3-a43a-002590d72d3b ONLINE 0 0 0 gptid/626456f8-4c2c-11e3-a43a-002590d72d3b ONLINE 0 0 0 gptid/634c34ad-4c2c-11e3-a43a-002590d72d3b ONLINE 0 0 0 gptid/64368d6a-4c2c-11e3-a43a-002590d72d3b ONLINE 0 0 0 gptid/6524b559-4c2c-11e3-a43a-002590d72d3b ONLINE 0 0 0 gptid/66130c69-4c2c-11e3-a43a-002590d72d3b ONLINE 0 0 0 (repairing) logs gptid/349c2eca-4c2d-11e3-a43a-002590d72d3b ONLINE 0 0 0 errors: No known data errors