I noticed some error messages being reported from my NAS volume over the weekend when our backup system was unable to verify some files that had been copied. I also received this message from freenas to my email...
This is the first time I have experienced an issue like this and am hoping that someone can give me some pointers. I went ahead and checked the SMART status of each drive in the array using "smartctl -t short /dev/ada0" (once for each of the four drives) and after that a "smartctl -H /dev/ada0" reveals that "SMART overall-health self-assessment test result: PASSED" for each drive.
I then went ahead and performed a scrub using this command, "zpool scrub backups" which ran for a few hours successfully with this result...
So it seems as though the smart status of the drives is good and the errors have been repaired, however, the data on this machine is pretty important so I really want to be sure. I'm curious whether I should just trust that everything is ok now or try to troubleshoot this issue further. Since all four members of the drive seem to show repairs I have no idea how I would determine which drive is causing the problem. All four of these drives are only a couple months old.
I think my biggest concern is that the report that was emailed to me over the weekend said this ("scrub: scrub completed after 2h48m with 0 errors on Wed Feb 15 05:49:48 2012"). The fact that I ran another scrub today and errors are still being repaired concerns me.
Thanks in advance for any advice anyone can provide me on how to troubleshoot this issue.
Roark Holz
Code:
Checking status of zfs pools: pool: backups state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 2h48m with 0 errors on Wed Feb 15 05:49:48 2012 config: NAME STATE READ WRITE CKSUM backups ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 1 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 1 errors: No known data errors
This is the first time I have experienced an issue like this and am hoping that someone can give me some pointers. I went ahead and checked the SMART status of each drive in the array using "smartctl -t short /dev/ada0" (once for each of the four drives) and after that a "smartctl -H /dev/ada0" reveals that "SMART overall-health self-assessment test result: PASSED" for each drive.
I then went ahead and performed a scrub using this command, "zpool scrub backups" which ran for a few hours successfully with this result...
Code:
[root@freenas] /var/log# zpool status -v pool: backups state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 3h20m with 0 errors on Tue Feb 21 12:30:29 2012 config: NAME STATE READ WRITE CKSUM backups ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada0p2 ONLINE 0 0 2 86K repaired ada1p2 ONLINE 0 0 2 86K repaired ada2p2 ONLINE 0 0 1 43K repaired ada3p2 ONLINE 0 0 3 129K repaired errors: No known data errors
So it seems as though the smart status of the drives is good and the errors have been repaired, however, the data on this machine is pretty important so I really want to be sure. I'm curious whether I should just trust that everything is ok now or try to troubleshoot this issue further. Since all four members of the drive seem to show repairs I have no idea how I would determine which drive is causing the problem. All four of these drives are only a couple months old.
I think my biggest concern is that the report that was emailed to me over the weekend said this ("scrub: scrub completed after 2h48m with 0 errors on Wed Feb 15 05:49:48 2012"). The fact that I ran another scrub today and errors are still being repaired concerns me.
Thanks in advance for any advice anyone can provide me on how to troubleshoot this issue.
Roark Holz