Problems with RAID Volume, need advice please...

Status
Not open for further replies.

roarkh

Cadet
Joined
Feb 21, 2012
Messages
3
I noticed some error messages being reported from my NAS volume over the weekend when our backup system was unable to verify some files that had been copied. I also received this message from freenas to my email...

Code:
Checking status of zfs pools:
 pool: backups
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 2h48m with 0 errors on Wed Feb 15 05:49:48 2012
config:

	NAME        STATE     READ WRITE CKSUM
	backups     ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ada0p2  ONLINE       0     0     0
	    ada1p2  ONLINE       0     0     1
	    ada2p2  ONLINE       0     0     0
	    ada3p2  ONLINE       0     0     1

errors: No known data errors


This is the first time I have experienced an issue like this and am hoping that someone can give me some pointers. I went ahead and checked the SMART status of each drive in the array using "smartctl -t short /dev/ada0" (once for each of the four drives) and after that a "smartctl -H /dev/ada0" reveals that "SMART overall-health self-assessment test result: PASSED" for each drive.

I then went ahead and performed a scrub using this command, "zpool scrub backups" which ran for a few hours successfully with this result...

Code:
[root@freenas] /var/log# zpool status -v
  pool: backups
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 3h20m with 0 errors on Tue Feb 21 12:30:29 2012
config:

	NAME        STATE     READ WRITE CKSUM
	backups     ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ada0p2  ONLINE       0     0     2  86K repaired
	    ada1p2  ONLINE       0     0     2  86K repaired
	    ada2p2  ONLINE       0     0     1  43K repaired
	    ada3p2  ONLINE       0     0     3  129K repaired

errors: No known data errors


So it seems as though the smart status of the drives is good and the errors have been repaired, however, the data on this machine is pretty important so I really want to be sure. I'm curious whether I should just trust that everything is ok now or try to troubleshoot this issue further. Since all four members of the drive seem to show repairs I have no idea how I would determine which drive is causing the problem. All four of these drives are only a couple months old.

I think my biggest concern is that the report that was emailed to me over the weekend said this ("scrub: scrub completed after 2h48m with 0 errors on Wed Feb 15 05:49:48 2012"). The fact that I ran another scrub today and errors are still being repaired concerns me.

Thanks in advance for any advice anyone can provide me on how to troubleshoot this issue.

Roark Holz
 

peterh

Patron
Joined
Oct 19, 2011
Messages
315
I would create a backup asap, then figure out if it's the cables, the controller the disks themself or mobo that is broken
You have problems don't continue without fixing the system.

if smart ( smartctl -a <dev> ) does not show problems with the disk itself you can have a cable/controller problem.
 
Status
Not open for further replies.
Top