SOLVED Checksum after drive comes back online

Status
Not open for further replies.

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
Hello All,

Running FreeNAS 9212 with 8GB ram.

An Alert indicated that a the volume was degraded. In volume status one of the drives in a volume was offline.

After powering down the server and re-seating the drive it came back online.

A new Alert states that "The volume status is: Online
One or more drives experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. "

In volume status the Checksum field is incrementing for that drive, currently at 3.45k.

Is this a resilvering process or is this an error count?

Thanks in advance for help.

~eric
 
D

dlavigne

Guest
Please post the output of zpool status within code tags.
 

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
Please post the output of zpool status within code tags.
Okay, here it is:
Code:
 pool: vol2                                                                   
state: ONLINE                                                                 
status: One or more devices has experienced an unrecoverable error.  An        
        attempt was made to correct the error.  Applications are unaffected.   
action: Determine if the device needs to be replaced, and clear the errors     
        using 'zpool clear' or replace the device with 'zpool replace'.        
   see: http://illumos.org/msg/ZFS-8000-9P                                     
  scan: scrub repaired 0 in 8h5m with 0 errors on Sun Jan  4 08:06:41 2015     
config:                                                                        
                                                                               
        NAME                                            STATE     READ WRITE CKS
UM                                                                             
        vol2                                            ONLINE       0     0   
0                                                                             
          mirror-0                                      ONLINE       0     0   
0                                                                             
            gptid/e2a6e3f3-b6b6-11e3-aff2-000c299bb030  ONLINE       0     0  25
1K                                                                             
            gptid/e396afa0-b6b6-11e3-aff2-000c299bb030  ONLINE       0     0   
0                                                                             
                                                                               
errors: No known data errors 
 
D

dlavigne

Guest
And did you replace the bad drive using the instructions in the docs?
 

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
And did you replace the bad drive using the instructions in the docs?

"After powering down the server and re-seating the drive it came back online." So I incorrectly saw no need to "replace" it.

However, now when looking at Volume Status I see the Replace button, and when clicked, the option to Replace disk da2 with Member disk "In-place" [da2].

Clicking that took the disk offline, after an error.
Then I Wipe the drive it to add it back in.
Then I removed the duplicated offline disk (with the checksum errors) from the volume.

The volume vol2 (ZFS) status is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.Wait for the resilver to complete.

Is there a place to see the status of resilvering?

Thank you,

~eric
 

ivec

Cadet
Joined
Apr 7, 2015
Messages
1
Thanks Elangley,

This forum post was gold :) I added more details to what you provided, i just had the same issue, I did not replace the Disk, i just wiped it and re-added to the volume

1) In GUI, Go to Storage, click on the volume that has the bad Disk and click on "Volume Status" select the disk in question and hit "replace"(the disk will stay online, the "replace" will fail, that is okay)
2) ssh to the freenas server as root and execute this command (sysctl kern.geom.debugflags=0x10)
3) Change directories: #cd /dev <- get into the folder that holds the ada* device
4) run the following command, Make sure you specify the right disk "adan" where n is the disk number(0,1,2...n), whichever is bad (dd if=zero of=ada* bs=1G) <- starts to zero out the drive (let run for 2 minutes then I used Control + C to break out of the format - took 45 seconds to stop)
5) Go back to Storage, click on the volume that has the bad Disk and click on "Volume Status" select the disk in question and hit "replace" again(new disk adan will be added and it will be displayed as ada*, the old one will be display the serial number)
6) remove the old(bad) HDD(click on detach)
7) in ssh run #zpool status (monitor the resilvered state), wait for it to complete

[root@ech-nas01] /# zpool status
pool: ech-esxi-vm
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Jun 2 22:25:25 2015
15.8G scanned out of 84.0G at 36.4M/s, 0h31m to go
15.8G resilvered, 18.76% done
config:

NAME STATE READ WRITE CKSUM
ech-esxi-vm ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/9b5402d9-0997-11e5-bd1a-0025904784a1 ONLINE 0 0 0 (resilvering)
gptid/b18ed218-e556-11e4-a04e-0025904784a1 ONLINE 0 0 0

errors: No known data errors

8) run scrub on the zpool(optional) do it at least once a month, make sure you have a scheduled monthly task for it.
 
Status
Not open for further replies.
Top