Degraded volume didn't repair correctly.

Status
Not open for further replies.

nateniu

Cadet
Joined
Apr 4, 2013
Messages
2
So I have a FreeNAS 8.3.1 setup with 4 1TB hard drives formatted with ZFS. I put about 300GB of data on the volume and wanted to test a crash situation. I powered down the system removed a hard drive and formatted it. Powered the system up, and put more data on the remaining 3 drives as the volume was degraded. I powered down the machine and put the formatted (blank but the same) hd back in.

- FreeNAS listed the HD as UNAVAILABLE in the volume manager. I could't put the hd in offline status or use the replace feature. And error would pop up in the GUI and fade away to fast for me to get.

-I put the hard drive offline offline by zplool offline etc.
-I then did the zpool replace volumename serialnumber /dev/ada2
Rresilvering started running once I went into volume manager. Things looked great, but after it was done there were 3094 errors and the volume is still degraded.

- I then ran a scrub command on the voulme and now this is what I'm left with.
Code:
pool: MainVolume                                                              
 state: DEGRADED                                                                
status: One or more devices has experienced an error resulting in data          
        corruption.  Applications may be affected.                              
action: Restore the file in question if possible.  Otherwise restore the        
        entire pool from backup.                                                
   see: http://www.sun.com/msg/ZFS-8000-8A                                      
  scan: scrub repaired 0 in 0h53m with 3097 errors on Thu Apr  4 23:04:08 2013  
config:                                                                         
                                                                                
        NAME                                            STATE     READ WRITE CKS
UM                                                                              
        MainVolume                                      DEGRADED 41.1K     0 4.2
0K                                                                              
          raidz1-0                                      DEGRADED  107K     0 13.
8K                                                                              
            gptid/72a3cc6e-8c59-11e2-ab9e-2c768aabdb20  ONLINE       0     0    
 0                                                                              
            gptid/7303af6b-8c59-11e2-ab9e-2c768aabdb20  ONLINE       0     0    
 0                                                                              
            replacing-2                                 DEGRADED     0     0    
 0                                                                              
              15104135553137921540                      OFFLINE      0     0    
 0  was /dev/gptid/7367cca7-8c59-11e2-ab9e-2c768aabdb20                         
              ada2                                      ONLINE       0     0    
 0                                                                              
            14065391872938856016                        UNAVAIL     50 8.26K    
 0  was /dev/gptid/741dec2e-8c59-11e2-ab9e-2c768aabdb20                         
                                                                                
errors: 3094 data errors, use '-v' for a list 


15104135553137921540 was ada2 the drive I cleared on purpose.

Now ada3 is gone??? Where did I go wrong in this process, is there a way to fix this at this point. I don't know what replacing-2 is, but things look in bad shape. However it's still just degraded and I can access data from it no problem.

Sorry, in the n00b phase but I'm stuck now.
-Nate
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
One of your disks is broken. See the disk with 50 read errors and 8.26k write errors. That disk is probably bad, or something else happened you didn't mention(or maybe didn't even notice).

In any case, I'd check SMART info and see what is going on. Something is wrong hardware-wise.

The proper way to do the test you were trying to do was to shutdown the server, pull a disk and zero it out. The ZFS information isn't always erased by formatting the disk. If the ZFS information remains it can cause problems because FreeNAS thinks the disk is part of a zpool and doesn't let you add it to a new zpool.

If you can't figure out what's wrong with the hard drive I'd delete the zpool, zero out all 4 drives, then make a new zpool and redo your test as I mentioned above. Very few people are as thorough as you are(good job!) and quite a few make the mistake you made. But the real trick is to zero the disks.

Give that a go and see what happens.

Also, if you look you see it said corruption has occurred. If you do a zpool status -v it'll list everything that's corrupt. The list is probably pretty long. The corruption is because you had 1 disk that was readded to the pool and was out of sync so you had no redundancy and the second disk had the read and write errors and you had no additional redundancy. This is why I always recommend RAIDZ2. Cool huh?
 

nateniu

Cadet
Joined
Apr 4, 2013
Messages
2
Sorry for the late update but zero'ing out hard drives takes forever.

I followed your advice and the second trail worked just fine.

1. Built the volume
2. Put data on it (but used z2)
3. Powered down removed a hard drive and erased it with zeros (one of 4)
4. Powered up put more data in the volume during the degraded stage
5. Powered down put the zero hard drive in and replaced the drive in the volume.

Back to a healthy volume, I did find the option to use zpool2, not sure what that is yet but I went for it anyways. So gonna look into that more.

I feel pretty confident but def want to look into doing some type of sync thing with amazon glacier. Learned that sometimes you can get it back and sometimes you can't; the backups of backups never end. Interesting thing though didn't get any write read errors though this round. SMART tests are loaded for all drives, but all seem healthy. While zero'ing out the data I used Western Digital's utilities too. No errors reported.

Wish I had 5 drives in this system, I'd feel a little more comfortable I'm about to dump all my digital documents, music, movies, etc into this. But this was a good exercise.
 
Status
Not open for further replies.
Top