Scrubbing always resilvers 100% of drive

Status
Not open for further replies.

CheckitTwice

Cadet
Joined
Oct 19, 2013
Messages
5
I have a raidz2 (FreeNAS-8.2.0-BETA4-x64 (r11722)) with 6 2 tb drives. A while ago a drive went bad, and I replaced it with no problems. A few weeks ago, 2 drives went bad in a day. I replaced them both successfully (whew!), but now I have this odd problem:

Whenever I scrub the volume, these 2 drives get 100% resilvered every time. If I do a zpool status before, it tells me all is well. The freenas web gui tells me the volume is healthy, and lists all the drives.

Yet if I do a scrub, those 2 drives get resilvered. I don't know if it's 100%, but when I look at the amount of resilvering, it approaches the size of the disk. Both drives always show the same amount of resilvering as it progresses through the scrub.

First, I'm worried that these 2 drives aren't really contributing to the redundancy of the system (otherwise why would they need to be resilvered?), and if I lose another drive, I'll lose all my data.

Secondly, all this resilvering is slowing down my system.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Why would you continue to an OLD beta version. If you want to stay with 8.x, upgrade to 8.3.2
 

CheckitTwice

Cadet
Joined
Oct 19, 2013
Messages
5
Thanks for your reply.
Is this the best time to upgrade, while I'm having problems with my drives? I am not in love with 8.x in particular. What would be the best version to upgrade to?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Thanks for your reply.
Is this the best time to upgrade, while I'm having problems with my drives? I am not in love with 8.x in particular. What would be the best version to upgrade to?

I know what you are asking. First instinct is not to add more complexities when things aren't going 100%. But the reality is that there isn't much risk for you because FreeNAS is setup like an appliance. I'd definitely upgrade to 8.3.2 or 9.1.1 for the newer ZFS code and go from there. I wouldn't update the zpool version , but you should be able to do your resilver and have everything work for you without the upgrade. I haven't upgraded mine since 9.x and I've been on 9.x for 2 months.

You are right though, I'd be worried that if you lose a disk you'll lose your pool. It is quite possible that you are in that situation. What's your hardware specs?
 

CheckitTwice

Cadet
Joined
Oct 19, 2013
Messages
5
I think I found the problem. I had 14 permanent errors in different files that were left over from when my 2 drives died. Every time I scrubbed (and resilvered), it eventually gave me these errors again. After reading some other threads, I decided to delete these 14 files. When I looked at the status after that, it gave me the <drive id><hex code> list instead of the files. I then scrubbed again, and again, got the resilvering. This time the status at the end, though, it said resilvering complete. A subsequent scrub didn't cause any resilvering.

My take is that because of the errors, it didn't consider the resilvering complete, so every scrub tried to do the silvering again, but with no success. Once the errors were removed, the silvering was considered successful.

To my mind, this is fairly confusing. If the permanent errors keep things from being considered resilvered, it would be nice to give a message to that effect before it went about trying to resilver, like "please fix your permanent errors, these will keep the resilvering from taking". I probably resilvered those 2 drives 10 times before I figured this out.

By the way, I tried to install 9.1, and that was a total fail. I used the ram drive image, and tried to boot. I watched it try to do dhcp via each and every one of my USB's even though none of them is hooked up to a LAN. This was nice and slow, since I suppose it was waiting for a timeout from the DHCP server each time. I got other errors as well, and in the end, I had no LAN access. Apparently the hardware I have isn't compatible with the new release. But that's the subject for another thread. My old 8.2.0-BETA4 is still working fine.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I think your logic is flawed. What is more likely is that the zpool metadata corruption was causing a false impression that the resilver wasn't complete.

ZFS is designed to prevent corruption so long as you have sufficient redundancy. As soon as you lose that redundancy, all hell can break loose. More than likely your metadata was causing some kind of loop in the resilvering operation causing the resilver to appear completed, but not really completing. On reboot ZFS realized something is wrong and tries to resilver again.
 

CheckitTwice

Cadet
Joined
Oct 19, 2013
Messages
5
You could be right, but it sounds like we might be saying the same thing.

Once I deleted those "permanent error" files, and resilvered, all was well. Is that different than the metadata, and if so, how did deleting those files make a difference?

BTW, just got done reading the raid5/raidz1 is dead article linked in your signature, and it got me worrying. That article was kind of old, so I wonder if the URE has gotten any better since then.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
You could be right, but it sounds like we might be saying the same thing.

Once I deleted those "permanent error" files, and resilvered, all was well. Is that different than the metadata, and if so, how did deleting those files make a difference?

By deleting the files you deleted the metadata associated with those files. The reality is that for most people, if zpool status shows permanent errors you should be focusing on those before resilvering. You didn't, hence your confusion.

BTW, just got done reading the raid5/raidz1 is dead article linked in your signature, and it got me worrying. That article was kind of old, so I wonder if the URE has gotten any better since then.

You can look at the UREs for most hard drives by checking out the manufacturer's website. But the short answer is "no". They aren't getting better as fast as disk space is growing. So things are still getting worse.
 

CheckitTwice

Cadet
Joined
Oct 19, 2013
Messages
5
Thanks for the clarification. However, I didn't see those errors before resilvering. Could it be that these errors were due to the very thing that article was talking about, and occurred during my initial resilver attempt? I've got 6(down to 4 for a while) 2 tb commercial grade drives that were fuller than they should have been.

Cyber, you don't need to reply if you don't want. I appreciate your help and clarifications.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Thanks for the clarification. However, I didn't see those errors before resilvering. Could it be that these errors were due to the very thing that article was talking about, and occurred during my initial resilver attempt? I've got 6(down to 4 for a while) 2 tb commercial grade drives that were fuller than they should have been.

It's not out of the realm of possibility. Unfortunately, we will probably never know.
 

N00b

Explorer
Joined
May 31, 2013
Messages
83
By deleting the files you deleted the metadata associated with those files. The reality is that for most people, if zpool status shows permanent errors you should be focusing on those before resilvering. You didn't, hence your confusion.

How does one remove permanent errors? I've had a couple of files that have gone bad. Is there a way to delete the file and move on if you really want to do that? I've had to scrub the whole pool again to clear the error. Is there another way to only scrub a part of the pool especially after a complete scrub?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Not really. The way to remove the errors is delete the offending file. If you are getting metadata locations(which is that hexcode you saw) then the only 2 ways I've seen to fix them is to delete and recreate the pool(you will need a backup) or to make a zvol, move all of your data to the zvol, then move it back. Moving your data will be time consuming, naturally.

I'd highly recommend you fix the errors as FreeNAS will have the yellow warning light every time it finds the errors.
 
Status
Not open for further replies.
Top