permanent errors in ZFS pool

cyberjock · May 17, 2014

If it's only one file its probably not a RAM problem but a failing hard drive(s). RAM causes serious corruption to the pool, and pretty rapidly too.

krakoliezene · Jan 1, 2015

I discovered such a permanent error in my pool last week, without scrubbing or anything like it. Scrubbing (after reading this post) repaired 43,2 Mb in checksums on ALL the disks...! The permanent error is still there though, pointing at a specific dataset. After I deleted the file, it shows a hex inode address instead. I tried to roll back to snapshots that were made a few weeks earlier, but without any result.
<jbear> mentioned that his permanent error was in a backup set made by rsync. This is exactly the same in my situation. I use this way of backing up my Mac for quite some time now, effectively mimicking Time Machine, and I had never any problems. I just upgraded to 9.3 Release a week before these permanent errors showed up.
Could ZFS and RSYNC collide somehow? I know, that rsync works on block-level synchronising files. Any ideas?

cyberjock · Jan 1, 2015

No, ZFS and rsync can't collide like that. But it does bring some question as to what your server is having problems with. Since your signature lists a board that doesn't support ECC RAM I'd definitely point the culprit there first.

krakoliezene · Jan 2, 2015

I've been using this board and the memory for the past 4 years under Freenas/ZFS without a glitch. So, yes, possible, but, no, not probable. The last big change was the upgrade to 9.3 Release, which went flawless as well, apart from some minor iscsi issues. I did not upgrade the pool yet. It is at level "5000" already. Just not the latest feature flags. I do have firmware 17 in the LSI 9211-8i, so 9.3 alerted me on that. But I'm using this firmware already more than a year with Freenas 9.2 and before, with no storage problems whatsoever, so I decided to leave things as they are. Don't fix it if it ain't broken is one of the first things they teach you

. Now maybe 9.3 could be less forgiving with this combination? I mean, what reason made the Freenas team put that warning in 9.3?

krakoliezene · Jan 2, 2015

And apart from the root cause of this permanent error, how do I get rid of the message? Deleting the file does not make the message disappear. Scrubbing doesn't either. I understood from <jbear>'s experience, that getting rid of the whole dataset could make things worse. Any ideas on this?

krakoliezene · Jan 8, 2015

Happily after the second scrub no permanent data errors anymore. Everything fine again. I think a power outage must have been the culprit. Or 9.3 reboots by itself for some reason. I'll keep an eye on the uptime counter. And maybe I'll look for an ECC memory mobo with larger memory. 16Gb is a bit thin for 36Tb RAW. Thanks for your help!

chuck_b · Jan 12, 2015

I am having a similar problem to krakoliezene, but worse. This is a relatively new system that I migrated away from an aging Ubuntu box. I was given an ASrock Extreme4 Z77 with 24G of non-ECC memory. A memtest+ is currently under way. Other than that, the system, NIC, controller, memory, etc all meet FreeNAS recommendations. I also have firmware 17 in the LSI controller. This is a RAIDZ2 array built on Seagate six 4TB drives with weekly rsync's to a pair of WD 4TB drives in a RAID0 array and offsite backup of critical data via Crashplan. I do not have an easy way to recreate the zpool without risking one of the drives in my RAID0 array failing and causing data loss. I have not been using snapshots. Individual files are either easily restored from one of the two backup locations or non-critical. I would like to fix the corruption and avoid recreating the zpool if at all possible.

About a week ago, I got an alert for a single corrupt file. I was able to see the corrupt file using "zpool status -v" and I did not care about the file (actually it was a directory) and tried to delete it. It would not delete. I was able to rename it but not move it to a different file system. I ran a scrub and now had 10 corrupt files (2 files in the same directory structure, 8 inode entries). I tried a reboot and an update to the 1/5/2015 code. After the update, the boot volume complained it no longer saw its RAID1 pair and was HEALTHY/DEGRADED. I rebooted again and the boot volume was fine. I ran a scrub on the boot volume and I am not longer getting any messages about boot. I still received an email saying I had corrupt files for the storage zpool, but now the story goes from bad to worse...

Three days ago, running "zpool status -v" caused the system to stop all network traffic besides ping. I power cycled the system and let the system run over the weekend. Everything worked fine. This morning I wanted to try to address the corrupt file emails I've been getting over the weekend. When I ran "zpool status -v" over ssh, it caused the system to reboot before it was able to complete showing me the full list of corrupt files. After the system came up I ran the command again and got the same reboot.

The system is currently running memtest and I will update with results. I don't understand how the corruption showed up for a single directory I wasn't able to delete. Then a scrub showed another file in the same directory structure and a bunch of inodes (probably from me trying to move/rename/delete the first directory). Now zpool status -v crashes or reboots my system. I have been very happy with FreeNAS but now I am starting to question the decision if my only option for two corrupt files I don't care about is to wipe the data and start over. I knew I was taking a calculated risk of corrupt data with non-ECC memory but this seems to be more than just non-ECC memory.

How do I troubleshoot this and get back to a stable FreeNAS system?

cyberjock · Jan 12, 2015

chuck_b said:
I have been very happy with FreeNAS but now I am starting to question the decision if my only option for two corrupt files I don't care about is to wipe the data and start over. I knew I was taking a calculated risk of corrupt data with non-ECC memory but this seems to be more than just non-ECC memory.

You are seeing why some of us more experienced users are so absolute with our answers. People hate hearing "use the right hardware or don't do FreeNAS". The problem is if you don't use the right hardware it can backfire badly. In your case you may be looking at corruption solely related to something like cosmic rays, which you'll never be able to validate because the event is long done. All you can do now is try to find a smoking gun (if you can) and fix that. But, considering the hardware and the fact that FreeBSD can't do good monitoring like it can a server board, you're kind of out in the cold. :(

I'm not sure what the fix is except to backup your data, destroy and recreate your pool, and restore the data. This kind of shuffle with what is sometimes 20+TB of data is a total PITA (and for some people impossible) which is why we push so hard for the "do it right or don't do it". If I told you today that if you had dropped an extra $100 back when you built the system you wouldn't be in the predicament you are in, you'd probably drop the cash right now. Unfortunately too many people are learning too late that once you've done it wrong, the costs to do it right are pretty nasty. It's like being thrown into a pool and after you hit the water you find out you can't swim. It's a crappy situation to be in to say the least.

chuck_b · Jan 12, 2015

cyberjock-

Thank you for the very quick response. As I said, the non-ECC memory was a calculated risk based on being given a free mb/cpu/memory. I am well aware of the benefits of enterprise grade infrastructure that I use daily for work and would have definitely gone with the recommended equipment if I were buying everything new. In fact, replacing mb/cpu/memory is not out of the question for me if that is the only true solution. For a home system, I was willing to take the risk of cosmic rays during a scrub corrupting massive amounts of data and cosmic rays during normal operations corrupting individual files. Having this issue after three weeks of uptime versus seven years and multiple drive swaps/expansions on an Ubuntu running md/ext is surprising.

I did not expect one or two corrupted files to damage the entire file system to the point that it was repairable only by destroying/recreating or simple commands "zpool status -v" causing system hangs or reboots! So far, memtest has not found any issues after completing one pass. I will let it run a few more passes. If it were just a few corrupted files, I would easily chalk it up to non-ECC memory and acknowledge everyone was right and I underestimated the frequency of cosmic rays impacting my data. I am concerned about the only recommended recovery procedure and a fairly innocuous command such as zpool status -v taking the system down.

I took the chance with non-ECC memory and got corrupt data. If I need to spend the money to correct that, well then stupidity has a price. Today that price looks like it will be $500+ in addition to the cost of a second set of backup disks if I don't want to completely rely on a RAID0 to preserve my data locally for a few days.

I am less concerned with the source of the original corruption and would like some assistance troubleshooting why I can't delete the files and move on as well as why zpool is crashing. Having ECC memory might have prevented the current corruption (assuming it was caused by my non-ECC memory), but it doesn't explain zpool status -v crashing or why I can't clear the corrupted data from the existing pool.

Is the answer really as simple as: "You didn't use ECC memory so there is nothing we can do for you?" With ECC memory, could a power outage put me in a similar situation? How about a FreeNAS/ZFS bug? I do have a UPS and have a shutdown timer, but I'm still nervous about not having any recourse besides wiping the file system to fix two corrupt files that I don't even care about. I am not trying to be difficult, I am really just trying to understand recovery options for when things go badly. If wipe/restore is the only real answer, then I need to plan for a second local backup in addition to RAIDZ2 with local and offsite backups, which is something I hadn't anticipated for a home NAS. Would snapshots have mitigated the need for rebuilding the file system?

cyberjock · Jan 12, 2015

chuck_b said:
Thank you for the very quick response. As I said, the non-ECC memory was a calculated risk based on being given a free mb/cpu/memory. I am well aware of the benefits of enterprise grade infrastructure that I use daily for work and would have definitely gone with the recommended equipment if I were buying everything new. In fact, replacing mb/cpu/memory is not out of the question for me if that is the only true solution. For a home system, I was willing to take the risk of cosmic rays during a scrub corrupting massive amounts of data and cosmic rays during normal operations corrupting individual files. Having this issue after three weeks of uptime versus seven years and multiple drive swaps/expansions on an Ubuntu running md/ext is surprising.

Let me put your comment in context. Are you *sure* you didn't have any problems when you were using md/ext? I mean, ext has no way to actually identify corruption. So unless you went checking every byte of all of your data you really wouldn't know for sure, right?

That's the problem with going with ZFS (and the reason I went to ZFS). You cannot compare ZFS to alternatives because, well, there really aren't any alternatives. So your comparison, while I understand what you mean, really doesn't pass a complex test to see if it is a fair comparison. You could have had corruption happening at regular intervals and you might never have known. In fact, some of these studies that say ZFS is so great claim that corruption is far more widespread than we realize and we probably wouldn't be able to accept the reality of it. I still don't buy the reality of it, but I have seen some of it first-hand. ;)

chuck_b said:
I did not expect one or two corrupted files to damage the entire file system to the point that it was repairable only by destroying/recreating or simple commands "zpool status -v" causing system hangs or reboots! So far, memtest has not found any issues after completing one pass. I will let it run a few more passes. If it were just a few corrupted files, I would easily chalk it up to non-ECC memory and acknowledge everyone was right and I underestimated the frequency of cosmic rays impacting my data. I am concerned about the only recommended recovery procedure and a fairly innocuous command such as zpool status -v taking the system down.

If the ZFS metadata gets corrupted then the whole house of cards crashes down. Yes, it's totally possible to do a zpool status and crash the box. There's dozens and dozens of users that couldn't even mount their pool because it was corrupted. The couldn't even access their data anymore. If a zpool status is all the problem you are having, count yourself lucky and get your data off the pool. ;)

chuck_b said:
I took the chance with non-ECC memory and got corrupt data. If I need to spend the money to correct that, well then stupidity has a price. Today that price looks like it will be $500+ in addition to the cost of a second set of backup disks if I don't want to completely rely on a RAID0 to preserve my data locally for a few days.

This is the harsh reality. It's also why we're so adamant about the whole "go ECC and server-grade or go home". The costs of getting out of the hole you might end up in (nevermind the emotional cost of the lost data) makes it far cheaper (and faster) in the long run to do it right.

chuck_b said:
I am less concerned with the source of the original corruption and would like some assistance troubleshooting why I can't delete the files and move on as well as why zpool is crashing. Having ECC memory might have prevented the current corruption (assuming it was caused by my non-ECC memory), but it doesn't explain zpool status -v crashing or why I can't clear the corrupted data from the existing pool.

Actually it might. If your metadata is corrupted beyond repair with redundancy then ZFS has to process whatever garbage it receives. That garbage often causes system crashes. Unfortunately you'll find very few people here willing to troubleshoot this issue because based on experience it's a foregone conclusion that it's probably due to hardware. Even *if* someone wanted to dig deeper, they'd only be able to show what is wrong, not necessarily correct it, and it would take many many hours to diagnose.

chuck_b said:
Is the answer really as simple as: "You didn't use ECC memory so there is nothing we can do for you?" With ECC memory, could a power outage put me in a similar situation? How about a FreeNAS/ZFS bug? I do have a UPS and have a shutdown timer, but I'm still nervous about not having any recourse besides wiping the file system to fix two corrupt files that I don't even care about. I am not trying to be difficult, I am really just trying to understand recovery options for when things go badly. If wipe/restore is the only real answer, then I need to plan for a second local backup in addition to RAIDZ2 with local and offsite backups, which is something I hadn't anticipated for a home NAS. Would snapshots have mitigated the need for rebuilding the file system?

Is the answer that simple. Well, you should *always* have a backup because things can and do go wrong. If your SAS controller starts writing random garbage to all of the disks its possible to trash a zpool in an unrecoverable way. ZFS does not negate the need for a backup. ZFS only provides validation that the data on the disk is correct or is not correct. As a rule a power outage shouldn't put you in the same place, but on non server-grade that can often be found to not be true.

I'd be willing to bet money the only solution is to wipe and restore. :(

Snapshots wouldn't have mitigated this disaster. All it does is provide you with points in time you could roll back to. But once a pool is corrupted the best way to deal with it is to remove the corruption. If it's just a file you can simply delete it. But if its metadata then you might have no recourse except a nuke and repave.

chuck_b · Jan 12, 2015

I'm sure I had silent corruption with md/ext even with monthly consistency checks. However, that combination is well known to me and provided lots of flexibility. Certain things, like going from RAID5 to RAID6 or from four drives to five are impossible with ZFS. I am eager to learn ZFS and frequently the best way is to troubleshoot when things go badly.

I have been relying on md/ext or ZFS with RAID5/6/Z1/Z2 to keep my data safe against hard drive failures and backup drives to restore the data from an unlikely but catastrophic failure. It seems like ZFS is more fragile and RAID0 for my backup set does not provide me with the comfort level I want for a file system that periodically needs to be restored.

I guess the bottom line question is: With ECC memory, how likely am I to get into a situation that trashes the zpool in an unrecoverable way? I can recover from this with negligible data loss from my current backup as long as one of my backup drives doesn't fail during the recovery, but my current backup solution is not as robust as I would like if this is likely to occur again. I want ZFS for the reliability that it provides, I guess I just assumed (I know what that makes me) that a RAIDZ2 would also provide more recoverability in addition to the reliability.

In the current state where the system appears to work fine until running "zpool status -v", is it possible to take a new rsync backup? Will I get a recoverable read error on the corrupted files and continue? Or should I consider the whole file system corrupt and just restore, losing any new data from the last couple of days (acceptable, but not preferred)?

Thanks again for your patience with this newbie!

cyberjock · Jan 12, 2015

chuck_b said:
I guess the bottom line question is: With ECC memory, how likely am I to get into a situation that trashes the zpool in an unrecoverable way? I can recover from this with negligible data loss from my current backup as long as one of my backup drives doesn't fail during the recovery, but my current backup solution is not as robust as I would like if this is likely to occur again. I want ZFS for the reliability that it provides, I guess I just assumed (I know what that makes me) that a RAIDZ2 would also provide more recoverability in addition to the reliability.

I've answered this in my newbie presentation. ZFS is as robust as you are dedicated to preventing corruption from crappy hardware. ZFS is enterprise class, and expects to be used on *good* hardware. As soon as people break that, anything is possible. But iXsystems has lots of customers with proper hardware and we don't see pool loss like we do in the forums... because they DO use proper hardware.

chuck_b said:
In the current state where the system appears to work fine until running "zpool status -v", is it possible to take a new rsync backup? Will I get a recoverable read error on the corrupted files and continue? Or should I consider the whole file system corrupt and just restore, losing any new data from the last couple of days (acceptable, but not preferred)?

Not a clue. You're outside of the standard parameters of ZFS, so almost anything could be good/bad/ugly. You're totally on your own unfortunately.

Once ZFS has problems its a "throw your hands up and walk away" situation for me. There's no way to really know much because you couldn't even tell me where the corruption is in ZFS.

Sorry.

chuck_b · Jan 12, 2015

cyberjock-

Thank you very much for all of your help. Your newbie presentation was very helpful to me when I was building the system. I took the risk on non-ECC memory thinking that minor corruption would only impact a few files and did not understand how impactful *any* corruption could be to the entire file system.

Thanks again for all of your willingness to help and to explain things.

Important Announcement for the TrueNAS Community.

permanent errors in ZFS pool

cyberjock

Inactive Account

krakoliezene

Dabbler

cyberjock

Inactive Account

krakoliezene

Dabbler

krakoliezene

Dabbler

krakoliezene

Dabbler

chuck_b

Cadet

cyberjock

Inactive Account

chuck_b

Cadet

cyberjock

Inactive Account

chuck_b

Cadet

cyberjock

Inactive Account

chuck_b

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

permanent errors in ZFS pool

Inactive Account

Dabbler

Inactive Account

Dabbler

Dabbler

Dabbler

Cadet

Inactive Account

Cadet

Inactive Account

Cadet

Inactive Account

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "permanent errors in ZFS pool"

Similar threads