permanent errors in ZFS pool

Status
Not open for further replies.

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
Hi,

zfs reports 5 permanent errors in the pool:

backup# zpool status -v
pool: store1
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scan: scrub repaired 0 in 17h35m with 0 errors on Tue Aug 20 05:45:46 2013
config:

NAME STATE READ WRITE CKSUM
store1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/7e3d623f-9190-11e2-b8f7-0019990d0a17 ONLINE 0 0 0
gptid/7eca7240-9190-11e2-b8f7-0019990d0a17 ONLINE 0 0 0
gptid/77cd1366-9556-11e2-858b-0019990d0a17 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

store1/hosting-backups:<0x1880d1d>
store1/hosting-backups:<0x1ea6932>
store1/hosting-backups:<0x8adc36>
store1/hosting-backups:<0x2414e3e>
store1/hosting-backups:<0x2407a3e>
What are my options? If there's a faulty disk I would gladly replace it, but ZFS doesn't tell me which disk is causing this problem. And what's with the so called "files":
store1/hosting-backups:<0x1880d1d>
store1/hosting-backups:<0x1ea6932>
store1/hosting-backups:<0x8adc36>
store1/hosting-backups:<0x2414e3e>
store1/hosting-backups:<0x2407a3e>

Those aren't filenames but hex codes. What's up with that? How can I delete those files, or repair the filesystem?

Thanks in advance!

Regards,
Joern
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That's metadata. And the only way I know of to repair that kind of issue is to either delete the offending files(doesn't seem to be an option in your case) or destroy and recreate the pool.

RAIDZ1 is not a reliable option in 2013,it "died" in 2009. See the link in my sig if you want more of an explanation. Your problem is likely the result of replacing one disk and a second disk had errors. Since you had no more redundancy you lost data. Basically, exactly what that link says could and was/is likely to happen in the future.

You'd be wise to destroy and recreate your pool with a RAIDZ2 configuration or better.
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
Thanks for your reply. Deleting the offending files is in fact an option for me, as this fileserver is just a rsync storage for backups. I would gladly remove the offending files. The question is how? You say those "filenames" are metadata. How do I figure out which "real" files are affected? Is there a way to tell zfs to clean the pool from all "broken" files? As were are using rsync to backup to this fileservers those deleted files would be replaced automatically at the next rsync run.

By the way: I have not replaced any drive since the zpool has been created. But ZFS has never complained about any faulty drives and it stills shows all drives as "ONLINE". If one of the drives really had some kind of problem I would assume that ZFS would locate the faulty drive and kick it out of the pool before any data might get corrupted.

The warning regarding the data corruption showed up 3 days ago. But as you can see the server has a uptime of 37 days and has never been shut down without a clean unmount:

backup# uptime
7:10PM up 37 days, 7:34, 1 user, load averages: 1.06, 1.98, 1.16

So, I'm really curious how the data could have gotten damaged in the first place. If ZFS doesn't detect a faulty drive I would assume that a RAIDZ2 would not help either, because as long as ZFS "trusts" the drives and and does not kick the faulty ones out, it will not take advantage of any redundant data on any of the other drives, no matter how many redundant drives there are. Right?

Regards,
Joern
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Thanks for your reply. Deleting the offending files is in fact an option for me, as this fileserver is just a rsync storage for backups. I would gladly remove the offending files. The question is how? You say those "filenames" are metadata. How do I figure out which "real" files are affected? Is there a way to tell zfs to clean the pool from all "broken" files? As were are using rsync to backup to this fileservers those deleted files would be replaced automatically at the next rsync run.

That's the catch. I'm not aware of any way to determine which files/folders are the offending ones.

By the way: I have not replaced any drive since the zpool has been created. But ZFS has never complained about any faulty drives and it stills shows all drives as "ONLINE". If one of the drives really had some kind of problem I would assume that ZFS would locate the faulty drive and kick it out of the pool before any data might get corrupted.

Actually, it won't from what I've experienced. It'll keep the drive in the pool as long as it is "mountable". You'll just see lots and lots of errors with a zpool status.

The warning regarding the data corruption showed up 3 days ago. But as you can see the server has a uptime of 37 days and has never been shut down without a clean unmount:

backup# uptime
7:10PM up 37 days, 7:34, 1 user, load averages: 1.06, 1.98, 1.16

So, I'm really curious how the data could have gotten damaged in the first place. If ZFS doesn't detect a faulty drive I would assume that a RAIDZ2 would not help either, because as long as ZFS "trusts" the drives and and does not kick the faulty ones out, it will not take advantage of any redundant data on any of the other drives, no matter how many redundant drives there are. Right?

Regards,
Joern

Here's some guesswork...

Naturally you won't know that data is corrupt until you try to read it. More than likely a scrub is what found those errors. But if you've never replaced a drive then you should have other problems(or you have failing RAM).

Is your system using ECC RAM?

Are you virtualizing?

My understanding is that any and all reads will include checks and corrections of any bad data in the accompanied "stripe". A scrub is nothing more than a read of all of your files in order by director. It's not in alphabetical order, but the order that the directories and files were created.

If you aren't using ECC RAM you might want to take the system offline for a RAM test. Assuming you have no failing disk bad RAM does cause corruption that you can't fix.

Post the output of the following commands in CODE or as a file attachment:

smartctl -a -q noserial /dev/(yourdisks)

substituting the devices for your disks.
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
Thanks for your help so far. I attached the smartctl output as files ada0.txt, ada1.txt, ada2.txt.

You might be right about the RAM. It's in fact none-ECC RAM. So there might an issue with that. I don't use virtualization, so we can rule that out, though.

But anyway I don't understand why ZFS cannot locate the faulty disk. I mean if there's a checksum mismatch ZFS must "know" from which block device the faulty block came. So why wouldn't it mark the device in question as "faulty". That's how any RAID implementation I'm aware of, works. And from my previous experience with ZFS that's how ZFS is supposed to work, too. Whenever ZFS detected a inconsistency it marked the affected drive as faulty and would let me replace it. What might be different in this case? And why is there no way to figure out, which files are actually affected? I always assumed that one of the advantages of ZFS vs. "dumb" block based RAID implementations is, that unlike "dumb RAIDs" ZFS knows very well which files are affected by which (faulty) block.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Holy smokes. 50C! Your hard drive should be kept below 40C for longer life. You really need to get those temps down!

Other than that, they all look fine. I can't say I'm surprised with errors though. A forum user had hard drives that got so hot during scrubs they'd start erroring like crazy. Let them cool off and they worked perfectly.

I'd definitely get those temps down ASAP, but your disks don't give any indication of problems.
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
Actually 50C is well within the specified operating temperature of 0C to 60C according to http://www.flipkart.com/seagate-bar...nal-hard-drive-st2000dm001/p/itmdd2xqfstdmg2x

Anyway what are my options now? Can I trust the drives? If the answer is "no", which ones should I replace? Is there really no way to rebuild the pool by removing all files, which are inconsistent? Most other filesystems have some kind of filesystemcheck which provide a way to remove damaged files and rebuild the metadata. ZFS ought to be one of the most reliable filesystems, so I can't believe I would have to restore the complete 4 TB pool just because of 5 inconsistent files without even knowing which hard drive might have caused in the inconsistency in the first place. That's like the opposite of what I expected from ZFS.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
I'd be curious as to a 'zpool history store1'. ZFS 'never' trusts the drives. Any data that drives return is checksummed to verify it's correct before it's passed 'up the chain'. It's kind of strange there's be unrecoverable errors without any drive showing read/write/checksum problems in zpool status.

While 60c might be 'max' according to hard drive manufacurers, actual statistics from a large number of hard drives has shown increased failure rates when drives are in or above the 40's. Most CPU's have a tjmax (max operating) temperature of 105c I think. That being said, if your cpu was running at 80c or 90c, I'd still call it too hot. Even though it's under max temp, it's still likely to have problems.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
I see the zpool was created by hand with a missing 'fake' disk, then resilvered later. The 3 scrubs before the zpool had any redundancy weren't of much use. After the zpool gained redundancy on March 25 (ish), there were no subsequent scrubs until recently.

Any errors zfs reported would have been cleared by the zpool clear.

These are the relevant lines:

Code:
2013-03-20.20:07:20 zpool create -m /mnt/store1 -f store1 raidz ada0p2 ada2p2 /dev/md3
2013-03-20.20:09:15 zpool offline store1 md3
2013-03-21.08:53:19 zpool scrub store1
2013-03-22.00:00:47 zpool scrub store1
2013-03-23.17:22:56 zpool scrub store1
2013-03-25.15:15:57 zpool replace store1 12963516955869863845 gptid/77cd1366-9556-11e2-858b-0019990d0a17
2013-03-25.19:17:47 zpool offline store1 12963516955869863845
2013-03-25.19:17:58 zpool detach store1 12963516955869863845
2013-08-19.12:10:09 zpool scrub store1
2013-08-20.22:50:53 zpool clear store1

It's very possible the pool was reporting errors, and on which disks. The zpool clear command resets those counters to 0.

I assume the scrub on Aug 19th found errors that couldn't be fixed by the single parity vdev. Bitrot on modern hard drives is quite bad compared to how much data they store. The 5 months between scrubs on a pool that can only correct a single error probably lead to the problem.

I would strongly recommend a raidz-2 pool. You'd need an extra drive for that. I'd also recommend automated scrubs. They 'prove' that the data on disk is 'good'.
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
Thank you for this analysis.

The inconsitencies turned up on August 18 during the daily FreeNAS report:

Checking status of zfs pools:
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
store1 5.34T 2.29T 3.06T 42% 1.00x ONLINE /mnt

pool: store1
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scan: resilvered 424G in 3h37m with 0 errors on Mon Mar 25 18:53:11 2013
config:

NAME STATE READ WRITE CKSUM
store1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/7e3d623f-9190-11e2-b8f7-0019990d0a17 ONLINE 0 0 0
gptid/7eca7240-9190-11e2-b8f7-0019990d0a17 ONLINE 0 0 0
gptid/77cd1366-9556-11e2-858b-0019990d0a17 ONLINE 0 0 0

errors: 5 data errors, use '-v' for a list

So it looks like those 5 errors had been found without any scrub.


What do you mean by

"It's very possible the pool was reporting errors, and on which disks. The zpool clear command resets those counters to 0."

Where would those errors have been reported? I just ran the "zpool clear command" this evening after I found a forum post which suggested to run this command to reset the error counter in this kind of situation.
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
I will try to create a new dataset and copy the content of the inconsitent dataset to the new dataset. If the new dataset turns out to be clean I will destroy the inconsistent dataset and rename the new one to the old one's name.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
I don't know. The 'zpool status' should have showed errors on the drive(s) which resulted in the metadata corruption.

My 3 suggestions: Cool the drives better. Under 40 is best. Setup automated scrubs: with a raidz1 setup like yours, I'd probably do them weekly. And the 3rd, if at all possible, recreate the pool as raidz2.

And as cyberjock says, confirm this isn't a virtualized setup. And if it's not ecc ram, I'd run an overnight memtest.
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
My initial Posting from 1:20 PM contains a copy'n'Paste from "zpool status -v" _before_ the "zpool clear". I can't see any errors on the drives there.

Thanks for the tip regarding the raidz2. I will consider that.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes, the drive is rated for that. But check out Google's White Paper on hard drives. If you read through that(or skip to the section on temperatures) you'll find that above 40C failure rates skyrocket.

I would have posted it earlier, but I wasn't at the computer.
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
Just an update: I copied all data from the courrupt dataset to a new dataset, destroyed the corrupted dataset and renamed the new one. When running a "zpool status -v" I get the following now:

backup# zpool status -v
pool: store1
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 17h35m with 0 errors on Tue Aug 20 05:45:46 2013
config:

NAME STATE READ WRITE CKSUM
store1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/7e3d623f-9190-11e2-b8f7-0019990d0a17 ONLINE 0 0 0
gptid/7eca7240-9190-11e2-b8f7-0019990d0a17 ONLINE 0 0 0
gptid/77cd1366-9556-11e2-858b-0019990d0a17 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

<0x26>:<0x1880d1d>
<0x26>:<0x1ea6932>
<0x26>:<0x8adc36>
<0x26>:<0x2414e3e>
<0x26>:<0x2407a3e>
It Looks like <0x26> refers to the destroyed dataset. As this dataset including any files belonging to this dataset don't exist any more I assume it's safe to ignore those errors. But is there really no way of telling ZFS that those files (or the entire dataset for that matter) are not relevant any more so there's really no point in throwing those error messages?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm thinking that those errors aren't dataset errors, and I'd be hesitant to continue using a pool with those errors.
 

jbear

Dabbler
Joined
Jul 14, 2013
Messages
29
*sigh*, so the only option is copying everything to external disks, wiping and re-recreating the pool and copying it back? I can't believe there's no way to fix the pool, especially as the block devices seem to be fine.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
AFAIK yes. ZFS is supposed to be impermeable to corruption as long as you ensure sufficient redundancy at all times. You let that last part slide, causing corruption. There is no easy way to fix it at this point. :(
 
Status
Not open for further replies.
Top