How do files ONLY IN A SNAPSHOT get corrupted?

winnielinnie · Mar 8, 2023

Disclaimer: My important data is not at risk. Everything explained in this post deals with "I don't care" temporary files. I used external USB drives, and couldn't care less if they were vaporized. But playing with USB drives does allow you to freely bump into "issues" to try to figure out. This is purely to better understand ZFS data integrity and recovery.

Long story short, here's what happened.

TestPool

mirror vdev
- USB drive A
- USB drive B

Handful of datasets. Handful of snapshots.

Saving files, deleting files, taking snapshots. The usual.

Then one day errors start spamming the zpool. Checksum, read, and write, all regarding USB Drive B. Yikes!

Short SMART test quickly fails on USB Drive B. It's dying, it's failing, time to say goodbye. I go ahead and offline, then outright detach it. The pool is now comprised of a single drive (stripe) of USB Drive A. For good measure, short and long SMART tests pass for the remaining USB drive A.

(No big deal. Data's not important. I can always just attach another same-sized drive if I ever want it to become a mirror again.)

But here's where things get interesting...

A scrub on the pool (comprised only of USB Drive A as a stripe, remember) returns several hundred checksum errors only for files within a particular snapshot. (Even though these files exist across multiple snapshots and on the live filesystem itself.)

So I destroy this snapshot and run a scrub again. The scrub completes with no errors.

So now my paranoia kicks in, which is the topic title of this thread:

How is it possible for there to exist checksum errors on files only in a certain snapshot, but these same files do not return checksum errors in other snapshots, or even on the live filesystem? How is it that destroying this snapshot resolves the situation? Creating/destroying snapshots does not alter file data: it only creates pointers.

Patrick M. Hausen · Mar 8, 2023

The data that differs in the snapshot in comparision to the current live data is kept in blocks. If there wasn't a snapshot these blocks would have been freed. Since there is a snapshot they weren't. These blocks differ from the blocks that make up the same file but with different content because it was updated.

If the blocks in the snapshot fail on media, only the version of the file in the snapshot is affected.

WI_Hedgehog · Mar 8, 2023

The snapshot data block(s) are corrupt. Or were, but you deleted them.

winnielinnie · Mar 8, 2023

WI_Hedgehog said:
The snapshot data block(s) are corrupt. Or were, but you deleted them.

Patrick M. Hausen said:
The data that differs in the snapshot in comparision to the current live data is kept in blocks.

Appreciate the responses, and this was my initial assumption. But I might not have clearly explained my confusion.

I'll reword my description:

Take the for example the file "BigBuckBunny.mkv".

It's a large video file. Once saved to the dataset, it has never been modified.

I take a snapshot of the dataset named "@manual-2023-03-07".

The file (records) which @manual-2023-03-07 points to should be identical (the very same records) as those that the live filesystem points to.

A scrub reveals checksum errors, and one of the corrupted files in question is:
@manual-2023-03-07:BigBuckBunny.mkv

How could this file yield checksum errors in the snapshot, yet its records "should" be identical to those in the live filesystem? (The file was never modified; which I assume means that there are no unique records that the snapshot points to.)

Unless this is just a "reporting quirk" by ZFS, in which snapshot metadata corruption presents as "corrupted" files?

sretalla · Mar 8, 2023

winnielinnie said:
The file was never modified; which I assume means that there are no unique records that the snapshot points to.

ZFS disagrees with you.

winnielinnie said:
Unless this is just a "reporting quirk" by ZFS, in which snapshot metadata corruption presents as "corrupted" files?

Depends what you consider metadata... maybe things like permissions or access times would have changed.

I think you're saying it's destroyed now, but it may have been interesting to do an md5 checksum of both versions and maybe a subsequent look into the details if the checksums were indeed different (which we would expect if ZFS isn't broken).

winnielinnie · Mar 8, 2023

sretalla said:
ZFS disagrees with you.

That's some next-level gaslighting by ZFS.

When I destroyed the snapshot, it freed up exactly 0 bytes of space. (Because it was essentially no different than the live filesystem.)

(This was revealed by using the -v flag in the zfs destroy command.)

sretalla said:
Depends what you consider metadata... maybe things like permissions or access times would have changed.

This did not change either. It was a "dump and go" dataset. Dump files. No editing, no modifying, no playing with permissions or ownership (after the initial dataset was created.)

sretalla said:
I think you're saying it's destroyed now, but it may have been interesting to do an md5 checksum of both versions and maybe a subsequent look into the details if the checksums were indeed different (which we would expect if ZFS isn't broken).

I regret doing the scrubs and destroying the "bad" snapshot that quickly.

Because now I don't have anything to compare against. (Pool is healthy, scrub passes, no checksum errors. No "bad" snapshot in which to run md5 against particular files: from the live filesystem vs from the snapshot.)

If I could go back in time, I would have tried what you suggested.

WI_Hedgehog · Mar 8, 2023

With Microsoft Excel and Word files, opening them and re-closing them causes Excel/Word to re-write the file with an updated header containing the Last Access Time, then re-write the filesystem Last Modified time to the previous Last Changed time--that's rife with side-effects. The backup system (and any sort of diff[erence] software) correctly reports the file was changed, yet the timestamp has not changed, suggesting file corruption and/or a poison pill (like a virus or ransomware). However, the "virus" is Microsoft Corporation.

@Ericloewe posted a link to a great lecture entitled Zebras All The Way Down*: Long video, but excellent at saying why we don't fix errors, we find the cause of the error and fix that.

jgreco · Mar 8, 2023

winnielinnie said:
The file (records) which @manual-2023-03-07 points to should be identical (the very same records) as those that the live filesystem points to.

A scrub reveals checksum errors, and one of the corrupted files in question is:
@manual-2023-03-07:BigBuckBunny.mkv

I wonder if that's true though. Does ZFS report metadata attributable to a file as belonging to the file during a scrub? I don't know. But I would point out that unless you've got UNIX atime updates disabled for the dataset, every read access to the file would result in an atime metadata update, and if this is attributed to the file, which I suspect it might well be, then there's your variance in the "file" -- metadata.

AlexGG · Mar 8, 2023

winnielinnie said:
It's a large video file. Once saved to the dataset, it has never been modified.

I take a snapshot of the dataset named "@manual-2023-03-07".

The file (records) which @manual-2023-03-07 points to should be identical (the very same records) as those that the live filesystem points to.

Maybe it is not the data blocks or block pointer records that are broken; maybe it is the parent record of the snapshot that is broken. If that's the case, the validity of the data blocks becomes irrelevant.

WI_Hedgehog · Mar 9, 2023

AlexGG said:
Maybe it is not the data blocks or block pointer records that are broken; maybe it is the parent record of the snapshot that is broken. If that's the case, the validity of the data blocks becomes irrelevant.

If the parent broke the error would be in the parent, however that comment did cause me to think about this situation further.

The parent is static once a snapshot is taken. Conceptually,* a snapshot freezes the parent in place, any updates to the parent are made to the snapshot file. When reading back, the parent is read until a snapshot exception is encountered, then the snapshot data is read. After that reading resumes in the parent.

Deleting a snapshot removes all changes, conceptually it's like deleting a file. Deleting a parent node replaces the parent blocks with the snapshot blocks, which is why it takes significantly longer than deleting a snapshot file.

Since a movie should remain unmodified (unless edited), the snapshot should contain no data. If instead something tries to change the parent (let's say ransomeware), the changes are written to the snapshot file. To reverse the ransomeware's encryption of the parent file, delete the snapshot.

Now, the snapshot container should be empty, and may have been empty, we don't know because it no longer exists. However, there is (was) a container to hold the snapshot, and that container occupies some space on the disk. An error occurred in the container, which could mean an empty container was damaged, we don't know.

What we can say is it was assumed the snapshot container contained data, because that's the most common use (and indeed the reason snapshot containers exist). That's the horse. However, empty snapshot containers can exist--it may be an edge case, but it does happen. That's the zebra.

*Now, it could be that in the implementation of the snapshot concept, especially considering how ZFS is Copy-On-Write, that upon a write to the parent file the parent file block is mapped to the snapshot container, and the newly written block exists in the parent file, the snapshot would then contain the original data and the parent file contains the most recent data. If the parent was unchanged, the snapshot container would exist, though be empty. Therefore the empty container could sustain damage.

winnielinnie · Mar 9, 2023

I broke my own rule about using these external drives (with unimportant, replaceable data) as a playground for ZFS, and to use these types of situations to learn more and take risks, without fear of losing anything important.

And yet here I immediately recovered back to a healthy "stripe" pool without using this opportunity to actually test things and log seemingly strange information.

Hopefully I can bump into such a "failure" again in the future.

I regret this haste of mine.

Arwen · Mar 9, 2023

Two things:

ZFS ALWAYS writes 2 copies of general Metadata by default, so the checksum errors would have to affect BOTH copies in order for an un-recoverable error to be reported. (Exception, I think the ZFS dataset attribute "redundant_metadata" can change that from the default of all, to not-all.)

This applies EVEN for Mirrors. So you would theoretically have FOUR copies of general Metadata for a 2 way Mirror, 2 per Mirror sub-device.

My own thought, is that it was USB being annoying. You found some errors that if you exported the pool, powered down the USB disk drive, let it cool down, then tried again with a ZFS clear & scrub the pool might have been fine.

I found USB hard disk drives in cheap enclosures get HOT. Hot enough to start reporting errors that don't really exist.

By the way, people who think the 2 copies of general Metadata does not exist per sub-device on a Mirrored pool, are wrong. The pool deals with Mirroring but the attribute "redundant_metadata" is at the dataset level. Different layers. Thus, my clear assumption of 2 copies of Metadata per sub-device in a Mirror pool.

winnielinnie · Mar 10, 2023

Thanks to all the responses in here! Taking into account all the responses in here, and possible culprits of this supposed paradox, here's an update:

I resilvered the mirror with an extra USB drive (these are two WD Easystore 4TB USB drives.) The pool is less than half filled.

Aside from the snapshot that I destroyed, everything is back to normal, including a scrub that returns 100% no errors, no bad checksums.

No way to do a "postmortem", since the old (detached) dead drive is out of the picture entirely.

Gathering some responses in here, and what I read across message boards and subreddits, I cam across some other potential culprits.

First and foremost, external USB drives are terrible for ZFS. (Terrible in general, really.) As many of you noted in here.

Secondly, this pool was not exclusively used in TrueNAS Core, but also on a separate Arch Linux-based machine as well. Apparently, it's possible that during a "partial upgrade state", such as with rolling-release distros, you can have a mismatch of the zfs kernel module and the zfs userspace tools. (This is not unique to rolling-release distros, as it's also possible with Ubuntu LTS using the HWE-kernel.)

This could potentially lead to unpredictable results when issuing zpool and zfs commands.

With all of that said, next time something quirky like this happens, I'm going to slow down and gather as much information as possible. (I had been in a "partial upgrade state" not long ago on my Linux machine, yet I resolved it before bumping into this weird issue with my USB zpool.)

Disclaimer: To reiterate, none of this data is important. I'm only using spare USB drives to manage a zpool (mirror vdev) as a place to dump unimportant and replaceable files, as well as to toy around with ZFS in the command-line. If I lose the entire pool, I haven't really lost anything of significance.

WI_Hedgehog · Mar 10, 2023

winnielinnie said:
First and foremost, external USB drives are terrible for ZFS. (Terrible in general, really.) As many of you noted in here.

Yes, ZFS has some unique requirements, but it goes to using Datacenter-level software on consumer-grade hardware.

Some USB stuff is hacked together in China (no offense intended, however I know for fact many things are chabuduo) and both the hardware and firmware are dubious at best--though for the price they're asking you're probably getting a lot of capability.

On the other hand, some USB hardware is really reliable. I have some cutting-edge USB drives (also from China) that are metal case and rock-solid (obtained at a really good price), so USB can be of great quality for its intended use (which is not ZFS).

The one thing to watch out for is "suddenly read-only." When USB flash memory has too many failures the controller puts the device into read-only mode. This is bad. Read-Only is kind of like "Reverse Osmosis," being the flow is going only one way, "out." And generally not for very long. As in hopefully you have enough time to read the whole drive so you can get the data off it. If this happens while your OS is writing to the drive, things can fail unceremoniously. Like you could crash your server (your bare-metal server running your VM also) during a read or write to USB if the USB drivers and/or OS aren't written well (and I'm not even going to mention Microsoft products here because there's no need).

Important Announcement for the TrueNAS Community.

How do files ONLY IN A SNAPSHOT get corrupted?

winnielinnie

MVP

Patrick M. Hausen

Hall of Famer

WI_Hedgehog

Guru

winnielinnie

MVP

sretalla

Powered by Neutrality

winnielinnie

MVP

WI_Hedgehog

Guru

jgreco

Resident Grinch

AlexGG

Contributor

WI_Hedgehog

Guru

winnielinnie

MVP

Arwen

MVP

winnielinnie

MVP

WI_Hedgehog

Guru

Similar threads

Important Announcement for the TrueNAS Community.

How do files *ONLY IN A SNAPSHOT* get corrupted?

MVP

Hall of Famer

Guru

MVP

Powered by Neutrality

MVP

Guru

Resident Grinch

Contributor

Guru

MVP

MVP

MVP

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How do files *ONLY IN A SNAPSHOT* get corrupted?"

Similar threads

How do files ONLY IN A SNAPSHOT get corrupted?

Related topics on forums.truenas.com for thread: "How do files ONLY IN A SNAPSHOT* get corrupted?"*