Silent corruption with OpenZFS (ongoing discussion and testing)

Constantin · Nov 26, 2023

I just wanted to thank you all for bringing this bug to the forefront, the testing, etc. I have updated my tunables accordingly, so I sleep well while folk far smarter than I over at iXsystems, etc. start hunting for the root cause of the bug.

kiriak · Nov 26, 2023

Home user here.
I read this thread and I am confused If I have to do something.

I am on Core 13.0-U5.3 and from the posts here I think I have to apply the tunable change as above.
On the other hand if this so critical, why there is no update that includes this change or at least some bold recommendation on the TrueNAS site?
I am reluctant to change things by hand because I don't know for how long they are valid and if things change maybe I will not be aware of it.

I had occasionally corrupted photos and other files on the past, but never since I moved to a BTRFS Synology NAS and later to my current ZFS TrueNAS.
It would be an irony if I lose again files due to corruption given all the efforts and costs to maintain a proper NAS (with proper hardware, backups etc.) and the same time many friends of mine have their files on a FAT external HDD and have all their data intact (ok, until now, I know, the users that lost data and these that haven't yet).

winnielinnie · Nov 26, 2023

kiriak said:
I am on Core 13.0-U5.3 and from the posts here I think I have to apply the tunable change as above.
On the other hand if this so critical, why there is no update that includes this change or at least some bold recommendation on the TrueNAS site?
I am reluctant to change things by hand because I don't know for how long they are valid and if things change maybe I will not be aware of it.

The FreeBSD bug report is suggesting an announcement to inform users to change this variable. (Not sure if it will be the default.)

The tunable practically eliminates the risk outright, and can yield a performance boost under some circumstances. The only "drawback" is less efficient handling/hole-seeking on sparse ("holey") files. Yet it does not violate anything. In fact, prior to OpenZFS 2.1.5, this parameter was already set to 0 by default. From OpenZFS 2.1.5+, they set it to 1 by default. (So it's not as if you're doing anything drastic.)

(Personally, I don't care about how efficiently sparse files are handled: I don't use iSCSI or VMs, and as far as "really 'holey' files" go, inline ZFS compression makes that a non-issue.)

For TrueNAS Core, to set this parameter is fairly simple:

EDIT: Outside of the reproducer scripts, this bug can manifest in the wild from compiling large projects. Not only is this theoretical, but it has happened "in the wild". In fact, the original bug report on the Gentoo tracker is what kickstarted this bug hunt in the first place.

Bronek said:
This is also most similar to how such cases were occasionally caught in the real-life e.g. developers building a large project with object files, which are immediately read by the linker, after being written (asynchronously) by the compiler.

Yorick · Nov 26, 2023

Just to highlight the conditions under which this happens:

a file is being written to (typically it would be asynchronously - meaning the write is not completed at the time when writing process "thinks" it is)

at the same time when ZFS is still writing the data, the modified part of file is being read from. The same time means "hit a very specific time", measured in microseconds (that's millionth of a second), wide window. Admittedly, as a non-developer for ZFS project, I do not know if using HDD as opposed to SSD would extend that time frame.

if it is being read at this very specific moment, the reader will see zeros where the data being written is actually something else

if the reader then stores the incorrectly read zeroes somewhere else, that's where the data is being corrupted

That specific set of steps doesn't apply to any of the applications I use. I'll chillax until this has been fixed.

Constantin · Nov 26, 2023

it’s easy enough to set the tunable, then apply the system update to U6. That ensures a persistent fix and the reboot cements the change.

winnielinnie · Nov 26, 2023

Yorick said:
That specific set of steps doesn't apply to any of the applications I use.

Same here. But going forward, I like to play "better safe than sorry."

Seems that the most likely hit in the "real world" would be those who build large project files, as explained by Bronek:

Bronek said:
This is also most similar to how such cases were occasionally caught in the real-life e.g. developers building a large project with object files, which are immediately read by the linker, after being written (asynchronously) by the compiler.

In the meantime, I made my own rudimentary script to literally scan all files in my dataset(s), which will output a "report" of any filess that contain at least one contiguous 128K block of zeroes.

Most of the "hits" are expected. Some sqlite files, db files, tar files, iso files, and "store-only" zip archives. I did find one photo (1.2 MiB) whose last 128K are nothing but zeroes, but it's unrelated to this.

My hunch is that I will only find false positives, and not a single corrupted file. (I don't even expect it to find any, in fact.)

But it was fascinating to see 128K chunks of 0x00 in some unusual places, such as a JPEG image.

EDIT: Also unrelated, but interesting: I noticed that some software is lazily developed when saving files, just dropping seemingly meaningless chunks of 0x00, which exaggerates the filesize. Obviously in ZFS, these chunks get compressed down to nothing. (For example, "game save" files that are each ~5MiB easily compress down to ~200KiB.)

EDIT 2: Oh, this is interesting. I wonder what it means? No need to reply to this comment, just dumping an observation I made.

My script (that searches for 128K chunks of zeroes) found a batch of JPEG images that contain such. These JPEG files are small, around 200-300 KiB each. Yet they all contain a "tail" of 128K+ 0x00. This means that around half of the filesize is comprised of a contiguous block of zeroes.

Want to know what all these images have in common? They all had "effects" applied to them from an old iPad front-facing camera app.

Go figure! Not sure why, but it's fascinating to come across this unusual quirk.

Constantin · Nov 26, 2023

It's one of the aspects of storage having become so cheap. The days that games and other software came distributed across multiple floppies that the user had to swap amongst are over. I've read that some popular distributions like the latest Call of Duty go into hundreds of GB of downloads.

gdreade · Nov 26, 2023

winnielinnie said:
EDIT: Also unrelated, but interesting: I noticed that some software is lazily developed when saving files, just dropping seemingly meaningless chunks of 0x00, which exaggerates the filesize.

It's unlikely to be related to your image files, but you can also see files that have been initialized with long runs of zeros (ie: _not_ sparse) when a piece of software is using mmap(2). In this case, it's typically used (prior to the mmap) to ensure that the backing filesystem space is allocated in advance so that the kernel doesn't later try to write a memory page to backing mmap'd storage and find that it's run out of space. It also allows for ensuring that the mmap'd space is reasonably contiguous. I suspect, though, that such reservations become problematic on COW and compressed filesystems ...

Juan Manuel Palacios · Nov 26, 2023

winnielinnie said:
(…)
In the meantime, I made my own rudimentary script to literally scan all files in my dataset(s), which will output a "report" of any filess that contain at least one contiguous 128K block of zeroes.

Would you be willing to share your script? I'm unsure if my use-case for my NAS (TimeMachine-based backups, a few webdev-oriented jails, a few other general-purpose jails such as UniFi Controller, Plex, ZoneMinder, etc., and a pfSense VM) has caused any corruption, so I'd love to find out.

Thanks!

winnielinnie · Nov 26, 2023

Juan Manuel Palacios said:
Would you be willing to share your script?

It's a very rudimentary script. I included a lot of notes that are mandatory to read. It will find and append to a list all files that contain at least one 128K chunk of zeroes. (You will only find false positives. I'm like 99.999999% sure.)

It's meant to be run under a snapshot as the root user. (Instructions are within the script.)

You can save the script directly under /root/, so that as the root user, while cd'd into the target snapshot, you simply invoke:
sh ~/zero-chunk-finder.sh

View the raw script: https://pastebin.com/raw/UDuVsf75

Download link: https://pastebin.com/dl/UDuVsf75
( Save as the name zero-chunk-finder.sh )

For what it's worth, it found a total of 1,300 files in my entire pool. Most of them are DB, cache, and SQLite files (from backed-up home directories, Firefox, Chrome, etc), a handful of "store-only" ZIP and TAR archives, and then an unusually large number of MP4 videos and FLAC audio. (Perhaps the media format's header contains the contiguous zeroes?) I found a total of around 40 JPEG images, which showed no corruption. Yet a hexdump reveals that the tail of the file has a massive chunk of 0x00. (No idea why! Harmless, but strange.)

All in all, I doubt there is any corruption, and I doubt anyone was affected by this. (I would hope so.) Unless you were a particular use-case, such as building a large project on a fast system.

katbyte · Nov 26, 2023

Ericloewe said:
Under what conditions? With Epyc Gen 3, a fast NVMe pool, OpenZFS 2.1, Linux and Coreutils 8.x I've been getting nothing.

promox 8 install on a fast NVME gen4 mirror, whatever the default zfs config is when proxmox created it - on the GitHub thread I posted more detail about the pool/etc. active vm with half tb disk, lots of io thou mostly read.

SGr33n · Nov 27, 2023

Hi People! I was messing with data corruption since a week and now I found this bug. I was on TrueNAS Scale 2.12 which should have ZFS 2.12 as default version, but I dist-upgraded so I had 2.20. No fix reported here really worked on my system. Adding zfs_dmu_offset_next_sync to sysctl on TrueNAS Gui returned Sysctl 'zfs_dmu_offset_next_sync' does not exist in kernel. So I solved installing the 24.04 nightly build (The choice is between TrueNAS 22.12 - ZFS 2.12 and TrueNAS 24.04 - ZFS 2.21). Hoping to return soon on a stable (23.10) do you have other solutions? Is there a way to upgrade to ZFS 2.21 on TrueNAS Scale 23.10 without waiting for 23.2? Thanks!

Yorick · Nov 27, 2023

Instead of UI for kernel parameter, try setting it from CLI: https://www.truenas.com/community/t...ing-discussion-and-testing.114390/post-792880

Gcon · Nov 27, 2023

SGr33n said:
Adding zfs_dmu_offset_next_sync to sysctl on TrueNAS Gui returned Sysctl 'zfs_dmu_offset_next_sync' does not exist in kernel.

I initially did the same thing. On TrueNAS SCALE you add it in GUI under System Settings > Advanced, but *not* under Sysctl but rather in Init/Shutdown Scripts.

Description: Prevent ZFS silent corruption bug - NAS-125358
Type: Command
Command: echo 0 >> /sys/module/zfs/parameters/zfs_dmu_offset_next_sync
When: Pre Init
Enabled ticked
Timeout: 10

Well that's what works for me. You can also run the command "echo 0 >> /sys/module/zfs/parameters/zfs_dmu_offset_next_sync" on the TrueNAS SCALE CLI so you don't have to reboot. If given the opportunity I'd reboot just to make sure that variable setting is working post reboot. Check with "cat /sys/module/zfs/parameters/zfs_dmu_offset_next_sync" should return "0".

SGr33n · Nov 27, 2023

Yorick said:
Instead of UI for kernel parameter, try setting it from CLI: https://www.truenas.com/community/t...ing-discussion-and-testing.114390/post-792880

Tnx for your support. Modifying /sys/module/zfs/parameters/zfs_dmu_offset_next_sync to have 0 value didn't work (returned to 1 on reboot). I didn't try

Code:

sudo midclt call system.advanced.update '{"kernel_extra_options": "zfs.zfs_dmu_offset_next_sync=0"}'

, I will try, but how to restore it to the default value once the 23.x branch will have ZFS 22.21? Is it

Code:

sudo midclt call system.advanced.update '{"kernel_extra_options": "zfs.zfs_dmu_offset_next_sync=1"}'

?

theFra985 · Nov 27, 2023

SGr33n said:
I will try, but how to restore it to the default value once the 23.x branch will have ZFS 22.21?

I think just sudo midclt call system.advanced.update '{"kernel_extra_options": ""}' and a reboot will be sufficient as the default value for that parameter seems to be empty.

Kris Moore · Nov 27, 2023

We've been following the progress over the holiday on the OpenZFS side, looks like a fix is being tested and we'll have a proper OpenZFS fix version here soon(ish). We're going to be reviewing this week and finalizing our update plans, but you can be assured we'll have fixes pushed out as soon as is reasonably safe to do so. If you want to monitor the TrueNAS tickets for reference, here they are:

Ticket for SCALE:

[NAS-125358] - iXsystems TrueNAS Jira

ixsystems.atlassian.net

Ticket for CORE:

[NAS-125356] - iXsystems TrueNAS Jira

ixsystems.atlassian.net

Etorix · Nov 27, 2023

Trying to digest #15526, there are two fixes:
#15566 closes the gap which made the bug more common when block cloning was introduced in 2.2
#15571 (hopefully?) corrects the bug which was already in 2.1.3 (and possibly much earlier), but harder to hit

This comment connects it with earlier issues which were thought to have been closed.
Many thanks to all involved in tracking and testing!

Glowtape · Nov 27, 2023

Would be nice if these two would land in time for the Christmas release.

--edit: How are file moves (not copies) affected by this? Does the BRT even do anything here, anyway?

winnielinnie · Nov 27, 2023

Glowtape said:
How are file moves (not copies) affected by this? Does the BRT even do anything here, anyway?

"Moves" on the same filesystem are basically "renames".

"Moves" across filesystems basically do a copy, followed by a deletion.

I don't think "moves" can even be theoretically affected, so such actions (even if done under times of heavy I/O or in bulk) should be fine.

Important Announcement for the TrueNAS Community.

Silent corruption with OpenZFS (ongoing discussion and testing)

Vampire Pig

Contributor

MVP

Wizard

Vampire Pig

MVP

Vampire Pig

Dabbler

Contributor

MVP

Dabbler

Cadet

Wizard

Explorer

Cadet

Cadet

SVP of Engineering

Wizard

Dabbler

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Silent corruption with OpenZFS (ongoing discussion and testing)"

Similar threads