Silent corruption with OpenZFS (ongoing discussion and testing)

Juan Manuel Palacios

Contributor
Joined
May 29, 2017
Messages
146
OpenZFS 2.2.2 & 2.1.14 released with fixes for this data corruption issue! Now lets wait for iX to do its magic and release updates on the already announced mid-December timeline.
 
Joined
Oct 22, 2019
Messages
3,641
Heads up for those anticipating the return of block-cloning with OpenZFS 2.2.2:
Rob N. said:
The "write" part of the file change can be anything - write, clone, fill, etc. What may make a difference is the relative speed of those operations, and obviously a clone is much faster than a write. There may also be a second bug in cloning that contributed to the this that we haven't found yet. This is part of the reason that cloning is still disabled in 2.2.2.
(Emphasis added.)

As much as I think block-cloning is a game-changer for ZFS, which I was very excited for, I have to agree with this "play it safe for now" approach.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
After applying the fix, is it a good idea to do a scrub ASAP or just stick to my usual schedule?
 

bcat

Explorer
Joined
Oct 20, 2022
Messages
84
Scrubs will not help with this issue since, as far as ZFS is concerned, there's no on-disk corruption. ZFS writes exactly the bytes it's told to by the application (e.g., cp). Instead, the corruption occurs when ZFS tells an application the wrong thing at read time (saying there is a "hole" in a file when, in fact, there is data there).
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
Thanks
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Put otherwise: The bug does not corrupt data that is already in the pool; it's a "read bug" which returns wrong data, which may result in corrupted copies being stored—originals are safe.
Scrubs are of no use: The original is fine, but the copy was written with its own checksum and is "properly corrupted".
 
Last edited:

Juan Manuel Palacios

Contributor
Joined
May 29, 2017
Messages
146
Hence the lack of checksum failures, because, if I understood correctly, when the bug hits, data is written correctly to some destination, even if that's corrupted data, and then at a later time that destination checksums OK when read once again and/or when scrubbed… correct?
@HarambeLives here's my analysis of how the bug impacts copies.
 

tiberiusQ

Contributor
Joined
Jul 10, 2017
Messages
190
Put otherwise: The bug does not corrupt data that is already in the pool; it's a "read bug" which returns wrong data, which may result in corrupted copies being stored—originals are safe.
Scrubs are of no use: The original is fine, but the copy was written with its own checksum and is "properly corrupted".
What about zfs replications to eg. another Truenas ?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Not affected by this bug.
 

Juan Manuel Palacios

Contributor
Joined
May 29, 2017
Messages
146
Not affected by this bug.
How come? Because if the bug is about incorrectly reporting holes, wouldn't it stand to reason it could also occur while reading the blocks that make up a snapshot?

After all, if block traversal is what occurs while supplying the data for a file that a specific userland tool, say cp(1), requests while attempting to copy said file to a destination, and the bug occurs at that point when a given range of blocks is misread as a hole, under a specific set of racy conditions… why can't that also occur while traversing a snapshot's blocks for the purpose of ZFS replication, provided the system is under the exact same set of racy conditions that trigger the problem in the former case?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
How come? Because if the bug is about incorrectly reporting holes, wouldn't it stand to reason it could also occur while reading the blocks that make up a snapshot?
Replication works at the block level. The bug is about user space programs operating at the file/vnode interface actively using "hole aware" system calls. ZFS replication is completely file agnostic.
 

Juan Manuel Palacios

Contributor
Joined
May 29, 2017
Messages
146
Replication works at the block level. The bug is about user space programs operating at the file/vnode interface actively using "hole aware" system calls. ZFS replication is completely file agnostic.
Right, I understand that replication works at the block level. But, at the end of the day, supplying the data for a file that a userland tool intends to copy translates into reading the blocks that make up that file when the ZFS layer is asked for its data.

So I guess what I'm wondering is at what layer does this erroneous hole reporting come into play, because, after all, and if I'm not sorely misunderstanding something, it is ZFS who's incorrectly reporting those holes.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
My understanding is that it occurs at the file system layer and not in ZFS' operation on blocks. Now that you keep asking, I ponder if I might be wrong ...
 

Juan Manuel Palacios

Contributor
Joined
May 29, 2017
Messages
146
My understanding is that it occurs at the file system layer and not in ZFS' operation on blocks. Now that you keep asking, I ponder if I might be wrong ...
Well, I don't pretend to be to too knowledgeable about ZFS internals, even if I consider myself experienced enough with the filesystem to the point of having managed to pull myself out of several rabbit… holes over the years (debugging pools a few times with zdb, migrating from GELI to ZFS native encryption via replication, etc.).

But, in any case, the fix for the issue talks about, if I'm understanding it correctly, blocks being in an inconsistent state, presumably while held in RAM (with the original blocks still being consistent on the storage media), and that keeps taking me back to the argument of ZFS supplying the data either to userland tools or to a replication stream ultimately boiling down to reading storage media blocks (which would arguably fail with the erroneous hole reporting under the correct racy conditions).
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
As I understand it, the conclusion was that the affected code was not called at all outside the handler for the lseek system call, and thus could not happen in any ZFS-internal context.

There's also an important factor to keep in mind: replication is operating on a snapshot, not a live dataset. Given the need to have a dirty dnode, that alone probably makes the timing impossible, even if the bug were relevant for a replication, which is just sending the new blocks and has no need to seek around inside files (nor does it really understand what a file is).
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I was hoping for an earlier update with the current ZFS version ("hotfix"). Unraid, for example, already has ZFS 2.1.14


It's worth understanding the TrueNAS fix cycle and how we approach these issues.

  1. We work with community to confirm the issue and how relevant it is.
  2. We try to publish any recommendations on mitigating or avoiding the problem. We did that on Nov 29th https://www.truenas.com/community/threads/old-openzfs-issue-found-and-being-resolved.114556/
  3. We include a fix in the nightlies... for both internal and community testing. (we do not push these untested versions on anyone)
  4. It takes almost a week to get through a QA cycle. Unlike unRaid, we have a significant lab and enterprise customers.
  5. We release a hot patch if necessary. This was done for CORE on Dec 7. https://www.truenas.com/community/threads/old-openzfs-issue-found-and-being-resolved.114556/
  6. For SCALE, we are releasing the fix with OpenZFS 2.2.2. This (SCALE 23.10.1) needs a full 2 week QA cycle. Unfortunately, we found another unrelated issue and delayed for a week. The plan is to release on Dec 19.

The bottom line is:

If you want fast response, follow the recommendations. The nightlies should only be used if necessary or there is no risk (e.g you have your own QA system).

If you want tested and verified software, wait for the official versions. Its reasonable to argue that some users should wait another few weeks for community feedback on the new software. See the software status page.

We don't plan to issue official versions without some professional level of testing. The unintended consequences of bug fixes can be worse than the original bug. In this case, the original bug lasted 15 years without detection.

We hope this approach meets the vast majority of TrueNAS user needs.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
he original bug lasted 15 years without detection
This is a very important point to keep in mind. In fact, nobody seems to have come forward with any "now that you mention it, back in the day I saw this in the wild" stories.
The unfortunate side of this is that this bug will, for a while at least, be used by clueless users* as a scapegoat for everything from dying disks to PEBKACs.

* I mean no disrespect, the cluelessness derives a lot from irresponsible guides pushing users into solutions they understand little about, leading them to make mistakes. Couple this with news articles of varying quality, and it's just the right set of conditions for Joe User, whose system is held together by a prayer and firmware, to see errors reported after a scrub and think to himself "Ah ha! I hit the infamous bug!"
 
Top