Silent corruption with OpenZFS (ongoing discussion and testing)

Joined
Oct 22, 2019
Messages
3,641
Happy Thanksgiving, everyone! Great timing, eh?

A "silent corruption" bug has been discovered recently in OpenZFS.

Here is the bug report for reference. (It's a long read, and the discussion and investigation is still ongoing):


From what I can grasp (and this might be inaccurate):
  • This most notably affects Open ZFS 2.2.0 with block-cloning.
  • Block-cloning was an original suspect, but there's the theory that it simply exploits an underlying bug.
  • Recent versions of coreutils 9.1+ are implicated (however, because this supposedly also affects FreeBSD, it conflates this issue further).
  • This has been reproduced on different Linux distributions, FreeBSD, and TrueNAS Core 13.0-U5.3 (and possibly others, as I've reproduced it on Arch Linux).
  • This has been reproduced on OpenZFS 2.2.0, OpenZFS 2.1.11, and Open ZFS 2.1.13.
  • An upstream "fix" from OpenZFS 2.2.1 is to disable the Block-Reference Table. (This is not a true "fix", only a safeguard in the meantime.)



If you want to test this out yourself and share your results in here:

Tony Hutter's script (designed for OpenZFS 2.2.0 and Linux-based systems): https://gist.github.com/tonyhutter/d69f305508ae3b7ff6e9263b22031a84

@Ericloewe's modified version of this script to work with TrueNAS Core systems: https://www.truenas.com/community/threads/truenas-13-0-u6-is-now-available.114337/page-3

Write-up about what this issue is and is not: https://gist.github.com/rincebrain/e23b4a39aba3fadc04db18574d30dc73

Others have reproduced this issue on - at least:
  • TrueNAS Core 13-U5.3 with standard FreeBSD cp
  • TrueNAS Core 13-U.6 with GNU cp 9.x
  • OpenZFS 2.1.x
  • OpenZFS 2.2.x, where it is very evident due to the combination of block cloning and GNU cp 9.x's eagerness to take advantage of it

I have reproduced silent corruption on Arch Linux with OpenZFS 2.2.0.

I could not reproduce silent corruption on TrueNAS Core 13.0-U6 with OpenZFS 2.1.13. Yes, I could. Post #25.

* I ran 16 instances of the script in parallel on spinning HDDs. Supposedly, it's more likely to occur on HDDs rather than SSDs.



I won't be available to follow-up for most of today, but we need a separate thread here where users can post their test results, as well as to follow the ongoing bug report(s) on the OpenZFS GitHub. The announcement thread for the release of TrueNAS Core 13.0-U6 will become too cluttered, so I started a new thread in here.

Disclaimer: I am but a simple person who wants 100% assurance that my data will not silently corrupt with ZFS, even under "rare" circumstances with unlikely I/O operations. Some of the stuff being investigated in the bug report by other users and developers is beyond my tiny brain's understanding.


Paging users who were in the "release announcement" thread:
@Ericloewe
@morganL
@Davvo
@Gcon
@Juan Manuel Palacios
@Etorix



EDIT: Information might rapidly change as more is discovered about this. Rather than continually edit this original post, it's better to read any updates below.


EDIT November 27, 2023: Tickets on iXsystems' TrueNAS Jira tracker, on page #7 by @Kris Moore :smile:
We've been following the progress over the holiday on the OpenZFS side, looks like a fix is being tested and we'll have a proper OpenZFS fix version here soon(ish). We're going to be reviewing this week and finalizing our update plans, but you can be assured we'll have fixes pushed out as soon as is reasonably safe to do so. If you want to monitor the TrueNAS tickets for reference, here they are:

Ticket for SCALE:
Ticket for CORE:
 
Last edited by a moderator:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Sure, let's do a dedicated thread.

Block-cloning was an original suspect, but there's the theory that it simply exploits an underlying bug.
I think it's better to say that there appears to be a family of race conditions in particularly sensitive (i.e. confusing) sections of the code, some of which apparently more likely to be tripped by block cloning.
Recent versions of coreutils 9.1+ are implicated (however, because this supposedly also affects FreeBSD 14.0 "cp" conflates matters further).
Specifically, recent coreutils seems to aggressively try to use reflinks and thus block cloning, if available. FreeBSD cp does not do this, though at least one reproduction on FreeBSD is claimed at this point, despite no block cloning being involved.
This has been reproduced on (...) FreeBSD 14.0,
Where have you seen that? I haven't seen anything about this.
 
Joined
Oct 22, 2019
Messages
3,641
Where have you seen that? I haven't seen anything about this.
Good catch. I confused occurrences of "FreeBSD" with "FreeBSD 14.0" trying to read through the entire bug report discussion. This part is still unclear to me. (Edited original post.)

This part by Rob N. is ambigously worded:
(this is not to say coreutils is at fault; that this happens on FreeBSD proves that. coreutils has changed when and why it tries to detect holes multiple times in the 9.x series, and narrowing that down may help us see what's happening a little easier).

Is he implying FreeBSD + coreutils 9.x can cause it to occur? Because with FreeBSD, there's the "base system" which we use by default, but one can also install GNU coreutils additionally.
 
Joined
Oct 22, 2019
Messages
3,641
Just to gauge an idea of how bad this stung:
Rob N. said:
For whatever its worth, I softly lean towards "disable indefinitely", but I hate it. All other things being equal, its better to not have the feature than to have it out there damaging data, and reputation along with it.

But, I don't know how we establish confidence in it without a hell of a lot of time spent, and I know that that time is not easy for any of us to come by. But at that this point we probably have no choice; the damage is done, now its about how we recover.

No to sound hyperbolic, but this is on the level of "Let's point and laugh at the BTRFS people for their history of data corruption!"

As for those who might have corrupted some data in OpenZFS 2.2.0, the "good news" is that it does not affect the original file (as it cannot). It would only affect the copies during a particularly narrow window of concurrent copy operations.

Outright disabling the BRT (which is the "fix" in OpenZFS 2.2.1) mitigates the issue. However, that others were able to reproduce this in OpenZFS 2.1.x means it's not only block-cloning, but probably something involving coreutils 9.1+.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Is he implying FreeBSD + coreutils 9.x can cause it to occur? Because with FreeBSD, there's the "base system" which we use by default, but one can also install GNU coreutils additionally.
My interpretation is something like: "Since someone has seen it on FreeBSD, we know that this is not merely a coreutils 9.x bug".

Just to gauge an idea of how bad this stung:


No to sound hyperbolic, but this is on the level of "Let's point and laugh at the BTRFS people for their history of data corruption!"

As for those who might have corrupted some data in OpenZFS 2.2.0, the "good news" is that it does not affect the original file (as it cannot). It would only affect the copies during a particularly narrow window of concurrent copy operations.

Outright disabling the BRT (which is the "fix" in OpenZFS 2.2.1) mitigates the issue. However, that others were able to reproduce this in OpenZFS 2.1.x means it's not only block-cloning, but probably something involving coreutils 9.1+.
It's not good, but realistically it's a relatively small impact - at least as far as anyone can tell, but surely whatever race conditions still exist are far more subtle. Again, not good, but also not surprising. It's a lot of code for a lot of features.
The main difference is that nobody is advocating for "move on to the next big feature!" while people are losing their data (in btrfs' case, even to something as pedestrian as not being able to rebuild a mirror, as was the case for several years if their wiki is to be believed).
 
Joined
Oct 22, 2019
Messages
3,641
My eyes are getting weary, and soon I will have to leave to attend a family event.

I've looked up and down three different discussions on the GitHub, but could not find a definite answer. Can someone else try to confirm or find a clue?

Is it FreeBSD + coreutils 9.1+ that has reproduced this bug for those who reported it? Or does the risk also exist for FreeBSD's "base system" cp command?

I've tried to reproduce this with 16 and 32 parellal instances of the script. Every time I try in TrueNAS Core 13.0-U6 (FreeBSD 13 + OpenZFS 2.1.13 + no coreutils), I can never reproduce it.

However, with Arch Linux, OpenZFS 2.2.0 + coreutils 9.4, I can reproduce the silent corruption about 50% of the time. (Sometimes it happens, sometimes it does not. Every test was with 16 parallel instances of the script.)


EDIT: My TrueNAS Core system is my stalwart. That I could not reproduce silent corruption on it (as far as I'm aware) brings me much relief. :smile: My Arch Linux system with OpenZFS 2.2.0 holds a mirror vdev pool. The data is not too important. But that I can reproduce silent corruption on it still makes me feel uneasy.) :confused:


EDIT 2:
My interpretation is something like: "Since someone has seen it on FreeBSD, we know that this is not merely a coreutils 9.x bug".
That's the impression I got as well.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Is it FreeBSD + coreutils 9.1+ that has reproduced this bug for those who reported it? Or does the risk also exist for FreeBSD's "base system" cp command?
I've asked admnd over on Github to elaborate on their test setup. It feels gross if just copying a bunch of data around a lot in a not-super-tight loop can corrupt data, without any fancy stuff involved - as would be the case in FreeBSD/Core. The reproducer script seems to mix a bunch of reflinked cps with classic cps, presumably to exercise standard copies of reflinked copies of ...

I've been throwing the modified script at the TrueNAS 13 machine at work I previously mentioned, with little to show for it. I'll also test at home later. I've also tested a bit on my Ubuntu 22.04 workstation (OpenZFS 2.1.x) and a FreeNAS 11 box (yeah, still), with nothing so far.
 

vhaarr

Cadet
Joined
Nov 23, 2023
Messages
1
Can anyone please answer two questions to help out a newbie? :smile:

1. Is TrueNAS SCALE 23.10 affected by this bug? It seems to package zfs 2.2.0 rc4, unless I'm mistaken.
2. How can I disable bclone_enabled from the webui terminal in TrueNAS SCALE 23.10? It seems this toggle wasn't actually added to zfs until 2.2.0 rc5 or later.

Am I right that if I upgraded to SCALE 23.10 there's no way for me to even slightly mitigate this issue? I've read the thread and I understand that this feature (bclone) might've just exposed a deeper underlying issue, but if I can disable it then surely I'll survive until the next update.

Unfortunately I upgraded because of a samba bug in 22.12 that was proving extremely tedious, and I didn't read about this zfs bug until the upgrade was done.

Thank you
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
1. Is TrueNAS SCALE 23.10 affected by this bug? It seems to package zfs 2.2.0 rc4, unless I'm mistaken.
Very likely.
2. How can I disable bclone_enabled from the webui terminal in TrueNAS SCALE 23.10? It seems this toggle wasn't actually added to zfs until 2.2.0 rc5 or later.
If you did not upgrade your pool with the block cloning feature, it'll have the same effect. Note, however, that there seem to be other cases, though significantly rarer.
 

bcat

Explorer
Joined
Oct 20, 2022
Messages
84
Oof, this is a rough one (and my empathy for anyone who's holiday just got suckier because of it). FWIW, I ran an initial test on my primary SCALE Cobia (23.10.0.1) system with a small, HDD-backed RAIDZ2 pool (flags upgraded a few days ago), and I could not repro yet (with four parallel script runs):

Code:
$ ./reproducer.sh & ./reproducer.sh & ./reproducer.sh & ./reproducer.sh & wait
[1] 35444
[2] 35445
[3] 35446
[4] 35447
writing files
writing files
writing files
writing files
checking files
checking files
checking files
checking files
[1]   Done                    ./reproducer.sh
[2]   Done                    ./reproducer.sh
[3]-  Done                    ./reproducer.sh
[4]+  Done                    ./reproducer.sh

$ zpool get bcloneratio,bclonesaved,bcloneused data-pool
NAME       PROPERTY     VALUE         SOURCE
data-pool  bcloneratio  2.99x         -
data-pool  bclonesaved  1.95G         -
data-pool  bcloneused   999M          -

I'll try again with more parallelism.

It's also worth noting that it doesn't look like I've actually triggered block cloning on any of my pools since upgrading the flags. (The bclonesaved and bcloneused properties were all 0 before I started running the test script.) So maybe I dodged a bullet there. (From latest GitHub issue updates, it sounds like this may be a latent issue, but made noticeably easier to trigger with block cloning in use.)
 

bcat

Explorer
Joined
Oct 20, 2022
Messages
84
Ope, it repros on SCALE with 16 parallel threads. :(

Note the Binary files reproducer_44381_0 and reproducer_44381_990 differ error below.

Code:
$ for i in {1..16}; do ./reproducer.sh & done; wait
[1] 44380
[2] 44381
[3] 44382
[4] 44383
[5] 44384
[6] 44386
[7] 44389
[8] 44390
[9] 44393
[10] 44394
[11] 44396
[12] 44399
[13] 44400
[14] 44403
writing files
[15] 44406
writing files
[16] 44407
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
checking files
checking files
checking files
checking files
[1]   Done                    ./reproducer.sh
checking files
checking files
checking files
checking files
checking files
checking files
checking files
Binary files reproducer_44381_0 and reproducer_44381_990 differ
[2]   Done                    ./reproducer.sh
[4]   Done                    ./reproducer.sh
[5]   Done                    ./reproducer.sh
[6]   Done                    ./reproducer.sh
[7]   Done                    ./reproducer.sh
[10]   Done                    ./reproducer.sh
[12]   Done                    ./reproducer.sh
[16]+  Done                    ./reproducer.sh
checking files
checking files
checking files
checking files
checking files
[3]   Done                    ./reproducer.sh
[8]   Done                    ./reproducer.sh
[9]   Done                    ./reproducer.sh
[11]   Done                    ./reproducer.sh
[14]-  Done                    ./reproducer.sh
[15]+  Done                    ./reproducer.sh
[13]+  Done                    ./reproducer.sh

$ zpool get bcloneratio,bclonesaved,bcloneused data-pool
NAME       PROPERTY     VALUE         SOURCE
data-pool  bcloneratio  2.99x         -
data-pool  bclonesaved  7.78G         -
data-pool  bcloneused   3.90G         -
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
1. Is TrueNAS SCALE 23.10 affected by this bug? It seems to package zfs 2.2.0 rc4, unless I'm mistaken.
Possibly. I've been hammering on it with the script created by github user tonyhutter [1] and have yet to get any mismatched files. I did have to comment out the check for /sys/module/zfs/parameters/zfs_bclone_enabled as this didn't exist on either a fresh or upgraded install of 23.10 - but I am confirming via the zpool bclone* values that it's actually taking effect. I've run a 32x-wide iteration on this, and no luck getting anything to break. I've also being doing things remotely over SMB from Windows clients (which does hook into copy_file_range as well, so it triggers bclone) with the same results (or lack thereof, I suppose)

2. How can I disable bclone_enabled from the webui terminal in TrueNAS SCALE 23.10? It seems this toggle wasn't actually added to zfs until 2.2.0 rc5 or later.

At this point no, the feature can only be set to disabled at pool creation time. Once it has been enabled/activated it can't be reverted.

Code:
admin@cobia-fresh[~]$ sudo zpool set feature@block_cloning=disabled cobia-fresh
cannot set property for 'cobia-fresh': property 'feature@block_cloning' can only be set to 'disabled' at creation time


If you haven't upgraded your pool, as mentioned by @Ericloewe , then you can avoid that upgrade to mitigate your exposure; however, some users reports on the original Github mention issues with ZFS 2.1.x branches as well.

[1] Script I'm using is here for posterity.
Code:
#!/bin/bash
#
# Run this script multiple times in parallel inside your pool's mount
# to reproduce https://github.com/openzfs/zfs/issues/15526.  Like:
#
# ./reproducer.sh & ./reproducer.sh & ./reproducer.sh & ./reproducer.sh & wait
#

#if [ $(cat /sys/module/zfs/parameters/zfs_bclone_enabled) != "1" ] ; then
#    echo "please set /sys/module/zfs/parameters/zfs_bclone_enabled = 1"
#    exit
#fi

prefix="reproducer_${BASHPID}_"
dd if=/dev/urandom of=${prefix}0 bs=1M count=1 status=none

echo "writing files"
end=1000
h=0
for i in `seq 1 2 $end` ; do
    let "j=$i+1"
    cp  ${prefix}$h ${prefix}$i
    cp --reflink=never ${prefix}$i ${prefix}$j
    let "h++"
done

echo "checking files"
for i in `seq 1 $end` ; do
    diff ${prefix}0 ${prefix}$i
done
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Ope, it repros on SCALE with 16 parallel threads. :(

Note the Binary files reproducer_44381_0 and reproducer_44381_990 differ error below.

Code:
$ for i in {1..16}; do ./reproducer.sh & done; wait
[1] 44380
[2] 44381
[3] 44382
[4] 44383
[5] 44384
[6] 44386
[7] 44389
[8] 44390
[9] 44393
[10] 44394
[11] 44396
[12] 44399
[13] 44400
[14] 44403
writing files
[15] 44406
writing files
[16] 44407
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
checking files
checking files
checking files
checking files
[1]   Done                    ./reproducer.sh
checking files
checking files
checking files
checking files
checking files
checking files
checking files
Binary files reproducer_44381_0 and reproducer_44381_990 differ
[2]   Done                    ./reproducer.sh
[4]   Done                    ./reproducer.sh
[5]   Done                    ./reproducer.sh
[6]   Done                    ./reproducer.sh
[7]   Done                    ./reproducer.sh
[10]   Done                    ./reproducer.sh
[12]   Done                    ./reproducer.sh
[16]+  Done                    ./reproducer.sh
checking files
checking files
checking files
checking files
checking files
[3]   Done                    ./reproducer.sh
[8]   Done                    ./reproducer.sh
[9]   Done                    ./reproducer.sh
[11]   Done                    ./reproducer.sh
[14]-  Done                    ./reproducer.sh
[15]+  Done                    ./reproducer.sh
[13]+  Done                    ./reproducer.sh

$ zpool get bcloneratio,bclonesaved,bcloneused data-pool
NAME       PROPERTY     VALUE         SOURCE
data-pool  bcloneratio  2.99x         -
data-pool  bclonesaved  7.78G         -
data-pool  bcloneused   3.90G         -
Was this on the "Ivy" or "Kincaid" server in your signature? I ask because I've gone up to 32x and nothing.
 

Juan Manuel Palacios

Contributor
Joined
May 29, 2017
Messages
146
Hi everyone, thank you for bringing me into this new thread!

I'm running TrueNAS Core 13.0-U5.1, and still haven't upgraded to U6 because I knew about this bug before it came out, and wanted to confirm whether or not U6 was impacted by it before pulling the trigger. But, according to the Github issue, someone did manage to reproduce it on a TrueNAS Core system that bundles ZFS 2.1.11-1, which could imply that 13.0-U5.1 is indeed also impacted (though I have zfs-kmod-v2023051000-zfs_0a06f128c, and that person reported zfs-kmod-v2023072100-zfs_0eb787a7e).

Still, when I first get a chance, most probably tomorrow, I'm going to give it a shot at trying to reproduce the problem on my system, utilizing a single disk / single vdev scratch pool, and also a 2-vdev 2-way mirrors pool. I'll report my findings here as soon as I have some data.

In any case, I posted a question in our previous U6 update thread about potential mitigations for this data corruption problem, taken right from the Github discussion, quote:

Is there any potential mitigation you'd recommend at this point? ZFS developers on that issue talk about setting zfs_dmu_offset_next_sync to 0 as their current best guess to avoid the problem entirely, and I'm guessing on FreeBSD that translates into setting the vfs.zfs.dmu_offset_next_sync sysctl variable to 0.

Is that correct? Is that something you recommend? Or is there a reason we may not want to mess with that sysctl variable?

Would anyone know anything about that?

Thank you!

(Cleared RGB tags for readability - Mod)
 
Last edited by a moderator:

bcat

Explorer
Joined
Oct 20, 2022
Messages
84
Was this on the "Ivy" or "Kincaid" server in your signature? I ask because I've gone up to 32x and nothing.
Ah, sorry, I repro'd on "ivy". (Specs in my signature are up to date except that I've swapped out the RAM for 4 x 32 GB = 128 GB, still ECC.) Heading out to family stuff in a bit, but if there's any additional system information I can send, I can supply those later.

In case it helps, I've attached a zpool get all from before I started trying to repro (so no block cloning had actually been done on the pools at that point).
 

Attachments

  • ivy-zpool-get-all.txt
    16.4 KB · Views: 86

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I'm running TrueNAS Core 13.0-U5.1, and still haven't upgraded to U6 because I knew about this bug before it came out, and wanted to confirm whether or not U6 was impacted by it before pulling the trigger.
U5.3:
zfs-2.1.11-1 zfs-kmod-v2023072100-zfs_0eb787a7e

U6:
zfs-2.1.13-1 zfs-kmod-v2023100900-zfs_dd2649a68

Regarding the tunable, I can't say if there is a data safety risk to enabling it, but you're likely to have a performance regression though as I understand disabling that will prevent SEEK_HOLE/SEEK_DATA to be able to get an accurate representation of stretches of empty space in sparse files, so you may spend time reading/writing large stretches of zeroes.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Ah, sorry, I repro'd on "ivy". (Specs in my signature are up to date except that I've swapped out the RAM for 4 x 32 GB = 128 GB, still ECC.) Heading out to family stuff in a bit, but if there's any additional system information I can send, I can supply those later.

In case it helps, I've attached a zpool get all from before I started trying to repro (so no block cloning had actually been done on the pools at that point).
Here's a general question for the audience:

If you can repro this, are you on RAIDZ?
If you can't repro this, are you on mirrors?
 

bcat

Explorer
Joined
Oct 20, 2022
Messages
84
One last bit of info, it's intermittently reproducible for me (on the same system, "ivy" from my sig). Some reproducer script batch runs runs yield no errors, others yield a lot, like this instance:

Code:
$ for i in {1..16}; do ./reproducer.sh & done; wait
[1] 198128
[2] 198129
[3] 198130
[4] 198131
[5] 198132
[6] 198135
[7] 198137
[8] 198139
[9] 198141
[10] 198144
[11] 198145
[12] 198147
[13] 198149
writing files
writing files
writing files
[14] 198154
[15] 198156
writing files
[16] 198158
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
writing files
checking files
checking files
checking files
checking files
checking files
checking files
checking files
checking files
checking files
checking files
checking files
checking files
checking files
checking files
Binary files reproducer_198130_0 and reproducer_198130_24 differ
Binary files reproducer_198130_0 and reproducer_198130_49 differ
Binary files reproducer_198130_0 and reproducer_198130_50 differ
Binary files reproducer_198130_0 and reproducer_198130_99 differ
Binary files reproducer_198130_0 and reproducer_198130_100 differ
Binary files reproducer_198130_0 and reproducer_198130_101 differ
Binary files reproducer_198130_0 and reproducer_198130_102 differ
Binary files reproducer_198130_0 and reproducer_198130_199 differ
Binary files reproducer_198130_0 and reproducer_198130_200 differ
Binary files reproducer_198130_0 and reproducer_198130_201 differ
Binary files reproducer_198130_0 and reproducer_198130_202 differ
Binary files reproducer_198130_0 and reproducer_198130_203 differ
Binary files reproducer_198130_0 and reproducer_198130_204 differ
Binary files reproducer_198130_0 and reproducer_198130_205 differ
Binary files reproducer_198130_0 and reproducer_198130_206 differ
checking files
Binary files reproducer_198130_0 and reproducer_198130_399 differ
Binary files reproducer_198130_0 and reproducer_198130_400 differ
Binary files reproducer_198130_0 and reproducer_198130_401 differ
Binary files reproducer_198130_0 and reproducer_198130_402 differ
Binary files reproducer_198130_0 and reproducer_198130_403 differ
Binary files reproducer_198130_0 and reproducer_198130_404 differ
Binary files reproducer_198130_0 and reproducer_198130_405 differ
Binary files reproducer_198130_0 and reproducer_198130_406 differ
Binary files reproducer_198130_0 and reproducer_198130_407 differ
Binary files reproducer_198130_0 and reproducer_198130_408 differ
Binary files reproducer_198130_0 and reproducer_198130_409 differ
Binary files reproducer_198130_0 and reproducer_198130_410 differ
Binary files reproducer_198130_0 and reproducer_198130_411 differ
Binary files reproducer_198130_0 and reproducer_198130_412 differ
Binary files reproducer_198130_0 and reproducer_198130_413 differ
Binary files reproducer_198130_0 and reproducer_198130_414 differ
Binary files reproducer_198130_0 and reproducer_198130_799 differ
Binary files reproducer_198130_0 and reproducer_198130_800 differ
Binary files reproducer_198130_0 and reproducer_198130_801 differ
Binary files reproducer_198130_0 and reproducer_198130_802 differ
Binary files reproducer_198130_0 and reproducer_198130_803 differ
Binary files reproducer_198130_0 and reproducer_198130_804 differ
Binary files reproducer_198130_0 and reproducer_198130_805 differ
Binary files reproducer_198130_0 and reproducer_198130_806 differ
Binary files reproducer_198130_0 and reproducer_198130_807 differ
Binary files reproducer_198130_0 and reproducer_198130_808 differ
Binary files reproducer_198130_0 and reproducer_198130_809 differ
Binary files reproducer_198130_0 and reproducer_198130_810 differ
Binary files reproducer_198130_0 and reproducer_198130_811 differ
Binary files reproducer_198130_0 and reproducer_198130_812 differ
Binary files reproducer_198130_0 and reproducer_198130_813 differ
Binary files reproducer_198130_0 and reproducer_198130_814 differ
Binary files reproducer_198130_0 and reproducer_198130_815 differ
Binary files reproducer_198130_0 and reproducer_198130_816 differ
Binary files reproducer_198130_0 and reproducer_198130_817 differ
Binary files reproducer_198130_0 and reproducer_198130_818 differ
Binary files reproducer_198130_0 and reproducer_198130_819 differ
Binary files reproducer_198130_0 and reproducer_198130_820 differ
Binary files reproducer_198130_0 and reproducer_198130_821 differ
Binary files reproducer_198130_0 and reproducer_198130_822 differ
Binary files reproducer_198130_0 and reproducer_198130_823 differ
Binary files reproducer_198130_0 and reproducer_198130_824 differ
Binary files reproducer_198130_0 and reproducer_198130_825 differ
Binary files reproducer_198130_0 and reproducer_198130_826 differ
Binary files reproducer_198130_0 and reproducer_198130_827 differ
Binary files reproducer_198130_0 and reproducer_198130_828 differ
Binary files reproducer_198130_0 and reproducer_198130_829 differ
Binary files reproducer_198130_0 and reproducer_198130_830 differ
checking files
[1]   Done                    ./reproducer.sh
[3]   Done                    ./reproducer.sh
[4]   Done                    ./reproducer.sh
[5]   Done                    ./reproducer.sh
[6]   Done                    ./reproducer.sh
[7]   Done                    ./reproducer.sh
[8]   Done                    ./reproducer.sh
[9]   Done                    ./reproducer.sh
[10]   Done                    ./reproducer.sh
[11]   Done                    ./reproducer.sh
[12]   Done                    ./reproducer.sh
[13]   Done                    ./reproducer.sh
[15]-  Done                    ./reproducer.sh
[16]+  Done                    ./reproducer.sh
[2]-  Done                    ./reproducer.sh
[14]+  Done                    ./reproducer.sh

Verifying that the script is in fact triggering block cloning:

Code:
$ zpool get bcloneratio,bclonesaved,bcloneused data-pool
NAME       PROPERTY     VALUE         SOURCE
data-pool  bcloneratio  2.99x         -
data-pool  bclonesaved  7.75G         -
data-pool  bcloneused   3.89G         -

And here is the exact script I'm using. (I had to remove an initial tunable check that I think only works on FreeBSD?)

Code:
#!/bin/bash
#
# Run this script multiple times in parallel inside your pool's mount
# to reproduce https://github.com/openzfs/zfs/issues/15526.  Like:
#
# ./reproducer.sh & ./reproducer.sh & ./reproducer.sh & /reproducer.sh & wait
#

prefix="reproducer_${BASHPID}_"
dd if=/dev/urandom of=${prefix}0 bs=1M count=1 status=none

echo "writing files"
end=1000
h=0
for i in `seq 1 2 $end` ; do
        let "j=$i+1"
        cp  ${prefix}$h ${prefix}$i
        cp --reflink=never ${prefix}$i ${prefix}$j
        let "h++"
done

echo "checking files"
for i in `seq 1 $end` ; do
        diff ${prefix}0 ${prefix}$i
done
 

bcat

Explorer
Joined
Oct 20, 2022
Messages
84
Here's a general question for the audience:

If you can repro this, are you on RAIDZ?
If you can't repro this, are you on mirrors?
This repros for me on RAIDZ2, HDD pool (1 vdev, 5x 12 TB HDDs).

I can try on my small mirrored SSD pool as well.

Edit: So far, unable to repro on a single-vdev 3-drive mirrored SSD pool (normally two drives, but I have a third drive in there as I was in the middle of swapping out the drives for larger ones). Tried a few times with 16/32/64 threads.
 
Last edited:

Juan Manuel Palacios

Contributor
Joined
May 29, 2017
Messages
146
U5.3:
zfs-2.1.11-1 zfs-kmod-v2023072100-zfs_0eb787a7e

U6:
zfs-2.1.13-1 zfs-kmod-v2023100900-zfs_dd2649a68

Regarding the tunable, I can't say if there is a data safety risk to enabling it, but you're likely to have a performance regression though as I understand disabling that will prevent SEEK_HOLE/SEEK_DATA to be able to get an accurate representation of stretches of empty space in sparse files, so you may spend time reading/writing large stretches of zeroes.
Thank you for the info!

Actually, it'd be disabling the tunable because, if I understood correctly, the recommendation is to disable it, and it's already enabled by default:

Code:
-> sysctl vfs.zfs.dmu_offset_next_sync
vfs.zfs.dmu_offset_next_sync: 1


And I've not yet customized it. And, if the only drawback is performance, I'd say it's well worth the added safety in the face of this silent corruption bug. Any other downsides we may want to consider before flipping this sysctl on TrueNAS Core?
 
Top