Still, I have been unable to reproduce the issue thus far.
Interestingly, he's able to reproduce it with only 4 parallel instances on TrueNAS Core 13.0-U5.3.
This got me thinking, and I decided to approach from a different angle. (Sort of. Read below.)
TL;DR
TL;DR: I am able to somewhat consistently reproduce this on TrueNAS Core 13.0-U6.
No block-cloning involved whatsoever.
The Setup
This is what I did:
- Created a new unencrypted dataset (
pool/playground
)
- This unencrypted dataset has the default options, except for
recordsize=1M
- Created a new 13.2-RELEASE Basejail called "vanilla"
- Made a mountpoint for it that accesses
/mnt/pool/playground
via its internal /media/playground
- In the jail (all proceedings steps are in the jail, by the way), switched to the latest pkg repo
- Installed
bash
, nano
, and coreutils
(9.1)
- Ran the script with only 4 parallel instances while inside the mountpoint's path
/media/playground
The Results
Here is a sample of two results:
Sample #1
Code:
for i in {1..4} ; do ./reproducer.sh & done; wait
[1] 19685
[2] 19686
[3] 19687
[4] 19688
writing files
writing files
writing files
writing files
checking files
checking files
checking files
checking files
Binary files reproducer_29316_0 and reproducer_29316_186 differ
Binary files reproducer_29316_0 and reproducer_29316_373 differ
Binary files reproducer_29316_0 and reproducer_29316_374 differ
Binary files reproducer_13948_0 and reproducer_13948_492 differ
Binary files reproducer_29316_0 and reproducer_29316_747 differ
Binary files reproducer_29316_0 and reproducer_29316_748 differ
Binary files reproducer_29316_0 and reproducer_29316_749 differ
Binary files reproducer_29316_0 and reproducer_29316_750 differ
Binary files reproducer_13948_0 and reproducer_13948_985 differ
Binary files reproducer_13948_0 and reproducer_13948_986 differ
[1] Done ./reproducer.sh
[3]- Done ./reproducer.sh
[4]+ Done ./reproducer.sh
[2]+ Done ./reproducer.sh
Sample #2
Code:
for i in {1..4} ; do ./reproducer.sh & done; wait
[1] 95060
[2] 95061
[3] 95062
[4] 95063
writing files
writing files
writing files
writing files
checking files
checking files
checking files
checking files
Binary files reproducer_29959_0 and reproducer_29959_822 differ
[1] Done ./reproducer.sh
[3]- Done ./reproducer.sh
[4]+ Done ./reproducer.sh
[2]+ Done ./reproducer.sh
What makes this even more insidious is that
sometimes it will result with no corruption. 
(This makes it tricky to reproduce on-demand.)
Additional Info and Evidence
@HoneyBadger: This is on a pool comprised of two mirror vdevs, spinning HDDs (WD Red Pluses, CMR).
System Info:
- TrueNAS Core 13.0-U6
- ZFS: 2.1.13
- coreutils: 9.1
- No block-cloning
- 32 GiB ECC RAM
- Intel Xeon E-2144G, 4 cores, 8 threads
- WD Red Plus HDDs
- 2 x two-way mirrors (a total of 4 spinners)
Because this ran in a jail, I had to modify the script so that it used the correct path for bash (
/usr/local/bin/bash
) and GNU coreutil's "cp" (
/usr/local/bin/gcp
).
Here is the modified version of the script I used, which works in a FreeBSD jail:
Code:
#!/usr/local/bin/bash
prefix="reproducer_${BASHPID}_"
dd if=/dev/urandom of=${prefix}0 bs=1M count=1 status=none
echo "writing files"
end=1000
h=0
for i in `seq 1 2 $end` ; do
let "j=$i+1"
/usr/local/bin/gcp ${prefix}$h ${prefix}$i
/usr/local/bin/gcp ${prefix}$i ${prefix}$j
let "h++"
done
echo "checking files"
for i in `seq 1 $end` ; do
diff ${prefix}0 ${prefix}$i
done
What confuses me is that I could not reproduce this on the TrueNAS host using the FreeBSD base system's "cp". I'm not sure what that means, or if it's a red herring.
The cherry on top is the proof of data corruption during the copy operation. Behold:
Code:
hexdump reproducer_29316_186
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0100000
Code:
du -hs reproducer_29316_186
512B reproducer_29316_186
That is absolutely
not the original urandom'd 1 MiB file. It's basically a barrage of zero's.
For reference, this is the original file:
Code:
du -hs reproducer_29316_0
1.0M reproducer_29316_0
A hexdump also confirms the theory.
So basically, like some others in the GitHub bug report and in here, I just copied a bunch of files that
silently corrupted, and I would never know about it had I not been actively trying to reproduce the bug.
Now I'm asking myself,
"What's the likelihood this already happened to me?"
The above corruption was with 4 parallel instances of the script. I realize it's A LOT of I/O happening in such a small cluster, and such scenarios are unlikely for home users.
But I still have my concerns, as you can tell.
It's possible that this bug has existed for longer than suspected, and might have inadvertently returned with OpenZFS 2.1.5, when a tunable was enabled by default:
zfs_dmu_offset_next_sync
So this obviously isn't exclusive to block-cloning or OpenZFS 2.2.0.
Perhaps this "silent corruption" bug indeed has a common source? Maybe block-cloning "enhances" its potential occurrence? An emergency "fix" in the meantime is that they basically
disabled block-cloning (by default) with version 2.2.1. But that doesn't address the bug for OpenZFS 2.1.x, and with no block-cloning involved.