Old OpenZFS Issue Found and being Resolved

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hello TrueNAS Community,

We want to inform the community about a data integrity issue that has been recently identified in OpenZFS, that can possibly occur when running a very specific workload. The TrueNAS community and iX have been discussing the issue here and following the discussion with upstream OpenZFS on their GitHub repo.

What Do We Know?
Under a high level of simultaneous read and write operations to the same files, a low probability race condition can occur, where a read operation has the potential to retrieve data that has not been completely written yet. The read operation will then be presented with incorrect or null data. Despite the bug being present in OpenZFS for many years, this issue has not been found to impact any TrueNAS systems. The bug fix is scheduled to be included in OpenZFS 2.2.2 within the next week.

How Was This Identified?
This behavior was originally observed on a non-TrueNAS system accessing ZFS directly from a local filesystem during software compilation. Despite significant testing, iXsystems has been unable to reproduce this condition on TrueNAS CORE or SCALE over remote protocols such as SMB, NFS, or iSCSI.

What Versions Of TrueNAS Are Impacted?
While the initial reports of this issue suggested it may have been limited to the new OpenZFS 2.2.0 release included in TrueNAS SCALE 23.10, further research indicates that this bug exists in earlier versions of OpenZFS, used in both TrueNAS CORE and SCALE.

What Should I Do?
As iXsystems has not been able to replicate the behavior over SMB, NFS, or iSCSI, users who make use of TrueNAS for regular file or block storage operations are unlikely to be impacted. A fix for this bug is expected to be included in forthcoming TrueNAS SCALE 23.10.1 and TrueNAS CORE 13.0-U6.1.

In the interim, users who run applications directly on their TrueNAS machine (such as a SCALE App, or a CORE Jail/Plugin) that make extensive use of local copying functionality may wish to make the following change to an Advanced Setting in TrueNAS if concerned.

CORE: Using System -> Tunables, add
  • Variable: vfs.zfs.dmu_offset_next_sync
  • Value: 0
  • Type: sysctl
SCALE: Using System -> Init/Shutdown Scripts add
  • Type: Command
  • Command: echo 0 >> /sys/module/zfs/parameters/zfs_dmu_offset_next_sync
  • When: Pre Init
  • Timeout: 10

Ensure the checkbox beside Enabled is set. On CORE, the tunable will take effect immediately; on SCALE, you can run the same command from System -> Shell. In both cases, this tunable will persist for future reboots.

Where Can I Follow This Issue?
In addition to the JIRA tickets linked above, you can also watch this thread for updates. Once a fix is integrated into TrueNAS, an email announcement will also be sent to subscribed users, and TrueNAS systems set to check for updates will automatically download the new version.
 

marian78

Patron
Joined
Jun 30, 2011
Messages
210
What about old version of FreeNAS (v9.10). Can use fix "vfs.zfs.dmu_offset_next_sync"?
 

marian78

Patron
Joined
Jun 30, 2011
Messages
210

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
The race condition is suspected to be present as far back as Solaris/Illumos ZFS. Applying the mitigation cannot hurt, but if you're still running FreeNAS you should consider upgrading.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I don't think that tunable exists in versions earlier than 2.1.something.
 

blank-user

Cadet
Joined
Dec 2, 2023
Messages
1
I don't think that tunable exists in versions earlier than 2.1.something.
The introduction of the tunable dramatically increased the probability of the bug occurring, but the bug itself actually goes back as far as 2006, it seems. It was just very unlikely to occur (it is a very small window of time, on the order of microseconds). Recent versions of coreutils cp utilize sparse copying in a particular way that made the issue apparent.

Further discussion here
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
A fix has been released nowq..coming soon to your local copy of scale and core..:)
 
Top