- Joined
- Feb 6, 2014
- Messages
- 5,112
Hello TrueNAS Community,
We want to inform the community about a data integrity issue that has been recently identified in OpenZFS, that can possibly occur when running a very specific workload. The TrueNAS community and iX have been discussing the issue here and following the discussion with upstream OpenZFS on their GitHub repo.
What Do We Know?
Under a high level of simultaneous read and write operations to the same files, a low probability race condition can occur, where a read operation has the potential to retrieve data that has not been completely written yet. The read operation will then be presented with incorrect or null data. Despite the bug being present in OpenZFS for many years, this issue has not been found to impact any TrueNAS systems. The bug fix is scheduled to be included in OpenZFS 2.2.2 within the next week.
How Was This Identified?
This behavior was originally observed on a non-TrueNAS system accessing ZFS directly from a local filesystem during software compilation. Despite significant testing, iXsystems has been unable to reproduce this condition on TrueNAS CORE or SCALE over remote protocols such as SMB, NFS, or iSCSI.
What Versions Of TrueNAS Are Impacted?
While the initial reports of this issue suggested it may have been limited to the new OpenZFS 2.2.0 release included in TrueNAS SCALE 23.10, further research indicates that this bug exists in earlier versions of OpenZFS, used in both TrueNAS CORE and SCALE.
What Should I Do?
As iXsystems has not been able to replicate the behavior over SMB, NFS, or iSCSI, users who make use of TrueNAS for regular file or block storage operations are unlikely to be impacted. A fix for this bug is expected to be included in forthcoming TrueNAS SCALE 23.10.1 and TrueNAS CORE 13.0-U6.1.
In the interim, users who run applications directly on their TrueNAS machine (such as a SCALE App, or a CORE Jail/Plugin) that make extensive use of local copying functionality may wish to make the following change to an Advanced Setting in TrueNAS if concerned.
CORE: Using System -> Tunables, add
Ensure the checkbox beside Enabled is set. On CORE, the tunable will take effect immediately; on SCALE, you can run the same command from System -> Shell. In both cases, this tunable will persist for future reboots.
Where Can I Follow This Issue?
In addition to the JIRA tickets linked above, you can also watch this thread for updates. Once a fix is integrated into TrueNAS, an email announcement will also be sent to subscribed users, and TrueNAS systems set to check for updates will automatically download the new version.
We want to inform the community about a data integrity issue that has been recently identified in OpenZFS, that can possibly occur when running a very specific workload. The TrueNAS community and iX have been discussing the issue here and following the discussion with upstream OpenZFS on their GitHub repo.
What Do We Know?
Under a high level of simultaneous read and write operations to the same files, a low probability race condition can occur, where a read operation has the potential to retrieve data that has not been completely written yet. The read operation will then be presented with incorrect or null data. Despite the bug being present in OpenZFS for many years, this issue has not been found to impact any TrueNAS systems. The bug fix is scheduled to be included in OpenZFS 2.2.2 within the next week.
How Was This Identified?
This behavior was originally observed on a non-TrueNAS system accessing ZFS directly from a local filesystem during software compilation. Despite significant testing, iXsystems has been unable to reproduce this condition on TrueNAS CORE or SCALE over remote protocols such as SMB, NFS, or iSCSI.
What Versions Of TrueNAS Are Impacted?
While the initial reports of this issue suggested it may have been limited to the new OpenZFS 2.2.0 release included in TrueNAS SCALE 23.10, further research indicates that this bug exists in earlier versions of OpenZFS, used in both TrueNAS CORE and SCALE.
What Should I Do?
As iXsystems has not been able to replicate the behavior over SMB, NFS, or iSCSI, users who make use of TrueNAS for regular file or block storage operations are unlikely to be impacted. A fix for this bug is expected to be included in forthcoming TrueNAS SCALE 23.10.1 and TrueNAS CORE 13.0-U6.1.
In the interim, users who run applications directly on their TrueNAS machine (such as a SCALE App, or a CORE Jail/Plugin) that make extensive use of local copying functionality may wish to make the following change to an Advanced Setting in TrueNAS if concerned.
CORE: Using System -> Tunables, add
- Variable:
vfs.zfs.dmu_offset_next_sync
- Value:
0
- Type:
sysctl
- Type:
Command
- Command:
echo 0 >> /sys/module/zfs/parameters/zfs_dmu_offset_next_sync
- When:
Pre Init
- Timeout:
10
Ensure the checkbox beside Enabled is set. On CORE, the tunable will take effect immediately; on SCALE, you can run the same command from System -> Shell. In both cases, this tunable will persist for future reboots.
Where Can I Follow This Issue?
In addition to the JIRA tickets linked above, you can also watch this thread for updates. Once a fix is integrated into TrueNAS, an email announcement will also be sent to subscribed users, and TrueNAS systems set to check for updates will automatically download the new version.