Update: WD Red SMR Drive Compatibility with ZFS

Status
Not open for further replies.

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Thanks to the FreeNAS community, we uncovered and reported on a ZFS compatibility issue with some capacities (6TB and under) of WD Red drives that use SMR (Shingled Magnetic Recording) technology. Most HDDs use CMR (Conventional Magnetic Recording) technology which works well with ZFS. Below is an update on the findings and some technical advice.

Identifying SMR Drives

First, to help provide more clarity for its customers, Western Digital has rebranded its product line.

WD Red TM Pro drives are CMR based and designed for higher intensity workloads. These work well with ZFS, FreeNAS, and TrueNAS.​
WD RedTM Plus is now used to identify WD drives based on CMR technology. These work well with ZFS, FreeNAS, and TrueNAS.​
WD RedTM is now being used to identify WD drives using SMR, or more specifically, DM-SMR (Device-Managed Shingled Magnetic Recording). These do not work well with ZFS and should be avoided to minimize risk.​

WD Red drives manufactured before August 2020 may use CMR or SMR technology and must be identified based on size and product code. You may find 2, 3, 4, and 6TB WD Red models that are either the previous generation CMR models or current generation SMR models. Realizing this is a little difficult to track, Western Digital has provided a chart to identify all of their WD Red family drives with DM-SMR. Below are the rules to make a determination:

All WD Red Pro HDDs, WD Red Plus HDDs, and all higher capacity (greater than 6TB) WD Red HDD models use CMR technology.

Smaller WD Red drives (6TB and under) that use “EFRX” in their product codes are also CMR based and are renamed as WD Red Plus.

WD Red drives (6TB and under) from their newest generation, released in late 2018, use DM-SMR. The 2TB, 3TB, 4TB, and 6TB WD Red DM-SMR drives can be identified by the letters “EFAX” in their product code. These drives should be avoided for use with ZFS wherever possible.

There is an excellent SMR Community forum post (thanks to Yorick) that identifies SMR drives from Western Digital and other vendors. The latest TrueCommand release also identifies and alerts on all WD Red DM-SMR drives.

The new TrueNAS Minis only use WD Red Plus (CMR) HDDs ranging from 2-14TB. Western Digital’s WD Red Plus hard drives are used due to their low power/acoustic footprint and cost-effectiveness. They are also a popular choice among FreeNAS community members building systems of up to 8 drives.

WD Red DM-SMR Drives and ZFS

iXsystems and Western Digital have been working to identify and resolve the ZFS compatibility issue with the WD Red DM-SMR drives. With the testing and analysis done so far, we can confirm:
  1. In normal operation, with a light file workload, there are little perceptible performance differences between CMR and DM-SMR drives.
  2. WD Red DM-SMR drives perform better when data is written in chunks that are larger than 64K. Sequential WRITES are also handled better. Smaller, random WRITES require a Read-Modify-Write sequence within the DM-SMR drive, which is significantly slower. Where resilvering is done with smaller WRITES, it will take much longer with DM-SMR drives. In contrast, CMR drives can perform well with WRITES that are 4K or higher.

  3. WD Red DM-SMR drives may exhibit problems if there are large numbers of TRIM commands that are issued by the file system. The TRIM commands are acknowledged and accepted, but the queue may grow too large and overflow. Unlike a CMR drive, the DM-SMR drive does a lot of work to free up space with each TRIM command. Many TRIM commands may be issued when a large ZFS dataset is deleted.
  4. When the TRIM commands overflow, the drive might become unresponsive where it cannot handle normal I/O and will return IDNF responses. At this point, I/Os to the drive should be stopped by disabling all the services (NFS, SMB, iSCSI, etc), maintaining power to the drive, and allowing the drive to complete its TRIM queue and garbage collection functions. This process can take many hours or days but is the safest way of ensuring that data is not lost.

Mitigating the DM-SMR issues with ZFS

Both iXsystems and Western Digital treat data loss as a serious event. Given these findings, we cannot recommend the use of these WD Red DM-SMR drives in a FreeNAS or TrueNAS system. However, if you do find that you have these drives in an existing system and cannot replace them, there are some ways to potentially mitigate the DM-SMR issues:
  1. Disable TRIM on pools with the DM-SMR drives. Disabling TRIM reduces the risk of the drives entering the unresponsive state where I/O cannot be completed. Disabling TRIM will have a negative impact on long-term drive performance but will enable the drives to operate more safely. On FreeNAS 11.3 this is done with setting “Sysctl” “vfs.zfs.trim.enabled=0” via tuneables. On TrueNAS 12.0, TRIM is disabled by default, but can be enabled via the pool webUI in TrueNAS 12.0 U1. You can check via the CLI that “zpool get autotrim” returns the value “off”.
  2. If possible, use smaller VDEVs. Mirrors are best and VDEVs with less than 4 drives are better. These actions will increase I/O sizes and reduce the resilver times significantly.
  3. Use a ZFS dataset record size and ZVOL block size that is large enough to force larger than 64K writes to each drive. If you have a ZFS RaidZ VDEV of <5 data drives, use 256K or higher. If your VDEV has more drives, use 512K or higher.
  4. Within ZFS, there is a parameter which is called “allocators” which determines roughly the parallelism of WRITES to a drive. By setting the sysctl vfs.zfs.spa_allocators = 1” via tuneables in the webUI, the randomness of WRITES is reduced and this improves the performance of the SMR drives.
  5. Upgrading to TrueNAS 12.0 with OpenZFS 2.0 may be beneficial. There are algorithmic changes which change TRIM from sending immediate (synchronous) TRIM commands to sending background asynchronous TRIM commands where smaller TRIMs may be skipped. AutoTRIM can be disabled in favor of manual or scheduled TRIM tasks, but these manual TRIM tasks may overwhelm a DM-SMR drive. In addition, TrueNAS 12.0 includes Asynchronous Copy-on-Write (CoW), which reduces the number of smaller WRITES in some access patterns. These improvements may improve performance of DM-SMR drives but have yet to be validated. In the meantime, it is recommended that TRIM be disabled.
It should be noted that the above recommendations have not been tested in a large population of systems to see whether they are sufficient to avoid future issues. They are provided as technical advice to assist in a transition away from these DM-SMR drives. iXsystems recommends that WD Red DM-SMR drives be avoided for use with ZFS wherever possible.

If you have DM-SMR drives…

Any existing systems with these DM-SMR drives should have a backup strategy, as should all systems with valuable data. If anyone experiences the IDNF errors with FreeNAS or TrueNAS CORE, please contact us and we will advise on how best to handle the situation.

Any FreeNAS Mini users with DM-SMR drives will be covered under the standard warranty and drives can be replaced. We are committed to the long-term stable operation of your systems. Please contact iXsystems support if you have any issues or concerns.

If you have any questions about what to use for your next TrueNAS system, please use the community forum or contact iXsystems.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
On TrueNAS 12.0, TRIM is disabled by default, but can be enabled via the pool webUI in TrueNAS 12.0 RELEASE.

Defaulting to TRIM disabled is a pretty major change for users with all-flash setups. Is there a data integrity reason for this being disabled out of the box?

For those who want it back on - where is this toggle in the webUI? I don't see it under the pool configuration even when exposing advanced options.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
My mistake.. I will correct the post. The UI update is planned for 12.0 U1.


The reason for being disabled is that the TRIM behavior changed significantly from 11.3. Without ability to test with every type of SSD, we took the conservative approach of disabling by default and then re-enable explicitly. It can be enabled in 12.0 RELEASE with a zpool command.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
The reason for being disabled is that the TRIM behavior changed significantly from 11.3. Without ability to test with every type of SSD, we took the conservative approach of disabling by default and then re-enable explicitly. It can be enabled in 12.0 RELEASE with a zpool command.

I assume it's related to the use of queued TRIM where advertised by the device - and certain devices advertise it but don't handle it well, from a "treats it as non-queued and stalls I/O" to "loses random data" (Samsung, looking in your general direction on that last one.)

Fair measure keeping it disabled. Let me know if i can help, but my SSD stable is a little monotone right now (almost exclusively Intel DC series)
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,828
.... I standardized around intel SSDs for a reason... :smile: the only non-intel SSD is for the L2ARC....
 
Status
Not open for further replies.
Top