Resource icon

Checking for TLER, ERC, etc. support on a drive

One of the problems with consumer-grade hard drives is that most of them will hang in the event that they run into an error, and will internally retry the operation, possibly for a minute or more. For a desktop PC, where redundancy does not exist, this is the correct course of action, because failure of a sector means loss of the data.

Enterprise class drives typically support the ability to limit the amount of time a drive wastes trying to recover data. Most of these drives are used in RAID arrays, and so in the event of a failure, the data can be recovered from parity. A drive encountering read errors cannot be allowed to hang for large amounts of time, because this stalls whatever the server is trying to do. So manufacturers include features to control the retries of failures.

For Western Digital, this is called TLER - Time-Limited Error Recovery. Great PDF.

For Seagate, it is called ERC - Error Recovery Control.

Samsung and Hitachi call it CCTL.

Some people are confused and think that these features are only necessary for hardware RAID, or aren't useful for software RAID. It is absolutely true that this is a very important feature for hardware RAID, because a hardware RAID controller is probably configured to deem a "hung" hard drive as failed and to place it in an offline or recovery status, which has many negatives associated with it. So you absolutely do want TLER/ERC/etc for a hardware RAID setup.

But what about ZFS?

If you've got a ZFS pool, and your underlying disk device appears to hang for a minute, you probably stop serving up data. This is likely to be bad behaviour for a filer. Unlike a hardware RAID controller, ZFS will typically wait for the command to complete, and if it is trying to read many sectors, this could take a very long time. So TLER/ERC/etc are also desirable properties for a ZFS system.

We've been thrilled in recent years to see the addition of "NAS" class hard drives, which are essentially conventional consumer-grade hard drives that have firmware that defaults to supporting TLER/ERC.

You can verify that a drive has TLER/ERC turned on by probing it with smartctl.

Code:
# smartctl -l scterc /dev/ada0
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p8 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled


That doesn't have it.

Code:
# smartctl -l scterc /dev/ada4
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p8 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)


That does, and it's set to a typical 7 seconds. Further, the same command can be used to try to set ERC.

Code:
# smartctl -l scterc,80,80 /dev/ada4
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p8 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control set to:
           Read:     80 (8.0 seconds)
          Write:     80 (8.0 seconds)


Some hard drives may not come with TLER/ERC enabled by default but can have it turned on regardless. If you try this, make sure to power cycle the drive to make sure the setting sticks around. It's hard to test for TLER/ERC working correctly without actually encountering a bad drive, however.

[2015-02-10] : I note that we just picked up some Samsung ST2000LM003 2.5" 2TB drives which appear to allow TLER to be set, but the setting appears to do nothing and isn't persistent. I happened to luck out in that a drive failed SMART testing with a bad sector and was therefore easily tested.

I'll be pruning responses to this thread, but if you have useful information to share, I may update this post and credit you.
Author
jgreco
Views
966
First release
Last update
Rating
5.00 star(s) 1 ratings

More resources from jgreco

Latest reviews

Useful to integrate "white label" drives in NAS arrays.
Top