Checking for TLER, ERC, etc. support on a drive

NASbox · Oct 31, 2021

@jgreco very nice writeup... Congrats.

Based on your write up, IIUC the same model of external hard drive might contain different models internally - did I get that correctly?

So if I was to buy 3 (as an example) WDBBGB0040HBK-NESN which is an 8TB drive, they could contain 3 different drives???

I'm likely going to have to get mine from CostCo. Good news is if I can interrogate the drive model over the USB3, and I don't like it, I can just take it back.

Also can I plug these into TrueNAS and run badblocks?

I think the WDBBGB0040HBK-NESN is what CostCo is currently selling, so if anyone can share any info on the model it would be much appreciated.

jgreco · Oct 31, 2021

NASbox said:
Based on your write up, IIUC the same model of external hard drive might contain different models internally - did I get that correctly?

So if I was to buy 3 (as an example) WDBBGB0040HBK-NESN which is an 8TB drive, they could contain 3 different drives???

I'll charitably say that you're likely only limited to two different kinds of drives. The buy I did two years ago on Black Friday ended up with this assortment in one machine:

<ATA WDC WD120EMFZ-11 0A81> at scbus3 target 9 lun 0 (pass2,da1)
<ATA WDC WD120EMFZ-11 0A81> at scbus3 target 10 lun 0 (pass3,da2)
<ATA WDC WD120EMFZ-11 0A81> at scbus3 target 11 lun 0 (pass4,da3)
<ATA WDC WD120EMAZ-11 0A81> at scbus3 target 12 lun 0 (pass5,da4)
<ATA WDC WD120EMAZ-11 0A81> at scbus3 target 13 lun 0 (pass6,da5)
<ATA WDC WD120EMAZ-11 0A81> at scbus3 target 14 lun 0 (pass7,da6)
<ATA WDC WD120EMAZ-11 0A81> at scbus3 target 15 lun 0 (pass8,da7)
<ATA WDC WD120EMAZ-11 0A81> at scbus3 target 16 lun 0 (pass9,da8)
<ATA WDC WD120EMFZ-11 0A81> at scbus4 target 0 lun 0 (pass10,da9)
<ATA WDC WD120EMFZ-11 0A81> at scbus4 target 1 lun 0 (pass11,da10)
<ATA WDC WD120EMFZ-11 0A81> at scbus4 target 2 lun 0 (pass12,da11)
<ATA WDC WD120EMFZ-11 0A81> at scbus4 target 3 lun 0 (pass13,da12)

The Reddit datahoarder guys have methods figured out to take a guess based on serial numbers that you can find on the box. I don't expect that to be super-reliable, but maybe better than nothing. I haven't checked your proposed part number but make certain that it isn't an SMR drive -- most 8TB's are now SMR. There's a guide in the Resources section that covers this.

winnielinnie · Oct 31, 2021

NASbox said:
Good news is if I can interrogate the drive model over the USB3, and I don't like it, I can just take it back.

With the external drives I shucked (8TB WDs, MyBook and Elements), I had 100% prediction using smartctl over a USB connection to assess the internal drive (the "real" model, so to speak), and even reveal the helium parameter (22) for He-filled drives.

winnielinnie · Oct 31, 2021

jgreco said:
I haven't checked your proposed part number but make certain that it isn't an SMR drive -- most 8TB's are now SMR.

How is that possible?

No WD Blue nor Red exceed 6TB, and all of their Red Plus and Red Pro are CMR (regardless of capacity.)

From which excess stock would they shove into their plastic USB enclosures for their 8TB+ external drives that would originally have been SMR internal drives?

jgreco · Oct 31, 2021

winnielinnie said:
How is that possible?

No WD Blue nor Red exceed 6TB, and all of their Red Plus and Red Pro are CMR (regardless of capacity.)

From which excess stock would they shove into their plastic USB enclosures for their 8TB+ external drives that would originally have been SMR internal drives?

Well, I guess I don't really care, it's better to check the part and verify it. I know Seagate's 8TB ST8000DM004 is SMR along with some others. I also know that WD Red absolutely comes in capacities greater than 6TB; an easy example is

so even if they've decided to relabel NEW drives with "Plus" and "Pro" designations, that doesn't change existing stock. And there are also 10TB SMR drives out there, though you are not likely to have one.

I do not think it is a bad thing to advise people to check, both to detect issues with their current inventory, but also to educate for future purchases.

winnielinnie · Oct 31, 2021

jgreco said:
I also know that WD Red absolutely comes in capacities greater than 6TB; an easy example is

That was before "SMR gate".

Like you mentioned, they've since re-labeled such drives as "Red Plus" now. So even back then, an 8TB+ purchase of a WD drive is practically a given for CMR.

But I agree, might as well develop a habit of always checking and verifying. Who knows what tricks they might pull in the future.

I'm 4/4 with shucking WD 8TB externals and getting all CMR (two are He-filled are run cooler than their counterparts).

Alex_K · Feb 15, 2022

Wouldn't we want in RAID that timeout to be zero, as in error encountered - mark it unreadable and go on, let ZFS read missing data from other device / recompile from sums and write somewhere else? Why wait 8 seconds?

jgreco · Feb 15, 2022

Alex_K said:
Wouldn't we want in RAID that timeout to be zero,

That's not how the real world works though. It seems like a valid question until you think about what's going on a bit.

First off, there are two high level "timeout" things that are going on here. I want to be clear that these are interrelated in some ways but also independent.

The first is a transaction timeout (going by various names) when a RAID controller issues a disk read request. It has a queue for these, and (not going into gory detail) it basically has an idea of what time it issued a request, so that it can timeout a request that is taking "too long". This lets the controller decide to fail over to the other drive, or parity, or whatever. That's kind of a high level function, and the controller is managing this for multiple queues to multiple drives over multiple channels.

The other is the drive itself. The controller on the drive has its own queue of transactions to run, see "NCQ", and may have a whole bunch of these stacked up at any given time, usually up to 31 transactions. But the thing is, if you enter the queue as the 31st transaction, and you can manage a maximum of 100 IOPS, even under ideal circumstances you're going to be waiting nearly half a second for that answer to pop out. This is completely normal.

But the real problem is when something does NOT go right. The drive cannot find the track, or the sector reads badly, and retries come into play. Not only does this screw with the transaction being processed, but it also screws with ones behind it in the queue. Plus, often, if one sector is bad, the ones near it may be bad too.

So we never want the timeout to be zero, because that would really mean we might never read anything from any drive, because physics means there's some latency in any read operation.

Where redundancy is available, yes, you can rapidly fail over to another disk or parity, but the individual devices have to have some guidance as to that abandonment of effort being desirable. That's that TLER/ERC is all about.

In practice, unless you are mission critical space shuttle launch data must flow, an I/O retry of several seconds is the general consensus for acceptability. Which particular single digit number is acceptable depends on the vendor, but 8 is a common default. The flip side is that if there's a meta-issue, like a power brownout or vibration, you don't want to be too anxious to reject all your results.

Important Announcement for the TrueNAS Community.

Checking for TLER, ERC, etc. support on a drive

NASbox

Guru

jgreco

Resident Grinch

winnielinnie

MVP

winnielinnie

MVP

jgreco

Resident Grinch

winnielinnie

MVP

Alex_K

Explorer

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

Checking for TLER, ERC, etc. support on a drive

Guru

Resident Grinch

MVP

MVP

Resident Grinch

MVP

Explorer

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Checking for TLER, ERC, etc. support on a drive"

Similar threads