Example of SMR ugliness, even on reads

deafen · Aug 16, 2019

I've been unwinding the mistake I made buying a pile of 6TB WD Blues, and noticed this interesting snapshot of just how badly these drives perform, even on read operations.

In the gstat output below, I'm replacing da14 with da1 (a 7200 RPM NL-SAS drive). Some points of note:

da10-da14 are SMR drives. da8 and da9 are PMR SATA drives.
The only drives that exhibit "delete" operations are the SMR drives. Delete ops are issued by the OS to tell the drive that certain blocks are no longer in use; I don't know why this matters for an HDD, and why only the SMR drives are getting these ops. Might be a good question for the zfs devs? Based on the associated latency, I assume that the drive is actually doing writes for these ops. Seems unnecessary.
Because of these slow delete ops (and destaging writes from cache, see below), read latency is awful, making ops stack up in the queue and causing %busy to stay very high. (They're typically all in the red.) Compare this to the %busy for the PMR drives, which don't have to do any housekeeping.
As far as the OS is concerned, write latency is negligible because it's all going to cache. But the extra work required to destage those writes slows down reads, so the real write latency is shifted into the read latency.
This is a resilvering operation, so these are all basically large, sequential ops, mostly 64K and 128K. Even so, the unpredictable behavior of the SMR drives keeps them throttled.
This is the first disk replacement resilver I've seen where the bottleneck is on the read side. Usually the rate is limited by the disk being written to. You can see that in this case, da1 can easily handle what's being thrown at it, and is spending most of its time starving on an empty queue, while the SMR disks struggle to read data fast enough. The PMR drives aren't even breaking a sweat.

Johnnie Black · Aug 16, 2019

Interesting stuff, thanks for posting, but do you know what is making the small writes to the pool? I would expect a resilver to be read only except on the deiver being resilvered, maybe this small amout of writes is enough to cause performance issues with SMR, I would expect no issues if it was really only reads.

deafen · Aug 17, 2019

Johnnie Black said:
... do you know what is making the small writes to the pool?

Good question. There's an ESXi datastore mounted from it, but none of the VMs are powered on. smbstatus shows one client mounting, but no activity. Whatever it is, it looks like it happens about every 5 seconds (or maybe that's just the zfs txg sync?) Regardless, it's sporadic enough not to represent an ongoing problem, IMO.

deafen · Aug 17, 2019

I'll say this, though, when it doesn't have to deal with delete ops, it can be pretty damned fast. Still read-bound, though, and multiple times less efficient than PMR reads (compare latency and %busy between da10 (PMR) and da11 (SMR).

HoneyBadger · Aug 18, 2019

deafen said:
Whatever it is, it looks like it happens about every 5 seconds (or maybe that's just the zfs txg sync?)

That's likely what it is. Just having a ZVOL mounted as a VMFS datastore will cause a small amount of "heartbeat/liveness checking" to hit it. ZFS will copy-on-write the heartbeat LBA across the disk and hopefully not cause it to reshingle as frequently, but the obfuscation of the SMR behavior still means that the drive will still offer inconsistent performance overall.

I do appreciate the amount and detail of the data you're providing on these drives though. Certainly a case of making lemonade.

If you get the chance to free one of them from the array, try hitting it with an ATA_SECURE_ERASE and seeing if that "clears the deck" as far as reshingling/delete operations. Essentially, these drives need to support the SMR equivalent of TRIM. ;)

Arwen · Aug 19, 2019

In my opinion, SMR drives are not suited for some applications, like zVols, small files, or write heavy applications.

For my case, I don't care how long my backup takes to my Seagate 8TB Archive SMR drive. As long as it's reliable, and ZFS lets me validate the backups via scubs.

deafen · Aug 20, 2019

HoneyBadger said:
If you get the chance to free one of them from the array, try hitting it with an ATA_SECURE_ERASE and seeing if that "clears the deck" as far as reshingling/delete operations. Essentially, these drives need to support the SMR equivalent of TRIM. ;)

Now that I've gotten them all swapped for PMR drives (see my post about replacing all the drives at once), I plan to do just that. Since I'm planning on selling them to my coworkers, I wanted to try to "reset" them anyway. I'll get to that later this week.

Important Announcement for the TrueNAS Community.

Example of SMR ugliness, even on reads

deafen

Explorer

Johnnie Black

Guru

deafen

Explorer

deafen

Explorer

HoneyBadger

actually does care

Arwen

MVP

deafen

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Example of SMR ugliness, even on reads

deafen

Explorer

Johnnie Black

Guru

deafen

Explorer

deafen

Explorer

HoneyBadger

actually does care

Arwen

MVP

deafen

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Example of SMR ugliness, even on reads"

Similar threads