Weirdly Slow `smartctl` Output -- Dodgy Drives?

ewhac

Contributor
Joined
Aug 20, 2013
Messages
177
This behavior has been annoying me ever since I built my new box a while ago; namely, getting HDD SMART status via smartctl is slow. I mean, really slow -- between one and two seconds for a full -a or -x report. The output pauses as the various sections are printed out. Moreover, while this is going on, access to the SATA bus is effectively stopped. Almost no I/O transactions are happening while something holds the SATA bus getting the SMART data out. So when smartd does its periodic check, volume access is pretty much stalled until it gets through all six spindles.

Previously, I thought it was just something peculiar about the motherboard (and no, I couldn't tell you how I came to think that). But then the other night I did a smartctl -x by hand on the boot drive, which is a small SSD from a different manufacturer, and the complete report blasted out instantly. So now I'm starting to wonder if there's something amiss with the HDDs.

All SMART reports are clean, and the pools are fine. Indeed, it just successfully completed a resilver when I staged a failed drive test and swapped one of them out with a cold spare.

Anyone ever seen this sort of thing before?
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
Yes, I am seeing this with "HDS722020ALA330 RSD HUA" disks in icy-box enclosures.
It is also pretty slow somewhere in the middle of the boot sequence (FreeNas startup, not bios) where it seems to slowly walk over all the disks a couple of times.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,079
Anyone ever seen this sort of thing before?
No. That is very strange behavior. That model, Hitachi 5K4000, is very old, from around 2011, it is possible that the drives are just performing poorly compared to modern drives without specifically being defective. The other drive you list, Hitachi 5K3000, is of a similar vintage.
Modern drives should spit that data out as fast as the console can scroll, which is almost instantly.
Have you run an array test to see what kind of performance you are getting from the pool? Something like jgreco's array tester might be informative:

solnet-array-test (for drive / array speed) non destructive test
https://forums.freenas.org/index.php?resources/solnet-array-test.1/
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
This behavior has been annoying me ever since I built my new box a while ago; namely, getting HDD SMART status via smartctl is slow. I mean, really slow -- between one and two seconds for a full -a or -x report.

I'm used to around 5 seconds for a SMART report.

Really slow would be 20-300 seconds and would be indicative of drive failure in progress.

Pausing between sections is a result of the stats needing to be loaded off the hard disk's system tracks (usually tracks negative-1 on down, system areas are typically between 1MB-1GB of disk space). Older drives didn't have much RAM and may not have most of the needed active SMART stats areas cached and readily available. Also, there's no good reason to cache details such as previous SMART result status logs either, so even new drives are quite likely doing disk reads to get it.

Different manufacturers have different strategies for accessing the service area. I wouldn't have a hard time believing that the controller might be accessing this data off disk at a very low priority, similar to the disk reads done by SMART short/long tests in the background, as it might even be using the same code. I would expect that to be a more important consideration in older drives, which had less space for firmware, and a larger incentive to share/multipurpose code.

All of this aims in the general direction of "that's not shocking to me, at all."

It does look like recent drives have managed to trim SMART reporting times down to less than half a second on a bunch of newer drives. I guess that's nice. I'm seeing about 0.2 seconds on a bunch of ST6000DX000 and around 0.3 on WD80EMAZ. An older ST4000DM000 is around 0.5. Well there's a counterexample, on a mid-2000's vintage Barracuda 7200.7, there's 0.14 seconds. But I have a whole pile of Hitachi/HGST drives showing up in the 4.5-5 second range too.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,079
It does look like recent drives have managed to trim SMART reporting times down to less than half a second on a bunch of newer drives. I guess that's nice. I'm seeing about 0.2 seconds on a bunch of ST6000DX000 and around 0.3 on WD80EMAZ. An older ST4000DM000 is around 0.5. Well there's a counterexample, on a mid-2000's vintage Barracuda 7200.7, there's 0.14 seconds. But I have a whole pile of Hitachi/HGST drives showing up in the 4.5-5 second range too.
I think it is interesting that the Hitachi/HGST drives are slower than even the old Seagate. Their firmware must give it a lower priority or there is something in their mechanism that makes the data more time consuming to access.
 
Top