Check your S.M.A.R.T. Tests...

rungekutta

Contributor
Joined
May 11, 2016
Messages
146
I may be looking to put up some drives for sale soon, so used smartctl from the command line to check in on status. I coincidentally noticed that last tests that ran automatically were nearly a year old - so I checked in on the configured tasks in the GUI, and no drives were selected.

I presume this happened through one of the previous upgrades. I've taken this system from 9.10 to current 11.3, and most things have survived intact, but evidently not this...

While at it, I made sure that scrubbing tasks and snapshot settings looked ok still.

Just though I'd mention in case someone else has the same (silent) problem. I guess this would have meant I wouldn't get alerted on SMART errors, until a drive is bad enough to actually go offline and degrade the pool.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Worth noting is that a replaced disk (even if da5 is replaced with da5 in terms of how the new disk is named) will not automatically be added (back) to the SMART tests which had the drive originally selected.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Worth noting is that a replaced disk (even if da5 is replaced with da5 in terms of how the new disk is named) will not automatically be added (back) to the SMART tests which had the drive originally selected.
Indeed, because the drives are identified in the config database by serial number, not just by device ID. But I wonder if this is still the case in 11.3, as I see that there's an option for "all" when you're scheduling SMART tests. What I don't know is whether that's stored internally as "all" (in which case this shouldn't continue to be a problem), or whether it just converts to a list of disks (in which case it will).

And OP, this is why running a reporting script (like this one: https://github.com/edgarsuit/FreeNAS-Report) on a regular basis can be helpful--you'll spot these issues much earlier.
 

rungekutta

Contributor
Joined
May 11, 2016
Messages
146
And OP, this is why running a reporting script (like this one: https://github.com/edgarsuit/FreeNAS-Report) on a regular basis can be helpful--you'll spot these issues much earlier.
That looks sweet. FreeNAS should incorporate something like that into the product itself.

The script seems to get a bit confused about “Seek error health” at least on my drives (WD Red and Seagate NAS). Clearly how those are reported and therefore also against what threshold varies by manufacturer but the script seems to be taking them literally and compare against its own hardcoded thresholds (for green, warning or error)..? In any case very helpful and if I can find the time I might poke around a bit with this myself to see if I can improve it further.
 

rungekutta

Contributor
Joined
May 11, 2016
Messages
146
A quick google later... Here’s how to read Seagate’s Seek Error Rate:

Short version: the raw value contains both the total number of seeks errors (first 16 bits) then the total number of seeks (following 32 bits) in a 48 bit number. So the components can be split out to get an exact percentage of seek errors. That is what the normalized value does... represented as such:

90 — <= 1 error per 1000 million seeks
80 — <= 1 error per 100 million
70 — <= 1 error per 10 million
60 — <= 1 error per million
50 — 10 errors per million
40 — 100 errors per million
30 — 1000 errors per million
20 — 10 errors per thousand

I think Seagate has set the “error” threshold on anything <30.

Western Digital no doubt have different definitions of raw and normalized values and thresholds.
 

guermantes

Patron
Joined
Sep 27, 2017
Messages
213
Top