Check your S.M.A.R.T. Tests...

rungekutta · May 18, 2020

I may be looking to put up some drives for sale soon, so used smartctl from the command line to check in on status. I coincidentally noticed that last tests that ran automatically were nearly a year old - so I checked in on the configured tasks in the GUI, and no drives were selected.

I presume this happened through one of the previous upgrades. I've taken this system from 9.10 to current 11.3, and most things have survived intact, but evidently not this...

While at it, I made sure that scrubbing tasks and snapshot settings looked ok still.

Just though I'd mention in case someone else has the same (silent) problem. I guess this would have meant I wouldn't get alerted on SMART errors, until a drive is bad enough to actually go offline and degrade the pool.

sretalla · May 18, 2020

Worth noting is that a replaced disk (even if da5 is replaced with da5 in terms of how the new disk is named) will not automatically be added (back) to the SMART tests which had the drive originally selected.

danb35 · May 18, 2020

sretalla said:
Worth noting is that a replaced disk (even if da5 is replaced with da5 in terms of how the new disk is named) will not automatically be added (back) to the SMART tests which had the drive originally selected.

Indeed, because the drives are identified in the config database by serial number, not just by device ID. But I wonder if this is still the case in 11.3, as I see that there's an option for "all" when you're scheduling SMART tests. What I don't know is whether that's stored internally as "all" (in which case this shouldn't continue to be a problem), or whether it just converts to a list of disks (in which case it will).

And OP, this is why running a reporting script (like this one: https://github.com/edgarsuit/FreeNAS-Report) on a regular basis can be helpful--you'll spot these issues much earlier.

rungekutta · May 19, 2020

danb35 said:
And OP, this is why running a reporting script (like this one: https://github.com/edgarsuit/FreeNAS-Report) on a regular basis can be helpful--you'll spot these issues much earlier.

That looks sweet. FreeNAS should incorporate something like that into the product itself.

The script seems to get a bit confused about “Seek error health” at least on my drives (WD Red and Seagate NAS). Clearly how those are reported and therefore also against what threshold varies by manufacturer but the script seems to be taking them literally and compare against its own hardcoded thresholds (for green, warning or error)..? In any case very helpful and if I can find the time I might poke around a bit with this myself to see if I can improve it further.

rungekutta · May 19, 2020

A quick google later... Here’s how to read Seagate’s Seek Error Rate:

NodeSupport / Guides / Internode Members Webspace End of Life | Internode

www.users.on.net

Short version: the raw value contains both the total number of seeks errors (first 16 bits) then the total number of seeks (following 32 bits) in a 48 bit number. So the components can be split out to get an exact percentage of seek errors. That is what the normalized value does... represented as such:

90 — <= 1 error per 1000 million seeks
80 — <= 1 error per 100 million
70 — <= 1 error per 10 million
60 — <= 1 error per million
50 — 10 errors per million
40 — 100 errors per million
30 — 1000 errors per million
20 — 10 errors per thousand

I think Seagate has set the “error” threshold on anything <30.

Western Digital no doubt have different definitions of raw and normalized values and thresholds.

guermantes · May 19, 2020

danb35 said:
this is why running a reporting script (like this one: https://github.com/edgarsuit/FreeNAS-Report) on a regular basis can be helpful--you'll spot these issues much earlier.

I had totally missed that script. Very useful. Thanks for mentioning it!

Important Announcement for the TrueNAS Community.

Check your S.M.A.R.T. Tests...

rungekutta

Contributor

sretalla

Powered by Neutrality

danb35

Hall of Famer

rungekutta

Contributor

rungekutta

Contributor

NodeSupport / Guides / Internode Members Webspace End of Life | Internode

guermantes

Patron

Similar threads

Important Announcement for the TrueNAS Community.

Check your S.M.A.R.T. Tests...

rungekutta

Contributor

sretalla

Powered by Neutrality

danb35

Hall of Famer

rungekutta

Contributor

rungekutta

Contributor

NodeSupport / Guides / Internode Members Webspace End of Life | Internode

guermantes

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Check your S.M.A.R.T. Tests..."

Similar threads