Several Questions About SMART Tests (in TrueNAS v13)

Macaroni323

Explorer
Joined
Oct 8, 2015
Messages
60
How can you read the SMART test status to see if a test is active currently on a drive on the webGUI?
... On the shell (command line)?
... What do you look for?

Is the SMART "Offline" test considered the "BadBlock" test (mentioned in some other posts) that aggressively writes and reads to/from each sector of a drive?
... Is the "Offline" test destructive of data?
... Is this test a good alternative to drive vendor's maintenance application (Western Digital Dashboard) "extended" SMART test.

Which SMART tests (if any) actually move data to test a sector, and then replace it?
... Is the data replaced in that same sector, or moved to another sector?

Are drives functional in the pool during their LONG and SHORT or CONVEYANCE tests (I assume yes since they're not "offline")?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
How can you read the SMART test status to see if a test is active currently on a drive on the webGUI?
... On the shell (command line)?
... What do you look for?
Use smartctl -a /dev/ada0 and look near the start of the SMART data section. See my example, not the test is 90% remaining:
Code:
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                (    0) seconds.


Is the SMART "Offline" test considered the "BadBlock" test (mentioned in some other posts) that aggressively writes and reads to/from each sector of a drive?
... Is the "Offline" test destructive of data?
... Is this test a good alternative to drive vendor's maintenance application (Western Digital Dashboard) "extended" SMART test.
The SMART Offline test is not BadBlocks. The SMART Short/Extended/Offline tests are all read-only tests and perform the absolute minimum testing to validate the drive is operational. I have not looked at a WD application in over a decade so I have no idea if there is any value to it, but I wouldn't hold my breath. If it were valuable then I think people would be talking about it here.

Which SMART tests (if any) actually move data to test a sector, and then replace it?
... Is the data replaced in that same sector, or moved to another sector?
None. However during a read operation, if the drive has difficulty reading the data but still can read it and write/read it again (depend on the manufacturer exactly what they do but this give you a proper idea of what is happening), you will see Pending Sectors count increase. If this continues to happen, after so long the drive electronics will determine the sector a problem and relocate the data and then map the failed sector out. I've never seen/heard of this happen during a SMART Self-test.

Are drives functional in the pool during their LONG and SHORT or CONVEYANCE tests (I assume yes since they're not "offline")?
Yes, and if you request data or write data, that has priority over the SMART Self-test. This means that if a SMART Long test normally takes 12 hours to run to completion without any data requests, in a busy server that same test could add a few hours to the overall test time. This is fully intentional.
 
Joined
Oct 22, 2019
Messages
3,641
How can you read the SMART test status to see if a test is active currently on a drive on the webGUI?
The TrueNAS GUI tries to approximate the "test progress". If you click on the "Tasks" clipboard icon, you'll see the progress of a currently running SMART selftest. Keep in mind, this is not a true "progress meter", since it can only guestimate when the test will complete. (It may even occasionally poll the SMART data to update in 10% increments.)
 

Macaroni323

Explorer
Joined
Oct 8, 2015
Messages
60
Thanks Joe and Winnie...

Issuing "smartctl -a /dev/ada1" it looks like these lines are previous SMART run times (4TB WD Red)

Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 463) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
 

Macaroni323

Explorer
Joined
Oct 8, 2015
Messages
60
It is "odd" that the smartctl -a /dev/adaX shows "Extended Offline" and "Short Offline". I was assuming that the drives are taken offline. Hence the question above.
 

Macaroni323

Explorer
Joined
Oct 8, 2015
Messages
60
There was discussion on a "badblock" test... Anyone know how to perform that?

It is possible that I could get SMART to register possible "badblocks" by doing a drive wipe (I have a new drive installed and would like to burn it in before applying it to the data pool).
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

Macaroni323

Explorer
Joined
Oct 8, 2015
Messages
60
Ahhh... Thanks Dan. That's the same way the WD application functions. Takes the disk offline and runs the test. Might be easier to pull the disk and run it on a separate machine (as I was doing previously but thought maybe there was a better way) rather than trying to run from SSH..

Checked "Uncle Fester" for badblock but didn't see it. But now I see it is there but search didn't seem to find it.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Once BadBlocks is complete, the drive has no data on it because it is a destructive test.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
The problem with SMART tests is that they are a bit like an electrocardiogram. In other words: The absence of a SMART error does not mean anything in terms of how long the drive will continue to live. It can literally die on you the the next minute.

A ZFS scrub is way more thorough and I have had a number of cases over the last years where SMART reported no issues, while ZFS had thousands of errors. And in all cases the hard drive was replaced by the manufacturer without any hesitation.

My totally unscientific conclusion, and I would be interested in comments from the more senior guys here, is to still run SMART test daily/weekly. But I put my trust solely on ZFS scrubs.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Christoph said it. SMART when it was developed, the intent was to predict drive failure within the next 24 hours or less. It cannot predict some failures but it's better than nothing. If there was a better way, I'm sure it would be done by now, or maybe it's coming?

As for the big difference between SMART and Scrubs, SMART reads the entire disk surface, a Scrub reads only locations with data. If you had a pool with a small amount of data then a Scrub would occur very fast but it will not have tested much of the drive.

But data corruption is where Scrubs shine, that is what they are for, to verify your data is bit for bit correct, and fix it if it can.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
A ZFS scrub is way more thorough
I think it's more accurate to say that they're just testing different things. For just one example, a long SMART test tests every block on the disk, though it tests them only internally to the disk. A scrub tests only those blocks that contain data, but it tests them end-to-end, involving the cables, controller, and everything else in the data path. Neither will tell you definitively that your disk is fine, but both will alert you to problems--and to different problems--on the disk.
 
Top