SMART Test Failed - How to see any details?

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
Maybe I just need more coffee, but where on earth do I view the actual results?

Storage > Pools > Status shows all drives online and good

Disks > da10 > Dropdown > SMART Test Results doesn't seem to show me any results. Am I missing something here? this is what I see

1665843469356.png


So I went into CLI and figured I could maybe see the results there, but still no

Code:
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p14 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              H7280A520SUN8.0T
Revision:             PAG1
Compliance:           SPC-4
User Capacity:        7,865,536,647,168 bytes [7.86 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Formatted with type 1 protection
8 bytes of protection information per logical block
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca260bd5228
Serial number:        001649PB3R5V        VLKB3R5V
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sat Oct 15 09:18:21 2022 CDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     36 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 45940:36
Manufactured in week 49 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  37
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1940
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 21759842658025472

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       23         0        23   18569405    1390483.328           0
write:         0       13         0        13   12180425     902791.191           0
verify:        0        0         0         0      84772          1.273           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Failed in segment -->       3   45931                 - [0x1 0x5d 0xfd]
# 2  Background short  Completed                   -   45907                 - [-   -    -]
# 3  Background short  Completed                   -   45883                 - [-   -    -]
# 4  Background short  Completed                   -   45859                 - [-   -    -]
# 5  Background short  Completed                   -   45835                 - [-   -    -]
# 6  Background short  Completed                   -   45811                 - [-   -    -]
# 7  Background short  Completed                   -   45787                 - [-   -    -]
# 8  Background short  Completed                   -   45763                 - [-   -    -]
# 9  Background short  Completed                   -   45739                 - [-   -    -]
#10  Background short  Completed                   -   45715                 - [-   -    -]
#11  Background short  Completed                   -   45691                 - [-   -    -]
#12  Background short  Completed                   -   45667                 - [-   -    -]
#13  Background short  Completed                   -   45643                 - [-   -    -]
#14  Background short  Completed                   -   45619                 - [-   -    -]
#15  Background short  Completed                   -   45595                 - [-   -    -]
#16  Background short  Completed                   -   45571                 - [-   -    -]
#17  Background short  Completed                   -   45547                 - [-   -    -]
#18  Background short  Completed                   -   45523                 - [-   -    -]
#19  Background short  Completed                   -   45499                 - [-   -    -]
#20  Background short  Completed                   -   45475                 - [-   -    -]

Long (extended) Self-test duration: 63865 seconds [1064.4 minutes]


I can see there that it failed on segment 3, but what the heck is segment 3? Is there a way to drill down further?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
You should run a SMART Long/Extended test from the Command line using smartclt -t long /dev/da10 and let it run for the entire almost 18 hours then check the SMART information again. It may pass but then again it may fail. Should it fail then I'd replace the hard drive.

Remember that the SMART test is telling you that your drive is failing generally in advance of a terrible failure.
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
Question is though, how do I see the information? For all I know "Segment 3" could be something completely mundane

Or will running a long test give me that information?

For a NAS based OS, its sure tough to see hard drive health!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I do not know what Segment 3 specifically addresses but a Short test does a check of the electronics in the drive and then commences a media test in specific areas of the disk. Most drives I've dealt with will tell you the LBA that failed first and abort the test. So I'm not sure if Segment 3 means some part of the drive electronics has failed or if it's a specific area on the media that failed.

Running a Long test will also perform the drive electronics test and then perform a complete media test from the inside outward which is why it will take almost 18 hours to complete, provided it passes. You can do a check of the drive as often as you like using smartctl -a for the drive and check the status of the test. If it fails in less than a minute then maybe it's the electronics. If it fails after that then it must be a media failure.

Regardless, any failure is a bad one, nothing in mundane.
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
I kicked off a long test, I do hope that actually gives me the information. I have an identical disk in another system, I wonder if segment 3 would line up with line 3 on the SMART data

1665846907758.png


Hopefully at some point TrueNAS can get with the program and actually include useful information. Its crazy this is not just listed in the UI. Even dropping to CLI doesn't give us the information we need

It would be like taking your car to the mechanic and he tells you your engine needs replacing, but refuses to say which part is the problem and you just have to trust him it needs to be replaced
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Hopefully at some point TrueNAS can get with the program and actually include useful information. Its crazy this is not just listed in the UI. Even dropping to CLI doesn't give us the information we need
Actually TrueNAS did tell you that you had a problem and informed you of the test which failed and the drive impacted, Short Test on da1. You are expecting too much of a piece of software. From there it's the responsibility of the administrator of the system to troubleshoot the problem further, and I gave you a short explanation of what the Short test is. Also when you look at the data the drive provides, it's pretty limited data. That is the manufacturers of the drive, not TrueNAS.
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
How am I expecting too much of an operating system designed to interact with hard drives? Its pretty basic functionality to just give me SMART information. Countless applications in Windows and Linux will give it to you, like above with the screenshot from HD Sentinel. I get it, we all love TrueNAS, but this is a pathetic excuse for diagnostics on a NAS OS. If a feature is bad or missing, we can and should call it out

Its up to me to troubleshoot the problem, yeah. But the pool is healthy and the drive shows no errors in the pool stats. TrueNAS just threw out that there is a SMART error and then doesn't tell you what it saw or why it determined it was an issue. How do you troubleshoot an issue with no error code?

The data is only limited in TrueNAS, all other applications (HD Sentinal, HD Tune Pro, Stablebit scanner, etc) give me a nice full SMART table with all the attributes. TrueNAS does not.

The fact I have to go into the CLI to even see slightly more (Albeit useless) information is weird too, especially when there is a page called "SMART Test Results" and then it doesn't even give me the results

The page itself is even broken "Background shor" - What the heck? I guess there is a character limit, but damn! Looks like hell and gives no information. May as well just ditch the page

1665856282449.png
 
Joined
Oct 22, 2019
Messages
3,641
The page itself is even broken "Background shor" - What the heck? I guess there is a character limit, but damn! Looks like hell and gives no information. May as well just ditch the page


I'm just going to leave this right here...
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
100% with you there!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Even dropping to CLI doesn't give us the information we need
Why do you "need" more information than "the drive failed its self-test"? Yes, other tools give more information. smartmontools doesn't give much detail with SAS drives; it does more with SATA. But you know that the drive failed its short self-test. You can (and I'd concur with @joeschmuck in recommending you do) run a long self-test and see what results that gives. Surely that's ample information to decide what to do with the drive.
But the pool is healthy and the drive shows no errors in the pool stats.
So what? I've never understood why people think these two things are even remotely related--and it's surely not just you; you aren't the first by a long shot and certainly won't be the last. But it can't (or at least shouldn't) be surprising that a drive self-test would test (and fail on) a sector without data--and therefore ZFS wouldn't show any issue at all.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
The data is only limited in TrueNAS, all other applications (HD Sentinal, HD Tune Pro, Stablebit scanner, etc) give me a nice full SMART table with all the attributes. TrueNAS does not.
I'll tell you, I want to say some unkind words but as a moderator here it's my job to try to keep the peace. Instead I will just lay out a few facts.

1. TrueNAS is not a hard drive diagnostic program, it's a NAS program. The distribution has built-in diagnostic tools that a user can use to troubleshoot the problem.
2. The programs you listed are all Drive Diagnostics programs, not an OS. Additionally the diagnostic program you have the screen shot tells you nothing more about the failure, it clearly indicates it had multiple Self Test failures with a error count of "3", there were no other failing indications at this time.
3. I don't know of any OS that provides that kind of data upon failure. Just because I don't know doesn't mean it doesn't exist, so I'm open to learn which OS does.
4. Windows OS is a very mature OS (even if you may not like it, and many do not). How does Windows inform the user that a hard drive failure is pending or occurred?
a: Upon bootstrap it may tell you that you have corrupt data and you have the option to try to repair it.
b: The system tells you that the file you are looking for is corrupt.
c: The system just reboots on you. (there are others)

I have never seen the Windows OS provide the user any detailed hard drive troubleshooting data. If you want that kind of data you must install a third party piece of software.

5. TrueNAS provides the user all the data required to diagnose the problem. You had a SMART Short Test failure on drive da10. This does not mean that you have corrupt data as previously explained by @danb35 but it does mean that more than likely the surface scan failed. So now you know you have this failure and which drive it was on. You have the options of replacing the failing drive or run a SMART Extended/Long test on the drive to verify the failure and confirm that replacing the drive is the best course of action.

I hope you can take away from this posting that TrueNAS is not a drive diagnostic program, it's just a very good (my opinion) and free NAS program.
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
Wow. Just wow.

I'll tell you, I want to say some unkind words but as a moderator here it's my job to try to keep the peace. Instead I will just lay out a few facts.

You're a joke man, calm it down.
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
Haha oh boy, if only I could look at the damn SMART information myself! Would have made this a 10 second task

1665929642815.png
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
Why do you "need" more information than "the drive failed its self-test"? Yes, other tools give more information. smartmontools doesn't give much detail with SAS drives; it does more with SATA. But you know that the drive failed its short self-test. You can (and I'd concur with @joeschmuck in recommending you do) run a long self-test and see what results that gives. Surely that's ample information to decide what to do with the drive.

So what? I've never understood why people think these two things are even remotely related--and it's surely not just you; you aren't the first by a long shot and certainly won't be the last. But it can't (or at least shouldn't) be surprising that a drive self-test would test (and fail on) a sector without data--and therefore ZFS wouldn't show any issue at all.

The fact that a tool built into a NAS operating system aimed at enterprise gives more information with SATA disks than SAS disks seems like an issue

Long test and it shows passed! Now I'm in a situation where I guess the drive is okay. If only it could just TELL ME WHATS WRONG or show me the SMART information

I've never seen people defend having limited information before. TrueNAS fanboys are a thing I suppose?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
The fact that a tool built into a NAS operating system aimed at enterprise gives more information with SATA disks than SAS disks seems like an issue

Long test and it shows passed! Now I'm in a situation where I guess the drive is okay. If only it could just TELL ME WHATS WRONG or show me the SMART information

I've never seen people defend having limited information before. TrueNAS fanboys are a thing I suppose?
Your attitude gets in the way of people answering your questions and you learning. You're the one who needs to cool it.

You can easily get all the information that SMART has by using smartctl.
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
My attitude? Maybe want to re-read the thread

You know you're in a joke of a place when the moderator of the forum doesn't like that I don't like how TrueNAS works (Or doesn't, rather)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
The problem I had was that you are expecting that TrueNAS should include Hard Drive Diagnostic Programs and then you named off some third party programs and the one screen shot you shared had the same diagnostic data as the smartctl data output. It wasn't because you just didn't like TrueNAS, a lot of people don't like it or dislike some of the features and feel they could be improved. There is a good diagnostic program included already and you used it and it told you that you had a Short test failure. You did run a Long/Extended test which passed.

What does this mean? To me it means I would run another Long test to verify the test passes and then run a Short test. If the Long tests passes then I'd say the hard drive media is good. If the Short test fails then it's possible you have a hardware issue. Maybe it's the armature, maybe it's a RAM test that fails, I don't know but it's a failure regardless and I'd recommend replacing the drive, but that is your decision.

You're a joke man, calm it down.
You know you're in a joke of a place when the moderator of the forum doesn't like that I don't like how TrueNAS works (Or doesn't, rather)
I don't know if you are having a bad day (possible) or you don't care if you project a bad attitude. This thread started out good but on posting #5 you started throwing TrueNAS under the bus and it escalated from there. I'm at fault for putting a little fuel on that fire because I feel TrueNAS does have the proper tools to diagnose a hard drive problem, especially the problem you experienced, but you apparently do not. If you truly feel that TrueNAS needs some improvements then I recommend you submit a recommendation to the developers. Maybe the developers will make the change(s) you submit.

Good luck
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
For those interested, this is what I was complaining about. Instantly seen by any other SMART monitoring software. So hard!

1666975048609.png


I get that you guys will defend TrueNAS no matter the issue, but I stand by that its crazy that it doesn't list this.

Annoyingly, if I'd know it was that error with such a low count, I'd have left it in the array!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'm a little perplexed, what program is in the screen capture? I'd like to possibly give it a try on my TrueNAS system. Also, could you post the output of smartctl -x /dev/dax as I'd like to see what is reporting the 13 write errors. That title alone in the screen capture does not tell me what SMART data created it.

And I will agree with you that TrueNAS should have informed you if there was a drive failure, especially 13 incidences.
 
Top