Reading S.M.A.R.T results

Snake3y3s · Jan 12, 2021

Small question/clarification.

Was reading through this resource (really great one) on SMART and how what to look out for.

My only question is that when reading the SMART information of the drive that gets output when using smartctl –a /dev/ada0

what am I reading in the chart? am I looking at the VALUE or the RAW_VALUE?? which is indicative of which?

Heracles · Jan 12, 2021

Hi,

You need to look at the RAW_Value. Like for test No5, you should have a RAW_Value of 0.

Snake3y3s · Jan 12, 2021

Thank you... but then i am confused :(

I did a long test and these are the results:

Code:

/$ smartctl -a /dev/ada3
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N3EYE1Y3
LU WWN Device Id: 5 0014ee 2b9970486
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Jan  9 13:12:38 2021 CAT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 113)    The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline
data collection:         (40980) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 411) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x703d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       4
  3 Spin_Up_Time            0x0027   181   178   021    Pre-fail  Always       -       5908
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       304
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   001   001   000    Old_age   Always       -       167103
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       7132
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       304
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       38
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       372
194 Temperature_Celsius     0x0022   114   110   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       5

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       10%      7130         3646336664
# 2  Short offline       Completed without error       00%      7111         -
# 3  Extended offline    Completed without error       00%      7105         -
# 4  Extended offline    Interrupted (host reset)      90%      6907         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

All of what was indicated in the article seems fine, but the drive is still accessing at 100% (when looking in the reports at "Drive Busy") and right at the end it says read failure.

I have turned the NAS off for now until i can get a replacement drive, just in case, but I need to know if the drive is indeed going to fail, just in case i need to get something off the NAS before I can replace the drive

Heracles · Jan 12, 2021

For a Western Digital HD to have that many Seek Errors, it usually means that this drive is not in good shape anymore... The drive is still clear on metrics No 5, 197 and 198... So the drive is having a hard time to properly positioned its heads at the right place on the disk (seek errors). Because of that, it must keep trying again and again before getting it right.

I do not have WD drives myself, so do not know much more about how they behave. What I do know is that this metric is not measured the same way in Seagate drives like the IronWolf I use here.

Snake3y3s said:
have turned the NAS off for now until i can get a replacement drive, just in case,

Know that this is not always the best idea. To let the drive cool down and park for a long time may lead them to never restart again... Redundancy, cold spares and backups is what will protect you better against such a case.

Here, I ended up with a failed smart test before Christmas. I replaced the drive with my cold spare, so the pool is back healthy. Any drive can fail and redundancy will keep my server up. Should a second drive fail, I will still be OK as long as it is in a different mirror. Should a complete mirror fail, then I will have to re-transfer my data from the snapshots I sent to my 2 other servers. Should I end up with a degraded pool and no more spares on site, I would keep Thanatos Online all the time to ensure the latest snapshots are available onsite. Once the pool is back to healthy and spares are in, Thanatos would go back to sleep.

In all cases, do your backups now if you do not already have any. Never too soon to make some backup, but often people try to do them too late.

Snake3y3s · Jan 12, 2021

I need to go get another drive to be able to get backups (yes I know... always have backups... but cash is king and I'm a beggar)

So... that does leave me with another conundrum. Do you suggest I go out and get a new replacement drive in the NAS (when I get paid) or should I go get a 10TB external (costs about the same as a WD RED) and do the backup now, then wait till next month to get another replacement drive?

Heracles · Jan 13, 2021

This is down to a too personnal situation for me to answer that one. You know the value of these data, health and lives are more important than anything else.... Up to you to make such a personnal decision.

Important Announcement for the TrueNAS Community.

Reading S.M.A.R.T results

Snake3y3s

Explorer

Heracles

Wizard

Snake3y3s

Explorer

Heracles

Wizard

Snake3y3s

Explorer

Heracles

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

Reading S.M.A.R.T results

Snake3y3s

Explorer

Heracles

Wizard

Snake3y3s

Explorer

Heracles

Wizard

Snake3y3s

Explorer

Heracles

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Reading S.M.A.R.T results"

Similar threads