SAS drive SMART question

Kilokk

Cadet
Joined
Feb 19, 2024
Messages
5
So I've been looking around the forum (and google) and I've seen some old threads relating to this but I'm unsure if they're still relevant. From what I've read SMART data isn't very accurate on SAS drives. Is that still the case?

Basically my setup is in a VM through ESXI with 4 cores allocated to the TrueNAS Scale VM. My issue is that 4 of the 5 drives drives failed a GUI SMART long test. One drive is faulty, but the other 3 all have this same error from the SMART data
# 1 Background long Failed in segment --> 6 65535 6417431600 [0x3 0x16 0x0]
Since every drive failed at the same segment, does that indicate there is an issue with my server and not the drives, or is it a coincidence and they're all on their way out?

Full results of the SMART tests

Code:
=== START OF INFORMATION SECTION ===
Vendor:               WD
Product:              WD4001FYYG-01SL3
Revision:             VRA9
Compliance:           SPC-4
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x50000c0f01e055a4
Serial number:        WMC1F1678693
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Thu Feb 22 21:50:49 2024 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     40 C
Drive Trip Temperature:        40 C

Accumulated power on time, hours:minutes 81963:00
Manufactured in week 48 of year 2013
Specified cycle count over device lifetime:  1048576
Accumulated start-stop cycles:  59
Specified load-unload count over device lifetime:  1114112
Accumulated load-unload cycles:  560
Elements in grown defect list: 53

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:    3017697    12304     24877   3030001      12340     338644.706          35
write:   6776911   115439    115543   6892350     115443     113104.558           0
verify:        7        0         0         7          0          1.175           0

Non-medium error count:     2265

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Failed in segment -->       6   65535        6856968932 [0x3 0x11 0x0]
# 2  Background long   Aborted (device reset ?)    -   65535                 - [-   -    -]
# 3  Background long   Failed in segment -->       6   65535        6856968932 [0x3 0x11 0x0]
# 4  Background short  Completed                   -   65535                 - [-   -    -]
# 5  Background short  Aborted (by user command)   -      58                 - [-   -    -]

Long (extended) Self-test duration: 31120 seconds [8.6 hours]












admin@truenas[~]$ sudo smartctl -a /dev/sdc |more
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.63-production+truenas] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WD
Product:              WD4001FYYG-01SL3
Revision:             VRA9
Compliance:           SPC-4
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x50000c0f0129b61c
Serial number:        WMC1F1527782
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Thu Feb 22 21:51:00 2024 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     39 C
Drive Trip Temperature:        40 C

Accumulated power on time, hours:minutes 81970:58
Manufactured in week 49 of year 2013
Specified cycle count over device lifetime:  1048576
Accumulated start-stop cycles:  56
Specified load-unload count over device lifetime:  1114112
Accumulated load-unload cycles:  531
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:    6493553    32327     35315   6525880      32327     332248.186           0
write:   8674701    32322     32351   8707023      32322     114587.892           0
verify:        0        0         0         0          0          1.220           0

Non-medium error count:     2353

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   65535                 - [-   -    -]
# 2  Background short  Completed                   -   65535                 - [-   -    -]
# 3  Background short  Aborted (by user command)   -      67                 - [-   -    -]

Long (extended) Self-test duration: 31120 seconds [8.6 hours]












admin@truenas[~]$ sudo smartctl -a /dev/sdd |more
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.63-production+truenas] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WD
Product:              WD4001FYYG-01SL3
Revision:             VRA9
Compliance:           SPC-4
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x50000c0f01e139d0
Serial number:        WMC1F1742086
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Thu Feb 22 21:51:14 2024 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     36 C
Drive Trip Temperature:        40 C

Accumulated power on time, hours:minutes 81957:12
Manufactured in week 50 of year 2013
Specified cycle count over device lifetime:  1048576
Accumulated start-stop cycles:  55
Specified load-unload count over device lifetime:  1114112
Accumulated load-unload cycles:  428
Elements in grown defect list: 2

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:    5135024    54621     99850   5189645      54649     329897.037          24
write:   8025664   206852    206960   8232516     206852     120589.987           0
verify:        0        0         0         0          0          0.000           0

Non-medium error count:     2404

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Failed in segment -->       6   65535        6417431600 [0x3 0x16 0x0]
# 2  Background short  Completed                   -   65535                 - [-   -    -]
# 3  Background short  Aborted (by user command)   -      52                 - [-   -    -]

Long (extended) Self-test duration: 31120 seconds [8.6 hours]





admin@truenas[~]$ sudo smartctl -a /dev/sde |more
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.63-production+truenas] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WD
Product:              WD4001FYYG-01SL3
Revision:             VRA9
Compliance:           SPC-4
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x50000c0f01e05540
Serial number:        WMC1F1663425
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Thu Feb 22 21:51:31 2024 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     40 C
Drive Trip Temperature:        40 C

Accumulated power on time, hours:minutes 81702:42
Manufactured in week 48 of year 2013
Specified cycle count over device lifetime:  1048576
Accumulated start-stop cycles:  87
Specified load-unload count over device lifetime:  1114112
Accumulated load-unload cycles:  73
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:    5095331    49657     70531   5144988      49662     344067.051           5
write:   7298915   507957    507971   7806872     507957     117777.114           0
verify:        2        0         0         2          0          0.000           0

Non-medium error count:     2216

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Failed in segment -->       6   65535         789824151 [0x3 0x11 0x0]
# 2  Background short  Completed                   -   65535                 - [-   -    -]
# 3  Background short  Aborted (by user command)   -      72                 - [-   -    -]

Long (extended) Self-test duration: 31120 seconds [8.6 hours]









admin@truenas[~]$ sudo smartctl -a /dev/sdf |more
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.63-production+truenas] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WD
Product:              WD4001FYYG-01SL3
Revision:             VRA9
Compliance:           SPC-4
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x50000c0f01e04e0c
Serial number:        WMC1F1680307
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Thu Feb 22 21:52:08 2024 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     39 C
Drive Trip Temperature:        40 C

Accumulated power on time, hours:minutes 81963:56
Manufactured in week 48 of year 2013
Specified cycle count over device lifetime:  1048576
Accumulated start-stop cycles:  58
Specified load-unload count over device lifetime:  1114112
Accumulated load-unload cycles:  577
Elements in grown defect list: 6

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:    4036385    77202    301945   4113587      77317     322281.851         113
write:   7615701   517023    517040   8132724     517025     121439.462           4
verify:        5      164      3202       169        168          0.000           0

Non-medium error count:     1912

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Failed in segment -->       6   65535         491530903 [0x3 0x11 0x0]
# 2  Background short  Completed                   -   65535                 - [-   -    -]
# 3  Background short  Aborted (by user command)   -      59                 - [-   -    -]

Long (extended) Self-test duration: 31120 seconds [8.6 hours]



As you can see these are old worn out drives. They were a freebie from a friend who got them with a server he purchased. He said they were untested and I should assume they wouldn't even spin up, but if they do I could use them to get my server up and running. The plan is to buy some better drives one or two at a time to replace these drives with something newer. The last one is showing as faulted under the storage dashboard after a scrub, so clearly that's the first to replace. Assuming that the SMART information is correct and 4/5 are bad, are there any that are worse than others? Nothing that isn't backed up is on the drives, just a few things to test a plex server so if 2 drives fail before I get new drives then so be it. I'd just rather avoid that if at all possible.

FWIW I also installed Scrutiny from Truecharts and that shows all the drives, including the faulted one, as passing. I'm assuming that isn't accurate.

Relevant Hardware
  • Asus 2U server ESC4000 G3
  • Intel Xeon E5-2620 v4
  • 16GB ECC RAM
  • 5x 4TB WD4001FYYG-01SL3 in RAIDZ1 connected to a PERC H330 and passed through to TrueNAS.
 
Last edited:

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
I have no experience if smart data is not accurate for SAS drives, haven't heard that yet.

The drives are 10 years old, the power on hours equate to 9.5 years. You are getting errors.

Honestly, even if the data on it is not mission critical, this is where I would abort that adventure and just purchase new drives. This will not be worth the headache in the long run in my opinion.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
  • 5x 4TB WD4001FYYG-01SL3 in RAIDZ1 connected to a PERC H330 and passed through to TrueNAS.
Did you pass through the HBA (which firmware?) in its entirety or the individual drives? The latter is known to not work.

Given the higher flexibility, I would consider using mirrors over RAIDZ1.

Also, please check out the recommended readings in my signature. The will likely cover all your questions.

Lastly, what is your use-case and overall hardware setup (in all its details)?
 

Kilokk

Cadet
Joined
Feb 19, 2024
Messages
5
Did you pass through the HBA (which firmware?) in its entirety or the individual drives? The latter is known to not work.

Given the higher flexibility, I would consider using mirrors over RAIDZ1.

Also, please check out the recommended readings in my signature. The will likely cover all your questions.

Lastly, what is your use-case and overall hardware setup (in all its details)?
I‘m not sure what firmware it’s actually on, but I passed the controller through to the VM through ESXI. Based on my (admittedly minimal) research, this card is in IT mode by default, but is there a way to check?

Wouldn’t mirrors effectively cut my total storage size in half? From what I saw when I was researching which type to use if I populated all 8 bays on my server I would have the capacity of 4 of them. I was strongly considering just yoloing it and just setting my pool up as striped because I didn’t want to lose the capacity of 1 drive. Had to hold my nose to do it as is. The only thing that would happen if every single drive failed is that I’d have to redownload things that are readily and easily available.

I think I’m spending the weekend reading up on all of that! Thank you very much!

My use-case is basically just to learn the platform. I’m also using it to have Plex running somewhere where I don’t have my PC on at all times so my wife can just watch whatever if I’m not home. I am also throwing Time Machine backups from my old and barely used MacBook on there.

As far as the hardware setup I’m not sure what else there is to tell that’s relevant, but I will try to provide as much detail as possible. The server itself is old enterprise hardware. What’s available to the TrueNas VM is in the OP but I’ll give the full hardware of the server, if that’s what you mean. If you need/want to know anything else let me know!


  • Asus 2U server ESC4000 G3 in a stock configuration
  • 2x Intel Xeon E5-2620 v4
  • 64GB ECC RAM
  • 5x 4TB WD4001FYYG-01SL3 in RAIDZ1 connected to a PERC H330 which is passed through to TrueNAS.
  • 2 SSDs connected to PCI-e slots on the board running my VMs, including TrueNAS itself
  • All running under ESXI
 
Last edited:

Kilokk

Cadet
Joined
Feb 19, 2024
Messages
5
I have no experience if smart data is not accurate for SAS drives, haven't heard that yet.

The drives are 10 years old, the power on hours equate to 9.5 years. You are getting errors.

Honestly, even if the data on it is not mission critical, this is where I would abort that adventure and just purchase new drives. This will not be worth the headache in the long run in my opinion.
Everything I’ve read here says that SAS drives don’t report SMART data the same way SATA does, so I took that to be that it can be a bit off or incomplete.

And the problem is ultimately funding. Drives are cheaper now, but my budget is still going to be for used old enterprise drives. Capacity is what I’m going for, reliability is secondary for me. I’ll be miffed if I lose data, but nothing on the server will be hard to redownload. Nothing precious or irreplaceable is going on there until I can actually afford new drives and put them in place, and even then I’ll have all of it backed up elsewhere.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Wouldn’t mirrors effectively cut my total storage size in half? From what I saw when I was researching which type to use if I populated all 8 bays on my server I would have the capacity of 4 of them. I was strongly considering just yoloing it and just setting my pool up as striped because I didn’t want to lose the capacity of 1 drive.
Yes, mirrors introduce 50 % parity. As Chris pointed out they are more flexible (you can just add another mirror Vdev, instead of replacing all your Raidz drives at once if you need more capacity) and additionally you gain performance because you stripe them. Question is: do you need that extra performance/flexibility?
If you considered YOLO raidz1 is definitely viable for you, since you know the risk involved.
 

nabsltd

Contributor
Joined
Jul 1, 2022
Messages
133
Everything I’ve read here says that SAS drives don’t report SMART data the same way SATA does, so I took that to be that it can be a bit off or incomplete.
Technically, SAS drives in SAS mode do not support SMART at all, since SMART is a standard, and SAS drives don't report it when in SAS mode. smartctl will use the correct API to query SAS drives and convert the output to look like SMART, but it's not really SMART.

OTOH, if you connect an SAS drive to an SATA-only data connection, the SAS drive will fall back to SATA mode and will then respond to SMART API queries.

You can see this if you use a tool like HDSentinel, where a drive with true SMART support will show the SMART value ID number and the data as both the raw vendor-specific and interpreted mode, while SAS does not show this, because it doesn't use the same "query ID number X" API as SMART, and returns values that are not vendor-specific, but are instead standard for the SAS protocol.
 
Top