SMART Test on NVME SSD unavailable

Patrick_3000

Contributor
Joined
Apr 28, 2021
Messages
167
On my primary SCALE server, I have three pools, including two data pools plus the boot pool. They are: (1) a 3x10TB HDD mirror for datasets, (2) a 2x4TB NVME SSD mirror for VM ZVOLs, and (3) a 1x32GB SATA Dom SSD for boot.

I'm able to select any of the three HDDs or the SATA Dom SSD for running SMART tests through the web UI, but the web UI does not give the option of selecting either of the NVME SSDs for SMART tests.

I've searched and seen others talking about this in other threads but no definitive answer regarding whether SCALE officially does not support SMART tests for NVME SSDs. Does anyone know anything about this and, if NVME SSDs cannot be chosen for SMART tests, know whether there is a workaround? Perhaps at the CLI over ssh?

In case it's relevant, my CPU is a Ryzen 5 Pro 4650G, with 128 GB ECC RAM, and an ASROCK Rack x570d4u-2L2T motherboard.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Well the issue is that smartmontools can only support what the manufacturers provide. You can open the Shell window and enter smartctl -x /dev/ada1 where ada0 is the drive ident, in scale it would be 'sda' for example. This will show you the available data. Below is some data I received from a buddy:

Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.107+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE21D280GA
Serial Number:                      PHM274350029098AGN
Firmware Version:                   E2010480
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          280,065,171,456 [280 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Aug 16 00:18:56 2023 BST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Log Page Attributes (0x0a):         Cmd_Eff_Lg Telmtry_Lg
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    18.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        39 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    2,320,860 [1.18 TB]
Data Units Written:                 295,724,912 [151 TB]
Host Read Commands:                 47,814,770
Host Write Commands:                2,885,920,626
Controller Busy Time:               1,270
Power Cycles:                       190
Power On Hours:                     27,950
Unsafe Shutdowns:                   59
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          1    10       -  0xc00c      -            0     -     -



But to answer your question, No, SMART Testing is not supported on the NVMe I have seen the data for, meaning if you try to test it, the result is basically "Okay" and then it exits. The NVMe module does not support it as we know right now. You can use Multi-Report (see link below) to monitor the NVMe drives, it will provide you the following:
Device ID, Serial Number, Model Number, Capacity, SMART Status, Critical Warning, Current Temp, Power On Time, Wear Level, Media Errors, and Total Data Written. These are the only relevant data I could pull out of what smartctl is able to report. I have no idea what TrueNAS will support with respect to NVMe only because I do not have a NVMe to test with.

Hope this helps explain things.
 

Patrick_3000

Contributor
Joined
Apr 28, 2021
Messages
167
Well the issue is that smartmontools can only support what the manufacturers provide. You can open the Shell window and enter smartctl -x /dev/ada1 where ada0 is the drive ident, in scale it would be 'sda' for example. This will show you the available data. Below is some data I received from a buddy:

Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.107+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE21D280GA
Serial Number:                      PHM274350029098AGN
Firmware Version:                   E2010480
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          280,065,171,456 [280 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Aug 16 00:18:56 2023 BST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Log Page Attributes (0x0a):         Cmd_Eff_Lg Telmtry_Lg
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    18.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        39 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    2,320,860 [1.18 TB]
Data Units Written:                 295,724,912 [151 TB]
Host Read Commands:                 47,814,770
Host Write Commands:                2,885,920,626
Controller Busy Time:               1,270
Power Cycles:                       190
Power On Hours:                     27,950
Unsafe Shutdowns:                   59
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          1    10       -  0xc00c      -            0     -     -



But to answer your question, No, SMART Testing is not supported on the NVMe I have seen the data for, meaning if you try to test it, the result is basically "Okay" and then it exits. The NVMe module does not support it as we know right now. You can use Multi-Report (see link below) to monitor the NVMe drives, it will provide you the following:
Device ID, Serial Number, Model Number, Capacity, SMART Status, Critical Warning, Current Temp, Power On Time, Wear Level, Media Errors, and Total Data Written. These are the only relevant data I could pull out of what smartctl is able to report. I have no idea what TrueNAS will support with respect to NVMe only because I do not have a NVMe to test with.

Hope this helps explain things.
Interesting, because when I try that smartctl command in a terminal over ssh and specify either of the NVME drives, SMART appears to provide some useful information (see below). Therefore, it seems that SMART has the ability to monitor NVME drives at least to some extent, but it's not accessible through SCALE's web UI, at least in Bluefin, which I'm using. Maybe when Cobia comes out in a few weeks, it will be different.

Here is part of the output for one of the NVME drives:

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 22 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 2%
Data Units Read: 14,051,591 [7.19 TB]
Data Units Written: 26,247,096 [13.4 TB]
Host Read Commands: 86,317,513
Host Write Commands: 251,249,589
Controller Busy Time: 798
Power Cycles: 94
Power On Hours: 5,723
Unsafe Shutdowns: 48
Media and Data Integrity Errors: 0
Error Information Log Entries: 157
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 22 Celsius
Temperature Sensor 2: 28 Celsius
Temperature Sensor 8: 22 Celsius

Error Information (NVMe Log 0x01, 16 of 16 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 157 0 0x6014 0x4005 0x028 0 0 -
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Your original question was:
I'm able to select any of the three HDDs or the SATA Dom SSD for running SMART tests through the web UI, but the web UI does not give the option of selecting either of the NVME SSDs for SMART tests.
I just told you that you cannot run a SMART test on the NVMe. This is not a TrueNAS limitation. Try to run a SMART test from any other operating system, it will not run. At most it will tell you it opened the drive which means it can talk to the drive, but that isn't very useful for this purpose.
Therefore, it seems that SMART has the ability to monitor NVME drives at least to some extent, but it's not accessible through SCALE's web UI, at least in Bluefin, which I'm using. Maybe when Cobia comes out in a few weeks, it will be different.
Monitoring some provided data is not the same as running a SMART test. This is two completely separate operations using the same smartmontools software. This is why it would not be an available option. The data you provided above is the output of a SMART status request in which some data is available, but none related to performing a SMART test. But we could split hairs and say that the drive performs it's own power on SMART test to test basic functions and that is the result of the Overall SMART Health result. This is not the same as a full blown test of reading all the memory locations, however Critical Warning monitors the entire NVMe for operating properly, you can think of it as an always running SMART Test if you like but it's not exactly the same thing. Any critical problems with the NVMe will be reported via this value. The normal SMART test will perform the basic operation checks and then a read media scan, a short test reads a few location on the disk while a long test read the entire disk surface, or all the media locations.

In TrueNAS I would expect there to be reports of drive temperature (you should be able to see in the GUI now by either navigating to the proper GUI webpage or going directly to the NAS ip_address/reportsdashboard/disk, Critical Warning, Overall SMART Health, what I hope is also monitored is Media and Data Integrity Errors and Error Information Log Entries, but the last two I have not heard of being reported here in the forums, but I don't see all the forum postings.

If you cannot see the drive temperature in the GUI, I would be interested in knowing that and I would propose you submit a bug report for that problem as the NVMe drive temp should be in the GUI (my opinion only).

I hope what I typed is easy to understand. I wanted to get into more details but that would be for another day.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
I'm able to select any of the three HDDs or the SATA Dom SSD for running SMART tests through the web UI, but the web UI does not give the option of selecting either of the NVME SSDs for SMART tests.
smartctl 7.4 added support for NVMe self tests so as long as there is something earlier than that on scale (or core) it is unlikely that the GUI will support it since that is what it uses to run tests on any other sort of drive. Of course this also depends on the drive supporting self tests as well, which is optional as of NVMe 1.3.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
smartctl 7.4 added support for NVMe self tests so as long as there is something earlier than that on scale (or core) it is unlikely that the GUI will support it since that is what it uses to run tests on any other sort of drive. Of course this also depends on the drive supporting self tests as well, which is optional as of NVMe 1.3.
Thank you for that information, I have to eat my words. I downloaded version 7.4 and kicked off a SMART test on a NVMe. Time to update the script to support it. I really appreciate the posting.

So the current version of SCALE does not have the new smartmontools version, I'm sure it will in the future.
 

Patrick_3000

Contributor
Joined
Apr 28, 2021
Messages
167
Thanks, everyone. That provides clarification. SCALE Cobia is in Beta and is supposed to be finalized before the end of 2023. I did a quick search to see if I could easily determine whether it supports smartctl 7.4 but didn't find anything. I guess I could install the Beta version in a VM and check, but it's not a big enough deal to me that I'm going to spend the time doing that. I'll wait until it's in production and see.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994

Patrick_3000

Contributor
Joined
Apr 28, 2021
Messages
167
I installed the Beta version of SCALE Cobia in a VM because I was curious. It turns out that Cobia, at least the current Beta, has smartctl 7.3, not 7.2 that's on Bluefin but not 7.4 that is apparently required for NVME support. Maybe 7.4 will make it into the final version. I'm not sophisticated enough to understand development on Github, but perhaps dak180 is already on the case.
 
Top