NVMe Disks & SMART

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I use a script from @Spearfoot to email me a smart report every day for all my discs (SATA & SAS). It doesn't however recognize NVMe.
I have 2 optane devices which unless I go into the terminal manually - I can see nothing.

I have modified the script to include NVMe drives - but I can only test it on my two optanes.

Can anyone with an NVMe drive directly accessible via TrueNAS please send me the output of smartctl -a /dev/nvme(x) for different NVMe drives so I can see if this might work or not.

Thanks

Sean
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Code:
root@freenas[~]# nvmecontrol devlist   
 nvme0: INTEL MEMPEK1J032GA
    nvme0ns1 (27905MB)
 nvme1: Samsung SSD 970 EVO Plus 1TB
    nvme1ns1 (953869MB)
 nvme2: Samsung SSD 970 EVO Plus 1TB
    nvme2ns1 (953869MB)
root@freenas[~]# smartctl -a /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL MEMPEK1J032GA
Serial Number:                      ****************
Firmware Version:                   K4110400
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          29,260,513,280 [29.2 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            5cd2e4 d5ebd00100
Local Time is:                      Wed Aug 18 14:11:21 2021 CEST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0046):     Wr_Unc DS_Mngmt Timestmp
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     2.80W       -        -    0  0  0  0  1000000   30000
 1 +     2.20W       -        -    0  1  0  1  1000000   30000
 2 +     1.80W       -        -    0  2  0  2  1000000   30000
 3 -   0.0080W       -        -    0  0  0  0  1150000   30000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        52 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    1%
Data Units Read:                    3,310 [1.69 GB]
Data Units Written:                 9,235,495 [4.72 TB]
Host Read Commands:                 88,873
Host Write Commands:                125,792,414
Controller Busy Time:               0
Power Cycles:                       14
Power On Hours:                     10,692
Unsafe Shutdowns:                   2
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

root@freenas[~]# smartctl -a /dev/nvme1
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB
Serial Number:                      ****************
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            519,938,818,048 [519 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 57019d11cf
Local Time is:                      Wed Aug 18 14:11:23 2021 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.80W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     3.40W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0100W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        43 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    5%
Data Units Read:                    10,472,704 [5.36 TB]
Data Units Written:                 143,000,335 [73.2 TB]
Host Read Commands:                 187,930,328
Host Write Commands:                2,642,888,506
Controller Busy Time:               49,085
Power Cycles:                       11
Power On Hours:                     8,726
Unsafe Shutdowns:                   7
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               43 Celsius
Temperature Sensor 2:               49 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

root@freenas[~]# smartctl -a /dev/nvme2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB
Serial Number:                      ****************
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            519,413,784,576 [519 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 57019d11d0
Local Time is:                      Wed Aug 18 14:11:24 2021 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.80W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     3.40W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0100W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        42 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    5%
Data Units Read:                    11,662,900 [5.97 TB]
Data Units Written:                 142,320,767 [72.8 TB]
Host Read Commands:                 203,710,455
Host Write Commands:                2,576,840,674
Controller Busy Time:               49,772
Power Cycles:                       13
Power On Hours:                     8,729
Unsafe Shutdowns:                   6
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               42 Celsius
Temperature Sensor 2:               49 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Thank you - that ought to work - it seems to have all the information I use in roughly the same place with the same key words

And I noticed I was using two figures wrongly - but that ought to be an easy fix and based on a sample of 2 users with 3 different devices its looking good

Sean

Code:
smartctl -a /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE21D280GA                              *** I use this
Serial Number:                      **************************                               *** I use this
Firmware Version:                   E2010325
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       <1.2                 *** I use this
Number of Namespaces:               1
Namespace 1 Size/Capacity:          280,065,171,456 [280 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Aug 18 13:19:08 2021 BST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +    18.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         2

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        39 Celsius
Available Spare:                    100%                                                         *** I use this
Available Spare Threshold:          0%
Percentage Used:                    0%                                                          *** I use this
Data Units Read:                    752,734 [385 GB]                                     *** I use this (wrongly - needs fixing)
Data Units Written:                 51,416,073 [26.3 TB]                               *** I use this (wrongly - needs fixing)
Host Read Commands:                 26,493,432
Host Write Commands:                425,453,768
 
Last edited:

dak180

Patron
Joined
Nov 22, 2017
Messages
310
I use a script from @Spearfoot to email me a smart report every day for all my discs (SATA & SAS). It doesn't however recognize NVMe.
I have 2 optane devices which unless I go into the terminal manually - I can see nothing.
How does it compare to my modified script?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Oh Poo - have you already done this?

Lol
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I can't get yours to work. It just says bad interpreter: /bin/bash^M. I am probably missing something obvious and very stupid

-rwxrwxrwx+ 1 root wheel uarch 336 Apr 18 14:23 churn_Archive_dataset.sh* -rwxrwxrwx+ 1 root wheel uarch 313 Apr 18 14:23 churn_Chaos_dataset.sh* -rwxrwxrwx+ 1 root wheel uarch 327 Apr 18 14:23 churn_Common_dataset.sh* -rwxrwxrwx+ 1 root wheel uarch 345 Apr 18 14:23 churn_Download_dataset.sh* -rwxrwxrwx+ 1 root wheel uarch 336 Apr 18 14:23 churn_Install_dataset.sh* -rwxrwxrwx+ 1 root wheel uarch 372 Apr 18 14:23 churn_KnightGroup_dataset.sh* -rwxrwxrwx+ 1 root wheel uarch 318 Apr 18 14:23 churn_Media_dataset.sh* -rwxrwxrwx+ 1 root wheel uarch 309 Apr 18 14:23 churn_Plex_dataset.sh* -rwxrwxrwx+ 1 root wheel uarch 362 Apr 18 14:23 churn_Plex-Music_dataset.sh* drwxrwxrwx+ 2 root wheel uarch 27 Aug 18 03:00 configs/ drwxrwxrwx+ 2 root wheel uarch 16 Aug 18 00:00 data/ -rwxrwxrwx+ 1 root wheel uarch 12006 Jun 10 2020 disk_burn.sh* -rwxrwxrwx+ 1 root wheel uarch 12011 May 9 20:35 disk_test.sh* -rwxrwxrwx+ 1 root wheel uarch 4771 Mar 14 10:54 get_hdd_temp.sh* -rwxrwxrwx+ 1 root wheel uarch 114 Nov 18 2020 mount_at_start.sh* -rwxrwxrwx+ 1 root wheel uarch 2185 Aug 18 15:33 nvme_out* -rwxrwxrwx+ 1 root wheel uarch 70549 Aug 18 15:56 report.sh* -rwxrwxrwx+ 1 root wheel uarch 894 Jun 5 19:58 rotate_temp_log.sh* -rwxrwxrwx+ 1 root wheel uarch 1657 May 13 2020 set_hdd_erc.sh* -rwxrwxrwx+ 1 root wheel uarch 12296 Aug 18 15:47 smart_report.sh* -rwxrwxrwx+ 1 root wheel uarch 9616 Mar 14 02:29 spindowntimer.sh* -rwxrwxrwx+ 1 root wheel uarch 176 Jun 17 2020 temp_log.sh* -rwxrwxrwx+ 1 root wheel uarch 3073 May 13 02:09 ups_report.sh* -rwxrwxrwx+ 1 root wheel uarch 5429 Nov 18 2020 zpool_report.sh*
 
Last edited:

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
There is no /bin/bash on FreeBSD. Change it to /bin/sh or if the script relies on bash'isms, then /usr/local/bin/bash ...
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Still doesn't work - now comes up with
Code:
./report.sh: line 3: $'\r': command not found
./report.sh: line 5: $'\r': command not found
./report.sh: line 10: $'\r': command not found
./report.sh: line 13: $'\r': command not found
./report.sh: line 16: $'\r': command not found
./report.sh: line 70: $'\r': command not found
./report.sh: line 71: $'\r': command not found
./report.sh: line 73: syntax error near unexpected token `$'{\r''
'/report.sh: line 73: `function rpConfig () {


I did download v1.3 accidentally (don't ask) - which did work (and looks pretty) - but 1.7 just fails and I do not know enough to know why
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Looks like bash specific code - did you try #!/usr/local/bin/bash?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Yes

#!/usr/local/bin/bash
# shellcheck disable=SC1004,SC2236

#set -euxo pipefail

###### ZPool, SMART, and UPS Status Report with TrueNAS Config Backup
### Original Script By: joeschmuck
### Modified By: bidelu0hm, melp, fohlsso2, onlinepcwizard, ninpucho, isentropik, dak180
### Last Edited By: dak180

### At a minimum, enter email address and set defaultFile to 0 in the config file.
### Feel free to edit other user parameters as needed.

### Current Version: v1.7
### https://github.com/dak180/FreeNAS-Report
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
OK - sorry, no idea.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Thanks anyway - hopefully @dak180 will come along at some point.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Another question - the @Spearfoot script is nearly done with one small issue that is confusing me. Scripting is NOT my strong point


And this is what I am doing - I have not tried to remove the [ or the ] yet

Code:
for drive in $NVME_list; do
    (
    devid=$(basename "$drive")
    "$smartctl" -a "$drive" | \
    awk -v device=$devid \ '
    /Serial Number:/{serial=$3}
    /Temperature:/{temp=$2} \
    /Available Spare:/{avail_spare=$3} \
    /Percentage Used:/{perc_used=$3} \
    /Data Units Read:/{data_read=$5} \
    /Data Units Read:/{data_read_unit=$6} \
    /Data Units Written:/{data_written=$5} \
    /Data Units Written:/{data_written_unit=$6} \
    END {
    data_read="${data_read} ${data_read_unit}"
    data_written="${data_written} ${data_written_unit}"
    printf "|%-6s|%-24s|%-4s|%-9s%|%-4s|%-11s|%-12s|\n",device,serial,temp,avail_spare,perc_used,data_read,data_written;
    }'
    ) >> "$logfile"
  done
  (
   echo "+------+------------------------+----+---------+----+-----------+------------+"
  ) >> "$logfile"
fi


ata_read does contain the value: [363
data_read_units does contain the value: GB]
What I need to do is simply concatenate the two values (possibly adding a space) and then remove the [ and the ] from the total.
I have tried to do this just before the printf - but nothing I have tried works

What I actually get displayed by the printf in the appropriate locations is:
+------+------------------------+----+---------+----+-----------+------------+
|nvme0 |******************** |33 |100% |0% |${data_read} ${data_read_unit}|${data_written} ${data_written_unit}|
|nvme1 |******************** |37 |100% |0% |${data_read} ${data_read_unit}|${data_written} ${data_written_unit}|
+------+------------------------+----+---------+----+-----------+------------+
 
Last edited:

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Another question - the @Spearfoot script is nearly done with one small issue that is confusing me. Scripting is NOT my strong point


And this is what I am doing - I have not tried to remove the [ or the ] yet

Code:
for drive in $NVME_list; do
    (
    devid=$(basename "$drive")
    "$smartctl" -a "$drive" | \
    awk -v device=$devid \ '
    /Serial Number:/{serial=$3}
    /Temperature:/{temp=$2} \
    /Available Spare:/{avail_spare=$3} \
    /Percentage Used:/{perc_used=$3} \
    /Data Units Read:/{data_read=$5} \
    /Data Units Read:/{data_read_unit=$6} \
    /Data Units Written:/{data_written=$5} \
    /Data Units Written:/{data_written_unit=$6} \
    END {
    data_read="${data_read} ${data_read_unit}"
    data_written="${data_written} ${data_written_unit}"
    printf "|%-6s|%-24s|%-4s|%-9s%|%-4s|%-11s|%-12s|\n",device,serial,temp,avail_spare,perc_used,data_read,data_written;
    }'
    ) >> "$logfile"
  done
  (
   echo "+------+------------------------+----+---------+----+-----------+------------+"
  ) >> "$logfile"
fi


ata_read does contain the value: [363
data_read_units does contain the value: GB]
What I need to do is simply concatenate the two values (possibly adding a space) and then remove the [ and the ] from the total.
I have tried to do this just before the printf - but nothing I have tried works

What I actually get displayed by the printf in the appropriate locations is:
+------+------------------------+----+---------+----+-----------+------------+
|nvme0 |******************** |33 |100% |0% |${data_read} ${data_read_unit}|${data_written} ${data_written_unit}|
|nvme1 |******************** |37 |100% |0% |${data_read} ${data_read_unit}|${data_written} ${data_written_unit}|
+------+------------------------+----+---------+----+-----------+------------+
You could try stripping out special characters ('[ 'and ']') with sed. Here's an example from elsewhere in the script:
Code:
dfamily=$("$smartctl" -i "$drive" | grep "Model Family" | awk '{print $3, $4, $5, $6, $7}' | sed -e 's/[[:space:]]*$//')
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
There is no /bin/bash on FreeBSD.
Code:
$ which bash
/bin/bash

There may not be on FreeBSD but there is on TrueNAS.

I did download v1.3 accidentally (don't ask) - which did work (and looks pretty) - but 1.7 just fails and I do not know enough to know why
What exactly did you download and how did you invoke the script?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I followed the link you gave - copied the text into a report.sh on my TN box which has appropriate permissions and ran with ./report.sh
I have just repeated the process - and the file now runs (maybe I had some invisible characters in there??) but produces a "Please specify a config file location"

I have edited the User-Definable Parameters to add an email address and set defaultFile="0" and even a backupLocation - but have not reached that part of the script yet

Apparently I am meant to do ./report.sh -c report.config and I eventually get an email

Just noticed - No SAS drives in this report. The script just ignores them!
 
Last edited:

dak180

Patron
Joined
Nov 22, 2017
Messages
310
I have just repeated the process - and the file now runs (maybe I had some invisible characters in there??) but produces a "Please specify a config file location"
Copy and paste depending on how you do it may not preserve line ending correctly.

Apparently I am meant to do ./report.sh -c report.config and I eventually get an email
That is correct.

Just noticed - No SAS drives in this report. The script just ignores them!
This does not surprise me as I have no SAS drives to test against. If you want to help with this open an issue on github and I will go over the info I will need.
 
Top