HDD Identifiers following reboot

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I have started graphing my HDD temps. Not interested in SSD temps, just the HDD's. The issue is that every time I reboot the NAS (for say an OS upgrade) the disk identifiers change. For example it used to be that sdr was the hottest HDD except that sdr is now an Intel SSD.

So every time I reboot I have to work though the disks, noting which ones are HDD's and select them.
I am using a custom script from @sretalla (that I can barely read / understand) that directly posts temps to an influxdb database at the moment - I think I could use homeassistant but that will suffer from the same problem

Anyone got any ideas?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Not sure that much can be done about that, short of modifying the script to find the serial of the disk and use that as the "name" to log it.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
I'll ask you to replace the get_one_hd_temp with this and test it for me as my test rig is not up right now.

Code:
sub get_one_hd_temp
{
    my $disk_dev = shift;
    my @hdcommand = ($smartctlCmd, '-a', "/dev/$disk_dev");
    my $temp;
    my $serial;

    foreach (run_command(@hdcommand)) {
        chomp;
        #print $_;
        if (/Temperature_Celsius|Airflow_Temperature_Cel/) { $temp = (split)[9]; }
        if (/Serial/) { $serial = (split)[2]; }
    }
        #print $temp;
        if ($use_influx == 1 && $influx_disks == 1) { log_to_influx("DiskTemp", $serial, $temp);}
    return $temp;
}


Basics...

Changed to use -a instead of -A for smartctl, which gives the serial in the output.

Added a line in the check (which goes through the output line by line) to match on Serial and store that.

Changed the log to influx to use serial instead of disk_dev.
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I'll mail you the output - but only the NVMe drives being logged now
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Just to check, how quick did you pick it up?... I changed one line which would explain what you're seeing:

if (/Serial/) { $serial = (split)[2]; }

There was originally a 9 where there is now a 2...
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Yup - nailed it - I picked it up before the edit.
I'll let that run for a bit - but it looks at first glance +ve

It is still picking up 2 SSD's. The ancient SanDisk devices that don't give out good smart stats. That not an issue as they are both due for replacement when I get around to it. One is showing 8% wear left - so its due for retirement. I can also easily filter them out anyway

No its 7% now and the other is 35% (both going down quite quickly). But thats because the system dataset is on the bootpool (wonder why I did that)

Thank you
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Just to clarify what's going on there...

It's looking for "Serial" in the line, if found, takes the 2nd element of a split (based on spaces/tabs) to assign that to $serial... with 9 in that position, there was nothing there, so all the errors you were seeing were the result of that: calling the log_to_influx function with no value in the "name" position.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
It is still picking up 2 SSD's. The ancient SanDisk devices that don't give out good smart stats. That not an issue as they are both due for replacement when I get around to it. One is showing 8% wear left - so its due for retirement. I can also easily filter them out anyway
You can edit this line to contain something that appears in the sfdisk -l output for that disk type:

next if (/Verbatim|Kingston|Elements|Enclosure|Virtual|KINGSTON|mapper/);

So for example, my boot drive is a portable SSD which doesn't give good smartctl output, so finding it from sfdisk -l :
Code:
Disk /dev/sdb: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: PSSD T7         
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 33553920 bytes
Disklabel type: gpt
Disk identifier: 006DC377-CC46-453E-B101-A35A993B0B71

Device        Start       End   Sectors   Size Type
/dev/sdb1        40      2087      2048     1M BIOS boot
/dev/sdb2      2088   1050663   1048576   512M EFI System
/dev/sdb3  34605096 976773134 942168039 449.3G Solaris /usr & Apple ZFS
/dev/sdb4   1050664  34605095  33554432    16G Linux swap


I see that the model contains "PSSD", so I would edit the line to look like this:

next if (/Verbatim|Kingston|Elements|Enclosure|Virtual|KINGSTON|mapper|PSSD/);
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Actually I have one more request. Would it be possible to substitute serial numbers for another description.
e.g. Substitute "NewNAS152739404637" for "BootPoolSSD1"

This would need to be a manual editing. If you felt like templating it - I would complete.

I imagine extending the if ($use_influx == 1 && $influx_disks == 1) { log_to_influx("DiskTemp", $serial, $temp);}

to something like
Code:
if ($use_influx == 1 && $influx_disks == 1) {
    if ($serial == "NewNAS152739404637") { $serial == "NewDescriptor" }
   ...
   ...
   ...
   log_to_influx("DiskTemp", $serial, $temp);}


Could you sanity check my syntax please?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
if ($use_influx == 1 && $influx_disks == 1) { if ($serial == "NewNAS152739404637") { $serial == "NewDescriptor" } ... ... ... log_to_influx("DiskTemp", $serial, $temp);}
Almost there...

if ($serial == "NewNAS152739404637") { $serial == "NewDescriptor" }

Would need to be if ($serial eq "NewNAS152739404637") { $serial = "NewDescriptor";}

Although that's rather inelegant, it will probably get the job done... I would personally prefer setting an array at the top of the script with the equations and just getting the output from a function that uses that array. Probably not saving enough CPU cycles to matter though.

eq means check for string equality, == means check for numerical equality, = means set equal to.

semicolon is required to complete a command.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Definately doesn't like that. Its complaining that
Argument "ZL0022F7" isn't numeric in numeric eq (==) at ./influxtemps_SCALE.pl l ine 166.
Argument "BootPoolSSD1" isn't numeric in numeric eq (==) at ./influxtemps_SCALE. pl line 167.
Argument "BootPoolSSD2" isn't numeric in numeric eq (==) at ./influxtemps_SCALE. pl line 168.
Argument "Exos12-1" isn't numeric in numeric eq (==) at ./influxtemps_SCALE.pl l ine 169.
Argument "Exos12-2" isn't numeric in numeric eq (==) at ./influxtemps_SCALE.pl l ine 170.
Argument "Exos12-3" isn't numeric in numeric eq (==) at ./influxtemps_SCALE.pl l ine 171.
Argument "Exos12-4" isn't numeric in numeric eq (==) at ./influxtemps_SCALE.pl l ine 172.
Argument "Exos12-5" isn't numeric in numeric eq (==) at ./influxtemps_SCALE.pl l
Code is
Code:
sub get_one_hd_temp
{
    my $disk_dev = shift;
    my @hdcommand = ($smartctlCmd, '-a', "/dev/$disk_dev");
    my $temp;
    my $serial;

    foreach (run_command(@hdcommand)) {
        chomp;
        #print $_;
        if (/Temperature_Celsius|Airflow_Temperature_Cel/) { $temp = (split)[9]; }
        if (/Serial/) { $serial = (split)[2]; }
    }
        #print $temp;
        if ($use_influx == 1 && $influx_disks == 1) {
            if ($serial == "NewNAS152739404637") { $serial = "BootPoolSSD1"; }
            if ($serial == "NewNAS152739404858") { $serial = "BootPoolSSD2"; }
            if ($serial == "NewNASZHZ5R0PJ") { $serial = "Exos12-1"; }
            if ($serial == "NewNASZTN027WZ") { $serial = "Exos12-2"; }
            if ($serial == "NewNASZHZ550AJ") { $serial = "Exos12-3"; }
            if ($serial == "NewNASZHZ52J66") { $serial = "Exos12-4"; }
            if ($serial == "NewNASZHZ5RY0M") { $serial = "Exos12-5"; }
            if ($serial == "NewNASZHZ5RRJS") { $serial = "Exos12-6"; }
            if ($serial == "NewNASZL2EFZAC") { $serial = "Exos12-7"; }
            if ($serial == "NewNASZL0022F7") { $serial = "Exos12-8"; }
            if ($serial == "NewNAS92X0A0L9FJDH") { $serial = "Tosh18-1"; }
            if ($serial == "NewNAS92X0A0LAFJDH") { $serial = "Tosh18-2"; }
            log_to_influx("DiskTemp", $serial, $temp);}
    return $temp;
}


Not sure why its complaining - the values aren't numeric in the first place
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
I edited my post... check the part about eq, == and =
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
if ($serial == "NewNAS152739404637") { $serial = "BootPoolSSD1"; }
Just to be clear about that also... NewNAS is prepended by the Influx function, so you won't actually find it in any of the strings in the temperature check function.

so to complete one example with all the fixes:

if ($serial eq "152739404637") { $serial = "BootPoolSSD1"; }
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I was scratching my head
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
And getting ahead of you:

Adding a variable to control in the section with all of those:

my $influx_no_servername = 0;

That's on line 16 in my version.

Then, changing:

Code:
sub log_to_influx
{
    # $type should be SensorTemp, FanSpeed, FanDuty or DiskTemp, $name should identify the item (da0, Fan 1, Temp...)
    my ( $type, $name, $value) = @_;
        (my $name_nospaces = $name) =~ s/\s//g;
        my $data;
        if ( $influx_no_servername == 1 ) {
            $data = "$type,component=$name_nospaces value=$value";
        }
        else {
            $data = "$type,component=$influxdb_hostname$name_nospaces value=$value";
        }
        my $auth = "Authorization: Token $influx_token";
        my $payload = "-XPOST \"$influxdb_url\" -d \"$data\" --header \"$auth\"";
        my @influxcommand = ('curl', '-i', $payload);
        #print join (/ /, @influxcommand), "\n";
        my @output = run_command(@influxcommand);

}


You would then (obviously) set it to 1 to put "no servername" into effect.

Maybe it would just be simpler to set influxdb_hostname to ""
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
I note that I'm editing my posts a lot... if you're reading them from the email notifications, please remember to check the forum directly before using any code.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
1687267386426.png

Working. I'll modify the names to point to the chassis slots.

Again - Thank you
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
At the risk of being a nusiance.

I have reverted a previous change (not because it doesn't work) and am now logging all disks temps, including SSD's. But because I can now identify which is is which I am only graphing the ones I want at this time (which will survive a reboot) - but can graph other values later should I want

What doesn't work, for obvious reasons (info not available in the same way and different function), is the nvme drives and serial numbers
smartctl -A/a /dev/nvme1 does not show the serial number but smartctl -i /dev/nvme1 does

Code:
root@NewNAS[/mnt/BigPool/SMB/NewNAS-Scripts/sretella]# smartctl -i /dev/nvme1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE21D280GA
Serial Number:                      PHM274360038280AGN
Firmware Version:                   E2010480
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          280,065,171,456 [280 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Jun 21 00:24:26 2023 BST


Code:
root@NewNAS[/mnt/BigPool/SMB/NewNAS-Scripts/sretella]# smartctl -A /dev/nvme1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        39 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    2,319,890 [1.18 TB]
Data Units Written:                 275,018,941 [140 TB]
Host Read Commands:                 47,779,331
Host Write Commands:                2,688,637,646
Controller Busy Time:               1,183
Power Cycles:                       183
Power On Hours:                     26,608
Unsafe Shutdowns:                   57
Media and Data Integrity Errors:    0
Error Information Log Entries:      0


I have amended the sub get_one_nvme_temp to the following (which doesn't work)
Code:
sub get_one_nvme_temp
{
    my $disk_dev = shift;
    my @nvmecommand = ($smartctlCmd, '-A', "/dev/$disk_dev");
    my @nvmecommand2 = ($smartctlCmd, '-i', "/dev/$disk_dev");
    my $temp;
    my $serial;
    my @result;
    my @result2;
    my $pattern = qr/Temperature\:\s*(\d+)\sCelsius/;
    my $pattern2 = qr/Serial Number\:\s*(\d+)\sCelsius/;                    #Yes I know this is wrong - but removing the \sCelsius/ breaks it further. Also - I suspect the space in "Serial Number" is likley an issue

    @result = join("\n", run_command(@nvmecommand)) =~ m/$pattern/g;
    if (@result) {
        $temp = $result[0];     
        print $temp;
    }
    @result2 = join("\n", run_command(@nvmecommand2)) =~ m/$pattern2/g;
    if (@result2) {
        $serial = $result[0];     
        print $serial;
    }
    if ($use_influx == 1 && $influx_disks == 1) {
        if ($serial eq "nvme0") { $serial = "SLOG-1"; }                        #Not Working
        if ($serial eq "nvme1") { $serial = "SLOG-2"; }                        #Not Working
        log_to_influx("DiskTemp", $serial, $temp);}
    
    return $temp;
}

I'll send the whole code over in a PM.

@sretalla if you don't want to do this / don't have time - please say so - I do not want to be a nusiance
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
As mentioned in DM, I'll have time to look at it in a few days.

No problem.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
It makes sense that doing it this way doesn't work as the pattern matching is a very different way of grabbing the wanted text than doing it line by line as in the HDD one.

What I would do is change it to use -a (which contains both -A and -i) and grab two bits from it with the pattern query, running the command only once, but it will just work with the code you put together if you use the regex like this for pattern2:

Serial Number\:\s*(\S+)\s

(says, find text within the string that starts with "Serial Number:", has a bunch of spaces after it, then some non-space characters (which are the ones we want, hence the parentheses), then a nonprinted character (end of line).)
 
Top