Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

Temperature monitoring

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
It might have something to do with text-parsing double-digit drive numbers, but that's speculation.
The drives are ada0-ada6 and da0-da7, so there are no double-digit drive numbers (if by that you mean that the N in adaN is two digits). My suspicion is that there are two unrelated things going on:
  • The temps of ada0 and ada1 are being parsed incorrectly (perhaps reported incorrectly), which ultimately leads to the error message I'm seeing
  • The script is only scanning /dev/adaN for temperatures, not /dev/daN
The latter appears pretty clear--temps-simple.sh is only giving temperatures for ada devices, and the output of temps-rrd-format.sh appears to be doing the same. For temps-simple.sh, I can just change the first (non-comment) line to:
Code:
for i in $(/sbin/sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/da/')) print $i }' );

and it works fine, except that ada0 and ada1 still give odd output. It looks like the same change could be made to temps-rrd-format.sh.

As to the first issue, it looks like the SATA DOMs have two attributes they're reporting (194 and 231) with the label of "Temperature_Celsius". Here's the SMART output of one of them:
Code:
[root@freenas2] /mnt/tank/scripts/freenas-temperature-graphing# smartctl -a /dev/ada0 
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KingSpec KDM-SA.71-016GMJ
Serial Number:    985021000159
LU WWN Device Id: 5 000000 000000000
Firmware Version: 1.094.12
User Capacity:    16,013,942,784 bytes [16.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA >3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Apr 19 13:03:23 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 245)    Self-test routine in progress...
                    50% of test remaining.
Total time to complete Offline 
data collection:         (   32) seconds.
Offline data collection
capabilities:             (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     (   1) minutes.
SCT capabilities:           (0x0039)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000a   100   100   000    Old_age   Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   100   100   050    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0013   100   100   050    Pre-fail  Always       -       0
  7 Unknown_SSD_Attribute   0x000b   100   100   050    Pre-fail  Always       -       0
  8 Unknown_SSD_Attribute   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       9943
 10 Unknown_SSD_Attribute   0x0013   100   100   050    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0012   100   100   000    Old_age   Always       -       12
167 Unknown_Attribute       0x0022   100   100   000    Old_age   Always       -       0
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       3
169 Unknown_Attribute       0x0013   100   100   010    Pre-fail  Always       -       131073
170 Unknown_Attribute       0x0013   100   100   010    Pre-fail  Always       -       0
171 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0012   001   001   000    Old_age   Always       -       4295229441
175 Program_Fail_Count_Chip 0x0013   100   100   010    Pre-fail  Always       -       0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   100   100   020    Pre-fail  Always       -       128
187 Reported_Uncorrect      0x0032   000   000   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       7
194 Temperature_Celsius     0x0022   075   075   030    Old_age   Always       -       25 (0 60 0 30 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
231 Temperature_Celsius     0x0033   100   100   005    Pre-fail  Always       -       0
240 Unknown_SSD_Attribute   0x0013   100   100   050    Pre-fail  Always       -       0
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       27787834
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       496520585

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Self-test routine in progress 00%      9943         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@freenas2] /mnt/tank/scripts/freenas-temperature-graphing# 
 

Seren

Junior Member
Joined
Feb 18, 2016
Messages
21
The drives are ada0-ada6 and da0-da7, so there are no double-digit drive numbers (if by that you mean that the N in adaN is two digits). My suspicion is that there are two unrelated things going on:
  • The temps of ada0 and ada1 are being parsed incorrectly (perhaps reported incorrectly), which ultimately leads to the error message I'm seeing
  • The script is only scanning /dev/adaN for temperatures, not /dev/daN
The latter appears pretty clear--temps-simple.sh is only giving temperatures for ada devices, and the output of temps-rrd-format.sh appears to be doing the same. For temps-simple.sh, I can just change the first (non-comment) line to:
Code:
for i in $(/sbin/sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/da/')) print $i }' );

and it works fine, except that ada0 and ada1 still give odd output. It looks like the same change could be made to temps-rrd-format.sh.

As to the first issue, it looks like the SATA DOMs have two attributes they're reporting (194 and 231) with the label of "Temperature_Celsius". Here's the SMART output of one of them:
Code:
[root@freenas2] /mnt/tank/scripts/freenas-temperature-graphing# smartctl -a /dev/ada0
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KingSpec KDM-SA.71-016GMJ
Serial Number:    985021000159
LU WWN Device Id: 5 000000 000000000
Firmware Version: 1.094.12
User Capacity:    16,013,942,784 bytes [16.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA >3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Apr 19 13:03:23 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 245)    Self-test routine in progress...
                    50% of test remaining.
Total time to complete Offline
data collection:         (   32) seconds.
Offline data collection
capabilities:             (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     (   1) minutes.
SCT capabilities:           (0x0039)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000a   100   100   000    Old_age   Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   100   100   050    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0013   100   100   050    Pre-fail  Always       -       0
  7 Unknown_SSD_Attribute   0x000b   100   100   050    Pre-fail  Always       -       0
  8 Unknown_SSD_Attribute   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       9943
10 Unknown_SSD_Attribute   0x0013   100   100   050    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0012   100   100   000    Old_age   Always       -       12
167 Unknown_Attribute       0x0022   100   100   000    Old_age   Always       -       0
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       3
169 Unknown_Attribute       0x0013   100   100   010    Pre-fail  Always       -       131073
170 Unknown_Attribute       0x0013   100   100   010    Pre-fail  Always       -       0
171 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0012   001   001   000    Old_age   Always       -       4295229441
175 Program_Fail_Count_Chip 0x0013   100   100   010    Pre-fail  Always       -       0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   100   100   020    Pre-fail  Always       -       128
187 Reported_Uncorrect      0x0032   000   000   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       7
194 Temperature_Celsius     0x0022   075   075   030    Old_age   Always       -       25 (0 60 0 30 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
231 Temperature_Celsius     0x0033   100   100   005    Pre-fail  Always       -       0
240 Unknown_SSD_Attribute   0x0013   100   100   050    Pre-fail  Always       -       0
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       27787834
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       496520585

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Self-test routine in progress 00%      9943         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@freenas2] /mnt/tank/scripts/freenas-temperature-graphing# 
Thanks for the detailed SMART output. I've updated the scripts to be more specific about which line they pull from the output, and check "daX" devices in addition to "adaX" devices. I've also added some handling for null data being returned from USB devices since we'll now be trying to check them. The updated files are on github (https://github.com/seren/freenas-temperature-graphing.git)
 

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
I just did a git pull, and I'm not seeing any difference in the output or the error messages. Github doesn't show any commits since 4/18. Did I miss something?
 

Seren

Junior Member
Joined
Feb 18, 2016
Messages
21
Oops, I think I committed locally and forgot to push to github. Could you try again?
 

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
That seems to have done the trick. temps-simple is still only reporting the adaN devices (and not the daN ones), but temps-rrd-format looks like it's got all of them. I'll turn the cron job back on and see what happens.

Edit: OK, now running into a different issue. I get this error from the cron job:
Code:
ERROR: /mnt/tank/scripts/freenas-temperature-graphing/temps-1min.rrd: found extra data on update argument: 32:32:25:25:34:33:33:33


Code:
root@freenas2] /mnt/tank/scripts/freenas-temperature-graphing# ./temps-rrd-format.sh 
49:50:52:51:45:44:44:44:35:36:35:35:35:35:32:32:25:25:34:33:33:32
[root@freenas2] /mnt/tank/scripts/freenas-temperature-graphing# ./temps-simple.sh 
ada0 - 25C  
ada1 - 25C  
ada2 - 34C  
ada3 - 33C  
ada4 - 33C  
ada5 - 33C  
dev.cpu.0.temperature: 58.0C
dev.cpu.1.temperature: 57.0C
dev.cpu.2.temperature: 63.0C
dev.cpu.3.temperature: 63.0C
dev.cpu.4.temperature: 53.0C
dev.cpu.5.temperature: 53.0C
dev.cpu.6.temperature: 57.0C
dev.cpu.7.temperature: 57.0C
 
Last edited:

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
I just had the idea that maybe there was something with the existing RRD databases not allowing the number of entries now being generated, so I deleted both .rrd files and re-enabled the cron jobs. No dice--I'm still getting the same error as noted above.
 

Seren

Junior Member
Joined
Feb 18, 2016
Messages
21
Oops, I think I committed locally and forgot to push to github. Could you try again?
I just had the idea that maybe there was something with the existing RRD databases not allowing the number of entries now being generated, so I deleted both .rrd files and re-enabled the cron jobs. No dice--I'm still getting the same error as noted above.
I think the problem was that the rrdtool database creation code hadn't been updated to handle "da" devices either. I thought that this was too small and simple to worry about duplication of code, but obviously I was wrong. :) I've updated the creation part in rrd.sh, and added a check that the device actually returns a SMART temperature. Can you wipe your rrd file, pull the latest changes, and try again?
 

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
That seems to be working now, thanks! Any chance of adding the stats below the graph, the way FreeNAS does it on the reporting screen?

Just about time to move to the new hardware; I'll have to see how it works there.

Edit: Though it's running without errors, the CPU temp graph caps out at 50 C, and mine seems to be running higher than that most of the time. Any chance of increasing the limit, or making the limit dynamic?
 
Last edited:

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
Just about time to move to the new hardware; I'll have to see how it works there.
OK, error again there, no doubt due to too many CPUs:
Code:
/mnt/tank/scripts/freenas-temperature-graphing/rrd-graph.sh: line 119: LINECOLORS[$i]: unbound variable

System reports 32 CPUs (2x 8-core CPUs with hyperthreading).
 

Seren

Junior Member
Joined
Feb 18, 2016
Messages
21
Yup, that's way more than I anticipated (my system has 2 cores and 4 drives). I've added a larger color palette and added some logic to reuse colors if there are more devices than colors. Let me know if that works.
 

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
Yes, that seems to work. On the scaling issue, removing the --rigid parameter in rrd-graph.sh seems to do the trick.
temps-1min-cpus.png
temps-1min-cpus.png
 
Last edited:

Seren

Junior Member
Joined
Feb 18, 2016
Messages
21
Yes, that seems to work. On the scaling issue, removing the --rigid parameter in rrd-graph.sh seems to do the trick.
View attachment 11527 View attachment 11528
Nice. There are some variables at the beginning of the rrd-graph.sh script which define the max and min ranges for the temperature graph. They're useful when the script is generating a graph for each individual drive and/or CPU so that you get a consistent scale. That code is commented out at the moment (individual graphs didn't seem very useful), so removing the --rigid parameter is a more useful and flexible option.

-s
 

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
Hmmm... I've now moved my FreeNAS install to the new hardware (will update my sig shortly), installed 9.10, etc. CPU temps are reporting just fine, but almost all the drive temps are missing. The drives now are two SSDs on ada0 and ada1, and 18 regular drives on da0-da17. The graph shows only the SSDs:
temps-1min-drives.png

(ada0 isn't showing because it reports its temp as 128 C, well off the chart). The temps-simple.sh script does report all the drives (yeah, I know they're running a little warm):
Code:
[root@freenas2] /mnt/tank/scripts/freenas-temperature-graphing# ./temps-simple.sh 
dev.cpu.31.temperature: 47.0C
dev.cpu.30.temperature: 48.0C
dev.cpu.29.temperature: 48.0C
dev.cpu.28.temperature: 48.0C
dev.cpu.27.temperature: 48.0C
dev.cpu.26.temperature: 49.0C
dev.cpu.25.temperature: 47.0C
dev.cpu.24.temperature: 47.0C
dev.cpu.23.temperature: 46.0C
dev.cpu.22.temperature: 46.0C
dev.cpu.21.temperature: 44.0C
dev.cpu.20.temperature: 44.0C
dev.cpu.19.temperature: 48.0C
dev.cpu.18.temperature: 48.0C
dev.cpu.17.temperature: 46.0C
dev.cpu.16.temperature: 46.0C
dev.cpu.15.temperature: 43.0C
dev.cpu.14.temperature: 43.0C
dev.cpu.13.temperature: 50.0C
dev.cpu.12.temperature: 50.0C
dev.cpu.11.temperature: 46.0C
dev.cpu.10.temperature: 46.0C
dev.cpu.9.temperature: 51.0C
dev.cpu.8.temperature: 51.0C
dev.cpu.7.temperature: 47.0C
dev.cpu.6.temperature: 47.0C
dev.cpu.5.temperature: 46.0C
dev.cpu.4.temperature: 46.0C
dev.cpu.3.temperature: 45.0C
dev.cpu.2.temperature: 46.0C
dev.cpu.1.temperature: 42.0C
dev.cpu.0.temperature: 42.0C
ada0 - 128C  
ada1 - 34C  
da0 - 37C  
da1 - 37C  
da2 - 37C  
da3 - 36C  
da4 - 35C  
da5 - 34C  
da6 - 41C  
da7 - 41C  
da8 - 40C  
da9 - 38C  
da10 - 38C  
da11 - 37C  
da12 - 41C  
da13 - 42C  
da14 - 42C  
da15 - 42C  
da16 - 40C  
da17 - 37C  
[root@freenas2] /mnt/tank/scripts/freenas-temperature-graphing# 
 

Seren

Junior Member
Joined
Feb 18, 2016
Messages
21
Hmm, can you post the smartctl -a output of one of the missing drives? Maybe the string matching is failing...
 

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
Sure. The first 12 drives (da0 - da11) were in the old server and were working fine on your script there. Here's the SMART output of one of them:
Code:
smartctl 6.4 2015-06-04 r4109 [FreeBSD 10.3-RELEASE amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WMC4N2493116
LU WWN Device Id: 5 0014ee 659d14129
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 27 15:11:02 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (40560) seconds.
Offline data collection
capabilities:             (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:     (   2) minutes.
Extended self-test routine
recommended polling time:     ( 407) minutes.
Conveyance self-test routine
recommended polling time:     (   5) minutes.
SCT capabilities:           (0x703d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   181   181   021    Pre-fail  Always       -       5908
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       19
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   081   081   000    Old_age   Always       -       14000
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       19
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       6
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       17
194 Temperature_Celsius     0x0022   114   111   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     13986         -
# 2  Short offline       Completed without error       00%     13962         -
# 3  Short offline       Completed without error       00%     13938         -
# 4  Short offline       Completed without error       00%     13914         -
# 5  Extended offline    Completed without error       00%     13900         -
# 6  Short offline       Completed without error       00%     13890         -
# 7  Short offline       Completed without error       00%     13866         -
# 8  Short offline       Completed without error       00%     13842         -
# 9  Short offline       Completed without error       00%     13818         -
#10  Short offline       Completed without error       00%     13794         -
#11  Short offline       Completed without error       00%     13771         -
#12  Short offline       Completed without error       00%     13746         -
#13  Extended offline    Completed without error       00%     13732         -
#14  Short offline       Completed without error       00%     13722         -
#15  Short offline       Completed without error       00%     13698         -
#16  Short offline       Completed without error       00%     13674         -
#17  Short offline       Completed without error       00%     13650         -
#18  Short offline       Completed without error       00%     13626         -
#19  Short offline       Completed without error       00%     13602         -
#20  Short offline       Completed without error       00%     13578         -
#21  Extended offline    Completed without error       00%     13564         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I did a fresh git clone into a new directory, and judging from the size of the newly-created RRD files (1.2M each), they're being created with space to store the new values. And it does appear that the data is being stored:
Code:
[root@freenas2] /mnt/ssdpool/scripts/freenas-temperature-graphing# rrdtool fetch temps-1min.rrd LAST -s -120
                          cpu0                cpu1                cpu2                cpu3                cpu4                cpu5                cpu6                cpu7                cpu8                cpu9               cpu10               cpu11               cpu12               cpu13               cpu14               cpu15               cpu16               cpu17               cpu18               cpu19               cpu20               cpu21               cpu22               cpu23               cpu24               cpu25               cpu26               cpu27               cpu28               cpu29               cpu30               cpu31                ada0                ada1                 da0                 da1                 da2                 da3                 da4                 da5                 da6                 da7                 da8                 da9                da10                da11                da12                da13                da14                da15                da16                da17

1461784680: 4.1049997400e+01 4.1049997400e+01 4.7000000000e+01 4.6049997400e+01 4.4049997400e+01 4.5149992200e+01 4.6000000000e+01 4.6950002600e+01 4.6950002600e+01 4.7000000000e+01 4.4000000000e+01 4.4950002600e+01 4.8149992200e+01 4.9249987000e+01 4.2049997400e+01 4.2000000000e+01 4.6000000000e+01 4.7000000000e+01 4.7199989600e+01 4.8149992200e+01 4.3099994800e+01 4.4049997400e+01 4.5950002600e+01 4.5950002600e+01 4.6049997400e+01 4.6049997400e+01 4.8900005200e+01 4.8950002600e+01 4.5000000000e+01 4.5000000000e+01 5.0000000000e+01 5.1000000000e+01 nan 3.4000000000e+01 3.7000000000e+01 3.7000000000e+01 3.6049997400e+01 3.6000000000e+01 3.5000000000e+01 3.4000000000e+01 4.0000000000e+01 4.1000000000e+01 4.0000000000e+01 3.8000000000e+01 3.8000000000e+01 3.7000000000e+01 4.1000000000e+01 4.2000000000e+01 4.2000000000e+01 4.1000000000e+01 4.0000000000e+01 3.7000000000e+01
1461784740: 4.1000000000e+01 4.1000000000e+01 4.7951916767e+01 4.7903833533e+01 4.4951916767e+01 4.5951916767e+01 4.6951916767e+01 4.7000000000e+01 4.9855750300e+01 4.8903833533e+01 4.4000000000e+01 4.4048083233e+01 4.8951916767e+01 4.9000000000e+01 4.2951916767e+01 4.2951916767e+01 4.8855750300e+01 4.9855750300e+01 4.7951916767e+01 4.7048083233e+01 4.3951916767e+01 4.4000000000e+01 4.8855750300e+01 4.7903833533e+01 4.6951916767e+01 4.6951916767e+01 5.1855750300e+01 5.1855750300e+01 4.5951916767e+01 4.5951916767e+01 4.7144249700e+01 4.6240416167e+01 nan 3.4000000000e+01 3.7951916767e+01 3.7000000000e+01 3.6951916767e+01 3.6000000000e+01 3.5000000000e+01 3.4000000000e+01 4.0000000000e+01 4.1000000000e+01 4.0000000000e+01 3.8000000000e+01 3.8000000000e+01 3.7000000000e+01 4.1000000000e+01 4.2000000000e+01 4.2000000000e+01 4.1000000000e+01 4.0000000000e+01 3.7000000000e+01
1461784800: nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 

Seren

Junior Member
Joined
Feb 18, 2016
Messages
21
Hmm, then I guess it's probably in the graphing script.

Could you add a line to the rrd-graph.sh script around line 20 with the contents: "set -x" (no quotes), and change line 80 from " --rigid > /dev/null" to " --rigid"? Then run it manually and post the output? I can also make those changes in a separate git branch that you can check out if that's easier, but you seem pretty technically competent. :)
 

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
Sure, it's attached since it's too big to post inline.
 

Attachments

Jacopx

Senior Member
Joined
Feb 19, 2016
Messages
365
Sorry for my ignorance but i haven't understand how to view and use this scripts! :(
 

danb35

Wizened Sage
Joined
Aug 16, 2011
Messages
11,739
Put the scripts in a share on your server. Set up cron jobs as indicated in the crontab.txt file, adjusting the pathname to match your shared folder. The graphs will be generated in that share, and you can view them on any computer that has that share open. They don't appear in the web GUI.
 

i3luefire

Member
Joined
Jan 4, 2014
Messages
69
this seems pretty freaking awesome. Thanks. but i get this error.
Code:
./rrd.sh: line 24: $1: unbound variable
 
Top