Dear ALL,
Just a bit of history.
Just a few days ago I assembled and installed my first DIY NAS. I took leftover mobo with some CPU, bought HBA and harddrives and now it works. I wanted it to be nearly quiet, since it lives its life in the room where I work. As many of us, I run SATA HDDs with HBA in IT mode.
From the very beginning I started to worry about my HDD health status. For my build I selected Fractal Design Define R5 - quite big and good case, however I simply forgot to order good system fans. Because of this I had to startup using two fans available with my case. I installed them in the front of the case to cool down HDDs. So far I have no issues with HDD temperatures, but I still worry about them.
I inspected internal TrueNAS HDD temperature monitoring facilities and found them inappropriate to monitor temperature on the long run. If temperatures become too high, I may get an email, notifying me about cooked/fried HDDs. I made a brave experiment and stopped one of the fans while system was under moderate load. It took just three minutes for HDDs to raise their temperature by 7oC to uncomfortable for me value. In other words, fan malfunction for such system is deadly.
So, I started to look for more appropriate mean to monitor HDD temperatures and react accordingly. My very first attempt was to employ a script on top of mix of smartctl -a, grep and awk. There are tons of such examples in the Internet. Unfortunately, such construction is quite heavy. It takes resources! Even if the boot pool is on SSD, it takes good deal of resources to run smartctl -a for 8 HDDs I have in the system. As a next step I transitioned to smartctl -A - slightly more efficient version, which supposed to query only S.M.A.R.T. log and nothing more. Unfortunately, it is still too heavy. Also, smartctl wakes the drives from IDLE.
I know it is bad for drives to start and stop all the time. That is why I do not allow drives to stop, but allow them to go to IDLE. This way I still get a good savings in terms of power consumption, but do not put too much pressure on mechanical parts.
I spent 3 days looking for light-weight temperature monitor without any success. I wanted something simple, reporting HDDs temperatures, allowing me to react, preferrably, whithin scripted environment. During my searches I found few references to camcontrol. First it looked like a very dangerous black magic, however, it looked capable to send raw SATA commands to drive. As a result, I found a way to send SATA READ SMART LOG command to my drives and get 512 bytes of response back. Since /bin/bash and friends are not capable to process binary data with acceptable overhead, I selected a Perl as my scripting environment. Please, meet a piece of work in progress: disk_temp.pl. It is still a work in progress, but it already contains all necessary moving pieces.
I am not very familiar with Perl, that is why some of the parts possibly could be rewritten in more efficient way. However, it works, at least, in my environment. It generates accepable load on my system and does not interferes with system main activities. This version just prints HDDs temperatures, while in my environment I use the provided subroutines to monitor HDDs and shutdown the system if temperature goes higher than 45C. Of course, this is only acceptable for home users, but this is who I am
Of course, I do not guaranty the script will work in other environments. It is just several subroutines, which could help to build system with more agile reaction, comparing to what bare TrueNAS may provide.
I will be happy if this some part of this code could make some SOHO admin's life easier
P.S. I do not process errors here. Error processing makes this code too complicated and is not relevant for demonstration purposes.
P.P.S. Make sure to change the line below to the list of drives appropriate for your system.
my $drives = "da1 da2 da3 da4 da5 da6 da7 da8";
P.P.P.S. Since I know how to communicate with SATL ATA PASS-THROUGH commands, it might be wise to develop a small Perl/Python binary extension to further reduce monitoring overhead to negligible value
P.P.P.P.S. In similar way it is possible to extract any S.M.A.R.T. sttribute from the S.M.A.R.T. log. Actually, camcontrol is very agile instrument. It is possible to perform virtually everyhting with HDDs using it.
--
With best regards,
Puzavi
Just a bit of history.
Just a few days ago I assembled and installed my first DIY NAS. I took leftover mobo with some CPU, bought HBA and harddrives and now it works. I wanted it to be nearly quiet, since it lives its life in the room where I work. As many of us, I run SATA HDDs with HBA in IT mode.
From the very beginning I started to worry about my HDD health status. For my build I selected Fractal Design Define R5 - quite big and good case, however I simply forgot to order good system fans. Because of this I had to startup using two fans available with my case. I installed them in the front of the case to cool down HDDs. So far I have no issues with HDD temperatures, but I still worry about them.
I inspected internal TrueNAS HDD temperature monitoring facilities and found them inappropriate to monitor temperature on the long run. If temperatures become too high, I may get an email, notifying me about cooked/fried HDDs. I made a brave experiment and stopped one of the fans while system was under moderate load. It took just three minutes for HDDs to raise their temperature by 7oC to uncomfortable for me value. In other words, fan malfunction for such system is deadly.
So, I started to look for more appropriate mean to monitor HDD temperatures and react accordingly. My very first attempt was to employ a script on top of mix of smartctl -a, grep and awk. There are tons of such examples in the Internet. Unfortunately, such construction is quite heavy. It takes resources! Even if the boot pool is on SSD, it takes good deal of resources to run smartctl -a for 8 HDDs I have in the system. As a next step I transitioned to smartctl -A - slightly more efficient version, which supposed to query only S.M.A.R.T. log and nothing more. Unfortunately, it is still too heavy. Also, smartctl wakes the drives from IDLE.
I know it is bad for drives to start and stop all the time. That is why I do not allow drives to stop, but allow them to go to IDLE. This way I still get a good savings in terms of power consumption, but do not put too much pressure on mechanical parts.
I spent 3 days looking for light-weight temperature monitor without any success. I wanted something simple, reporting HDDs temperatures, allowing me to react, preferrably, whithin scripted environment. During my searches I found few references to camcontrol. First it looked like a very dangerous black magic, however, it looked capable to send raw SATA commands to drive. As a result, I found a way to send SATA READ SMART LOG command to my drives and get 512 bytes of response back. Since /bin/bash and friends are not capable to process binary data with acceptable overhead, I selected a Perl as my scripting environment. Please, meet a piece of work in progress: disk_temp.pl. It is still a work in progress, but it already contains all necessary moving pieces.
I am not very familiar with Perl, that is why some of the parts possibly could be rewritten in more efficient way. However, it works, at least, in my environment. It generates accepable load on my system and does not interferes with system main activities. This version just prints HDDs temperatures, while in my environment I use the provided subroutines to monitor HDDs and shutdown the system if temperature goes higher than 45C. Of course, this is only acceptable for home users, but this is who I am
Of course, I do not guaranty the script will work in other environments. It is just several subroutines, which could help to build system with more agile reaction, comparing to what bare TrueNAS may provide.
I will be happy if this some part of this code could make some SOHO admin's life easier
P.S. I do not process errors here. Error processing makes this code too complicated and is not relevant for demonstration purposes.
P.P.S. Make sure to change the line below to the list of drives appropriate for your system.
my $drives = "da1 da2 da3 da4 da5 da6 da7 da8";
P.P.P.S. Since I know how to communicate with SATL ATA PASS-THROUGH commands, it might be wise to develop a small Perl/Python binary extension to further reduce monitoring overhead to negligible value
P.P.P.P.S. In similar way it is possible to extract any S.M.A.R.T. sttribute from the S.M.A.R.T. log. Actually, camcontrol is very agile instrument. It is possible to perform virtually everyhting with HDDs using it.
--
With best regards,
Puzavi
Code:
#!/usr/local/bin/perl # Initial experiments #camcontrol cmd da1 -v -c "A1 08 0E D0 01 00 4F C2 00 B0 00 00" -i 512 "i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1" #camcontrol cmd da2 -v -c "A1 08 0E D0 01 00 4F C2 00 B0 00 00" -i 512 "i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1" #camcontrol cmd da3 -v -c "A1 08 0E D0 01 00 4F C2 00 B0 00 00" -i 512 "i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1" #camcontrol cmd da4 -v -c "A1 08 0E D0 01 00 4F C2 00 B0 00 00" -i 512 "i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1" #camcontrol cmd da5 -v -c "A1 08 0E D0 01 00 4F C2 00 B0 00 00" -i 512 "i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1" #camcontrol cmd da6 -v -c "A1 08 0E D0 01 00 4F C2 00 B0 00 00" -i 512 "i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1" #camcontrol cmd da7 -v -c "A1 08 0E D0 01 00 4F C2 00 B0 00 00" -i 512 "i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1" #camcontrol cmd da8 -v -c "A1 08 0E D0 01 00 4F C2 00 B0 00 00" -i 512 "i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1" use strict; use warnings; use bigint; ### Parameters ### my $drives = "da1 da2 da3 da4 da5 da6 da7 da8"; my $AttrAirTempID = 190; # Air temperature inside HDD body, value, or 100 - value my $AttrHDATempID = 194; # Temperature as reported by HDD built-in sensor my $AttrTempID = 231; # Yet another unknown temperature ########################################################################################################################### # Parse S.M.A.R.T. data from HDD/SSD for Temperature attributes # Input: # @smart - 512B of smart log in form of Perl array # Output: # %Temp - hash table with parsed temperatures sub Parse_SMART_Temperature { ####################################################################################################################### # S.M.A.R.T. attributes are fixed size 12 bytes long data structures. To find necessary structure it is necessary to # walk through all the records, looking for necessry one. We are interested in records with ID 190, 194 and 231, # representing: # 190 - air temperature inside HDD body, value, or 100 - value # 194 - temperature as reported by HDD built-in sensor # 237 - yet another temperature my @smart = @_; my $idx; # Return hash placeholder, if attribute is not found, -1 will remain as temperature value my %Temp = ("$AttrAirTempID" => -1, "$AttrHDATempID" => -1, "$AttrTempID" => -1); for ($idx = 2; $idx < 361; $idx = $idx + 12) { my $AttrID = $smart[$idx + 0]; my $Flags = $smart[$idx + 1] + ($smart[$idx + 2] << 8); my $Current = $smart[$idx + 3]; my $Worst = $smart[$idx + 4]; my $Thresh = $smart[$idx + 11]; if ($AttrID == $AttrAirTempID) { $Temp{"$AttrAirTempID"} = 100 - $Current; } elsif ($AttrID == $AttrHDATempID) { $Temp{"$AttrHDATempID"} = $Current; } elsif ($AttrID == $AttrTempID) { # I have no such HDD, so I have no idea how to handle this attribute } } return %Temp; } ########################################################################################################################### # Query HDD/SSD S.M.A.R.T. data log # Input: # $DeviceID - device to query suitable for camcontrol cmd: da1 => /dev/da1 will be queried # Output: # @smart - 512B Perl array with S.M.A.R.T. data sub Query_SMART_Log { my $camc_prefix = "camcontrol cmd "; my $camc_suffix = " -v -c \"A1 08 0E D0 01 00 4F C2 00 B0 00 00\" -i 512 -"; my $camc_device = $_[0]; my $camc_cmd = $camc_prefix.$camc_device.$camc_suffix; my $result = `$camc_cmd`; return unpack("C*", $result); } ########################################################################################################################### # Query device model and device serial number # Input: # $DeviceID - device to query suitable for cmartctl -a /dev/$DeviceID: da1 => /dev/da1 will be queried # Output: # None, models and serials are stored in %Models and %Serials hash tables my %Models; my %Serials; sub Query_Models_Serials { my $smartctl_prefix = "smartctl -a /dev/"; my $smartctl_suffix = " | grep 'Device Model:\\|Serial Number:' | awk '{print \$3}'"; my $smartctl_device = $_[0]; my $smartctl_cmd = $smartctl_prefix.$smartctl_device.$smartctl_suffix; my $result = `$smartctl_cmd`; ($Models{$smartctl_device}, $Serials{$smartctl_device}) = split("\n", $result); } ########################################################################################################################### # Main and inititialization my @DriveList = split(" ", $drives); foreach my $DeviceID (@DriveList) { Query_Models_Serials($DeviceID); } # Main reporting loop for ( ; ; ) { printf("\033c"); foreach my $DeviceID (@DriveList) { my @smart = Query_SMART_Log($DeviceID); my %Temp = Parse_SMART_Temperature(@smart); my $reported = 0; printf("/dev/%s: %s %s ", $DeviceID, $Models{$DeviceID}, $Serials{$DeviceID}); if ($Temp{$AttrAirTempID} != -1) { printf("AIR: %d ", $Temp{$AttrAirTempID}); $reported = 1; } if ($Temp{$AttrHDATempID} != -1) { printf("HDA: %d ", $Temp{$AttrHDATempID}); $reported = 1; } if ($reported == 0) { printf("UNDETECTED"); } printf("\n"); sleep(1); } }