Script: Hybrid CPU & HD Fan Zone Controller

Joined
Jan 17, 2014
Messages
4
Next question ... why do I get different cpu temps for different ways asking sysctl about cpu temps ...

Code:
[root@freenas] ~# sysctl -a dev.cpu | grep temperature && sysctl -a | grep temperature
dev.cpu.3.temperature: 30.0C
dev.cpu.2.temperature: 30.0C
dev.cpu.1.temperature: 25.0C
dev.cpu.0.temperature: 25.0C
hw.acpi.thermal.tz1.temperature: 29.8C
hw.acpi.thermal.tz0.temperature: 27.8C
dev.cpu.3.temperature: 40.0C
dev.cpu.2.temperature: 40.0C
dev.cpu.1.temperature: 30.0C
dev.cpu.0.temperature: 30.0C


The temps I get using sysctl -a dev.cpu are always about 10°C lower for cores 2&3 ..
 
Joined
Jan 17, 2014
Messages
4
And I want to suggest an improvement:

I noticed the ipmi command to change the hd-duty-cycle gets invoked every 3 minutes even though it did not change.
Code:
[root@freenas] ~# ./hybrid_fan_controller.pl
2016-10-24 17:19:10: Setting fan mode to 1 (full)
2016-10-24 17:19:15: CPU Temp: 29.0
2016-10-24 17:19:15: CPU Temp: 29.0 <= 35, CPU Fan going low.
2016-10-24 17:19:15: CPU Fan: low
2016-10-24 17:19:15: CPU Fan changing... (low)
2016-10-24 17:19:15: Setting Zone 0 duty cycle to 30%

2016-10-24 17:19:16: Maximum HD Temperature: 32
2016-10-24 17:19:16: Drives are cool enough, going to 30%
2016-10-24 17:19:16: Setting Zone 1 duty cycle to 30%

2016-10-24 17:22:17: Maximum HD Temperature: 32
2016-10-24 17:22:17: Setting Zone 1 duty cycle to 30%

2016-10-24 17:25:18: Maximum HD Temperature: 32
2016-10-24 17:25:18: Setting Zone 1 duty cycle to 30%

2016-10-24 17:28:18: Maximum HD Temperature: 32
2016-10-24 17:28:18: Setting Zone 1 duty cycle to 30%


I would suggest to modify line 248 if( !$override_hd_fan_level to incorporate something similar to line 205 if( $old_cpu_fan_level ne $cpu_fan_level ) but ofcourse using $hd_fan_duty and (to be added) $old_hd_fan_duty.

If the $hd_fan_duty stays the same, you suppress the debug-message (lines 504, 509, 515, 520) but you still send the ipmi command. I extracted the above using $debug=2.

With kind regards,
blue_bandana

--------- edit --------- here are my changes -----------
I marked the changed lines with #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<, I modified / added 3 lines in total.

Code:
sub main
{
   # need to go to Full mode so we have unfettered control of Fans
   set_fan_mode("full");
 
   my $cpu_fan_level = "";
   my $old_cpu_fan_level = "";
   my $override_hd_fan_level = 0;
   my $last_hd_check_time = 0;
   my $hd_fan_duty = 0;
   my $old_hd_fan_duty = 0;	#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

 
   while()
   {
	 $old_cpu_fan_level = $cpu_fan_level;
	 $cpu_fan_level = control_cpu_fan( $old_cpu_fan_level );
	
	 if( $old_cpu_fan_level ne $cpu_fan_level )
	 {
	   $last_fan_level_change_time = time;
	 }

	 if( $cpu_fan_level eq "high" )
	 {
	 
	   if( $hd_fans_cool_cpu && !$override_hd_fan_level && ($last_cpu_temp >= $cpu_hd_override_temp || $last_cpu_temp == 0) )
	   {
		 #override hd fan zone level, once we override we won't backoff until the cpu drops to below "high"
		 $override_hd_fan_level = 1;
		 dprint( 0, "CPU Temp: $last_cpu_temp >= $cpu_hd_override_temp, Overiding HD fan zone to $hd_fan_duty_high%, \n" );
		 set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty_high );
		
		 $last_fan_level_change_time = time;
	   }
	 }
	 elsif( $override_hd_fan_level )
	 {
	   #restore hd fan zone level;
	   $override_hd_fan_level = 0;
	   dprint( 0, "Restoring HD fan zone to $hd_fan_duty%\n" );
	   set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty ); 
	 
	   $last_fan_level_change_time = time;
	 }

	 # periodically determine hd fan zone level
	
	 my $check_time = time;
	 if( $check_time - $last_hd_check_time > $hd_polling_interval )
	 {
	   $last_hd_check_time = $check_time;
 
	   # we refresh the hd_list from camcontrol devlist
	   # everytime because if you're adding/removing HDs we want
	   # starting checking their temps too!
	   @hd_list = get_hd_list();
	 
	   my $hd_temp = get_hd_temp();
	   $old_hd_fan_duty = $hd_fan_duty;   #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
	   $hd_fan_duty = calculate_hd_fan_duty_cycle( $hd_temp, $hd_fan_duty );
	 
	   if( !$override_hd_fan_level && ( $old_hd_fan_duty != $hd_fan_duty ))  #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
	   {
		 set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty );

		 $last_fan_level_change_time = time; # this resets every time, but it shouldn't matter since hd_polling_interval is large.
	   }
	 }
	
	 # verify_fan_speed_levels function is fairly complicated	
	 verify_fan_speed_levels(  $cpu_fan_level, $override_hd_fan_level ? $hd_fan_duty_high : $hd_fan_duty );
	
		 
	 # CPU temps can go from cool to hot in 2 seconds! so we only ever sleep for 1 second.
	 sleep 1;

   } # inf loop
}


------ edit comment ----
Also "ne" is a string comparison, but hd_duty is numeric, you can use !=
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Next question ... why do I get different cpu temps for different ways asking sysctl about cpu temps ...

Code:
[root@freenas] ~# sysctl -a dev.cpu | grep temperature && sysctl -a | grep temperature
dev.cpu.3.temperature: 30.0C
dev.cpu.2.temperature: 30.0C
dev.cpu.1.temperature: 25.0C
dev.cpu.0.temperature: 25.0C
hw.acpi.thermal.tz1.temperature: 29.8C
hw.acpi.thermal.tz0.temperature: 27.8C
dev.cpu.3.temperature: 40.0C
dev.cpu.2.temperature: 40.0C
dev.cpu.1.temperature: 30.0C
dev.cpu.0.temperature: 30.0C


The temps I get using sysctl -a dev.cpu are always about 10°C lower for cores 2&3 ..

Different cores have different thermal characteristics.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
And I want to suggest an improvement:

I noticed the ipmi command to change the hd-duty-cycle gets invoked every 3 minutes even though it did not change.
Code:
[root@freenas] ~# ./hybrid_fan_controller.pl
2016-10-24 17:19:10: Setting fan mode to 1 (full)
2016-10-24 17:19:15: CPU Temp: 29.0
2016-10-24 17:19:15: CPU Temp: 29.0 <= 35, CPU Fan going low.
2016-10-24 17:19:15: CPU Fan: low
2016-10-24 17:19:15: CPU Fan changing... (low)
2016-10-24 17:19:15: Setting Zone 0 duty cycle to 30%

2016-10-24 17:19:16: Maximum HD Temperature: 32
2016-10-24 17:19:16: Drives are cool enough, going to 30%
2016-10-24 17:19:16: Setting Zone 1 duty cycle to 30%

2016-10-24 17:22:17: Maximum HD Temperature: 32
2016-10-24 17:22:17: Setting Zone 1 duty cycle to 30%

2016-10-24 17:25:18: Maximum HD Temperature: 32
2016-10-24 17:25:18: Setting Zone 1 duty cycle to 30%

2016-10-24 17:28:18: Maximum HD Temperature: 32
2016-10-24 17:28:18: Setting Zone 1 duty cycle to 30%


I would suggest to modify line 248 if( !$override_hd_fan_level to incorporate something similar to line 205 if( $old_cpu_fan_level ne $cpu_fan_level ) but ofcourse using $hd_fan_duty and (to be added) $old_hd_fan_duty.

If the $hd_fan_duty stays the same, you suppress the debug-message (lines 504, 509, 515, 520) but you still send the ipmi command. I extracted the above using $debug=2.

With kind regards,
blue_bandana

--------- edit --------- here are my changes -----------
I marked the changed lines with #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<, I modified / added 3 lines in total.

Code:
sub main
{
   # need to go to Full mode so we have unfettered control of Fans
   set_fan_mode("full");
 
   my $cpu_fan_level = "";
   my $old_cpu_fan_level = "";
   my $override_hd_fan_level = 0;
   my $last_hd_check_time = 0;
   my $hd_fan_duty = 0;
   my $old_hd_fan_duty = 0;	#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

 
   while()
   {
	 $old_cpu_fan_level = $cpu_fan_level;
	 $cpu_fan_level = control_cpu_fan( $old_cpu_fan_level );
	
	 if( $old_cpu_fan_level ne $cpu_fan_level )
	 {
	   $last_fan_level_change_time = time;
	 }

	 if( $cpu_fan_level eq "high" )
	 {
	 
	   if( $hd_fans_cool_cpu && !$override_hd_fan_level && ($last_cpu_temp >= $cpu_hd_override_temp || $last_cpu_temp == 0) )
	   {
		 #override hd fan zone level, once we override we won't backoff until the cpu drops to below "high"
		 $override_hd_fan_level = 1;
		 dprint( 0, "CPU Temp: $last_cpu_temp >= $cpu_hd_override_temp, Overiding HD fan zone to $hd_fan_duty_high%, \n" );
		 set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty_high );
		
		 $last_fan_level_change_time = time;
	   }
	 }
	 elsif( $override_hd_fan_level )
	 {
	   #restore hd fan zone level;
	   $override_hd_fan_level = 0;
	   dprint( 0, "Restoring HD fan zone to $hd_fan_duty%\n" );
	   set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty ); 
	 
	   $last_fan_level_change_time = time;
	 }

	 # periodically determine hd fan zone level
	
	 my $check_time = time;
	 if( $check_time - $last_hd_check_time > $hd_polling_interval )
	 {
	   $last_hd_check_time = $check_time;
 
	   # we refresh the hd_list from camcontrol devlist
	   # everytime because if you're adding/removing HDs we want
	   # starting checking their temps too!
	   @hd_list = get_hd_list();
	 
	   my $hd_temp = get_hd_temp();
	   $old_hd_fan_duty = $hd_fan_duty;   #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
	   $hd_fan_duty = calculate_hd_fan_duty_cycle( $hd_temp, $hd_fan_duty );
	 
	   if( !$override_hd_fan_level && ( $old_hd_fan_duty ne $hd_fan_duty ))  #<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
	   {
		 set_fan_zone_duty_cycle( $hd_fan_zone, $hd_fan_duty );

		 $last_fan_level_change_time = time; # this resets every time, but it shouldn't matter since hd_polling_interval is large.
	   }
	 }
	
	 # verify_fan_speed_levels function is fairly complicated	
	 verify_fan_speed_levels(  $cpu_fan_level, $override_hd_fan_level ? $hd_fan_duty_high : $hd_fan_duty );
	
		 
	 # CPU temps can go from cool to hot in 2 seconds! so we only ever sleep for 1 second.
	 sleep 1;

   } # inf loop
}

Part if the issue is dealing with failure. Ie if the fan duty fails to set (eg because the BMC is resetting) I need to retry. Rather than setting then testing the value I'm just setting it every time. At the moment if it misses then the crash logic will catch the high or low case and do a reset and very easily might miss the next set since it takes multiple minutes to reset.

Also "ne" is a string comparison, but hd_duty is numeric, you can use !=
 
Joined
Jan 17, 2014
Messages
4
Different cores have different thermal characteristics.
but why does i.e. core 3, merely milliseconds after the first call, have a 10°C higher temp. Even worse, if you change the order of the calls, core 3 (at least on my board) will have still a lower temp when queried with sysctl -a dev.cpu???

Edit: My question does not belong in this thread ... please ignore it. back on topic
 
Last edited:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Sensors calibration errors, and/or thermal paste not applied uniformly, and/or die and/or IHS surfaces aren't the best.
 

yonkoc

Explorer
Joined
Oct 26, 2011
Messages
52
Its behaving as if the duty cycle commands aren't working.

First thing, do you have 4 pin PWM fans? I assume you do since the fan speeds do go up when it sets the fan mode to full.

Have there been any system event logs posted in IPMI?

This is the code that sets the fan zone duty cycle (which is what doesn't seem to be working)

Code:
# zone,dutycycle%
sub set_fan_zone_duty_cycle
{
	my ( $zone, $duty ) = @_;
	
	if( $zone < 0 || $zone > 1 )
	{
		bail_with_fans_full( "Illegal Fan Zone" );
	}

	if( $duty < 0 || $duty > 100 )
	{
		dprint( 0, "illegal duty cycle, assuming 100%\n");
		$duty = 100;
	}
		
	dprint( 1, "Setting Zone $zone duty cycle to $duty%\n");

	`$ipmitool raw 0x30 0x70 0x66 0x01 $zone $duty`;
	
	return;
}


So, without running the fan controller, firstly, set the fan mode to Full

ipmitool raw 0x30 0x45 0x01 1

And verify that your fans spin up

ipmitool sdr | grep FAN

Then try setting fan zone 0 to 50%

ipmitool raw 0x30 0x70 0x66 0x01 0 50

again, try

ipmitool sdr | grep FAN

FAN4 should slow down to about half speed (1000RPM?)

It might take a few seconds (10?) for it to finish slowing down.

What happens when you try?

Also, you can try increasing $debug value to a higher value to provide some more info, but it really does seem like its doing everything except actually setting the duty cycle.

Well, well, well, when a firmware update can do wonders.... :) I'm happy to report to anyone with an X10SL7-F that the stock BMC firmware ver 1.42, that mine at least came with, is garbage and going to the latest 3.27 will allow the script to work properly. :) Sorry I was unable to respond earlier.

So, yes. Script works, sort of... I have a question too about the massage fan speed *= 0.8

Why is it needed and why at 0.8? Can we comment it out? I believe the fans won't spin below their minimum designated speed and the BMC or script gets the wrong math somewhere, or it could be a BMC interpretation of the code. Or the code is wrong. Or maybe the correlation between the 3 and 4, respectively, settings for the CPU and HDD. I just don't know.

Example. CPU max fan speed is set to 2000 in script. I set Med value to the percentage below:
Duty set at 15%, I expect 300 RPM, instead I get 1000 RPM. Which is 50%.
Duty set at 20%, I expect 400 RPM, instead I get 1000 RPM. Which is 50%.
Duty set at 25%, I expect 500 RPM, instead I get 1100 RPM. Which is 55%
Duty set at 30%, I expect 600RPM, instead I get 1100 RPM. Which is 55%
Duty set at 35%, I expect 700 RPM, instead I get 1200 RPM. Which is 60%
Duty set at 40%, I expect 800 RPM, instead, I get 1300 RPM. Which is 65%.
Duty set at 45%, I expect 900 RPM, instead, I get 1300 RPM. Which is 65%.
Duty set at 50%, I expect 1000 RPM, instead I get 1400 RPM. Which is 70%.
Duty set at 55%, I expect 1100 RPM, instead I get 1400 RPM. Which is 70%.
Duty set at 60%, I expect 1200 RPM, instead I get 1500 RPM. Which is 75%.
Duty set at 65%, I expect 1300 RPM, instead I get 1500 RPM. Which is 75%.
Duty set at 70%, I expect 1400 RPM, instead I get 1600 RPM. Which is 80%.
and so on. I did not have time to run the full tests


Example. HDD max fan speed is set to 6300 in script. I set Med_high value to the percentage below, and set same value in other low and med value (maybe that's why I'm getting these results but I did not do it for the CPU):
Duty set at 25%, I expect 1575 RPM, instead I get 2700 RPM. Which is 43%.
Duty set at 30%, I expect 1900 RPM, instead I get 3000 RPM. Which is 47.5%.
Duty set at 40%, I expect 2500 RPM, instead I get 3600 RPM. Which is 57%.
Duty set at 50%, I expect 3150 RPM, instead I get 4100 RPM. Which is 65%.
Duty set at 60%, I expect 3800 RPM, instead I get 4700 RPM. Which is 75%.
Duty set at 70%, I expect 4400 RPM, instead I get 5200 RPM. Which is 82.5%. BMC resets all the time due to 5200>5040
Duty set at 80%, I expect 5040 RPM, instead I get 5600 RPM. Which is 89%. BMC resets all the time due to 5600>5040
Duty set at 90%, I expect 5670 RPM, instead I get 6000 RPM. Which is 95%. BMC resets all the time due to 6000>5040

Can anyone better at math and reading code take a look.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Well, well, well, when a firmware update can do wonders.... :) I'm happy to report to anyone with an X10SL7-F that the stock BMC firmware ver 1.42, that mine at least came with, is garbage and going to the latest 3.27 will allow the script to work properly. :) Sorry I was unable to respond earlier.

So, yes. Script works, sort of... I have a question too about the massage fan speed *= 0.8

Why is it needed and why at 0.8? Can we comment it out? I believe the fans won't spin below their minimum designated speed and the BMC or script gets the wrong math somewhere, or it could be a BMC interpretation of the code. Or the code is wrong. Or maybe the correlation between the 3 and 4, respectively, settings for the CPU and HDD. I just don't know.

Example. CPU max fan speed is set to 2000 in script. I set Med value to the percentage below:
Duty set at 15%, I expect 300 RPM, instead I get 1000 RPM. Which is 50%.
Duty set at 20%, I expect 400 RPM, instead I get 1000 RPM. Which is 50%.
Duty set at 25%, I expect 500 RPM, instead I get 1100 RPM. Which is 55%
Duty set at 30%, I expect 600RPM, instead I get 1100 RPM. Which is 55%
Duty set at 35%, I expect 700 RPM, instead I get 1200 RPM. Which is 60%
Duty set at 40%, I expect 800 RPM, instead, I get 1300 RPM. Which is 65%.
Duty set at 45%, I expect 900 RPM, instead, I get 1300 RPM. Which is 65%.
Duty set at 50%, I expect 1000 RPM, instead I get 1400 RPM. Which is 70%.
Duty set at 55%, I expect 1100 RPM, instead I get 1400 RPM. Which is 70%.
Duty set at 60%, I expect 1200 RPM, instead I get 1500 RPM. Which is 75%.
Duty set at 65%, I expect 1300 RPM, instead I get 1500 RPM. Which is 75%.
Duty set at 70%, I expect 1400 RPM, instead I get 1600 RPM. Which is 80%.
and so on. I did not have time to run the full tests


Example. HDD max fan speed is set to 6300 in script. I set Med_high value to the percentage below, and set same value in other low and med value (maybe that's why I'm getting these results but I did not do it for the CPU):
Duty set at 25%, I expect 1575 RPM, instead I get 2700 RPM. Which is 43%.
Duty set at 30%, I expect 1900 RPM, instead I get 3000 RPM. Which is 47.5%.
Duty set at 40%, I expect 2500 RPM, instead I get 3600 RPM. Which is 57%.
Duty set at 50%, I expect 3150 RPM, instead I get 4100 RPM. Which is 65%.
Duty set at 60%, I expect 3800 RPM, instead I get 4700 RPM. Which is 75%.
Duty set at 70%, I expect 4400 RPM, instead I get 5200 RPM. Which is 82.5%. BMC resets all the time due to 5200>5040
Duty set at 80%, I expect 5040 RPM, instead I get 5600 RPM. Which is 89%. BMC resets all the time due to 5600>5040
Duty set at 90%, I expect 5670 RPM, instead I get 6000 RPM. Which is 95%. BMC resets all the time due to 6000>5040

Can anyone better at math and reading code take a look.

What matters is what speed your fans spin at at 100% duty cycle. Duty cycle is not the same as percentage of max speed.

You should set the Max Speed parameter to the specified max speed of your fan. The script then assumes that if the fan is below 80% of max speed when it expects max speed, then its not max speed, and if its above 80% of max speed, when it expects low speed, then its not low speed.

I think I see the problem... which is that you should leave high at 100% and low at 30%.

My assumption when writing the BMC logic is that high is equivalent to max speed... whcih is 100% duty cycle. Never occurred to me that someone wouldn't want to use their full fan power if they needed it ;)


---

Put another way, the max_fan_speed is not a setting for what max fan speed you would like, but its supposed to be the actual max fan speed.
 
Last edited:

podo

Dabbler
Joined
Sep 5, 2016
Messages
10
Hi all,

this looks like a VERY useful tool! Please do you think it will work on Supermicro X11 board ?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Think so
 

podo

Dabbler
Joined
Sep 5, 2016
Messages
10
Hi all,
it seems to be working on the X11!
Now I am playing a little around with the thresholds etc. Please is there a way to completely turn off some fans, when they are not needed ? If I set the fans to 0%, something kicks in and sets them to 100%. In my situation the CPU fan could be completely off most of the time.. Any idea, how to do this ?
Thanks again for the script! :)
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Hi all,
it seems to be working on the X11!
Now I am playing a little around with the thresholds etc. Please is there a way to completely turn off some fans, when they are not needed ? If I set the fans to 0%, something kicks in and sets them to 100%. In my situation the CPU fan could be completely off most of the time.. Any idea, how to do this ?
Thanks again for the script! :)

Nope. The BMC will panic and surge the fans to 100% if they stop rotating.

So you can't really go below 20%. And some fans are not rated to not stall below 30%.
 

podo

Dabbler
Joined
Sep 5, 2016
Messages
10
Nope. The BMC will panic and surge the fans to 100% if they stop rotating.

So you can't really go below 20%. And some fans are not rated to not stall below 30%.

OK, thanks, I thought that it will be the BMC thresholds kicking in. I did the lower thresholds set to 0, but as you say, the fans can not go below 20-30%, they are dropping to 0 then :(
Another question - please is it possible to control more, than just 2 FAN headers with one script ? Or shall I just start the script more times, with adopted definitions of fan zones ?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
The BMC can only control zones zero and one.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The BMC will panic and surge the fans to 100% if they stop rotating.
Well, it won't panic as in "kernel panic", which is a very real possibility since it's running Linux.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Well, it won't panic as in "kernel panic", which is a very real possibility since it's running Linux.

This is true. It's more of a "oh shit, the fans have stopped!"

Edit: doesn't quite have the same impact after the forum filter gets to it.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Does it mean, that the headers FAN1-X will always go together - same fan speed for all connected fans ?
Technically, the PWM output on all is going to be identical. The fans themselves may respond differently - though the same model should behave essentially the same.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Does it mean, that the headers FAN1-X will always go together - same fan speed for all connected fans ?

Yes and the other zone is fan A

And as Eric said, different fans have different responses to the PWM signals.
 
Top