Script: Hybrid CPU & HD Fan Zone Controller

skyline65

Explorer
Joined
Jul 18, 2014
Messages
95
Which motherboard do you have?

Looks like there is a space in the name of your fan headers, so "FAN A" instead of "FANA" in my script.

Also, can you verify this command works

ipmitool raw 0x30 0x70 0x66 0x01 0 50

That should set zone zero to 50%

[root@freenas] ~# ipmitool raw 0x30 0x70 0x66 0x01 0 50

Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0x70 rsp=0xcc): Invalid data field in request
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,367
A pity. Which motherboard do you have?

Unfortunately, if the fan duty raw command doesn't work and there's no alternative.

The best you can do is fluctuate between fan mode optimal/standard and full.

Worth asking SuperMicro if there are any IPMI raw commands to control the fan duty cycle on your system.
 

skyline65

Explorer
Joined
Jul 18, 2014
Messages
95
That would work as I only need full when it gets hot... which isnt that often in the UK! I have a X9SCI-LN4F
 

skyline65

Explorer
Joined
Jul 18, 2014
Messages
95
I did email Supermicro and the reply is below:

Using IPMItool or IPMICFG you can use the below command:

"Ipmicfg -raw 0x30 0x45 0x01 0x0#" In place of the # you can use 0, 1 or 2. This will set the fan mode to 0 – standard, 1 – Full, 2 – Optimal.

To read out the current setting you can use: "Ipmicfg -raw 0x30 0x45 0x00" The output will be 00, 01 or 02, referring to standard, full and optimal respectively.
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,367
I've been meaning to write a replacement get_cpu_temp function which'd work on your board. There is already a set_fan_mode function.

Most of the Logic will be no good since you don't have dual zones, or the ability to set duty cycle, but it should be fairly easily to gut the main script to control your temps.

Ie, Optinal when it's cool. Standard when it's heating and full when it's hot.
 

skyline65

Explorer
Joined
Jul 18, 2014
Messages
95
Thanks I appreciate it. At least it will work on my other X10 board.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,367
I've been meaning to write a replacement get_cpu_temp function which'd work on your board. There is already a set_fan_mode function.

Most of the Logic will be no good since you don't have dual zones, or the ability to set duty cycle, but it should be fairly easily to gut the main script to control your temps.

Ie, Optinal when it's cool. Standard when it's heating and full when it's hot.

I've updated the script. It now reads the temperature from the kernal, rather than via IPMI. This is faster, more reliable, will work with pretty much all systems (I think), and functions while the BMC is rebooting.

This basically means that IPMI is *not* used to read the cpu or hd temp, but only to read the fan speeds, and set fan speeds/modes. Which is a much nicer implementation I think.

Because of this change to the way cpu temp was read, I found the temperature sensing was more responsive, so I was able to increase the cpu/hd override temperature to 62C (this works well in my system). After a bit of further testing, I then changed the other CPU thresholds to 35/45/55C. Due to the responsiveness, its not as important to catch the temperature climbing early.

The net result is that the fan controller is even more quiet, yet seems to respond better. Ie, I can now run on low with 1 thread running at 100%, and just barely tickling medium with 2. It takes over 50% utilisation for the temps to slowly climb to the point where the HD fans are needed.

This is the new get_cpu_temp_sysctl function:
Code:
sub get_cpu_temp_sysctl
{
	# significantly more efficient to filter to dev.cpu than to just grep the whole lot!
	my $core_temps = `sysctl -a dev.cpu | egrep -E \"dev.cpu\.[0-9]+\.temperature\" | awk '{print \$2}' | sed 's/.\$//'`;
	chomp($core_temps);

	dprint(3,"core_temps:\n$core_temps\n");

	my @core_temps_list = split(" ", $core_temps);
	
	dprint_list( 4, "core_temps_list", @core_temps_list );

	my $max_core_temp = 0;
	
	foreach my $core_temp (@core_temps_list)
	{
		if( $core_temp )
		{
			dprint( 2, "core_temp = $core_temp C\n");
			
			$max_core_temp = $core_temp if $core_temp > $max_core_temp;
		}
	}

	dprint(1, "CPU Temp: $max_core_temp\n");

	$last_cpu_temp = $max_core_temp; #possible that this is 0 if there was a fault reading the core temps

	return $max_core_temp;
}


@skyline65 I think there are enough useful functions in the script, that you should be able to mix/match the functions to build a fan controller which works in your X9 system. Basically, the main loop should be fine, but you want to set the fan_mode rather than duty. "optimal", "standard" or "full". Its important to have a verify_fan_speed function, because if the fan mode should be full, and its not, then you need to reset the BMC, and if it should be standard and its full, then you should reset the BMC too... most of the logic is fairly simple to understand, and all of the core functions which actually use IPMI should work and be quite useful in a modified version.
 
Last edited:

skyline65

Explorer
Joined
Jul 18, 2014
Messages
95
Thanks for this... as a total noob to the code side I may struggle a lot o_O, but rather than asking you to do all the hard work I will give it a go!
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,367
Thanks for this... as a total noob to the code side I may struggle a lot o_O, but rather than asking you to do all the hard work I will give it a go!

Just use the dprint function. Add an "exit;" command after each line as you debug it, and just run it until you get the line right ;)

Once you get it working once through... remove the exit line ;)

for example, just add this at the top of the main loop

...

my $test_temp = get_cpu_temp_sysctl();
dprint( 0, "test_test : $test_temp\n" );
exit;

...

And then if you run it you see that the cpu temp works...

next

set_fan_mode("full");
sleep 10;
set_fan_mode("standard");

etc...
 

skyline65

Explorer
Joined
Jul 18, 2014
Messages
95
OK... looks like I have some studying to do!:) But as with most things once you have a cheat sheet and break it all done it is fairly logical.
 

mrd

Dabbler
Joined
Jan 9, 2016
Messages
21
Hi there, as promised I went over the code. Don't have any *bsd system on-hand to run it so if I make an invalid assumption below please correct me.
First off, I didn't see any egregious mistakes and the logic seems fine. Besides a few minor code style issues , what I would have done differently is:

1. move config stuff to a separate cfg file
1a. perhaps add a function/different script that on first-run fills that cfg file based on user input/some system test

2.
Code:
elsif( $fan_speed > 10000 || $fan_speed < 0 )
  {
  dprint( 0, "$fan_name Fan speed: $fan_speed RPM, is nonsensical\n");
  $fan_speed = -1;
  }

Don't assume 10k is nonsensical. Get its upper non-critical from ipmi at the start of the script. You can also set a hardcoded value if you want, but a value closer to 30k (do faster fans even exist?) would make more sense I think.

3. move as many sanity checks as possible to the top (such as illegal fan-zone/duty cycle). You can still have them in functions as needed ofc.

4. Move raw command bytes to a platform specific constant (e.g. x10_raw_bytes) and then check for the platform we're running on at start-up.

5. try a bmc warm reset and if it doesn't work/isn't supported do a cold reset

6. add an option to crosscheck systemctl returned values with ipmi reported ones (a sort of sanity check)

7. add the code to github :)

On the whole these are all nitpicks though. Thanks for sharing this script with us and keep up the good work!
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,367
This is really neat, I'm going to play around with this over the weekend. Do I need to reset any of the impitool sensor min/max thresholds for this to work, or should it override those?

Depending on your fans, you might need to adjust the fan thresholds to prevent the BMC flipping out when fans spin down. Basically, if your fan thresholds are set correctly, you shouldn't generate any event log entries when fans spin down. If you do get entries, then you need to modify the thresholds. Temperature sensors shouldn't need adjusting.

BTW, You can set 0 RPM thresholds to disable the critical readings... for example, my Noctua 120mm fans spin low at 300rpm.

ipmitool sensor thresh "FANA" lower 000 000 200
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,367
Hi there, as promised I went over the code. Don't have any *bsd system on-hand to run it so if I make an invalid assumption below please correct me.
First off, I didn't see any egregious mistakes and the logic seems fine. Besides a few minor code style issues , what I would have done differently is:

1. move config stuff to a separate cfg file
1a. perhaps add a function/different script that on first-run fills that cfg file based on user input/some system test

2.
Code:
elsif( $fan_speed > 10000 || $fan_speed < 0 )
  {
  dprint( 0, "$fan_name Fan speed: $fan_speed RPM, is nonsensical\n");
  $fan_speed = -1;
  }

Don't assume 10k is nonsensical. Get its upper non-critical from ipmi at the start of the script. You can also set a hardcoded value if you want, but a value closer to 30k (do faster fans even exist?) would make more sense I think.

3. move as many sanity checks as possible to the top (such as illegal fan-zone/duty cycle). You can still have them in functions as needed ofc.

4. Move raw command bytes to a platform specific constant (e.g. x10_raw_bytes) and then check for the platform we're running on at start-up.

5. try a bmc warm reset and if it doesn't work/isn't supported do a cold reset

6. add an option to crosscheck systemctl returned values with ipmi reported ones (a sort of sanity check)

7. add the code to github :)

On the whole these are all nitpicks though. Thanks for sharing this script with us and keep up the good work!

Thanks.

Regarding the config, although that might make sense, it would also complicate things for simple users, the benefit of course is that the main script could stay un-edited. You'd still need to be able to configure where the config file is kept ;)

And it would involve me doing a lot of work for no gain... since its working in my production environment now ;)

Where I would like to improve the script is with regards to logging critical temperature events, BMC resets etc... and emailing those events. This doesn't happen currently.

Regarding 10K, the BMC returns a value, something like 27K iirc, sporadically, sometime after a reset. The important thing is to filter out that value. If your fans can infact reach 10K, then yes, that value should be changed. And perhaps a value of 25K would be better. Do fans exist that reach that? The default upper-non-critical is 28K I believe, and that is above this spurious value. IIRC. I'd have to retest the BMC reset logic to retrigger the spurious reading which prompted the code. It doesn't happen every time :)

One of the problems is I don't actually remember the exact value that was a problem... was it 27800 RPM?

It is a good idea to move the sanity checks, and in fact one of my ideas for improvements was to derive and calculate a few of the settings which can be derived... for example, you can derive the fan_zone from the fan_header...

Re the raw bytes... as far as I know, the raw bytes are the same for X10/X11 etc... where they are applicable. What changes between revisions is the modes that are available. The change would be to add support for different platforms (ie not supermicro), and that may be handled better by changing which function is called, rather than which set of raw bytes are sent.

ie, store the function in a variable. Of course... I have no idea how to detect the platform :)

Re the BMC reset code. This script relies on previous testing/experience from others, where the cold reset was used. Warm might work... I wouldn't know... the case where its needed is very hard to trigger... I've only seen it occur... I think twice in all my testings. As such, I can't easily test if the warm reset would solve it, and then I would also need to add another layer to the reset mode to "try harder" with a cold reset. At the moment, the reset code is for when a rare bad situation happens... and the cold reset does fix it.

The latest version of this script is using systemctl values for CPU temps, not the ipmitool values. I don't think there is any value in cross-checking against the ipmi when the values are coming from the kernel. I've never seen the kernel values being invalid, and its read orders of magnitude faster than the original IPMI method.

And re github... yes... maybe one day :)

Thanks again for taking a look
 

yonkoc

Explorer
Joined
Oct 26, 2011
Messages
52
Okay,

First of all, thank you for the enormous amount of work that you've put into this. Folks like you make me smile all the time!

I've got an X10SL7-F in a 826TQ chassis and have got some questions.

Info:
Followed the directions from this article to hook up the 3 chassis fans via this PWM fan splitter cable so they are on the FANA header, CPU is on FAN4. My chassis fans are rated for 7000RPM. I'm pretty sure the CPU fan is nowhere near the chassis fan RPMs. I can't imagine the fans going over 9-10k, the blades would probably detach and fly through the front of the chassis or the chassis would fly out the rails.

Screen Shot 2016-10-22 at 10.19.28 PM.png


Questions:
1. So, can anybody tell me what on earth are the values of the Low NR, Low CT, High CT, High NR? I know what the abbreviations stand for but are those RPMs??? Are those combined for the 3 fans? In other words, 25400 / 3 = 8467 RPM for High CT and 25500 / 3 = 8500 RPM for High NR? But then following this logic isn't Low NR of 133 and Low CT of 200 a bit low?
2. Why is FAN4 (CPU) reporting the same values like the FANA fans?
3. When you save the script initiating the hybrid_fan_controller.pl I assume it is saved as .sh?
4. I have an encrypted pool consisting of 11x 2TB HDDs, the 12th chassis slot is the SSD for the FreeNAS OS. I just migrated from a 32gb flash drive to an SSD as the boot drive and everything works fine. Now I'd like to re-use the 32gb flash drive as the storage location for the fan controller. Because the pool is encrypted I have to unlock it every time the server boots up but I am afraid I might forget to do that. And if the script resides on the encrypted pool it will not run as it will not be visible. So, as i was reading Cyberjock's awesome, amazing and informative ZFS manual it says that you can add VDevs to an existing zpool but if one of the vdevs fails ALL data is trashed. I certainly don't want to add a single 32gb flash drive as a vdev so that when it fails it takes all my data with it. I assume I need to add the flash drive as another volume (zpool), correct? That way if it dies, I just replace it with another drive.

Thanks again for all of your hard work.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
1. They are RPM for one fan. Servers have small fans with very high rotating speed to have good airflow and good static pressure at the same time (at the cost of noise, they are very loud...).

2. Because they are the default settings I guess.

4. Yes, you can create another pool on the flash drive. Be warned that flash drives are very unreliable so don't use it for anything you can't afford to lose.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,367
Okay,

First of all, thank you for the enormous amount of work that you've put into this. Folks like you make me smile all the time!

Thanks :)

I've got an X10SL7-F in a 826TQ chassis and have got some questions.

Info:
Followed the directions from this article to hook up the 3 chassis fans via this PWM fan splitter cable so they are on the FANA header, CPU is on FAN4. My chassis fans are rated for 7000RPM. I'm pretty sure the CPU fan is nowhere near the chassis fan RPMs. I can't imagine the fans going over 9-10k, the blades would probably detach and fly through the front of the chassis or the chassis would fly out the rails.

View attachment 14315

Questions:
1. So, can anybody tell me what on earth are the values of the Low NR, Low CT, High CT, High NR? I know what the abbreviations stand for but are those RPMs??? Are those combined for the 3 fans? In other words, 25400 / 3 = 8467 RPM for High CT and 25500 / 3 = 8500 RPM for High NR? But then following this logic isn't Low NR of 133 and Low CT of 200 a bit low?

They are the defaults and are in RPM. The lower values need to be lower than your fan will reach. So for example, my Noctua 120mm fans have a min speed of 300 rpm. So I have the lower non critical set to 200 and the lower critical and lower non recoverable set to zero.

The important thing is that these thresholds are set below your fans lower limits, because if the fan triggers these thresholds while being modulated by the fan controller script the bmc will take over and surge the fans to 100% to 'recover' a stall. And write a log to the system event log.

2. Why is FAN4 (CPU) reporting the same values like the FANA fans?

It's not, 2700 is not 1100. The other values are the default (possibly wrong) thresholds.

3. When you save the script initiating the hybrid_fan_controller.pl I assume it is saved as .sh?

Actually, I use no extension. The shebang (#!) tells the unix system it's a bash script, thus the extension is not necessary.

4. I have an encrypted pool consisting of 11x 2TB HDDs, the 12th chassis slot is the SSD for the FreeNAS OS. I just migrated from a 32gb flash drive to an SSD as the boot drive and everything works fine. Now I'd like to re-use the 32gb flash drive as the storage location for the fan controller. Because the pool is encrypted I have to unlock it every time the server boots up but I am afraid I might forget to do that. And if the script resides on the encrypted pool it will not run as it will not be visible. So, as i was reading Cyberjock's awesome, amazing and informative ZFS manual it says that you can add VDevs to an existing zpool but if one of the vdevs fails ALL data is trashed. I certainly don't want to add a single 32gb flash drive as a vdev so that when it fails it takes all my data with it. I assume I need to add the flash drive as another volume (zpool), correct? That way if it dies, I just replace it with another drive.

Thanks again for all of your hard work.

Just put the script in root's home folder which is on the boot drive. Ie /root/

The script is designed so that if it fails to launch, your system will be at 100% fan on anyway, which is safe.

(The first thing the script does is set fan mode to Full, and this setting becomes the default across reboots. After that it temporarily overrides the fan zone duty cycles, but it leaves the fans in Full mode, always. Thus before the controller loads the fans will spin up to 100%)
 
Last edited:

yonkoc

Explorer
Joined
Oct 26, 2011
Messages
52
Well,

I've started testing the script. Again, I'm on X10SL7-F with all 3 chassis fans on FANA and the CPU on FAN4. I have updated my IPMI fan thresholds as below:

FAN4 | 2000.000 | RPM | ok | 400.000 | 500.000 | 600.000 | 2200.000 | 2300.000 | 2400.000
FANA | 6300.000 | RPM | ok | 1600.000 | 1700.000 | 1800.000 | 6600.000 | 6700.000 | 6800.000

Script is running from root.
The only changes in the script I made were:

hd_fans_cool_cpu to 0 from 1
cpu_max_fan_speed = 2300 from default (fan is stock for E3-1230v3, at 100% it spins at 2000, occasionally going to 2100, rarely to 2200)
hd_max_fan_speed = 6700 from default (my fans are the San Ace 80, technically they are rated for 6300 but I've seen them go to 6600. So 6700 was chosen.)
cpu_fan_header to FAN4 from FAN1

The moment the script starts running it goes Full mode, I can hear the rack pulling away from the wall ;). Which is expected. But then it does nothing of what it says. It simply stays at full mode.
Can you advise? Would you happen to know your firmware versions? Maybe that would be one of the things I need to update. Mine are:
Firmware Revision : 01.42
BIOS Version : 2.00

2016-10-23 22:11:34: CPU Temp: 37.0 dropped below 45, CPU Fan going med.
2016-10-23 22:11:37: Maximum HD Temperature: 38
2016-10-23 22:11:37: Drives are too hot, going to 100%
2016-10-23 22:13:06: CPU Temp: 35.0 <= 35, CPU Fan going low.
2016-10-23 22:13:18: CPU fan speed should be low, but 2000 > 1840.
2016-10-23 22:13:35: CPU fan speed should be low, but 2000 > 1840.
2016-10-23 22:13:40: Resetting BMC
2016-10-23 22:14:27: CPU Temp: 48.0 >= 45, CPU Fan going med.
2016-10-23 22:14:37: Maximum HD Temperature: 36
2016-10-23 22:14:37: Drives are warming, going to 50%
2016-10-23 22:16:29: CPU Temp: 35.0 <= 35, CPU Fan going low.
2016-10-23 22:16:41: CPU fan speed should be low, but 2000 > 1840.
2016-10-23 22:16:58: CPU fan speed should be low, but 2000 > 1840.
2016-10-23 22:17:03: Resetting BMC
2016-10-23 22:17:48: Maximum HD Temperature: 35
2016-10-23 22:17:48: Drives are cool enough, going to 30%
2016-10-23 22:18:01: CPU Fan speed: 12800 RPM, is nonsensical
2016-10-23 22:18:01: HD Fan speed: 12800 RPM, is nonsensical
2016-10-23 22:18:12: CPU fan speed should be low, but 2000 > 1840.
2016-10-23 22:18:12: HD fan speed should be low, but 6300 > 5360.
2016-10-23 22:18:29: CPU fan speed should be low, but 2000 > 1840.
2016-10-23 22:18:29: HD fan speed should be low, but 6300 > 5360.
2016-10-23 22:18:34: Resetting BMC
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,367
Its behaving as if the duty cycle commands aren't working.

First thing, do you have 4 pin PWM fans? I assume you do since the fan speeds do go up when it sets the fan mode to full.

Have there been any system event logs posted in IPMI?

This is the code that sets the fan zone duty cycle (which is what doesn't seem to be working)

Code:
# zone,dutycycle%
sub set_fan_zone_duty_cycle
{
	my ( $zone, $duty ) = @_;
	
	if( $zone < 0 || $zone > 1 )
	{
		bail_with_fans_full( "Illegal Fan Zone" );
	}

	if( $duty < 0 || $duty > 100 )
	{
		dprint( 0, "illegal duty cycle, assuming 100%\n");
		$duty = 100;
	}
		
	dprint( 1, "Setting Zone $zone duty cycle to $duty%\n");

	`$ipmitool raw 0x30 0x70 0x66 0x01 $zone $duty`;
	
	return;
}


So, without running the fan controller, firstly, set the fan mode to Full

ipmitool raw 0x30 0x45 0x01 1

And verify that your fans spin up

ipmitool sdr | grep FAN

Then try setting fan zone 0 to 50%

ipmitool raw 0x30 0x70 0x66 0x01 0 50

again, try

ipmitool sdr | grep FAN

FAN4 should slow down to about half speed (1000RPM?)

It might take a few seconds (10?) for it to finish slowing down.

What happens when you try?

Also, you can try increasing $debug value to a higher value to provide some more info, but it really does seem like its doing everything except actually setting the duty cycle.
 
Joined
Jan 17, 2014
Messages
4
Did I get this right:

# massage fan speeds with its *=0.8 is so that a slight dip in fan speed does not invoke an error handling?

So if I get 3100 rpm as full speed I enter $cpu_max_fan_speed = 3100 and the massage function lowers this to 2480, right?

I am not supposed to substract "a bit" by myself, correct?

Otherwise I could comment out the "massage" and substract the "a bit" myself. May I suggest renaming " massage fan speeds" to " convert max fan speed to error checking fan speed".
 
Last edited:
Top