Fan Scripts for Supermicro Boards Using PID Logic

Fan Scripts for Supermicro Boards Using PID Logic 2020-08-20, previous one was missing a file

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419

Have you checked you have the latest BMC firmware? I've heard of earlier versions of the firmware not supporting the more advanced fan control instructions properly.

@Glorious1
# Zone 0 - CPU/System fans, headers with number (e.g., FAN1, FAN2, etc.)
# Zone 1 - Peripheral fans, headers with letter (e.g., FANA, FANB, etc.)

BTW, The assertion above is not necessarily true, for example, on the Xeon D X10SDV boards (rev 2) FAN1-3 is Zone 0 and FAN4 is Zone 1. There are only 4 fan headers on these boards.

Not sure if this affects your implementation or not.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Thanks lmannyr. Perhaps the CPU log didn't print because you disabled the line
print_interim_CPU | tee -a $CPU_LOG >/dev/null ?

I spent hours going through your data and the code. It doesn't make much sense. At first, high CPU temperature correctly led to high CPU duty cycle, but it seems RPMs did not respond at all for any fans.

Then, after the first reset, the CPU fan A looked like it was responding to CPU duty, but ZONE_PER (fans 1-4) RPMs went very high, although drive temps were below the setpoint, and the PER duty cycle was only 30. But it doesn't look like CPU duty was driving PER fans, because CPU duty had been high for several cycles with no effect on fans 1-4.

I can't find anything in the code that would cause that. I haven't heard similar problems from anyone else. (???) Did you have another fan control script running at the same time?

I'll go through the code again and see if I missed something.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Have you checked you have the latest BMC firmware? I've heard of earlier versions of the firmware not supporting the more advanced fan control instructions properly.
It seems pretty clear from the log that the BMC reset was triggered by the script intentionally, if a script can have intention.

Re the different fan zone configuration, no, I wasn't aware of that, and my script would not work right with that configuration. I wonder how common boards with that configuration are? Guess I have to think of a new approach for specifying zones/headers.
 
Last edited:

giacombum

Explorer
Joined
Jul 18, 2017
Messages
61
I'm reading all the post about the script: I want to use it on my Supermicro X11SL-F in a Fractal Node 804: the case came with 3 fans with a two pin connector (so I've connected them to the case cables and to a psu sata port, and they're driven by the rear potentiometer switch on the case), I've bought two noctua fans and then I've obviously the CPU FAN; reading the supermicro quick reference guide, I think I've missunderstood the right connector for CPU fan: I read 'Connector Fan1-Fan4, FanA -> System/CPU Fan Headers', so I've connected the CPU fan to the FANA connector. Now I've read that CPU fan has to be on 1,2,3 or 4 connector, am I right? Indeed the CPU fan speeds up and slows down in a very limited time cycle (every 8-10 seconds), so maybe the connection is wrong...

And, can I connect the case fans (not all, just two of them because of the port number) and drive them with the script? And in that case, they are powered by the sata connection or from the motherboard?
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Hi giacombum,

If you're not using a fan control script, you should just follow what it says in the Supermicro guide or manual. I just went and checked the manual for that board. Apparently (someone correct me if I'm wrong), there is only one fan zone with 5 headers (A and 1-4). If so, it shouldn't make a difference what header you connect the CPU fan to.

Perhaps the rapid cycling in RPMs is due to some task that the processor handles on a regular frequency like that?

You can't use 2- 0r 3-pin fans with the scripts, you need 4-pin fans and of course the headers have to have 4 pins.
 

giacombum

Explorer
Joined
Jul 18, 2017
Messages
61
Ok, thanks, so I try to use your script for CPU and the two additional fans that I've mounted.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
OK, @giacombum. Assuming you have only one fan zone, you would use spinpid.sh. It will show in the log only the RPM for a fan connected to FAN1. If you have different kinds of fans, RPMs may vary, but they should all change proportionally.
 

giacombum

Explorer
Joined
Jul 18, 2017
Messages
61
OK, @giacombum. Assuming you have only one fan zone, you would use spinpid.sh. It will show in the log only the RPM for a fan connected to FAN1. If you have different kinds of fans, RPMs may vary, but they should all change proportionally.
Thank you, just to understand: the number of fan zones is setted up by the motherboard, so it doesn't matter if in my case (fractal design node 804) I've cpu and hdds separated by a backplate (in practice the case is splitted in two parts, one for the mobo and obviously ram + cpu, the other for the hdds). Or in that case I consider two fan zones?
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Yes, number of fans zones is determined completely by the motherboard. Even if you have your fans physically separated, you can’t control them separately if you have only one fan zone.
 

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
I also noticed quite some CPU load by the script polling the IPMI CPU temperature every 2 seconds.
Thus I changed the CPU querying to use sysctl instead. Here is my modified CPU_check_adjust function:

Code:
##############################################
# function CPU_check_adjust
# Get CPU temp.  Calculate a new DUTY_CPU.
# Send to function adjust_fans.
##############################################
function CPU_check_adjust {
   DUTY_CPU_LAST=$DUTY_CPU

#  CPU_TEMP=$($IPMITOOL sdr | grep "CPU Temp" | grep -Eo '[0-9]{2,5}')
#  following may be more efficient:
#   CPU_TEMP=$($IPMITOOL sensor get "CPU Temp" | awk '/Sensor Reading/ {print $4}')

   # Get number of CPU cores to check
   CORES=$(sysctl hw.ncpu | cut -c10)
   CORES=$((CORES - 1))

   # Find hottest CPU core
   MAX_CORE_TEMP=0
   for CORE in $(seq 0 $CORES)
   do
	   CORE_TEMP="$(sysctl dev.cpu.${CORE}.temperature | cut -c24-25)"
	   if [[ $CORE_TEMP -gt $MAX_CORE_TEMP ]]; then MAX_CORE_TEMP=$CORE_TEMP; fi
   done
   CPU_TEMP=$MAX_CORE_TEMP

   # This will break if settings have non-integers
   let DUTY_CPU="$(( (CPU_TEMP-CPU_REF)*CPU_SCALE+DUTY_CPU_MIN ))"

   # Don't allow duty cycle outside min-max
   if [[ $DUTY_CPU -gt $DUTY_CPU_MAX ]]; then DUTY_CPU=$DUTY_CPU_MAX; fi
   if [[ $DUTY_CPU -lt $DUTY_CPU_MIN ]]; then DUTY_CPU=$DUTY_CPU_MIN; fi

   adjust_fans $ZONE_CPU $DUTY_CPU $DUTY_CPU_LAST

   sleep $CPU_T
   print_interim_CPU | tee -a $CPU_LOG >/dev/null
}

The code is not super portable with the cut calls and I guess it will have issues with CPUs having more than 10 CPU cores. I should probably have used awk instead, but...
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I also noticed quite some CPU load by the script polling the IPMI CPU temperature every 2 seconds.
Thus I changed the CPU querying to use sysctl instead.

Thanks for reporting. I'm curious how you tied the CPU usage to the IPMI call. Also how much difference the change made in CPU usage.

Also, in terms of implementing it, I wonder if it's that critical to check all the cores and take the max. They are all pretty close in my experience, and it's all relative. Would it be that bad to just take core 1?
 

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
Thanks for reporting. I'm curious how you tied the CPU usage to the IPMI call. Also how much difference the change made in CPU usage.

Well access to the BMC keyboard controller style (kcs) system interface is listed in top.
Code:
  PID	 JID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	 CPU COMMAND
  933	   0 root		  1 -16	-	 0K	16K ipmire  5  69:30   0.29% ipmi0: kcs

Now with just the setting of the duty cycles, it is very low (< 2%).
Before with the additional polling for the CPU every 2 seconds it was between 12 and 15 percent on my E3 1230v3.

Also, in terms of implementing it, I wonder if it's that critical to check all the cores and take the max. They are all pretty close in my experience, and it's all relative. Would it be that bad to just take core 1?
Yeah, in the machines we see around here all the cores typically have close to the same temperature. This could be different for multi socket systems, tho. If you have 2 CPUs and the system is cache and/or NUMA-aware, then it will try to keep thread execution within those boundaries. Then you could see one hot CPU and one cold CPU for workloads small enough to fit on one CPU. Now, if you were to query just the cold CPU for temperatures to asses the cooling requirements...
But to be honest: I just did it, because it was an easy thing to do. If you are concerned about the performance of the loop: don't bother. I briefly thought about unrolling it for my 8 cores, but it's really not worth the efforts.
I did however make a small improvement by moving the querying for the number of cores to the setup section.
Code:
   # Get number of CPU cores to check
   CORES=$(sysctl hw.ncpu | cut -c10)
   CORES=$((CORES - 1))

There is really no need to do this every 2 seconds, over and over again. The number of CPU cores is static and can be fetched just once.
 
Last edited:

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Thanks. You obviously know a lot more about this stuff than I do, so maybe I'll just take your word for it and put that code in if it's OK with you.

But I don't get any ipmi or kcs process showing in top. Even in htop, when I filter for 'ipm' or 'kcs', I get nothing. Did you have any special top options to get that to show?
 

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
I wasn't really aware of that, but you are probably right. I'm running top in my tmux monitoring session as
top -jiSCzt. After a quick glance at the man page, I'd assume the -S parameter could be relevant for displaying the IPMI process:
-S Show system processes in the display. Normally, system processes such as the pager and the swapper are not shown. This option makes them visible.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I wasn't really aware of that, but you are probably right. I'm running top in my tmux monitoring session as
top -jiSCzt. After a quick glance at the man page, I'd assume the -S parameter could be relevant for displaying the IPMI process:
OK, good to know. I set my script to read CPU temps every 2 seconds. In htop (where you do something similar to -S using the settings menu) ipmi0 kcs is normally at 0.9 - 1.4%. Since I have 8 cores, I have 800% to play with, so that doesn't seem like a lot.

Processes like autosnap when it comes on, and continuously the django web interface use more CPU than the ipmi0 kcs, even when nothing else is going on. I've got a wimpy Atom processor, I don't get why the powerful processors are more taxed by the IPMI check than mine is. Maybe it's because I have 8 cores?

I just switched it to the sysctl method, still every 2 seconds. Now there is a process intr, which is completely off the radar when using the IPMI method. It is ranging from 1 to 5%. Apparently intr can be a lot of things, but here it seems pretty clear it represents the sysctl process.

So my tentative conclusion would be that sysctl is worse rather than better. Where am I going wrong?
 
Last edited:

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
I cannot explain the difference. Maybe it has to do with the older IPMI kernel module I use? I'm still on 9.10.2-U6 which is based on FreeBSD 10.3. Maybe the IPMI kernel module has been improved with FreeBSD 11?
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I cannot explain the difference. Maybe it has to do with the older IPMI kernel module I use? I'm still on 9.10.2-U6 which is based on FreeBSD 10.3. Maybe the IPMI kernel module has been improved with FreeBSD 11?
I just updated the post, you beat me to it.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Actually I take it back after watching top and htop longer and switching back and forth between the IPMI and sysctl method. That intr process seems to come and go on its own, independently. I don't know where the sysctl activity would be showing up, but it must not be much. So I'm sold. Thanks.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Glorious1 updated Fan Scripts for Supermicro Boards Using PID Logic with a new update entry:

More/better efficiency update

I had earlier switched to the best solution recommended by @Stux for reading CPU temperature in spincheck.sh. Now, with refined, complete bash code suggested by @bestboy, the apparently more efficient method has been incorporated into all 3 of the scripts that read temperature.

If you want the gory details, instead of reading CPU temp from the IPMI, we are now reading it from sysctl. We use the hottest of up to 10 cores as CPU temperature. I used awk instead of cut to get the actual...

Read the rest of this update entry...
 

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
Great! I'm glad I could contribute.

BTW: I found a way to make the script more portable:
CORES=$(sysctl hw.ncpu | cut -c10)
won't work correctly as soon as the number of CPU cores reaches 2 or more digits. Even though that's fine for you and me - we just have 8 cores - this is going to be a problem for some users. A hexacore CPU with hyper-threading will already break the script.

This can be fixed by replacing the cut call with the -n switch of sysctl. This will make sysctl return just the value of the given key.
CORES=$(sysctl -n hw.ncpu)
should work for any number of CPU cores.
 
Top