Script to control fan speed in response to hard drive temperatures

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
hi,

Some digging produced perhaps an additional piece of the puzzle towards better control.
I was looking to very the raw command line that was valid for the X9/x10 boards, to an X11. So far no definitive success.
Yet I found that when running
Code:
ipmitool -v sensor

The sensor ID is also displayed with its raw reference value (at least it looks so).

Anyone on a X9/x10 board that could check what values are used for X9/x10?

Here is my output:
Code:

Sensor ID              : FAN1 (0x41)
 Entity ID             : 29.1
 Sensor Type (Threshold)  : Fan
 Sensor Reading        : 600 (+/- 0) RPM
 Status                : Lower Non-Critical
 Lower Non-Recoverable : 300.000
 Lower Critical        : 500.000
 Lower Non-Critical    : 700.000
 Upper Non-Critical    : 25300.000
 Upper Critical        : 25400.000
 Upper Non-Recoverable : 25500.000
 Positive Hysteresis   : 100.000
 Negative Hysteresis   : 100.000
 Assertion Events      :
 Assertions Enabled    : lcr- lnr- ucr+ unr+
 Deassertions Enabled  : lcr- lnr- ucr+ unr+

Sensor ID              : FAN2 (0x42)
 Entity ID             : 29.2
 Sensor Type (Threshold)  : Fan
 Sensor Reading        :  Unable to read sensor: Device Not Present


Sensor ID              : FAN3 (0x43)
 Entity ID             : 29.3
 Sensor Type (Threshold)  : Fan
 Sensor Reading        :  Unable to read sensor: Device Not Present


Sensor ID              : FAN4 (0x44)
 Entity ID             : 29.4
 Sensor Type (Threshold)  : Fan
 Sensor Reading        : 700 (+/- 0) RPM
 Status                : Lower Non-Critical
 Lower Non-Recoverable : 300.000
 Lower Critical        : 500.000
 Lower Non-Critical    : 700.000
 Upper Non-Critical    : 25300.000
 Upper Critical        : 25400.000
 Upper Non-Recoverable : 25500.000
 Positive Hysteresis   : 100.000
 Negative Hysteresis   : 100.000
 Assertion Events      :
 Assertions Enabled    : lcr- lnr- ucr+ unr+
 Deassertions Enabled  : lcr- lnr- ucr+ unr+

Sensor ID              : FANA (0x45)
 Entity ID             : 29.5
 Sensor Type (Threshold)  : Fan
 Sensor Reading        : 700 (+/- 0) RPM
 Status                : Lower Non-Critical
 Lower Non-Recoverable : 300.000
 Lower Critical        : 500.000
 Lower Non-Critical    : 700.000
 Upper Non-Critical    : 25300.000
 Upper Critical        : 25400.000
 Upper Non-Recoverable : 25500.000
 Positive Hysteresis   : 100.000
 Negative Hysteresis   : 100.000
 Assertion Events      :
 Assertions Enabled    : lcr- lnr- ucr+ unr+
 Deassertions Enabled  : lcr- lnr- ucr+ unr+

 

hugovsky

Guru
Joined
Dec 12, 2011
Messages
567
x10SL7-f:

Code:
Sensor ID              : FAN1 (0x41)
Entity ID             : 29.1
Sensor Type (Threshold)  : Fan
Sensor Reading        :  Unable to read sensor: Device Not Present


Sensor ID              : FAN2 (0x42)
Entity ID             : 29.2
Sensor Type (Threshold)  : Fan
Sensor Reading        : 500 (+/- 0) RPM
Status                : ok
Lower Non-Recoverable : 100.000
Lower Critical        : 200.000
Lower Non-Critical    : 300.000
Upper Non-Critical    : 25300.000
Upper Critical        : 25400.000
Upper Non-Recoverable : 25500.000
Positive Hysteresis   : 100.000
Negative Hysteresis   : 100.000
Assertion Events      :
Assertions Enabled    : lcr- lnr- ucr+ unr+
Deassertions Enabled  : lcr- lnr- ucr+ unr+

Sensor ID              : FAN3 (0x43)
Entity ID             : 29.3
Sensor Type (Threshold)  : Fan
Sensor Reading        : 400 (+/- 0) RPM
Status                : ok
Lower Non-Recoverable : 100.000
Lower Critical        : 200.000
Lower Non-Critical    : 300.000
Upper Non-Critical    : 25300.000
Upper Critical        : 25400.000
Upper Non-Recoverable : 25500.000
Positive Hysteresis   : 100.000
Negative Hysteresis   : 100.000
Assertion Events      :
Assertions Enabled    : lcr- lnr- ucr+ unr+
Deassertions Enabled  : lcr- lnr- ucr+ unr+

Sensor ID              : FAN4 (0x44)
Entity ID             : 29.4
Sensor Type (Threshold)  : Fan
Sensor Reading        : 1100 (+/- 0) RPM
Status                : ok
Lower Non-Recoverable : 100.000
Lower Critical        : 200.000
Lower Non-Critical    : 300.000
Upper Non-Critical    : 25300.000
Upper Critical        : 25400.000
Upper Non-Recoverable : 25500.000
Positive Hysteresis   : 100.000
Negative Hysteresis   : 100.000
Assertion Events      :
Assertions Enabled    : lcr- lnr- ucr+ unr+
Deassertions Enabled  : lcr- lnr- ucr+ unr+

Sensor ID              : FANA (0x45)
Entity ID             : 29.5
Sensor Type (Threshold)  : Fan
Sensor Reading        : 1400 (+/- 0) RPM
Status                : ok
Lower Non-Recoverable : 300.000
Lower Critical        : 500.000
Lower Non-Critical    : 700.000
Upper Non-Critical    : 25300.000
Upper Critical        : 25400.000
Upper Non-Recoverable : 25500.000
Positive Hysteresis   : 100.000
Negative Hysteresis   : 100.000
Assertion Events      :
Assertions Enabled    : lcr- lnr- ucr+ unr+
Deassertions Enabled  : lcr- lnr- ucr+ unr+


A1SAi-2750-F:

Code:
Sensor ID              : FAN1 (0x41)
Entity ID             : 29.1
Sensor Type (Threshold)  : Fan
Sensor Reading        : 1200 (+/- 0) RPM
Status                : ok
Lower Non-Recoverable : 200.000
Lower Critical        : 300.000
Lower Non-Critical    : 400.000
Upper Non-Critical    : 25300.000
Upper Critical        : 25400.000
Upper Non-Recoverable : 25500.000
Positive Hysteresis   : 100.000
Negative Hysteresis   : 100.000
Assertion Events      :
Assertions Enabled    : lcr- lnr- ucr+ unr+
Deassertions Enabled  : lcr- lnr- ucr+ unr+

Sensor ID              : FAN2 (0x42)
Entity ID             : 29.2
Sensor Type (Threshold)  : Fan
Sensor Reading        : 1100 (+/- 0) RPM
Status                : ok
Lower Non-Recoverable : 200.000
Lower Critical        : 300.000
Lower Non-Critical    : 400.000
Upper Non-Critical    : 25300.000
Upper Critical        : 25400.000
Upper Non-Recoverable : 25500.000
Positive Hysteresis   : 100.000
Negative Hysteresis   : 100.000
Assertion Events      :
Assertions Enabled    : lcr- lnr- ucr+ unr+
Deassertions Enabled  : lcr- lnr- ucr+ unr+

Sensor ID              : FAN3 (0x43)
Entity ID             : 29.3
Sensor Type (Threshold)  : Fan
Sensor Reading        : 1100 (+/- 0) RPM
Status                : ok
Lower Non-Recoverable : 200.000
Lower Critical        : 300.000
Lower Non-Critical    : 400.000
Upper Non-Critical    : 25300.000
Upper Critical        : 25400.000
Upper Non-Recoverable : 25500.000
Positive Hysteresis   : 100.000
Negative Hysteresis   : 100.000
Assertion Events      :
Assertions Enabled    : lcr- lnr- ucr+ unr+
Deassertions Enabled  : lcr- lnr- ucr+ unr+
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
thnx @hugovsky
To my eye, this looks a bit promising even if it is way to early to jump to conclusions.

@Kevin Horton: Is that a typo at line 51 in your script - missing a space in the raw command?
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
Most of the time the fan goes to the expected speed after using ipmitool raw commands to set the duty cycle, as reported by "ipmitool sdr | grep FAN", but I've had two instances of the fan speed getting stuck on 100% duty cycle (i.e. full speed). In both cases I had to use "ipmitool bmc reset warm" to regain control of the fan speed.

If anyone intends to use these commands in some sort of automatic fan control, it may be wise to have the code check that the fan isn't stuck on full speed, and have it reset the BMC if necessary.

If I execute

Code:
ipmitool bmc reset warm
on my X10SL7-F

I get

Code:
MC reset command failed: Invalid
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
@Z300M ...a vague hinch from today's readings suggests this might be due to an older IPMI version. If you're on 2.0 it should be fine.
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
OK, yes. I updated the IPMI firmware to the latest, and at least that command now works. BUT I can't seem to change the way the fans work: the 120mm fans stay in the 1400rpm to 1600rpm range, no matter what fan mode I choose.

Yet the reason I was looking into the whole fan speed issue again is that yesterday a couple of fans were down at 400rpm and then jumping up to 1500rpm and back to 400rpm. I changed the lower thresholds to lower values thinking to stop them from "cycling", but now the fans are running at top speed again.

The ipmitool raw commands that are claimed to alter the fan duty cycle don't seem to do anything either.

The fans are ENERMAX T.B. Silence UCTB12P 120mm PWM Function Case Fans.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
@Z300M did you try powercycling the ipmi after the changes? (fans tend to get stuck in between changes as described in this thread)



Cheers /
 
Last edited:

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
Thanks. Yes, that does the trick: the 120mm fans are now running at 400-500rpm, and the CPU fan at 1100rpm instead of 2100 -- all with fan mode set to optimal.

How do the fan mode settings interact with the raw settings to change the duty cycle? Does changing the fan mode setting override any duty cycle settings?
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Just found this thread and have been playing around with it. Interesting. This command, when substituting the Hex values in the table, resulted in the RPMs in the table. My fans supposedly are rated from about 300 minimum to ~1200, board is SuperMicro A1SRi-2758F.
Code:
sudo ipmitool raw 0x30 0x70 0x66 1 0 0x<Hex>

Percent   Hex   RPM
10          A    300
20         14    400
30         1E    500
40         28    600/700
50         32    800
60         3C    900
70         46    1000/1100
80         50    1100/1200
90         5A    1200/1300
100        64    1300

I also played around with the raw fan mode command. Near as I can tell I only have two settings that way, 0 and 1, although in the IPMI gui I have 3 fan modes. Also found once I set it, I had to reset the BMC to get control back. So I don't like that raw mode control.

Regarding the question about how fan mode and raw speed control interact, it seems they don't. If you set either one, it just ignores whatever came before and sets it as it is supposed to.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
@Glorious1 cool find. Now there are further hopes to why this would not be specific to X9 X10 boards but also be reasonably safe to use for X11.
 
Joined
Dec 2, 2015
Messages
730
@Glorious1 cool find. Now there are further hopes to why this would not be specific to X9 X10 boards but also be reasonably safe to use for X11.
These are IPMI commands, so I suspect the motherboard family would make no difference, as long as the same IPMI version is available. My X10SL7-F has an AST2400 BMC, running IPMI v2, which seems to be commonly used on Supermicro motherboards. I would think the biggest risk if the commands are not compatible with your board is you may have to reset the BMC.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I experimented a bit more with fan mode on my A1SRi-2758F by setting the mode in the IPMI GUI and reading the mode with
Code:
ipmitool raw 0x30 0x45 0  

The raw values of the modes are not intuitive:
00 - Standard
04 - Heavy IO
01 - Full speed
So I can set all three modes with raw commands using ipmitool raw 0x30 0x45 1 <VALUE>, substituting one of the values above. I'm not sure if this corresponds to the values for X9/X10 boards.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
02 doesn't map to anything?
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
02 doesn't map to anything?
No, it comes back with some sort of error indicating an invalid value. I heard some sm boards have 5 fan modes. Mine has three, 0, 1, and 4. Maybe 2 and 3 are for the other two modes.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
That's weird. 0, 1, 2, 4 would be logic because they are single set bit values (0000, 0001, 0010, 0100).
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
So you guys obviously know this control stuff pretty well with your talk of PID and such. I'm wondering, for a script that controls fans in response to drive temperature, is it better to use fan mode or fan speed? My thoughts:
  • Fan mode has the advantage, I think, that within the mode, fan speed will respond autonomously to changes in temperatures on the board. That response is weak, in my experience, but better than nothing. Disadvantages might be that there are only three modes (in my case) and you have the chance of the mode getting stuck, requiring a BMC warm reset and maybe a more complicated script.
  • Fan speed is cool (pun intended) because you have a lot of fine control and you know what you're getting. However, the temps on the board will be completely ignored, as I understand it. Or, I guess it's possible to read those temps also and incorporate them into the script?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Fan speed is cool (pun intended) because you have a lot of fine control and you know what you're getting. However, the temps on the board will be completely ignored, as I understand it. Or, I guess it's possible to read those temps also and incorporate them into the script?
Sure, you can read them using ipmitool, for instance.
 
Joined
Dec 2, 2015
Messages
730
So you guys obviously know this control stuff pretty well with your talk of PID and such. I'm wondering, for a script that controls fans in response to drive temperature, is it better to use fan mode or fan speed? My thoughts:
  • Fan mode has the advantage, I think, that within the mode, fan speed will respond autonomously to changes in temperatures on the board. That response is weak, in my experience, but better than nothing. Disadvantages might be that there are only three modes (in my case) and you have the chance of the mode getting stuck, requiring a BMC warm reset and maybe a more complicated script.
  • Fan speed is cool (pun intended) because you have a lot of fine control and you know what you're getting. However, the temps on the board will be completely ignored, as I understand it. Or, I guess it's possible to read those temps also and incorporate them into the script?
It depends on what problem you are trying to solve. In my case, I'm using a Fractal-Design Node 804 case, which has the hard drives in a separate bay with its own fans. There is very poor correlation between hard drive temperatures and CPU temperatures, or the temp sensor on the motherboard. So, the BMC has no way to control the speed of the case fans in the hard drive bay to control HD temperature. I could have left them on a constant, fairly high speed, all the time. But testing suggested that my workflow would only require high speed on those fans for a few minutes every day. They could be at idle the rest of time, using less power, and making less noise, if I could control them in response to HD temperatures.

You mention the risk of the fan mode getting stuck, requiring the BMC to be reset. I've had that happen three times in three months when commanding fan speed via raw commands too, so I think that is a risk no matter how you control the fans. I'm pondering adding a check in my script that if the HD temps are low and the fan speed is abnormally fast it would reset the BMC.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
You mention the risk of the fan mode getting stuck, requiring the BMC to be reset. I've had that happen three times in three months when commanding fan speed via raw commands too, so I think that is a risk no matter how you control the fans.
That's good to know, thanks. Any idea how the script would decide it is stuck? Does it ever get stuck on other than full speed? What if you stayed at 90% or less?

My drives and board are cooled by the same fans. The problem I have is that the BMC or whatever is relatively insensitive to temperature change. If I set it to Standard mode, everything is peachy unless a scrub or long smart test starts, or it is hot in the room. I could leave it on HeavyIO, but then fans are usually going faster than necessary. In both cases drive temp varies a lot depending on use. Even CPU temp varies from high 40s to near 70.

I've been reading up on this PID stuff. It doesn't seem that complicated to just do PI, which sources say is often enough. Please let me know if I'm making sense here.

I would calculate error as PV-SP (process variable, in this case max temp, and set point). That's backwards from standard, but it's more intuitive for a positive error to require a positive change in fan speed.

The P is easy-peasy. P = current error x Kp. Kp is a tuning variable, lets say an increase in fan speed of 6 percentage points per degree error.

For I, I was thinking of using the error in the most recent 4 readings. Add them up, and multiply by Ki. Maybe 3 percentage points. Apparently Kp should normally be higher than Ki. And of course it would have to be tuned.

Then just add P+I and make that change to fan speed.

One problem I read about is windup, where when you boot error is very high and the I term way overcorrects. So it would be necessary to wait some time after bootup before the script takes over. And then there's the problem of your script failing and you fry your drives.

It might be over my head, and I would have to do it in bash, the only scripting language I know anything about (and that's not saying much).
 
Joined
Dec 2, 2015
Messages
730
For me, whenever the BMC locks up, the fans are stuck at max speed. I have no idea what triggers the condition, so I won't speculate whether not commanding more than 90% speed would do the job. Looking at my logs of HD temps, the script should not have been commanding max speed when the BMC gets stuck. For example, I had my fourth event of the fan speed getting stuck this morning. The HD temperature was 39°, which should have commanded 900 rpm on the fans. But, I see that the temperatures suddenly plunged, and I later found the fans racing at max speed.

I took advantage of this morning's event to add code to reset the BMC if necessary, and test it. The control loop in my script is very simple:

if HD temp is 38° or lower, command 400 rpm,
if HD temp is 39°, command 900 rpm,
if HD temp is 40° or warmer, command max rpm.

I added an additional condition to reset the BMC if the temperature is 37° or cooler, and the fan speed was greater than 600. That worked. I then added a mirror image condition to also reset the BMC if the temperature was greater than 40° and the fan speed was less than 1200 rpm. I have never had the fan speed get stuck at low speed, but I only have four data points, so I cannot be sure this won't happen.
 
Top