Script to control fan speed in response to hard drive temperatures

Joined
Dec 2, 2015
Messages
730
Did you adjust the IPMI fan thresholds too?
I bought the same mix of Noctua fans you have, and used your recommended thresholds (thanks for those). I saw the rpm of one of the fans go right to zero. The Lower Non-recoverable and Lower Critical thresholds were both set to zero. I suspect that having the fan actually stall triggered those thresholds. But, I cannot explain why the duty cycle should have been low enough to cause the fans to stall.

Note: unlike your setup, I've got the three HD fans on separate fan headers: FAN A, FAN B and FAN C. This allows me to see the rpm of the individual fans.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
When I was debugging this issue myself I actually had my fan loop (which runs every second) read back the fan duty cycles and log it. I also had it check agaibst what it thought it should be.

This helped me to spot when it went crazy if not why
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I bought the same mix of Noctua fans you have, and used your recommended thresholds (thanks for those). I saw the rpm of one of the fans go right to zero. The Lower Non-recoverable and Lower Critical thresholds were both set to zero. I suspect that having the fan actually stall triggered those thresholds. But, I cannot explain why the duty cycle should have been low enough to cause the fans to stall.

Note: unlike your setup, I've got the three HD fans on separate fan headers: FAN A, FAN B and FAN C. This allows me to see the rpm of the individual fans.
Curious. I've never had the fans stall or go up to 100%. Did you check in the script log to see if it really decided on a 20% duty, and the drive temps were not below the setpoint?
 
Joined
Dec 2, 2015
Messages
730
When I was debugging this issue myself I actually had my fan loop (which runs every second) read back the fan duty cycles and log it. I also had it check agaibst what it thought it should be.

This helped me to spot when it went crazy if not why
I'll try something like this if it starts acting up again. Thanks.
 
Joined
Dec 2, 2015
Messages
730
Curious. I've never had the fans stall or go up to 100%. Did you check in the script log to see if it really decided on a 20% duty, and the drive temps were not below the setpoint?
Yeah, the logs didn't suggest the script would have commanded a low duty cycle, and the drive temps were always close to the target, i.e. they may have been slightly low at times, but not by nearly enough of an error to call for a huge duty cycle reduction.
 

lmannyr

Contributor
Joined
Oct 11, 2015
Messages
198
PID Fan Control Script with Detailed Logs
EDIT 2: Updated spinpid.sh to include a setting variable (FAN_MIN) to specify minimum fan duty cycle (%). This was suggested by @Kevin Horton and can be adjusted to prevent fan stalling at too low a speed. 2017-01-01

EDIT: I found when the processor was under heavy load, its temperature spiked. Of course the script only responds to drive temperature. So, like @Stux did for X10 boards, I modified the script substantially to respond to both drive and CPU temperature. My board has only one fan zone, so it is not split as in Stux's script. The relationship is a bit complicated, but I have tested for days with a lot of scrubs and processor-intensive tasks, and it has worked very well. When the drives need cooling, PID is still used. When the CPU needs more attention, the fans simply ramp up with temperature, and there are two settings to regulate that. New scripts attached.
****************************************

Since being inspired by the ideas and facts presented in this thread, especially the tricks that Kevin discovered to control fans, I’ve spent way too much time working on this. My motivation is that the PWM fan control logic in my motherboard is lousy. Much of the time, setting the mode to Standard is inadequate and HeavyIO is overkill. Either way, the temperatures fluctuate a lot, so I don’t see switching between them in a script as an ideal solution.

Instead, adjusting fan duty cycle to maintain a steady temperature seems to be the way to go. There are a couple of problems though:
  1. BMC won’t give up control (EDIT: @PigLover found the solution to this: leaving fan mode on Full. That is incorporated into the script now). It’s hard to develop control logic that works when the BMC independently decides to change duty cycle. This may happen when you set a duty cycle near the upper/lower end, >80% or <30%. Limiting your own settings between those figures doesn’t help much, since the logic also doesn’t work right if you hit a control limit. I’ve taken two approaches to dealing with this:
    • When setting the duty cycle ≥ 50%, first set the mode to HeavyIO, and when setting <50%, first set mode to Standard. This has nothing to do with regulating temperature, it is just less likely the BMC will feel the need to make adjustments in those ranges.
    • Control the temperature so well that fans rarely, if ever, will get near the extremes of duty cycle.
  2. Temperature readings are way too coarse for fine control. Using the temperature of the hottest drive (Tmax) is a good strategy in principle, but you only have 1 C resolution. Let’s say your setpoint (SP) is 36 C. You can successfully keep Tmax in the range 35-37, but you will never do better than that, and it will be very hard to keep it from oscillating. The solution I’ve found is using the mean temperature of all drives (Tmean). Even with slight temperature change, chances are that one of the drives will cross the degree threshold and its temperature reading will change. Using Tmean (with at least 1 decimal place) gives you much more and earlier information about temperature trends in your drives.
First, a few basics in case there are any fan newbies. There are two fan settings you can control: mode and duty cycle. You can read those plus RPM.
  1. Mode. There are 3-5 modes, depending on the board. Mine has Standard, HeavyIO, and Full. Full is full speed all the time (unless you change it with an ipmitool raw command). Standard and HeavyIO have some degree of speed control based on board temperatures. HeavyIO is a higher speed at a given temperature. The details don’t seem to be well documented, but that doesn't matter, since it works very poorly anyway.
  1. Duty cycle and RPM. Duty cycle is a percentage of full power applied to the fans. It is correlated with RPM, actual fan speed, but you can’t set RPM directly. And when you read RPM it is rounded to the nearest 100.
I read whatever I could understand on PID control. Turns out, as BiduleOhm said, it doesn’t have to be as complicated as they make it out to be. Here’s how it works. Based on temperature error from your setpoint, you calculate three corrections (P, I, and D) to adjust the duty cycle. Each has a tuning constant: Kp, Ki, and Kd. Note that T (time between control cycles) is sometimes included in the calculation of I and D. It makes it a bit more complicated, but I think then your tuning constants don’t break if you change T. However, I’ve never changed T to see what happens.

First you calculate the current error as ERRc = Tmean – SP. Most sources say to calculate error the other way, SP-Tmean, but it makes more sense to have error be positive when temps are too high and you need to increase fan speed.
  • P is for proportional. This is a correction proportional to the current error. Just multiply by the tuning constant. So the formula for P is simply Kp * ERRc.
  • I is for integral. This is a correction for cumulative error. So every cycle, you add the current error (ERRc) * T to cumulative error (ERR), and multiply by a tuning constant. So ERR = ERRc*T + ERR, then I = Ki * ERR. My understanding is this helps correct offset; where the temperature stays a bit below or above SP.
  • D is for derivative. That is change in current error with respect to time, or basically the slope of the error line. In practice it’s even simpler, you just need to subtract ERRc from the previous error (ERRp), divide by T and multiply by another tuning constant. So D = Kd * ((ERRc – ERRp) / T). This really does two things. When you start a scrub or the sun hits your NAS, you get a large positive error. The bigger the increase in error, the bigger D is. In this case, D and P are additive, aggressively increasing duty cycle. But then when the temps are cooling fast, coming back down to SP, P is still positive and D is negative, so D counters P and puts the brakes on the fans, reducing overshoot and subsequent oscillation.
Most of what I’ve read says, in the great majority of cases, D is not needed, and you can just use PI. So I worked with PI for a long time, trying all kinds of tuning, and always had oscillation. By then I was using Tmean instead of Tmax, and I put the derivative term in and it was MUCH more stable. On the other hand, I have not seen any sign of offset, and I does more harm than good, so I ended up setting Ki to 0.

Here are a few graphs showing some preliminary trials. First using Tmax as the process variable, SP is 36. Kp is too high.
View attachment 11763

Tried adding the I term. Many experiments I won't bore you with. This one using Tmean, still bad oscillation. This shows cumulative error too. No improvement View attachment 11764

And the script and tuning I ended up with, showing Tmax, Tmean, and duty cycle.. As a test, this starts with a large error. See how Tmean comes down to SP without overshooting. Then there are minor corrections through the day, and you can see the fans increase as the sun comes in in the afternoon.
View attachment 11765


Here is what the log looks like. The stuff on the right (starting with ERRc) can be turned off; I just use it for diagnosis and tuning. This log begins with starting the script. It gets equilibrated about 30 minutes in.
Code:
Saturday, May 14
		  da0	 da1	 da2	 da3	 ada0	ada1	ada2	Tmax Tmean  RPM MODE	 Fan% Curr/New
15:01:03  Spin 35 Spin 35 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.71  800 Standard 49/51  ERRc= 0.14; P=  0.58; I= 0.00; D=  1.15
15:06:05  Spin 35 Spin 35 Spin 36 Spin 36 Spin 31 Spin 32 Spin 32 ^36  33.86  800 HeavyIO  51/53  ERRc= 0.29; P=  1.15; I= 0.00; D=  1.14
15:11:06  Spin 35 Spin 35 Spin 36 Spin 36 Spin 32 Spin 33 Spin 32 ^36  34.14  800 HeavyIO  53/58  ERRc= 0.57; P=  2.29; I= 0.00; D=  2.28
15:16:07  Spin 35 Spin 35 Spin 36 Spin 36 Spin 32 Spin 33 Spin 32 ^36  34.14  900 HeavyIO  58/60  ERRc= 0.57; P=  2.29; I= 0.00; D=  0.00
15:21:08  Spin 35 Spin 35 Spin 36 Spin 36 Spin 32 Spin 33 Spin 32 ^36  34.14  900 HeavyIO  60/62  ERRc= 0.57; P=  2.29; I= 0.00; D=  0.00
15:26:09  Spin 35 Spin 35 Spin 36 Spin 35 Spin 32 Spin 33 Spin 32 ^36  34.00  900 HeavyIO  62/63  ERRc= 0.43; P=  1.72; I= 0.00; D= -1.13
15:31:10  Spin 35 Spin 34 Spin 36 Spin 35 Spin 32 Spin 32 Spin 32 ^36  33.71 1000 HeavyIO  63/61  ERRc= 0.14; P=  0.58; I= 0.00; D= -2.28
15:36:12  Spin 35 Spin 34 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.57  900 HeavyIO  61/60  ERRc= 0.00; P=  0.00; I= 0.00; D= -1.14
15:41:13  Spin 35 Spin 34 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.57  900 HeavyIO  60/60  ERRc= 0.00; P=  0.00; I= 0.00; D=  0.00
15:46:14  Spin 35 Spin 34 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.57  900 HeavyIO  60/60  ERRc= 0.00; P=  0.00; I= 0.00; D=  0.00
15:51:15  Spin 35 Spin 34 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.57  900 HeavyIO  60/60  ERRc= 0.00; P=  0.00; I= 0.00; D=  0.00
15:56:16  Spin 35 Spin 34 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.57  900 HeavyIO  60/60  ERRc= 0.00; P=  0.00; I= 0.00; D=  0.00

This is MUCH better than the control built into the boards. Tmean normally stays within 0.3 C of SP unless there is a disturbance, then within 0.5 C. It is damn near perfect. I guess SuperMicro doesn’t do something like this because (a) they’re more interested in protecting the board than the drives, and the board is less sensitive to temperature variation, (b) accessing drive temperatures depends on the OS, and (c) it requires tuning.

Tuning
PID tuning advice on the internet generally does not work well for controlling drive temperatures in my experience.
  1. First run the script spincheck.sh (logs detailed data only, no control) and get familiar with your temperature and fan variations without any intervention.
  2. Now in the settings of spinpid.sh, choose a setpoint that is an actual observed Tmean, given the number of drives you have. It should be the Tmean associated with the Tmax that you want.
  3. Set Ki=0 and leave it there. You probably will never need it.
  4. Start with Kp low. Use a value that results in a rounded correction=1 when error is the lowest value you observe other than 0 (i.e., when ERRc is minimal, Kp ~= 1 / ERRc). However, if you have few drives and thus coarser temperature monitoring, you may need a larger Kp. I would not go below 4.
  5. Set Kd at about Kp*10
  6. Get Tmean within ~0.3 degree of SP before starting script. At this stage you don’t want to test a large error, you want an equilibrated system.
  7. Start script and run for a few hours or so. If Tmean oscillates (best to graph it), you probably need to reduce Kd. If no oscillation but response is too slow, raise Kd.
  8. Stop script and get Tmean at least 1 C off SP. Restart. If there is overshoot and it goes through some cycles, you may need to reduce Kd.
  9. If you have problems, examine P and D in the log and see which is messing you up. Most likely Kd needs tuning. You can try raising Kp, though too high and changes become too aggressive and you get overshoot and oscillation. You can even try using Ki. If you use Ki, make it small, ~ 0.1 or less.
Scripts

There are two bash scripts attached (change extension to .sh after you download) (EDIT: scripts updated 2016-07-02). spincheck.sh logs data only, it does not control anything. spinpid.sh logs and controls. Both scripts log:
  • disk status (spinning or standby)
  • disk temperature (Celsius) if spinning
  • max and mean disk temperature
  • fan rpm and mode
  • current fan duty cycle (plus new one for spinpid.sh)
  • optional diagnostic variables
The scripts include disks on motherboard as well as on an HBA. They get a list of devices from camcontrol devlist. I edit that to delete my SanDisk flash drives from the list; you may have to change that for your system. Suggestions in the script.

The mode code is primarily based on my board. If you have different modes you may need to make some minor tweaks. I have no idea if any of this is applicable to non-Supermicro boards.

As usual, you are responsible for anything you do on your system. This works for me, but for all I know it could make your box catch fire or cause a zombie apocalypse. I suggest you monitor closely at first.

After testing, if you decide to use it, you can run it as a post-init script (Tasks in the GUI) so it starts automatically after booting. In this case, to avoid ‘windup’ (a large error when starting with possibly cold drives), you may want to add a ‘sleep 1200’ before the main loop. Then it will wait 20 minutes for the drives to warm up before doing anything.


I ran your script and get t his error right away with the spincheck script:

9: Syntax error: redirection unexpected (expecting word)

Using a SM X10SL7-F

I have my CPU hooked up to FAN A and the drives to FAN 1 FAN 2 Fan 4. Is this correct?
 
Last edited:

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I ran your script and get t his error right away with the spincheck script:

9: Syntax error: redirection unexpected (expecting word)

Using a SM X10SL7-F

I have my CPU hooked up to FAN A and the drives to FAN 1 FAN 2 Fan 4. Is this correct?
I'm not sure what that error means. Lines 8 and 9 of the script are:
Code:
LOG=/mnt/Ark/Jim/spincheck.log
exec > >(tee -i $LOG) 2>&1

You need to change the value of LOG to reflect a location on your system. If you haven't done that, it's possible that's what's causing the error.

Regarding the fan hookups, I'm hoping @Kevin Horton or @Stux will weigh in. They have boards more similar to yours and are familiar with dealing with fan zones.
 
Joined
Dec 2, 2015
Messages
730
In theory, you should hook up the CPU fan to one of the numbered fan headers (FAN1, FAN2, etc). These are in zone 0. The lettered ones (FANA, FANB, etc) are in zone 1, which Supermicro considers as the peripheral zone. Depending on the number of fans you have, you may need some fan splitters.

But, sometimes theory and practice diverge without problem. No matter what you do, you need to test to confirm that the script is working as expected, and that the cooling is adequate for the worst case ambient temperature and system load.
 

lmannyr

Contributor
Joined
Oct 11, 2015
Messages
198
I'm not sure what that error means. Lines 8 and 9 of the script are:
Code:
LOG=/mnt/Ark/Jim/spincheck.log
exec > >(tee -i $LOG) 2>&1

You need to change the value of LOG to reflect a location on your system. If you haven't done that, it's possible that's what's causing the error.

Regarding the fan hookups, I'm hoping @Kevin Horton or @Stux will weigh in. They have boards more similar to yours and are familiar with dealing with fan zones.

Figured it out. I was running it as "sh filename.sh" I ran it as "bash filename.sh" and it is working now. thanks
 

lmannyr

Contributor
Joined
Oct 11, 2015
Messages
198
In theory, you should hook up the CPU fan to one of the numbered fan headers (FAN1, FAN2, etc). These are in zone 0. The lettered ones (FANA, FANB, etc) are in zone 1, which Supermicro considers as the peripheral zone. Depending on the number of fans you have, you may need some fan splitters.

But, sometimes theory and practice diverge without problem. No matter what you do, you need to test to confirm that the script is working as expected, and that the cooling is adequate for the worst case ambient temperature and system load.


My X10SL7-F has FAN1-4 and FANA. There isn't a FAN B, FAN C, etc..

Sounds butt backwards to put the CPU on FAN 1 (wasting 2-4) and split the 4 CASE FANS (4 wire) from FAN A.

Am I missing something?
 
Joined
Dec 2, 2015
Messages
730
I see two scenarios:

  1. You want to use your own script(s) to control all fans. In this case you can hook the fans up to whichever headers you wish. The scripts will specify which fan zones to control.
  2. You want to use your own script to control hard drive fans, but have the BMC control the CPU fan in response to CPU temperature. In this case you'll need to do some testing to see which fan header(s) (if any) will be controlled by the BMC if you have a script controlling the other fan zone.
 

lmannyr

Contributor
Joined
Oct 11, 2015
Messages
198
Kevin,

How did you do it? In your signature it says your running the same board as I. Or is that outdated now?
 
Joined
Dec 2, 2015
Messages
730
I managed to screw up one of mounting posts when I removed my stock CPU fan when I RMA'd my board. I screwed up when I bought a replacement CPU fan, and bought one that wasn't PWM. My current fan runs at full speed all the time.

So, my situation is probably not directly relevant. You'll need to do some testing, then you can be the expert :)
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Figured it out. I was running it as "sh filename.sh" I ran it as "bash filename.sh" and it is working now. thanks
Just "filename.sh" should be sufficient. There's a flag at the top that it is a bash script.
 

lmannyr

Contributor
Joined
Oct 11, 2015
Messages
198
Couple questions...

I ran the "spincheck.sh" for 24 hours. In that time I closed the terminal. It still logging so it's still running. How do I shutoff spincheck.sh?

the X10 board is set in IPMI at "FULL" Should this be set differently or does it now matter once "spinpid.sh" is started?
 

lmannyr

Contributor
Joined
Oct 11, 2015
Messages
198
This is what I have so far running spinpid. Doesn't seem to be working.... I left the set point alone. I did change the time to every 2 min. I also changed the Kd to 50. Where the modes changes is me messing with IPMI to see if it would help...

IPMI says FAN A (CPU) is running at 2000RPM while FAN 1, 2, and 4 are at 400. Suggestions?


Code:
		 da0  da1  da2  da3  da4  da5  da6  da7  ada0 ada1 Tmax Tmean  ERRc	  P	 I	  D CPU Driver  RPM MODE	Curr/New Adjustments
00:28:36  *36  *35  *32  *34  *34  *32  *30  *33  *36  *33  ^36  33.50 -0.07  -0.28  0.00  -1.40  35 Drives  800 Full	50/48	
00:30:37  *36  *35  *32  *34  *35  *32  *30  *34  *37  *33  ^37  33.80  0.23   0.92  0.00   6.00  32 CPU	 700 Full	 6/25	20 20
00:32:39  *35  *35  *32  *35  *35  *32  *30  *34  *37  *34  ^37  33.90  0.33   1.32  0.00   2.00  30 Drives  400 Full	25/28	
00:34:40  *35  *34  *32  *35  *36  *33  *31  *34  *37  *34  ^37  34.10  0.53   2.12  0.00   4.00  29 CPU	 400 Full	 6/25	20 20
00:36:41  *35  *34  *33  *35  *36  *33  *31  *34  *37  *34  ^37  34.20  0.63   2.52  0.00   2.00  30 Drives  400 Full	25/30	
00:38:43  *34  *33  *33  *36  *36  *34  *31  *35  *37  *34  ^37  34.30  0.73   2.92  0.00   2.00  29 CPU	 400 Full	 6/25	20 20
00:40:44  *34  *33  *34  *36  *37  *34  *32  *35  *37  *34  ^37  34.60  1.03   4.12  0.00   6.00  31 Drives  400 Full	25/35	
00:42:45  *34  *33  *34  *36  *37  *34  *32  *35  *37  *34  ^37  34.60  1.03   4.12  0.00   0.00  29 CPU	 500 Full	 6/25	20 20
00:44:47  *34  *33  *34  *36  *37  *34  *32  *35  *37  *34  ^37  34.60  1.03   4.12  0.00   0.00  29 Drives  400 Full	25/29	
00:46:48  *33  *33  *34  *37  *37  *35  *32  *35  *37  *34  ^37  34.70  1.13   4.52  0.00   2.00  29 CPU	 400 Full	 6/25	20 20
00:48:49  *33  *33  *35  *37  *37  *35  *32  *36  *37  *35  ^37  35.00  1.43   5.72  0.00   6.00  34 Drives  400 Full	25/37	
00:50:51  *33  *33  *35  *37  *37  *35  *32  *36  *37  *35  ^37  35.00  1.43   5.72  0.00   0.00  32 CPU	 700 Optimal  6/25	20 20
00:52:52  *33  *33  *35  *37  *37  *35  *32  *36  *37  *35  ^37  35.00  1.43   5.72  0.00   0.00  32 Drives  400 Optimal 25/31	
00:54:54  *33  *33  *35  *37  *37  *35  *32  *36  *37  *35  ^37  35.00  1.43   5.72  0.00   0.00  35 CPU	 400 Optimal  6/25	20 20
00:56:55  *33  *33  *35  *37  *37  *36  *32  *36  *38  *35  ^38  35.20  1.63   6.52  0.00   4.00  33 Drives  400 Optimal 25/36	
00:58:56  *33  *33  *35  *37  *38  *36  *32  *36  *38  *35  ^38  35.30  1.73   6.92  0.00   2.00  34 CPU	 500 Optimal  6/25	20 20
01:00:58  *33  *33  *35  *37  *38  *36  *32  *36  *38  *35  ^38  35.30  1.73   6.92  0.00   0.00  33 Drives  400 Optimal 25/32	
01:02:59  *33  *32  *35  *37  *38  *36  *32  *36  *38  *35  ^38  35.20  1.63   6.52  0.00  -2.00  34 CPU	 400 Optimal  6/25	20 20
01:05:00  *33  *32  *35  *37  *38  *36  *32  *36  *38  *35  ^38  35.20  1.63   6.52  0.00   0.00  34 Drives  400 Optimal 25/32	
01:07:02  *33  *32  *35  *37  *38  *36  *32  *36  *38  *35  ^38  35.20  1.63   6.52  0.00   0.00  34 CPU	 400 Optimal  6/25	20 20
01:09:03  *33  *32  *35  *37  *38  *36  *32  *36  *38  *35  ^38  35.20  1.63   6.52  0.00   0.00  34 CPU	 400 Optimal  6/25	20 20
01:11:04  *33  *32  *35  *37  *38  *36  *32  *36  *38  *35  ^38  35.20  1.63   6.52  0.00   0.00  34 Drives  400 Optimal 25/32	
01:13:08  *33  *32  *35  *37  *38  *36  *32  *37  *38  *35  ^38  35.30  1.73   6.92  0.00   2.00  35 CPU	 400 Optimal  6/25	20 20
01:15:10  *32  *32  *35  *37  *38  *36  *32  *37  *38  *35  ^38  35.20  1.63   6.52  0.00  -2.00  40 CPU	 400 Optimal  6/25	20 20
01:17:11  *32  *32  *35  *37  *38  *36  *32  *37  *38  *35  ^38  35.20  1.63   6.52  0.00   0.00  32 Drives  400 Optimal 25/32	
01:19:13  *32  *32  *35  *37  *39  *36  *32  *37  *38  *35  ^39  35.30  1.73   6.92  0.00   2.00  34 CPU	 400 Optimal  6/25	20 20
01:21:14  *32  *32  *35  *37  *39  *36  *32  *37  *38  *35  ^39  35.30  1.73   6.92  0.00   0.00  32 Drives  400 Optimal 25/32	
01:23:16  *32  *32  *35  *37  *39  *36  *32  *37  *38  *35  ^39  35.30  1.73   6.92  0.00   0.00  34 CPU	 400 Optimal  6/25	20 20
01:25:17  *32  *32  *35  *37  *39  *36  *32  *37  *38  *35  ^39  35.30  1.73   6.92  0.00   0.00  33 Drives  400 Optimal 25/32	
01:27:18  *32  *32  *35  *37  *39  *37  *32  *37  *38  *35  ^39  35.40  1.83   7.32  0.00   2.00  34 CPU	 400 Optimal  6/25	20 20
01:29:20  *32  *32  *35  *37  *39  *37  *32  *37  *38  *35  ^39  35.40  1.83   7.32  0.00   0.00  32 Drives  400 Optimal 25/32	
01:31:21  *32  *32  *35  *38  *39  *37  *33  *37  *38  *35  ^39  35.60  2.03   8.12  0.00   4.00  34 CPU	 400 Optimal  6/25	20 20
01:33:23  *32  *32  *35  *38  *39  *37  *33  *37  *38  *35  ^39  35.60  2.03   8.12  0.00   0.00  32 Drives 1200 Full	25/33	
01:35:24  *32  *32  *35  *38  *39  *37  *33  *37  *38  *35  ^39  35.60  2.03   8.12  0.00   0.00  32 CPU	 500 Full	 6/25	20 20
01:37:26  *32  *32  *35  *38  *39  *37  *33  *37  *38  *35  ^39  35.60  2.03   8.12  0.00   0.00  30 Drives  400 Full	25/33	
01:39:27  *32  *32  *35  *38  *39  *37  *33  *37  *38  *35  ^39  35.60  2.03   8.12  0.00   0.00  33 CPU	 500 Full	 6/25	20 20
01:41:28  *32  *32  *36  *38  *39  *37  *33  *37  *38  *35  ^39  35.70  2.13   8.52  0.00   2.00  30 Drives  400 Full	25/36	
01:43:30  *32  *32  *36  *38  *39  *37  *33  *37  *39  *36  ^39  35.90  2.33   9.32  0.00   4.00  31 CPU	 500 Full	 6/25	20 20
01:45:31  *32  *32  *36  *38  *39  *37  *33  *37  *39  *36  ^39  35.90  2.33   9.32  0.00   0.00  31 Drives 1200 Full	25/34	
01:47:33  *32  *32  *36  *38  *39  *37  *33  *37  *39  *36  ^39  35.90  2.33   9.32  0.00   0.00  30 CPU	 500 Full	 6/25	20

 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Couple questions...

I ran the "spincheck.sh" for 24 hours. In that time I closed the terminal. It still logging so it's still running. How do I shutoff spincheck.sh?

the X10 board is set in IPMI at "FULL" Should this be set differently or does it now matter once "spinpid.sh" is started?
Shutting down a script like spincheck.sh. In my experience, if you just start a script in a normal SSH session (not a startup task or tmux session), it will quit when the SSH session ends. How do you know it was still running?

You definitely don't want more than one instance of spinpid.sh running at the same time, as they will be playing tug of war.

If you are still in the SSH session with the script (you see the script output updated live on the console), all you need to stop it is Control-C (maybe Alt-C or something on Windows). If you are not in that session, and it is still running (you can see continual activity in the log), you need to find the process numbers and kill it.
Code:
ps -aux | grep spinpid  # or spincheck
#  See the process IDs related to the script, then kill them with
sudo kill -9 <pid1> <pid2> . . . 

When running spinpid.sh, the fan mode should stay on FULL.

No, looks like the script isn't working quite right on your system. I only have fan zone 0, you have two zones I think. You need to look at and begin understanding all the ipmitool raw commands in the script. Stop the script, and try the commands live in an SSH session and observe the effect.

Here's roughly how those commands work:
  1. 0x30 tells it you are issuing a command
  2. The following one or two numbers say what you are commanding: 0x45 for fan mode, 0x70 0x66 for fan duty cycle
  3. Then 0 to read the value, or 1 to set a value
  4. If you are setting, there is another number at the end for the value you are setting.
  5. If you are reading or setting duty cycle, there is yet another number for zone (0 or 1 if you have two zones), right after the get/set number
# read fan mode (0 means read)
ipmitool raw 0x30 0x45 0

# set fan mode to FULL (first 1 means set, second is value for FULL)
ipmitool raw 0x30 0x45 1 1

# read duty cycle (first 0 means read, second is zone 0)
ipmitool raw 0x30 0x70 0x66 0 0

# set duty cycle (the 1 means set, 0 is zone 0, then duty cycle 30%)
ipmitool raw 0x30 0x70 0x66 1 0 30

If you have multiple zones you can monitor speed of
multiple fans. This is set to read speed of FAN1,
but you can easily modify it to read speed of FANA.
RPM=$(ipmitool sdr | grep "FAN1" | grep -Eo '[0-9]{2,5}')

Here's more info:
https://forums.servethehome.com/index.php?resources/supermicro-x9-x10-x11-fan-speed-control.20/
(ironically, the author of that post cites this thread as the source of a lot of his information! But he summarizes it nicely.)
 
Last edited:

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
With this knowledge, and the script as a model, it wouldn't be very hard to modify it to regulate 2 fan zones independently (spinpid2.sh). In fact, it would simplify it in some respects, because currently there is some logic each time through the loop to determine whether to regulate fan speed based on CPU or drive temps. That would no longer be needed. Just set the duty cycles independently.

I would do it but don't have a system to test it on.

As Kevin noted, you could also reverse the fan zones, using zone 0 (FAN1, FAN2, etc.) for drive cooling and zone 1 (FANA) for CPU.
 

lmannyr

Contributor
Joined
Oct 11, 2015
Messages
198
Glorious1,

I took 1 class in some basic programming in my first year of college 20 years ago. I enjoyed it but did do much more with programming aside from building computers and installing software. I like to tinker. Programming tinkering is a whole other type of tinkering. I'm willing to learn. I can get around in terminal with the help of google. But since I don't do it everyday, it does take me some time to figure things out. These scripts do seem simple if one understands the lingo and syntax.

I appreciate your time you put into the long reply. That is a great start. Surprised no one here hasn't gotten this script to work on an X10SL7-F board. Seems to be a very common board (or not). Thanks for the tips. I'll look over your notes above and try to alter the original script.

If you have the patience, I'll be more than happy to test for you. ;-)

Are you suggesting to have two separate scripts? Or just have both zones controlled in one script? Does it matter?
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
No, both zones would be controlled in one script. Maybe I'll tinker with it in the next few days and send you something to try. The existing script has all the operations so all the complicated stuff is figured out already, just a matter of copying pasting, and some minor edits.

Yes, more people have boards with dual zones than ones like mine. I think @Stux may have a script that deals with dual zone control, but I don't think it uses PID. The PID logic works very well for the drive zone.
 
Top