having issues with my hardware - looking for suggestions

Joined
Mar 5, 2022
Messages
224
I just noticed that one of my drives had an error. I've replaced it and am resilvering now...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,995
da14 is way too hot! And has exceeded 60C. She's burning up!

All of your problems could be from poor cooling. Take a hard look into that. You have already had comments about cooling.
 
Joined
Mar 5, 2022
Messages
224
What sort of case is this in?
HBA's do suffer from heat issues as they are designed for enterprise type installations with lots of airflow. You might take a look at possible over heating on the LSI card and add an additional fan to keep it cool
I have three drives (one on each 4-bay), a case fan, and of course the power supply and CPU. I've ordered a pair of fans that will fit in the slot below the LSI card (its designed for a graphics card, but the LSI card has holes in the outside metal so it should work.) FWIW, the LSI card is the top-most card in the motherboard. Until I get the fan assembly, I plan to leave the case side off.

PXL_20231216_204823208.jpg
PXL_20231216_204816395.jpg
 
Joined
Mar 5, 2022
Messages
224
da14 is way too hot! And has exceeded 60C. She's burning up!

All of your problems could be from poor cooling. Take a hard look into that. You have already had comments about cooling.
Where did you see the drive temperature?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,995
|da14 ! |WWY0ACWY |54* | 3996| 50| 0| 0| 0| 0| 0| 395656636| N/A| 8590065666| 1|
In the output of smart_report. This was the first one
190 Airflow_Temperature_Cel 0x0022 046 025 000 Old_age Always - 54 (Min/Max 33/64)
And here as well, 64C max temp while the power was applied. This value will go away when you power off the drive. Your drive might also report lifetime high temp, you need to issue the command
Code:
smartctl -x /dev/da14
and then search to "temp" and see if ti does. You can post the entire output here as well if you can't find it and we will tell you if it's there.

If you are going to run a script that monitors your hard drives, you should understand how to read the results or just not run the script at all. Not trying to be mean, just pointing out you should be able to read and understand the results. @dak180 is doing a fine job updating and maintaining this script and it works very well. I don't know of any complaints. If you need help reading it, just ask, Many people will offer assistance.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
I've moved it to another PCIE slot that is higher in the case and is more open to air flow. I've also removed the case cover for now.
That is usually a very bad idea. Removing the case cover reduces the air flow and by that cooling. If you put a strong household fan next to the case that blows into it this may be different. Check the temperatures to see whether your approach makes things better or worse.

With the temperatures that @joeschmuck has spotted, my strong recommendation would be to switch off the NAS immediately. Otherwise you are running a relatively high risk of loosing all your data.
 
Joined
Mar 5, 2022
Messages
224
I replaced the suspect HDD and re-silvered. I ran:
Code:
get_hdd_temp.sh
from FreeNAS-scripts-master and the temperatures are much more reasonable:
Code:
sudo ./get_hdd_temp.sh
=== CPU (8) ===
CPU  0:   44C
CPU  1:   44C
CPU  2:   44C
CPU  3:   44C
CPU  4:   44C
CPU  5:   44C
CPU  6:   44C
CPU  7:   44C

=== DRIVES ===
   da0:   33C [120GB]  XXXXXX2589 Phison Driven SSDs (PNY CS900 120GB SSD)
   da1:   33C [120GB]  XXXXXX2587 Phison Driven SSDs (PNY CS900 120GB SSD)
   da2:   30C [2.00TB] XXXXXX32NL9
   da3:   32C [2.00TB] XXXXXX0YV
  da14:   32C [2.00TB] XXXXXX7Y17
   da4:   32C [2.00TB] XXXXXXDP0Z
   da5:   31C [2.00TB] XXXXXXNYJC
   da6:   31C [2.00TB] XXXXXX61E7
   da7:   30C [2.00TB] XXXXXXJUKF
   da8:   31C [2.00TB] XXXXXXL9CVT
   da9:   26C [2.04TB] XXXXXXS30B
  da10:   26C [2.04TB] XXXXXXS30B
  da11:   30C [2.04TB] XXXXXX30B
  da12:   24C [2.04TB] XXXXXXS30B
  da13:   64C [10.0TB] XXXXXXACWY

The pools have been solid for over 24 hours and I have not seen any erros on the console either. Is it possible that the overheating HDD caused the controller card to appear to overheat?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,995
da13 is way too hot. 64C, why is it so much higher that the rest of the drives? This is your first obvious problem to fix. It certainly seems like there is very little airflow across this drive.

I just noticed that your airflow is definitely wrong. The rear case fan is pulling air into the case, correct?

For most common cases air gets pulling into the front of the case, flows across the hard drives, into the rest of the computer case, and then out the exhaust and power supply fans. If you have 4 fans sucking in air, you should have 4 fans blowing it out, however that is not always needed depending on the case.

The HBA likely started overheating due to poor airflow. Some HBAs require a huge volume of air and were designed to be in high flow cases.
 
Joined
Mar 5, 2022
Messages
224
da13 is way too hot. 64C, why is it so much higher that the rest of the drives?
LOL I forgot to mention that da13 is an external USB drive. I have to believe that the temperature reading is wrong.

The rear case fan is pulling air into the case, correct?
I believe that the fans all push the air from the back to front. The fans in the hot swap cases are pushing air out and the case fan is pulling air in. I'll double check when I install the slot fan for the controller card.

I like your suggestion for the airflow to go front to back (quite the opposite of my configuration.) When I install the new fan, I'll turn the fans around.

Now that the drive has been replaced, I'm not getting any warnings of any kind.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,995
The fans in the front of your case appear to be pulling air in. "Typically" the fan moves air in the front (open blades side) and out the rear (where the fan motor is mounted) direction. I will not say that you cannot have a reverse running fan but you should ensure the flow by feeling for it.
 
Joined
Mar 5, 2022
Messages
224
This has boiled down to a 10+ year old motherboard beginning to fail I'm starting a new thread asking for suggestions.
 
Top