Frequent Kernel Panics - HBA problem?

Status
Not open for further replies.

Robertr

Dabbler
Joined
Sep 22, 2017
Messages
31
The machine can stay online for a day unless I tax the storage, use AFP for timemachine, etc.
It seems to me that it has something to do with the storage not working but I need help trouble shooting before spending money on a new HBA.
I did manage to take a couple of pics of the screen and it seems the problem is similar with every crash. (As far as I've seen.)
Booting from 2 * 30 GB Kingston USB sticks in a mirror.
I have 6 * 4 TB Seagate drives in a RaidZ2. They are 4 newer IronWolf and 2 older Seagate drives that I haven't dared to change yet as I'm sure it will crash while resilvering.
I get just over 150 MB/s from each drive when scrubbing but it has never managed to get through a full scrub without crashing and starting over. (Data is about 3.5 TB.)
Can it be because the PERC H310 gets too warm?
It is a bit tight between the cards but I have a fan blowing over them. I also tried keeping the cabinet open but it made no difference.
I started a scrub when I started writing this post but it crashed twice and then I stopped the scrub.

What kinds of logs/info can I provide if someone is willing to help out a bit?

System: Stable 11.1-U4
Motherboard: Gigabyte GA-X58A-UD7 rev.1 , built-in Realtek LAN disabled in BIOS
CPU: Xeon X5680 clocked down to 3 GHz
Memory: 48 GB, 6 * 8 GB unreg.unbuff. ECC (It's supposed to work and machine is running fine but not reporting ECC so unsure if it's actually functional.)
HBA: DELL PERC H310 flashed to IT mode following this guide https://bit.ly/2BSaNTT
NICs: Intel 2 * 1 Gb card, Intel X520-DA1 10Gb (direct to workstation for video editing etc.)

e81087ad9a2399aa211f9a1e10f85a79.jpg

a7913dcd1e456182f97ba5e8a65aca9e.jpg
 
Last edited:

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
The screens came out super small and unreadable. It could be the card is overheating. Try some better thermal compound and see where that gets you.
 
Last edited by a moderator:

Robertr

Dabbler
Joined
Sep 22, 2017
Messages
31
The screens came out super small and unreadable. I could be the card is overheating. Try some better thermal compound and see where that gets you.
If you right click and open in a new tab you will be able to read the screens.
What image hosting service should I use for good and easy functioning?

I will re-seat the heatsink and try to improve air flow.


Sent from my iPhone using Tapatalk
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
System: Stable 11.1-U4
Motherboard: Gigabyte GA-58A-UD7 rev.1 , built-in Realtek LAN disabled in BIOS
CPU: Xeon X5680 clocked down to 3 GHz
Memory: 48 GB, 6 * 8 GB unreg.unbuff. ECC (It's supposed to work and machine is running fine but not reporting ECC so unsure if it's actually functional.)

I am assuming you mean you have a "GA-X58A-UD7" motherboard? If so, from the quick specs I saw; it does not appear to support ECC Ram: https://www.gigabyte.com/Motherboard/GA-X58A-UD7-rev-10#sp

  1. 6 x 1.5V DDR3 DIMM sockets supporting up to 24 GB of system memory(Note 1)
  2. Dual/3 channel memory architecture
  3. Support for DDR3 2200/1333/1066/800 MHz memory modules
  4. Support for non-ECC memory modules
  5. Support for Extreme Memory Profile (XMP) memory modules

I did not see anything listed for a "GA-58A-UD7"

So I would possibly mention a couple potential issues:
  • Is this thing even really able to run a Xeon X5680 CPU?
  • How did you get 48 GB of RAM on a system that only supports 24GB?
  • AFAIK there is no ECC support on this MB

Apologies in advance if I am way off on my assumptions.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457

Robertr

Dabbler
Joined
Sep 22, 2017
Messages
31
I am assuming you mean you have a "GA-X58A-UD7" motherboard? If so, from the quick specs I saw; it does not appear to support ECC Ram: https://www.gigabyte.com/Motherboard/GA-X58A-UD7-rev-10#sp



I did not see anything listed for a "GA-58A-UD7"

So I would possibly mention a couple potential issues:
  • Is this thing even really able to run a Xeon X5680 CPU?
  • How did you get 48 GB of RAM on a system that only supports 24GB?
  • AFAIK there is no ECC support on this MB

Apologies in advance if I am way off on my assumptions.
Yes, X58A..
The board doesn’t have -official- support for >24GB, ECC or Xeons but I have been running it with this CPU for years without any problems.
Others have reported that ECC is working for them and it’s running fine and stable with this RAM now.


Sent from my iPhone using Tapatalk
 

Robertr

Dabbler
Joined
Sep 22, 2017
Messages
31
With new quality thermal paste [emoji6]

Maybe i'm missing something but when I do that its still just a tiny thumb.
Weird. I’ll repost them from my phone using Tapatalk.
New pics are now in original post.

And yes, I have good thermal paste. [emoji846]


Sent from my iPhone using Tapatalk
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Code:
Dump aborted due to IO failure
Yeah the IO controller driver took a crap have fun using it to write to disk haha!

Are you using any type of backplane or expander by chance? Also it never hurts to run an overnight memtest.
 

Robertr

Dabbler
Joined
Sep 22, 2017
Messages
31
Code:
Dump aborted due to IO failure
Yeah the IO controller driver took a crap have fun using it to write to disk haha!

Are you using any type of backplane or expander by chance? Also it never hurts to run an overnight memtest.
Could it be a problem between the CPU and card? (The board is old.)
Or is it most likely the card?

No backplane used. Drives connected straight to the card. Maybe I could try changing the cables if nothing else works..
I will do a thorough memtest run.


Sent from my iPhone using Tapatalk
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Yes, X58A..
The board doesn’t have -official- support for >24GB, ECC or Xeons but I have been running it with this CPU for years without any problems.
Others have reported that ECC is working for them and it’s running fine and stable with this RAM now.


Sent from my iPhone using Tapatalk

First off, thanks for your honesty. So I will do similar in return...

Congrats to you for even getting the system running with what seems like components that are well outside of manufacturer specifications. However, with that being said; I personally would run (not walk) away from this scenario if you value any data that is stored on this system. There is just too many possible issues that can arise due to the very nature of this setup.

You already have the Xeon CPU and ECC Ram, so the reasonable choice would be to acquire a motherboard that actually is rated for them and take advantage of what you already own.

Please give some thought to what you are doing and understand that if you continue on this route; not many contributors are going to give you the time of day (and some may just outright give you a "tongue lashing").

I will hop off my "soap box" now and will simply wish you the best of luck.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Or is it most likely the card?
Yeah my money is on the card. A wouldn't expect a bad cable to crash the driver like that as the controller should just log a ton of errors and eventually just drop the drive.
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
Agreed, it's likely the card. I had one go bad on me a while back. Errors on all connected drives.

Re: the cpu/board combo. I follow the same philosophy as the OP, if it works, it works. (For a home lab, not a commercial situation in my case.) HOWEVER, every time you update the SW there could be an issue. As long as he knows the risks (and one can size that risk with some research) then ok. There is a reason the old adage, "if it's ain't broke, don't fix it," applies so often. I've been bitten a few times myself on that one. :)
 

Robertr

Dabbler
Joined
Sep 22, 2017
Messages
31
I totally understand the argument to use “proper” hardware and I’ve been contemplating getting a “new” server motherboard. This machine was repurposed when I built a new Hackintosh based on an HP Z800 board and a couple of Xeon X5690.
The Ga-X58A-UD7 is actually a great board but the SATA controllers seemed to becoming a bit wonky and I wanted more CPU power.

Right now I’m running memtest on the FreeNAS machine and as you can see, the ECC seems to be working as intended.
I will let it run until tomorrow.
Then I will try to improve the cooling of the PERC H310 card and if that doesn’t work I will complain to the ones I bought it from and see if they can help me out with a new card or something. (Bought on eBay from USA.)

c1a98d548f518f8a3ef52b649d97208c.jpg

a9530db5c4d93175b4bc727e2b8c545b.jpg



Sent from my iPhone using Tapatalk
 

Robertr

Dabbler
Joined
Sep 22, 2017
Messages
31
Memtest has been running for a total of about 25 hours now without any errors.
I won’t have time to fiddle with the card until the day after tomorrow.
a2997493dc7133ad4296abc87a0d9929.jpg



Sent from my iPhone using Tapatalk
 

Robertr

Dabbler
Joined
Sep 22, 2017
Messages
31
Everything seems to be working fine now.
I just completed a scrub. (This was impossible before.)
I reseated the heat sink with a low viscosity, high performing thermal paste. (The same I use on the CPU’s.)
I even mounted a small fan I had lying around, right on the heat sink.
There is also the fan blowing over the cards but that I had before, too.
In the process I moved the card to another slot, I will see how it turns out if I have to move it again to fit everything the way I want.

Anyways, all good for now!
d4898e2c695f85d4b0f87fa39bebc737.jpg
bdb38703950e4307cf8ee837f036a256.jpg
ab38a943adce869a9e23413967ae7dfa.jpg



Sent from my iPhone using Tapatalk
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Yeah, that card was built for a server chassis where there is lots of directed high pressure air flow. Not well suited for a tower as you are seeing.
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
Doesn't help that the card is sandwiched in between what look like other high power (temp) cards.

Great move on tying on the small fan! I too have done that in one system and it is rock solid. :)
 

Robertr

Dabbler
Joined
Sep 22, 2017
Messages
31
It seems to be working fine with some heavy load like scrubbing or shovelling files but...
Having AFP sharing on and making Time Machine backups from the macs makes it crash.
I will search in the appropriate sections of the forum but if anyone has some pointers they are more than welcome.

I also put in another 140mm intake fan blowing fresh air straight onto the cards and I opened an expansion card slot right so that air can flow past the card and get out.
I'm hoping this will improve air circulation and cooling even more.

As soon as I have the time I will connect a monitor again and see if I can catch the output when it crashes with AFP time machine.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Thanks for reporting back. I'm glad to hear it been so much more stable.
 

miguellee

Dabbler
Joined
Apr 7, 2018
Messages
13
Everything seems to be working fine now.
I just completed a scrub. (This was impossible before.)
I reseated the heat sink with a low viscosity, high performing thermal paste. (The same I use on the CPU’s.)
I even mounted a small fan I had lying around, right on the heat sink.
There is also the fan blowing over the cards but that I had before, too.
In the process I moved the card to another slot, I will see how it turns out if I have to move it again to fit everything the way I want.

Anyways, all good for now!
d4898e2c695f85d4b0f87fa39bebc737.jpg
bdb38703950e4307cf8ee837f036a256.jpg
ab38a943adce869a9e23413967ae7dfa.jpg



Sent from my iPhone using Tapatalk
May I know which heat sink you are using? I can't find a suitable dimensions heat sink. I am using a 90mm fan blowing directly to my LSI 9211 8i only. Thanks

Sent from my HTC U11 using Tapatalk
 
Status
Not open for further replies.
Top