SOLVED Is my server fried? Need troubleshooting help

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Yesterday I found my server off. Maybe there was a power blip, I don't know. It's on a surge protector, but no UPS. And now:

I can connect to IPMI via web browser.​
I click Power On button.​
I hear the server start and lights come on on the board.​
But IPMI continues to indicate "Host is currently off" and this alert pops up: "Performing power action failed. Please check The feature connector cables."​
I can't access via SSH or the FreeNAS WebGUI. I suspect it's not even booting as I can't hear any disk access.​
The only way to shut it down is to press and hold the chassis power button.​
I tried turning off power supply and unplugging the server for a few minutes. Also restarted the ethernet switch. Made no difference. I don't know where to start with troubleshooting this.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
It is the system in my signature. I have no idea what AVR is or "the AVR bug", and googling left me not much wiser. I found a list of about 100 bugs.
If there is a bug in some component of the motherboard, why would it take 5 years after building to show up? I have not updated the motherboard firmware or anything.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
All C2000 SoCs are on borrowed time. Try to have Supermicro replace it for free.

why would it take 5 years after building to show up?
Has it been on continuously? If so, you got really lucky, they drop like flies around the 2-3 year mark.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Yes, almost continuously. I just had to replace a RAM module and another drive not long ago too, so it's starting to cost money to maintain. I sent SuperMicro an email and hope for the best.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Well, I've got some good news and some news that is not necessarily good. The good news is, Supermicro finally got back to me and invited me to submit a C2000 RMA request.

The other news is, when we went to take out the motherboard, we found the power connector retaining clip was not engaged, and over the 5 years since installing it, it gradually pulled out a little on one side. We reset the connector and it started up just fine.

Yes it's good that it works now with no trouble or expense. Other than being a bit embarrassing, the not-so-good part of this is that I still have a board that is "on borrowed time" and it will live to die another day. Maybe when the C2000 RMA program is over.

EDIT: Wait, they suggested I RMA it anyway. Wow!
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It's gonna die and Intel is footing much of the bill. Might as well get it over with, as far as they're concerned.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Well I'm really flummoxed now. Supermicro sent the board back. They just updated BIOS and IPMI firmware and ran a bunch of tests, no mention of addressing C2000 problem or hardware fix. I wrote back and they said the necessary C2000 fixes have been applied. I'm still trying to figure what they meant by that.

But worse is that I can't get to get past the Supermicro "initializing . . ." screen. Since they replaced the IPMI, it took me a while to even find it on the network, get password sorted etc. And the java KVM doesn't work anymore (it always sucked anyway), but they now have a iKVM over HTML5 that is much easier. So all that's great, except it doesn't get close to booting. All the connections are solid, and they obviously tested the hell out of it before returning it.

I guess I'll start unplugging nonessential things like pool drives and PCIe card and see if it makes a difference? I really have no clue.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
After googling more and trying again, I found the post code in the lower right is important. Mine says B4. Most of what I can find says that is a BIOS firmware failure. They just updated BIOS and tested and everything was fine. I'm trying to get response from Supermicro.

Regarding the C2000 issue, after several queries they tell me "The CPU had potential failure. We have applied all the HW and firmware fixes on this board according to Intel instruction during the RMA process." Although the RMA repair report mentions nothing about hardware. <sigh> Why does everything about computers have to be so cryptic, almost intentionally so?
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
OK, I panicked needlessly. Supermicro said B4 code usually is a memory issue. I just had to reseat one of the RAM modules, although it seemed perfectly seated before. That and reset a whole bunch of stuff that got wiped with the firmware updates.
 
Top