SOLVED Dead ASRock C2750D4I and poor customer service?

nickt

Contributor
Joined
Feb 27, 2015
Messages
131
*** EDIT 24 October 2016 ***
My problem is resolved now: the board was toast, suffering from - what I now understand to be - a well known problem where patting the watchdog causes the BMC flash memory to wear out, preventing the board from powering up. If you've got this board, and it's still operational, take a look at my "how to" guide on disabling the watchdog to avoid your board dying too. This is obviously a firmware bug on the board; at this point, there is no acknowledgment by AS Rock of the issue. I'll update my post if and when a firmware fix is issued.
*** EDIT 25 October 2016 ***
William from AS Rock tells me that the BMC team in their global HQ are aware of this issue and are working on a firmware fix. No ETA for the new firmware at this point
*** EDIT 19 February 2017 ***
And AS Rock has finally provided a firmware fix. Should be able to use the watchdog as you please with 00.30.00 BMC firmware (and above). Thanks to Dale for calling it out and verifying.
*** /EDIT ***

Hi everyone,

I'm fairly sure that the ASRock C2750D4I motherboard in my FreeNAS box has died without reason - after 15 months flawless service - but it's hard to be sure as diagnostics / IPMI isn't giving much away. I'm hoping someone might be able to help me decide for sure.

I'm also very disappointed in ASRock Rack's seemingly non-existent customer service. I've tried a few times now to contact technical support (http://event.asrockrack.com/tsd.asp and sales email address) and I've had absolutely no response (two weeks later) other than a robot confirming that I sent a support request. Is this a common experience? I had expected somewhat better...

So... almost without warning (see comment on CPU temperature alarms), my FreeNAS server stopped, and won't power on. After pressing the power button, the power supply starts up, but then cuts out after ~3.5 seconds. There's no output to the VGA and no POST. Subsequent presses of the power button do nothing: I have to remove the power cable from the power supply before attempting again. IPMI works fine, but provides very little information on what might be wrong - there's nothing useful in the logs at all.

I have tried the following:

* Substituted power supply for another known working unit
* Confirmed that my server power supply is able to power another PC without issue
* Progressively removed all USB devices and SATA drives
* Progressively removed all case connections (fans, LEDs etc), with the exception of the power button
* Removed RAM and tried one at a time in the A1 slot
* Attempted with no RAM inserted at all
* Reset CMOS by removing power and battery and holding the power button for more than 30 seconds

In all cases, symptoms are identical: power supply fires up for about 3.5 seconds then cuts out. No VGA output, no POST.

IPMI shows very little; the only clue I have is a bunch of CPU temperature alarms issued just before the server failed. I’ve never had CPU temperature alarms before; there was no load on the CPU (or IO) at the time. The server always runs very cool and is in a cool environment.

I'd really appreciate if anyone has any suggestions for any further diagnosis I could try. I'm pretty sure the motherboard is toast, but I can't completely rule out that the RAM isn't the problem, as I don't have another ECC capable motherboard.

My server configuration is as follows:

Motherboard: ASRock C2750D4I
RAM: 2x 8 GB Crucial DDR3L 1600 (PC3L 12800) ECC Unbuffered CT2KIT102472BD160B
HDDs: 6x 3TB Western Digital Green WD30EZRX
Case: Fractal Design Node 304
Power supply: A1-3000 420W PSU

BIOS version 2.80
BMC version 00.23.00
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
Of course FreeNAS forum isn't the proper location to troubleshoot your problem however I still think we can offer some advice.

So far it appears that you have done just about everything except replace the CPU and motherboard. Do you have a speaker connected to the motherboard SPKR connection points? You stated that you have no POST output so to me that means you do have a speaker so you can hear any tones. If you don't, then connect a speaker and see if there are any beeps. Report the number of beeps that you hear and look it up in the user manual for your motherboard, if they are listed.
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,478
also, I believe the ASRock has a digital display directly on the motherboard that can display 2 digit letters and numbers. this is specifically there for diagnosing error codes and problems. do you see any codes on there as it boots up for those 3-5 seconds?
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,478
Also try this email address. This guy has always been phenomenal in dealings and his name almost always comes up in relation to ASRock US support.

william@asrockamerica.com
 

John Doe

Guru
Joined
Aug 16, 2011
Messages
635
@nickt Don't waste more time. Here is what happened and what you have to do.
Most ASRock 2550 and 2750 purchased over a year ago, have been dying in the last two - three weeks.
The internet is full of people reporting this: https://www.reddit.com/r/freenas/comments/4x1kh1/asrock_c2550d4i_sudden_death/

Fortunately, the motherboard caries a three year warranty, so it's only going to be the cost to ship it to ASRock and your inconvenience.
To do so, go to http://event.asrock.com/usrma/ and create an account. After that, submit an RMA.
ASRock is fully aware of this, so you should receive a RMA number in about 24 hours.
After that follow their instructions and ship the motherboard.

By the way, my (new) 2750 board is coming today from the ASRock RMA and my co-workers' 2550 is coming Saturday.
Both died in a window of 24 hours and were purchased one year apart.
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,478
I had noticed more and more reports lately. Fhaksor posting that thread.

I hope the xl mini wasn't affected by this. That is not going to be good for XiSystems image.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
It would be nice to know if this really is a poorly designed motherboard or poor components. We may need to stop recommending these boards if there is some proof of the flaw.
 

nickt

Contributor
Joined
Feb 27, 2015
Messages
131
Hi all,

Thanks so much for the many helpful responses - it's a great community that underpins FreeNAS. The overall prognosis, however, isn't looking great...

@joeschmuck - fair call - I realise my post is not strictly FreeNAS related, but I thought it would be good to post anyway given the popularity of this board in FreeNAS builds - both to benefit from the wisdom of others who have more experience / knowledge than me and to alert others to what may prove to be a widespread problem.

I don't have a speaker connected to the board as I have not been able to locate SPKR terminals on the board. A zoomed in image of the board is available on ASRock's product page - let me know if you know of one / can see one - maybe my domestic blindness is getting the better of me...

@nojohnny101 - no digital display on this model, unfortunately. That would be nice...

I had actually reached out to william@askrockamerica.com in my attempts to get support, but he has not replied to me. I had seen others on the FreeNAS forum comment about William's great help - I don't think I said anything to offend him!

@John Doe - I think you're absolutely right. Looking through those reddit threads is a little scary - they describe my issue perfectly. The RMA link you provided is similar to the one I used, although appears to be US specific (I am in Australia - the one I used http://event.asrockrack.com/tsd.asp appears to be global). But I'll try it anyway.

If this is a general issue then we are all the poorer for it - the board has been perfect for my needs - I particularly appreciate the low power draw for an always on server (generally around 50 - 60 W at the wall with 6 WD green drives).

Thanks again for your help so far - I'll keep the thread up to date as I go.

Nick
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
I don't have a speaker connected to the board as I have not been able to locate SPKR terminals on the board. A zoomed in image of the board is available on ASRock's product page - let me know if you know of one / can see one - maybe my domestic blindness is getting the better of me...
On your board it is marked as "SPEAKER1" (see attached screen capture from the ASRock website, far left on the image).
Capture.JPG
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
Mass die-off of ASRock C2750/C2550 boards? I'll admit I haven't kept up with the forums in the last few days, but it's the first time I'm hearing that.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
Mass die-off of ASRock C2750/C2550 boards? I'll admit I haven't kept up with the forums in the last few days, but it's the first time I'm hearing that.

Is that because of the alleged bug in the asrock bios that causes the serial config flash to die from excessive wear?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Is that because of the alleged bug in the asrock bios that causes the serial config flash to die from excessive wear?
I was just about to ask that.

It's no "alleged" bug. I forget what models were effected, exactly, but it definitely was a thing that someone clever discovered that every time the watchdog process did a normal commuication with the BIOS, it would hard-reflash things. So depending on exact configs and what not, that means you were reflashing things more or less frequently, that were only meant to be flashed like twice in their life. lol.
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
This happened to mine about 4 months ago, I thought the new PSU killed it since since there were a lot of reports on newegg of it being crappy, and a few weeks prior a fan had started grinding and my backplane died all within a few weeks of installing it. I tried for about a week and couldn't get anything out of it other than the IPMI working.

@nickt Don't waste more time. Here is what happened and what you have to do.
Most ASRock 2550 and 2750 purchased over a year ago, have been dying in the last two - three weeks.
The internet is full of people reporting this: https://www.reddit.com/r/freenas/comments/4x1kh1/asrock_c2550d4i_sudden_death/

Fortunately, the motherboard caries a three year warranty, so it's only going to be the cost to ship it to ASRock and your inconvenience.
To do so, go to http://event.asrock.com/usrma/ and create an account. After that, submit an RMA.
ASRock is fully aware of this, so you should receive a RMA number in about 24 hours.
After that follow their instructions and ship the motherboard.

By the way, my (new) 2750 board is coming today from the ASRock RMA and my co-workers' 2550 is coming Saturday.
Both died in a window of 24 hours and were purchased one year apart.

If I would have known that I could have saved myself about $800+ because I just sold mine on ebay for $40 (listed dead or alive, it was instantly getting bids, now I know why lol). I ended up upgrading to a SuperMicro board and started out with a 32 GB stick of RAM and recently just added another 32 GB stick. The guy that bought it asked if I still had the invoice and he said he was going to pay to have Asrock fix it. I didn't even check to see if it was under warranty, I just assumed that it had a one year warranty and it was purchased in December of 2014. All in all it's not a loss since the RAM for the C2750 is ridiculously expensive if you want to max it out at 64 GB.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
Is that because of the alleged bug in the asrock bios that causes the serial config flash to die from excessive wear?
I was just about to ask that.

It's no "alleged" bug. I forget what models were effected, exactly, but it definitely was a thing that someone clever discovered that every time the watchdog process did a normal commuication with the BIOS, it would hard-reflash things. So depending on exact configs and what not, that means you were reflashing things more or less frequently, that were only meant to be flashed like twice in their life. lol.
Holy crap, how does a bug like that even happen, much less get through quality control?!

Must've been in a recent update, otherwise I'm sure we'd have heard of this at least a year ago.

Anyone have a source with a summary of what's known, so far? This is easily the most insane bug I've seen all year.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
Yeah, I've seen that...

I'm going to look for high-resolution images of the C2750D4I boards to see if I can add some meaningful numbers to the hardware side of things.

In any case, the working theory seems to be that the BMC firmware is responding to OS watchdog pings by flashing some EEPROM, presumably the one that stores its config, because:
  • BIOS traditionally uses battery-backed RAM for current settings (battery's gotta be there for the RTC anyway)
  • A number of additional issues have been popping up with the BMC (fan control failing, etc.), apparently because the BMC's kernel panics when the EEPROM writes fail
  • The BMC is hooked very deeply into the CPU and its Management Engine. System firmware probably has an initialization that requires speaking to the BMC before POST even begins - no functional BMC, no boot. (This check is obviously omitted on boards without a BMC, like all consumer-grade boards)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
C2550D4I-1(L).png

Observations so far:
The EEPROMs for the BMC and system firmware are both socketed.
BMC firmware is stored on an EEPROM that goes in that fancy-ass socket, marked in green, system firmware is stored on the unimpressive 8-pin DIP package IC marked in blue.

The other three marked ICs may be of interest, but I can't find high-resolution images of them to get their model numbers.
I kinda suspect that the one marked in red is battery-backed SRAM for BIOS settings and the orange one the RTC controller. That would make yellow the most likely candidate for BMC settings EEPROM.

I'd appreciate any high-resolution images of these ICs and surrounding areas that any C2750D4I/C2550D4I owners might be able to take. Thanks!
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
The other major consideration in deciding whether the boards are repairable is whether Asrock can fix the issue, and have a fix for it, in firmware so that a repaired board does not have the same issue. And of course whether the part is a standard one or can be obtained from Asrock.

Is their warranty the same worldwide, or does it depend where one bought the board?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
Repairs are not for the faint of heart, most likely. The socketed stuff generally only has the program, not the settings, which are supposedly being overwritten way too often. Which leaves a small fine pitch IC surrounded by absolutely tiny passives...
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
Repairs are not for the faint of heart, most likely. The socketed stuff generally only has the program, not the settings, which are supposedly being overwritten way too often. Which leaves a small fine pitch IC surrounded by absolutely tiny passives...
Probably doable if it isn't a section of some large BGA chip. But certainly not worth doing if they can't stop it happening again!
 
Top