SOLVED Dead ASRock C2750D4I and poor customer service?

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
Well, the bug is so egregious that a fix shouldn't be too hard. I do wonder how it happened in the first place...
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,523
I'm also using the same motherboard and here is my feedback.
I've been following this issue on the ASRock C2x50 but I couldn't reproduce what is described in the FreeNAS bug 16190.

My C2750 has been up since some time now (few months) and I should have experienced the same issue (with one write/second to the flash).

So I'm wondering:
- are some hardware (or BIOS) versions unaffected?
- is this issue only happening with specific versions of FreeNAS (i.e. >= 9.3)?
- in the BIOS the watchdog can be disabled but is it the watchdog causing the issue? I couldn't find out...
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
The problem only happens when the watchdog is active. That means that the BMC is actively expecting the OS to ping it to announce that it's still alive.

- are some hardware (or BIOS) versions unaffected?
Neither, this is most likely a BMC firmware bug. Probably a recent one, otherwise we'd have seen this much earlier.

- is this issue only happening with specific versions of FreeNAS (i.e. >= 9.3)?
Unlikely, since the driver is not specific to ASRock boards.
 

nickt

Contributor
Joined
Feb 27, 2015
Messages
131
Hi all,

A few updates...

@joeschmuck - my domestic blindness is more advanced than I had realised... thanks for that. So I connected a speaker (on my board, there are only blank solder pads), and got nothing. Common wisdom seems to suggest using pins 1 and 4, but I tried every combination of pins possible. Nothing at all. The board is powered for such a short time (3.5 seconds), I wonder whether it has even got to the point of producing beeps before the power cuts. If we go with the prevailing theory, perhaps the BIOS is so cooked, that it can't do beep patterns...?

@Ericloewe - here are some close ups of my board, using the same coloured markings as you. The good old 8 pin DIL package is covered up with a sticker, and given that I hope to RMA the board, I'm not going to pull it off.

board-1.jpg board-2.jpg

Mass die off? It certainly can't be universal to the product - the outcry would have been much louder / sooner, but the consistency of reports turning up right now with very similar sounding symptoms and board lifetimes suggests at least a serious and consistent batch issue.

@Pitfrr good questions, I'm sure. In my case, versions are:
- BIOS version: 2.80
- BMC version: 00.23.00
- FreeNAS: FreeNAS-9.3-STABLE-201602031011

You might notice that these are all at least one version behind. I prefer to wait a month or two before applying updates. Board firmware I don't touch unless release notes suggest a very good reason. In this case, AsRock release notes don't suggest they've addressed the issue we suspect (other than a nebulous "improve product performance"):

upload_2016-9-5_10-7-23.png

upload_2016-9-5_10-7-33.png
 

nickt

Contributor
Joined
Feb 27, 2015
Messages
131
Oh. And ASRock support. Not so good.

William did come back to me (although not until I posted here). He's been quite helpful, asked some good questions and has certainly given the impression that he wants to help. But because I don't live in the US, he can't arrange an RMA for me (although acknowledges that one is obviously required). He tells me that he has been emailing his colleagues in Taiwan, but they continue to ignore me (and him, evidently).

So you good folk located in the US appear to be well looked after.

The rest of us poor suckers appear to be of little interest to ASRock global support. A cautionary tale...?
 

nickt

Contributor
Joined
Feb 27, 2015
Messages
131
One other thought. Reviewing the bug report that seems to link with the fault we are speculating over (https://bugs.freenas.org/issues/16190), I note that it is IPMI / BMC that is implicated. Now I could be misunderstanding the problem here, but IPMI works on my board (not that it has anything particularly interesting to report from its rather sparse logs...)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
One other thought. Reviewing the bug report that seems to link with the fault we are speculating over (https://bugs.freenas.org/issues/16190), I note that it is IPMI / BMC that is implicated. Now I could be misunderstanding the problem here, but IPMI works on my board (not that it has anything particularly interesting to report from its rather sparse logs...)
What happens if you try to change some settings?

Thanks for the pics, by the way. I'll look up the part numbers in the morning.
 

nickt

Contributor
Joined
Feb 27, 2015
Messages
131
OK - that's a good question - I'll try when I get home. I guess it doesn't matter so much what settings - just anything that would write to non-volatile storage...?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
OK - that's a good question - I'll try when I get home. I guess it doesn't matter so much what settings - just anything that would write to non-volatile storage...?
Right. New user, new password or some such thing should be informative.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
Ok, so:
The SPI flash has 4KB sectors and is rated for 100 000+ write/erase cycles. At one write per minute, that's something like three months for it to wear out.

If it's only writing to single page (or handful of pages), it may even be possible to use spare capacity to avoid replacing these chips - since most of the chip would be in order, remapping the bad pages to new ones (after the bug is fixed) would allow these to stay in place.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
I was wondering where the hell the SRAM for the BIOS settings was. Turns out, Intel integrated it into the PCH on the Skylake platform (probably earlier platforms too) - so it could very well be inside the SoC, hence why the battery is so close to the SoC.
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
This definitely isn't related to FreeBSD since I was using Linux when mine died.
Right. New user, new password or some such thing should be informative.

It should still be writeable, the guy that bought my board asked what the old username and password was, I told him and I would assume he changed it.

Sent from my AOSP on dragon using Tapatalk
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
This definitely isn't related to FreeBSD since I was using Linux when mine died.


It should still be writeable, the guy that bought my board asked what the old username and password was, I told him and I would assume he changed it.

Sent from my AOSP on dragon using Tapatalk
If so, that indicates that a software-only fix might be possible, at the cost of a few KB.
 

nickt

Contributor
Joined
Feb 27, 2015
Messages
131
OK... this is weird. As suggested by @Ericloewe, I added a new user to IPMI, just to see what might happen. At first, nothing much interesting - the new user took, and I was able to power cycle and successfully login with the new user. So it obviously worked fine.

In doing that power cycle, the board still did its 3.5 seconds of life thing.

But here's where it gets interesting: a couple of power cycles later, and the board stayed on! A tour of logs / sensors in IPMI suggested things were running (IPMI confirmed that the power state was "on"), but there were some abnormalities (quite a few of the sensors weren't running: some voltage, CPU / MB temperature). OK. Another power cycle, and the board stayed on again, and this time all the sensors are good.

I haven't attempted to boot into FreeNAS yet - as part of earlier fault diagnosis I reset CMOS, so it just boots into the BIOS screen (which I can navigate just fine). Later tonight, I might run a memory test and then attempt to boot FreeNAS (although I won't be risking any of my data drives).

I still consider the board well and truly toasted - there's no way I'd trust it for daily service anymore. But this latest observation certainly provides a few more clues as to what might be going on here. For me, it seems to supports the current theory.

For general interest, here is a grab of some sensor warnings in IPMI (these still seem to be appearing). Also a lone entry in the system log in BIOS - no idea what it means. Let me know if you'd like to see anything else.

Screen Shot 2016-09-06 at 11.12.03 AM.png File 6-09-2016, 11 59 21 AM.jpeg

Oh and ASRock support? William is awesome - he keeps emailing me and trying to help. Global support? Dial tone. Radio silence. Nada.
 

nickt

Contributor
Joined
Feb 27, 2015
Messages
131
Righto. That was short lived. I left everything powered down for a few hours, came back to it just now and it's all dead again. 3.5 s of power, then no more. I've tried retracing my steps - created a new user, power cycled a couple of times (the new user persisted OK). But this time, nothing happens. It's staying dead.

Nothing much different in IPMI, except for the system log includes a bunch of I2C errors at the warning level.

Screen Shot 2016-09-06 at 8.27.02 PM.png

Oh well...
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
So, the BIOS is complaining of a ME engine problem, which is consistent with the BMC failing. What I don't quite understand are the I2C errors - the flash device uses SPI, so maybe the firmware team didn't quite understand the difference between SPI and I2C?
 

nickt

Contributor
Joined
Feb 27, 2015
Messages
131
Ha! Who knows. I've found the logging messages on this board to be rather vague and inconsistent...
 

nickt

Contributor
Joined
Feb 27, 2015
Messages
131
Another quick update. I was able to resurrect the board once more by making further changes in IPMI - this time it was to do with the logging policy. I was able to keep it up long enough to run a full pass of memtest86, which generated no errors, so I can say with some confidence that PSU, RAM and CPU are all OK. I was also able to start FreeNAS, although not much to see there as I haven't (and won't) connect my data drives. But I kept it running successfully for a few hours.

Interestingly, I was able to keep cold / warm rebooting the machine successfully for some time after - so long as the machine wasn't powered down too long. When I left it powered down overnight, it went back to its failed state.

ASRockRack Taiwan have finally contacted me (after considerable hassling by William). 3 days later, I am still waiting to hear back from them, having now supplied the necessary information for an RMA.

I'll also report back on NewEgg in a few days' time (who sold me the board). That's a whole different story, and I'm wanting to give them every chance to help me before I say what I really think...

Nick
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
Interesting that you were able to revive it, if only for a little while.

Sent from my AOSP on dragon using Tapatalk
 
Top