Issues with X11SSM-F 3.3VSB Setting

tealcomp · Mar 24, 2017

Hi All:

I have been working to put together a new FreeNAS server for about 2 months :) While I have had the server up and running, I have been having some nagging "issues" with the box that leaves me uncomfortable with the stability and hence reliability.
Today, I noticed that the IPMI 3.3VSB (Standby I presume?) sensor is in a "Low Critical" state; indeed, the voltage being put out is around 2.4V, which is definitely low. Based on the log, this started happening on 03/17. Here is the output of the ipmisensor tool

Code:

jupiter# ipmitool sensor | grep VSB
5VSB  | 5.000  | Volts  | ok  | 4.246  | 4.376  | 4.480| 5.390  | 5.546  | 5.598
3.3VSB  | 2.432  | Volts  | nr  | 2.789  | 2.891  | 2.959| 3.554  | 3.656  | 3.690

I have already replaced the PSU (due to a CAM read error I have been experiencing in another post I am still working on).

I have this issue with both the Seasonic 760 Platinum and now an eVGA P2 850. So, I am going to go out on the edge and say that the PSU is probably not the source of my problem.

Any thoughts on this one? I have rebooted the server, but I have not yet reset the IPMI; would that be the next diagnostic step to take or perhaps upgrade the BIOS? Below are the particulars; note I have removed the identifiable information.

Code:

Firmware Revision : 01.13 IP address :
Firmware Build Time : 01/15/2016 BMC MAC address :
BIOS Version : 1.0b System LAN1 MAC address :
BIOS Build Time : 12/29/2015 System LAN2 MAC address :

Thanks in advance for your assistance and input.

-Dan

Ericloewe · Mar 25, 2017

It cannot be the PSU, since no ATX PSU supplies a 3.3v standby rail.

Contact Supermicro for support. The regulator providing that rail may be defective.

tealcomp · Mar 25, 2017

Thanks Eric..that is on my list of to dos. I didn't think it was the PSU but I had another problem that most likely is being caused by the PSU. I was reading the manual trying to figure out what the 3.3V SB rail is actually powering..do you know off hand and the better question is it considered a critical indicator?

Sent from my LG-H900 using Tapatalk

Ericloewe · Mar 25, 2017

tealcomp said:
I was reading the manual trying to figure out what the 3.3V SB rail is actually powering..do you know off hand and the better question is it considered a critical indicator?

Something related to IPMI.

tealcomp · Mar 25, 2017

Ericloewe said:
Something related to IPMI.

Might explain why email notification is not happening eh.

Sent from my LG-H900 using Tapatalk

tealcomp · Mar 25, 2017

Eric:

Would you mind giving me your input on this other "possibly" related issue I have been experiencing? I have done an in depth analysis of the problem, and given the random nature of the issue, I am wondering if it is possible the motherboard is also the source of this problem?

tealcomp · Mar 25, 2017

Ericloewe said:
Something related to IPMI.

I have opened a support issue with SM.

Ericloewe · Mar 25, 2017

tealcomp said:
Eric:

Would you mind giving me your input on this other "possibly" related issue I have been experiencing? I have done an in depth analysis of the problem, and given the random nature of the issue, I am wondering if it is possible the motherboard is also the source of this problem?

The motherboard possibly, but it's almost certainly unrelated.

tealcomp · Mar 25, 2017

Ericloewe said:
The motherboard possibly, but it's almost certainly unrelated.

OK fair enough, could you expand what you think it might be? If you say unrelated I would imagine you have thoughts about what it might be? You have read the history I presume, I have gone out of my way to rule out anything remotely possible. I find it interesting that in RAIDZ2 the drives seem to be far more stable than RAIDZ3..I am just at a loss (so far) to explain the cause. I searched the forum with a fine toothed comb, checked google for external references as well and really have tried the solutions others have said worked for them.

Thanks in Advance!

-Dan

tealcomp · Mar 27, 2017

tealcomp said:
I have opened a support issue with SM.

After working with SM Support, updating the IPMI FW as well as the SM BIOS and validating the problem with the low voltage rail occurs outside of FN as well, SM has recommended I RMA this board. So, I will work on doing that and I suppose ordering a duplicate board while they diagnose this current one. Before all is said and done I will have enough parts to build a 2nd new server :)

tealcomp · Mar 28, 2017

tealcomp said:
After working with SM Support, updating the IPMI FW as well as the SM BIOS and validating the problem with the low voltage rail occurs outside of FN as well, SM has recommended I RMA this board. So, I will work on doing that and I suppose ordering a duplicate board while they diagnose this current one. Before all is said and done I will have enough parts to build a 2nd new server :)

New motherboard on the way from the Egg. RMA issued from SM, just need to find a box to shove it in and send to them. Fingers crossed, new board will arrive this THU. Stay tuned :)

tealcomp · Apr 2, 2017

Hi All:

The saga continues :) I have installed the replacement motherboard and the 3.3v standby issue is not present. So that part is resolved. Unfortunately, the CAM issues continue to plague me. I thought I was on to something when I started offloading the drives the to on-board connectors, but last night one of those drives that was on the HBA I moved to on-baord SATA, threw a similar CAM error to what I saw with the drive while on HBA. At this point, I believe the issues lie with the REDS. I say this with confidence because I have literally replaced everything else :) The sad part of the news for me is they are passing their extended longs so I doubt I will be able to convince W/D to replace them.
Any thoughts or suggestions?

-Dan

Ericloewe · Apr 2, 2017

Could be marginal cables.

tealcomp · Apr 2, 2017

Ericloewe said:
Could be marginal cables.

Eric:

Yes, I considered that; I have replaced the HBA cables once with cables directly from SM. They did help but it wasn't a complete solution. Also, I have used probably 4 different kinds of cables and again the drives are the central component that seem to be the issue. I find it difficult to believe that that many different kinds of cables are "marginal"; keeping in mind I have had issues with the HBA connections as well as the on-board SATA connectors; so there again, not even the same kinds of cables. I truly believe the issue is with the REDS. Also, I found an article where someone had a very similar problem, and until they rid themselves of the REDS, they had the same problem I am having. The funny thing is, the corrections in the Z3 Pool are minor, we are talking 116K corrections against nearly 8TB of data, Like everyone else however, I don't like unexplained errors, they drive me bonkers. I just completed another set of Extended tests and once again there are zero indications of any SMART errors. If you run a smartctl -x /dev/da? here, you get far much more detail and the root cause of the problem manifests itself as a READ FPDMA QUEUED message.

-Dan

Important Announcement for the TrueNAS Community.

Issues with X11SSM-F 3.3VSB Setting

tealcomp

Explorer

Ericloewe

Server Wrangler

tealcomp

Explorer

Ericloewe

Server Wrangler

tealcomp

Explorer

tealcomp

Explorer

tealcomp

Explorer

Ericloewe

Server Wrangler

tealcomp

Explorer

tealcomp

Explorer

tealcomp

Explorer

tealcomp

Explorer

Ericloewe

Server Wrangler

tealcomp

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Issues with X11SSM-F 3.3VSB Setting

Explorer

Server Wrangler

Explorer

Server Wrangler

Explorer

Explorer

Explorer

Server Wrangler

Explorer

Explorer

Explorer

Explorer

Server Wrangler

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Issues with X11SSM-F 3.3VSB Setting"

Similar threads