ewellinger
Cadet
- Joined
- Aug 4, 2017
- Messages
- 5
Hello!
I need some advice about a build I've been working on that has been hitting A LOT of different issues and wanted to get the communities input on how best to proceed without just throwing money and time at the problem. TLDR: RMA'd my motherboard 3 times and still hitting issues
The basic components of the build are as follows:
At this point I'm actually on my fourth (!!) motherboard as I hit a variety of issues with the previous 3 that required RMA replacements. The motherboard is new, but the CPU(s) are refurbished chips from ServerMonkey. The RAM is also refurbished but I'm pretty confident that they are all working as I was able to do 4 passes with no errors on the full set of 8 sticks. The basic trajectory of this cursed build is as follows:
One option I was considering was just writing off the DIMMD1 and working with the working DIMM slots. Not sure if this is a good long term approach though since I don't know if that leaves the board more prone to failure.
I'm pretty sure I didn't bend any pins on the CPU connector with the multiple reseating of the first CPU, but I guess I can't be sure of that. It's worth noting that this is not my first build so I'm not a complete n00b regarding these types of things but this is my first foray into the more enterprise geared components.
I need some advice about a build I've been working on that has been hitting A LOT of different issues and wanted to get the communities input on how best to proceed without just throwing money and time at the problem. TLDR: RMA'd my motherboard 3 times and still hitting issues
The basic components of the build are as follows:
- Motherboard: Supermicro X11SPH-nCTF
- CPU: Intel Xeon Silver 4210 / 4214
- Memory: 8x 32GB Crucial DDR4 2933MHz PC4-23400 ECC
- Chassis: Supermicro CSE-826BE16-R920LPB
At this point I'm actually on my fourth (!!) motherboard as I hit a variety of issues with the previous 3 that required RMA replacements. The motherboard is new, but the CPU(s) are refurbished chips from ServerMonkey. The RAM is also refurbished but I'm pretty confident that they are all working as I was able to do 4 passes with no errors on the full set of 8 sticks. The basic trajectory of this cursed build is as follows:
- Motherboard 1:
- System would not power on at all. The BMC light would turn on but for the live of me I couldn't get it to boot up. I used a multi-meter and verified that the chassis PSU was working as expected so that wasn't the issue.
- Wasn't sure whether it was the CPU or motherboard so I RMA'd the motherboard.
- Motherboard 2:
- System turned on, yay!
- But I was hitting memory issues where it was not identifying one of the DIMMs. At this point my suspicions turned to the CPU. I cleaned the CPU contacts with some isopropyl alcohol and tried reseating it numerous times. The memory issue went away but then reappeared shortly into a memory test. Proceeded with RMA'ing the CPU and getting a different 4210 chip.
- The new chip worked and I was able to test all the memory sticks for a full 4 runs over ~48 hours.
- Unfortunately the NIC was bad and booting into TrueNAS I was unable to see it at all. Tried updating the BIOS and booting into a different OS but no dice, the adaptor wasn't recognized at all.
- Looking back I should have just added a network adaptor card since this was the closest I got to a working system
- Motherboard 3:
- The next motherboard immediately had memory issues. I was seeing messages like "Memory Training Failure" when booting with just a single DIMM of memory. I tried moving the DIMM around (based on a request from Supermicro support) and was able to boot with just the DIMMC1 slot filled. At this point I was able to update the BIOS but that didn't help.
- RMA'd a 3rd time. At this point I was informed that any future issues would only result in a repair and not a replacement. I also heard that a tech from Supermicro would personally QA the board before shipping it out.
- Motherboard 4:
- Motherboard 4 initially booted with just DIMMA1 occupied. Added in the rest of the memory sticks and then DIMMD1 couldn't be found. Cleaned the CPU, reset it, and tried just DIMMA1 again and it couldn't find it now.
- At this point I'm convinced the CPU is bad because I know this board was inspected before being sent (the BIOS was updated to a recent version that none of the others were running).
- In the hope of getting something working I ordered a Xeon Silver 4214 CPU and a different heatsink (the officially tested Supermicro one as opposed to a Dynatron B5).
- This arrived yesterday and I tried progressively added more memory to the system. No matter what I do DIMMD1 is consistently not found. I have not yet tried reseating it (generally didn't have a lot of time yesterday).
- On the plus side it looks like the network adaptor is working, so that's something I supposed.
One option I was considering was just writing off the DIMMD1 and working with the working DIMM slots. Not sure if this is a good long term approach though since I don't know if that leaves the board more prone to failure.
I'm pretty sure I didn't bend any pins on the CPU connector with the multiple reseating of the first CPU, but I guess I can't be sure of that. It's worth noting that this is not my first build so I'm not a complete n00b regarding these types of things but this is my first foray into the more enterprise geared components.