Finding bad memory chip location on Supermicro X10drl-i MB

VictorR

Contributor
Joined
Dec 9, 2015
Messages
143
45 Drives Q30
SuperMicro X10DRL-i
2x E52620 v3 CPU
2x 120GB SSD Boot Drive
256GB RAM
2x LSI 9201 HBA
30x WD Re 4TB drives
3x X540T2BLK Intel X540 DA2

I had a recurring memory error message:
fullsizeoutput_370.jpeg

After spending days of trying to find out what the actual location of "Bank 7" is on the X10drl-i, I got lucky when the chip up and failed, revealing its MB location

fullsizeoutput_371.jpeg

X10_layout.png

Ordered a replacement (before realizing these Samsung chips have lifetime warranty) and replaced it.
A week later, I get this...(actually, I think I remember seeing it occasionally, before. Last time I was at this office was 6 months ago)

freenas.local kernel log messages:
MCA: Bank 12, Status 0x8c00004f000800c3
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 24
MCA: CPU 20 COR (1) MS channel 3 memory error
MCA: Address 0x2bd4b358c0
MCA: Misc 0x910800200021e8c​
-- End of security output --​

memory error.png

My problem, now, is finding the location of the second chip at "Bank 12"

Several web searches later, it seems that mcelog is a useful tool. But, I cannot correlate the address in the kernel log message with the addresses in mcelog. I can't find anything Ipmitool or mcelog.

root@freenas:~ # mcelog
Hardware event. This is not a software error.
MCE 0
CPU 20 BANK 12 TSC 4741ac06ff4a0
MISC 910800200021e8c ADDR 2bd4b358c0
TIME 1564951165 Sun Aug 4 13:39:25 2019
MCG status:
MemCtrl: Corrected patrol scrub error
STATUS 8c00004f000800c3 MCGSTATUS 0
MCGCAP 7000c16 APICID 18 SOCKETID 0
CPUID Vendor Intel Family 6 Model 63

root@freenas:/net # dmidecode -t memory
# dmidecode 3.1
Scanning /dev/mem for entry point.
SMBIOS 3.0 present.

Handle 0x002A, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 256 GB
Error Information Handle: Not Provided
Number Of Devices: 4

Handle 0x002B, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMA1
Bank Locator: P0_Node0_Channel0_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 725BDB74
Asset Tag: P1-DIMMA1_AssetTag (date:15/21)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown

Handle 0x002C, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMB1
Bank Locator: P0_Node0_Channel1_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 4104DC40
Asset Tag: P1-DIMMB1_AssetTag (date:15/33)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown

Handle 0x002D, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMC1
Bank Locator: P0_Node0_Channel2_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 41995ECB
Asset Tag: P1-DIMMC1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown

Handle 0x002E, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMD1
Bank Locator: P0_Node0_Channel3_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 419958BE
Asset Tag: P1-DIMMD1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown

Handle 0x002F, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 256 GB
Error Information Handle: Not Provided
Number Of Devices: 4

Handle 0x0030, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMME1
Bank Locator: P1_Node1_Channel0_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 730233AD
Asset Tag: P2-DIMME1_AssetTag (date:15/37)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown

Handle 0x0031, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMMF1
Bank Locator: P1_Node1_Channel1_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 725BDB75
Asset Tag: P2-DIMMF1_AssetTag (date:15/21)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown

Handle 0x0032, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMMG1
Bank Locator: P1_Node1_Channel2_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 419958E6
Asset Tag: P2-DIMMG1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown

Handle 0x0033, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMMH1
Bank Locator: P1_Node1_Channel3_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 419958FA
Asset Tag: P2-DIMMH1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
 

VictorR

Contributor
Joined
Dec 9, 2015
Messages
143
Trying to run Memtest86 on the box. For some reason, the main splash screen of the program does not work, when launched. I cannot scroll through the options
 
Top