VictorR
Contributor
- Joined
- Dec 9, 2015
- Messages
- 143
45 Drives Q30
SuperMicro X10DRL-i
2x E52620 v3 CPU
2x 120GB SSD Boot Drive
256GB RAM
2x LSI 9201 HBA
30x WD Re 4TB drives
3x X540T2BLK Intel X540 DA2
I had a recurring memory error message:
After spending days of trying to find out what the actual location of "Bank 7" is on the X10drl-i, I got lucky when the chip up and failed, revealing its MB location
Ordered a replacement (before realizing these Samsung chips have lifetime warranty) and replaced it.
A week later, I get this...(actually, I think I remember seeing it occasionally, before. Last time I was at this office was 6 months ago)
freenas.local kernel log messages:
My problem, now, is finding the location of the second chip at "Bank 12"
Several web searches later, it seems that mcelog is a useful tool. But, I cannot correlate the address in the kernel log message with the addresses in mcelog. I can't find anything Ipmitool or mcelog.
root@freenas:~ # mcelog
Hardware event. This is not a software error.
MCE 0
CPU 20 BANK 12 TSC 4741ac06ff4a0
MISC 910800200021e8c ADDR 2bd4b358c0
TIME 1564951165 Sun Aug 4 13:39:25 2019
MCG status:
MemCtrl: Corrected patrol scrub error
STATUS 8c00004f000800c3 MCGSTATUS 0
MCGCAP 7000c16 APICID 18 SOCKETID 0
CPUID Vendor Intel Family 6 Model 63
root@freenas:/net # dmidecode -t memory
# dmidecode 3.1
Scanning /dev/mem for entry point.
SMBIOS 3.0 present.
Handle 0x002A, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 256 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x002B, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMA1
Bank Locator: P0_Node0_Channel0_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 725BDB74
Asset Tag: P1-DIMMA1_AssetTag (date:15/21)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x002C, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMB1
Bank Locator: P0_Node0_Channel1_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 4104DC40
Asset Tag: P1-DIMMB1_AssetTag (date:15/33)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x002D, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMC1
Bank Locator: P0_Node0_Channel2_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 41995ECB
Asset Tag: P1-DIMMC1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x002E, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMD1
Bank Locator: P0_Node0_Channel3_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 419958BE
Asset Tag: P1-DIMMD1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x002F, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 256 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x0030, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMME1
Bank Locator: P1_Node1_Channel0_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 730233AD
Asset Tag: P2-DIMME1_AssetTag (date:15/37)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0031, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMMF1
Bank Locator: P1_Node1_Channel1_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 725BDB75
Asset Tag: P2-DIMMF1_AssetTag (date:15/21)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0032, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMMG1
Bank Locator: P1_Node1_Channel2_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 419958E6
Asset Tag: P2-DIMMG1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0033, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMMH1
Bank Locator: P1_Node1_Channel3_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 419958FA
Asset Tag: P2-DIMMH1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
SuperMicro X10DRL-i
2x E52620 v3 CPU
2x 120GB SSD Boot Drive
256GB RAM
2x LSI 9201 HBA
30x WD Re 4TB drives
3x X540T2BLK Intel X540 DA2
I had a recurring memory error message:

After spending days of trying to find out what the actual location of "Bank 7" is on the X10drl-i, I got lucky when the chip up and failed, revealing its MB location


Ordered a replacement (before realizing these Samsung chips have lifetime warranty) and replaced it.
A week later, I get this...(actually, I think I remember seeing it occasionally, before. Last time I was at this office was 6 months ago)
freenas.local kernel log messages:
MCA: Bank 12, Status 0x8c00004f000800c3
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 24
MCA: CPU 20 COR (1) MS channel 3 memory error
MCA: Address 0x2bd4b358c0
MCA: Misc 0x910800200021e8c
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 24
MCA: CPU 20 COR (1) MS channel 3 memory error
MCA: Address 0x2bd4b358c0
MCA: Misc 0x910800200021e8c
-- End of security output --

My problem, now, is finding the location of the second chip at "Bank 12"
Several web searches later, it seems that mcelog is a useful tool. But, I cannot correlate the address in the kernel log message with the addresses in mcelog. I can't find anything Ipmitool or mcelog.
root@freenas:~ # mcelog
Hardware event. This is not a software error.
MCE 0
CPU 20 BANK 12 TSC 4741ac06ff4a0
MISC 910800200021e8c ADDR 2bd4b358c0
TIME 1564951165 Sun Aug 4 13:39:25 2019
MCG status:
MemCtrl: Corrected patrol scrub error
STATUS 8c00004f000800c3 MCGSTATUS 0
MCGCAP 7000c16 APICID 18 SOCKETID 0
CPUID Vendor Intel Family 6 Model 63
root@freenas:/net # dmidecode -t memory
# dmidecode 3.1
Scanning /dev/mem for entry point.
SMBIOS 3.0 present.
Handle 0x002A, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 256 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x002B, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMA1
Bank Locator: P0_Node0_Channel0_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 725BDB74
Asset Tag: P1-DIMMA1_AssetTag (date:15/21)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x002C, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMB1
Bank Locator: P0_Node0_Channel1_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 4104DC40
Asset Tag: P1-DIMMB1_AssetTag (date:15/33)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x002D, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMC1
Bank Locator: P0_Node0_Channel2_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 41995ECB
Asset Tag: P1-DIMMC1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x002E, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002A
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P1-DIMMD1
Bank Locator: P0_Node0_Channel3_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 419958BE
Asset Tag: P1-DIMMD1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x002F, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 256 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x0030, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMME1
Bank Locator: P1_Node1_Channel0_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 730233AD
Asset Tag: P2-DIMME1_AssetTag (date:15/37)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0031, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMMF1
Bank Locator: P1_Node1_Channel1_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 725BDB75
Asset Tag: P2-DIMMF1_AssetTag (date:15/21)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0032, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMMG1
Bank Locator: P1_Node1_Channel2_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 419958E6
Asset Tag: P2-DIMMG1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0033, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x002F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 32 GB
Form Factor: DIMM
Set: None
Locator: P2-DIMMH1
Bank Locator: P1_Node1_Channel3_Dimm0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MT/s
Manufacturer: Samsung
Serial Number: 419958FA
Asset Tag: P2-DIMMH1_AssetTag (date:15/44)
Part Number: M386A4G40DM0-CPB
Rank: 4
Configured Clock Speed: 1866 MT/s
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown