Hi,
My system :
CPU : Intel Pentium G4560
Motherboard : P10S-M WS
RAM : 4x8Go Crucial DDR4 2400Mhz ECC
HDD : 1xRaidZ1 5x4To
4x WDC_WD40EZRZ
1x ST4000VN008
Since the update, I have random shutdown/reboots followed by an alert from MDAM indicating a failure
Smart offline test on incriminated disks do not show any errors.
Dmesg is spammed with ECC Hardware errors :
Is this an indication of a failing DIMM ?
If you have any idea of what is happening, it would be greatly appreciated !
Thank you !
My system :
CPU : Intel Pentium G4560
Motherboard : P10S-M WS
RAM : 4x8Go Crucial DDR4 2400Mhz ECC
HDD : 1xRaidZ1 5x4To
4x WDC_WD40EZRZ
1x ST4000VN008
Since the update, I have random shutdown/reboots followed by an alert from MDAM indicating a failure
Code:
[ 60.778788] md/raid1:md127: not clean -- starting background reconstruction [ 60.785911] md/raid1:md127: active with 2 out of 2 mirrors [ 60.791524] md127: detected capacity change from 0 to 4188160 [ 60.797437] md: resync of RAID array md127 [ 61.330877] md/raid1:md126: not clean -- starting background reconstruction [ 61.337927] md/raid1:md126: active with 2 out of 2 mirrors [ 61.343583] md126: detected capacity change from 0 to 4188160 [ 61.349555] md: resync of RAID array md126 [ 62.154979] Adding 2094076k swap on /dev/mapper/md127. Priority:-3 extents:1 across:2094076k FS [ 63.350202] Adding 2094076k swap on /dev/mapper/md126. Priority:-4 extents:1 across:2094076k FS [ 63.535512] md/raid1:md126: Disk failure on sdd1, disabling device. md/raid1:md126: Operation continuing on 1 devices. [ 63.535526] md: md126: resync interrupted. [ 63.659522] md: resync of RAID array md126 [ 63.663915] md: md126: resync done. [ 63.839877] md126: detected capacity change from 4188160 to 0 [ 63.845749] md: md126 stopped. [ 65.607439] md/raid1:md127: Disk failure on sdb1, disabling device. md/raid1:md127: Operation continuing on 1 devices. [ 65.607477] md: md127: resync interrupted. [ 65.707521] md: resync of RAID array md127 [ 65.711910] md: md127: resync done. [ 65.861259] md127: detected capacity change from 4188160 to 0
Code:
This is an automatically generated mail message from mdadm running on truenas A Fail event had been detected on md device /dev/md127. It could be related to component device /dev/sdb1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : active raid1 sda1[1] sdb1[0](F) 2094080 blocks super 1.2 [2/1] [_U] [======>..............] resync = 32.8% (688960/2094080) finish=0.1min speed=172240K/sec unused devices: <none>
Code:
This is an automatically generated mail message from mdadm running on truenas A Fail event had been detected on md device /dev/md126. It could be related to component device /dev/sdd1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md126 : active raid1 sdd1[1](F) sdc1[0] 2094080 blocks super 1.2 [2/1] [U_] [===>.................] resync = 15.1% (317184/2094080) finish=0.1min speed=158592K/sec md127 : active raid1 sda1[1] sdb1[0] 2094080 blocks super 1.2 [2/2] [UU] [===>.................] resync = 17.7% (371968/2094080) finish=0.1min speed=185984K/sec unused devices: <none>
Smart offline test on incriminated disks do not show any errors.
Dmesg is spammed with ECC Hardware errors :
Code:
[10877.749344] {81}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [10877.757969] {81}[Hardware Error]: It has been corrected by h/w and requires no further action [10877.766720] {81}[Hardware Error]: event severity: corrected [10877.772429] {81}[Hardware Error]: Error 0, type: corrected [10877.778157] {81}[Hardware Error]: fru_text: CorrectedErr [10877.783705] {81}[Hardware Error]: section_type: memory error [10877.789694] {81}[Hardware Error]: node: 1 device: 1
Is this an indication of a failing DIMM ?
If you have any idea of what is happening, it would be greatly appreciated !
Thank you !