FreeNAS Build with 10GBe and Ryzen

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
Platform First Error Handling (PFEH)
They literally call it that? That's Newspeak if I've ever seen it, but I'm glad it can be disabled.

Also, it's fascinating to see action on this front.
3000 "ECC Correctable Errors" (=single-bit) and about 100 of CPU errors
Are you sure these aren't uncorrectable ECC errors in disguise?

It is possible that they're not: 3*10^3 errors out of something like 1.6*10^10 cells. So a 10^-6 Chance of a certain cell having an error (this is a dubious statement, but bear with me). To get an uncorrectable error, you need two errors in the same 72-bit word. To further draw the ire of any statisticians reading this, let's assume you have an error and are looking for a second error, so the probability of not having an uncorrectable error is (1-10^-6)^71 = 0.9999290025. And that's integrated over the hours it ran, you need more than spatial proximity, you also need temporal proximity, so it's going to be even less likely. Sure, you have 3000 trials, but that's about an order of magnitude too small to get a decent probability of getting that uncorrectable error.
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
202
PFEH setting in BIOS
1588917161830.png


Screenshot taken after 1m33sec with memory overclocked / undervolted
1588916811096.png


screenshot after almost 2h after ending the run with memory overclocked / undervolted:
1588916848800.png


And screenshot in Linux with memory overclocked / undervolted (during memtester run in the background):
1588916886090.png


And last night, I also managed to replicate how Asrock Rack / AMD proved it (so a third method!) by injecting errors in Linux. I'll post more details on that later...
 
Last edited:

b3081a

Cadet
Joined
Apr 15, 2020
Messages
5
And screenshot in Linux with memory overclocked / undervolted
Just curious, how would this be like in FreeBSD/FreeNAS currently on AMD systems? Is it simply ignored and showing nothing, or we can at least get some sort of logs from it?
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
202
That is something I was wondering myself as well, but didn't get to yet.

FreeNAS doesn't have a tool to stress memory in OS (as far as I know). They advise to run Memtest86 (outside the OS that is). For my overclocking method I need something like that.
As far as I could find, you can't install additional tools in FreeNAS (like the FreeBSD version of "memtester"), except for in a jail. But to create a jail, you need to create pool. I currently have a FreeNAS installation that I'm preparing for "production", in which I certainly won't do any overclocking / ECC testing, which has my HDD pool and I have a pool-less test-FreeNAS installation on an USB stick for things like ECC testing. So I'm stuck a bit there...

I know Diversity did some TrueNAS testing already where errors were shown in the OS though... I suppose FreeNAS should be similar?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
It should be pretty similar. If it's not working, it's probably not too difficult to fix with Linux working.
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
202
3rd working method
This is how Asrock Rack and AMD tested it:


The BIOS
Everything default except
  • "Platform First Error Handling" was changed from the default "Enabled" to "Disabled"
    1588981853166.png
  • "Disable Memory Error Injection", strangely enough, was (accidently) set to the default "True". I think Asrock Rack fell for their own double negation confusion ;) I haven't tried it yet with set to false. I also haven't retried Memtest86 error injection with these settings again...
    1588981838792.png

The OS
I used a fresh install of "Fedora-Server-dvd-x86_64-32-1.6.iso" for this. I might have selected a few additional package groups during the install, not sure if it will make a difference to the below instructions.
Code:
[root@localhost mce-inject-master]# cat /etc/fedora-release
Fedora release 32 (Thirty Two)
[root@localhost ~]# uname -r
5.6.8-300.fc32.x86_64


Installing / configuring additional packages / tools
edac-utils
Code:
[root@localhost ~]# yum install edac-utils
Fedora 32 openh264 (From Cisco) - x86_64                                                4.8 kB/s | 5.1 kB     00:01
Fedora Modular 32 - x86_64                                                              2.2 MB/s | 4.9 MB     00:02
Fedora Modular 32 - x86_64 - Updates                                                    881 kB/s | 1.4 MB     00:01
Fedora 32 - x86_64 - Updates                                                            4.1 MB/s | 7.8 MB     00:01
Fedora 32 - x86_64                                                                      4.3 MB/s |  70 MB     00:16
Dependencies resolved.
======================================================================================================================== Package                      Architecture             Version                           Repository                Size
========================================================================================================================Installing:
edac-utils                   x86_64                   0.16-22.fc32                      fedora                    49 k
Installing dependencies:
sysfsutils                   x86_64                   2.1.0-28.fc32                     fedora                    44 k

Transaction Summary
========================================================================================================================Install  2 Packages

Total download size: 93 k
Installed size: 238 k
Is this ok [y/N]: y
Downloading Packages:
(1/2): edac-utils-0.16-22.fc32.x86_64.rpm                                               406 kB/s |  49 kB     00:00
(2/2): sysfsutils-2.1.0-28.fc32.x86_64.rpm                                              337 kB/s |  44 kB     00:00
------------------------------------------------------------------------------------------------------------------------Total                                                                                   115 kB/s |  93 kB     00:00
warning: /var/cache/dnf/fedora-558931b5e76b51a7/packages/edac-utils-0.16-22.fc32.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 12c944d0: NOKEY
Fedora 32 - x86_64                                                                      1.6 MB/s | 1.6 kB     00:00
Importing GPG key 0x12C944D0:
Userid     : "Fedora (32) <fedora-32-primary@fedoraproject.org>"
Fingerprint: 97A1 AE57 C3A2 372C CA3A 4ABA 6C13 026D 12C9 44D0
From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-32-x86_64
Is this ok [y/N]: y
Key imported successfully
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                1/1
  Installing       : sysfsutils-2.1.0-28.fc32.x86_64                                                                1/2
  Installing       : edac-utils-0.16-22.fc32.x86_64                                                                 2/2
  Running scriptlet: edac-utils-0.16-22.fc32.x86_64                                                                 2/2
  Verifying        : edac-utils-0.16-22.fc32.x86_64                                                                 1/2
  Verifying        : sysfsutils-2.1.0-28.fc32.x86_64                                                                2/2

Installed:
  edac-utils-0.16-22.fc32.x86_64                             sysfsutils-2.1.0-28.fc32.x86_64

Complete!

Bison
Code:
[root@localhost mce-inject-master]# yum install bison
Last metadata expiration check: 0:06:26 ago on Fri 08 May 2020 12:45:14 AM CEST.
Dependencies resolved.
=============================================================================================================================================================================================================================================
Package                                               Architecture                                           Version                                                           Repository                                              Size
=============================================================================================================================================================================================================================================
Installing:
bison                                                 x86_64                                                 3.5-2.fc32                                                        fedora                                                 818 k
Installing dependencies:
m4                                                    x86_64                                                 1.4.18-12.fc32                                                    fedora                                                 218 k

Transaction Summary
=============================================================================================================================================================================================================================================
Install  2 Packages

Total download size: 1.0 M
Installed size: 3.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): m4-1.4.18-12.fc32.x86_64.rpm                                                                                                                                                                          946 kB/s | 218 kB     00:00
(2/2): bison-3.5-2.fc32.x86_64.rpm                                                                                                                                                                           2.0 MB/s | 818 kB     00:00
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                                                        1.2 MB/s | 1.0 MB     00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                                                     1/1
  Installing       : m4-1.4.18-12.fc32.x86_64                                                                                                                                                                                            1/2
  Installing       : bison-3.5-2.fc32.x86_64                                                                                                                                                                                             2/2
  Running scriptlet: bison-3.5-2.fc32.x86_64                                                                                                                                                                                             2/2
  Verifying        : bison-3.5-2.fc32.x86_64                                                                                                                                                                                             1/2
  Verifying        : m4-1.4.18-12.fc32.x86_64                                                                                                                                                                                            2/2

Installed:
  bison-3.5-2.fc32.x86_64                                                                                              m4-1.4.18-12.fc32.x86_64

Complete!

Flex
Code:
[root@localhost mce-inject-master]# yum install flex
Last metadata expiration check: 0:06:41 ago on Fri 08 May 2020 12:45:14 AM CEST.
Dependencies resolved.
=============================================================================================================================================================================================================================================
Package                                               Architecture                                            Version                                                         Repository                                               Size
=============================================================================================================================================================================================================================================
Installing:
flex                                                  x86_64                                                  2.6.4-4.fc32                                                    fedora                                                  318 k

Transaction Summary
=============================================================================================================================================================================================================================================
Install  1 Package

Total download size: 318 k
Installed size: 927 k
Is this ok [y/N]: y
Downloading Packages:
flex-2.6.4-4.fc32.x86_64.rpm                                                                                                                                                                                 1.5 MB/s | 318 kB     00:00
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                                                        482 kB/s | 318 kB     00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                                                     1/1
  Installing       : flex-2.6.4-4.fc32.x86_64                                                                                                                                                                                            1/1
  Running scriptlet: flex-2.6.4-4.fc32.x86_64                                                                                                                                                                                            1/1
  Verifying        : flex-2.6.4-4.fc32.x86_64                                                                                                                                                                                            1/1

Installed:
  flex-2.6.4-4.fc32.x86_64

Complete!

Rasdaemon
Code:
[root@localhost mce-inject-master]# yum install rasdaemon
Last metadata expiration check: 0:02:51 ago on Fri 08 May 2020 12:52:01 AM CEST.
Dependencies resolved.
=============================================================================================================================================================================================================================================
Package                                                       Architecture                                       Version                                                           Repository                                          Size
=============================================================================================================================================================================================================================================
Installing:
rasdaemon                                                     x86_64                                             0.6.4-1.fc32                                                      fedora                                             117 k
Installing dependencies:
perl-DBD-SQLite                                               x86_64                                             1.64-4.fc32                                                       fedora                                             196 k
perl-DBI                                                      x86_64                                             1.643-2.fc32                                                      fedora                                             707 k
perl-Math-BigInt                                              noarch                                             1:1.9998.18-2.fc32                                                fedora                                             190 k
perl-Math-Complex                                             noarch                                             1.59-452.fc32                                                     fedora                                              56 k

Transaction Summary
=============================================================================================================================================================================================================================================
Install  5 Packages

Total download size: 1.2 M
Installed size: 3.5 M
Is this ok [y/N]: y
Downloading Packages:
(1/5): perl-Math-BigInt-1.9998.18-2.fc32.noarch.rpm                                                                                                                                                          497 kB/s | 190 kB     00:00
(2/5): perl-DBD-SQLite-1.64-4.fc32.x86_64.rpm                                                                                                                                                                498 kB/s | 196 kB     00:00
(3/5): perl-Math-Complex-1.59-452.fc32.noarch.rpm                                                                                                                                                            1.3 MB/s |  56 kB     00:00
(4/5): rasdaemon-0.6.4-1.fc32.x86_64.rpm                                                                                                                                                                     1.1 MB/s | 117 kB     00:00
(5/5): perl-DBI-1.643-2.fc32.x86_64.rpm                                                                                                                                                                      1.1 MB/s | 707 kB     00:00
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                                                        1.1 MB/s | 1.2 MB     00:01
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                                                     1/1
  Installing       : perl-Math-Complex-1.59-452.fc32.noarch                                                                                                                                                                              1/5
  Installing       : perl-Math-BigInt-1:1.9998.18-2.fc32.noarch                                                                                                                                                                          2/5
  Installing       : perl-DBI-1.643-2.fc32.x86_64                                                                                                                                                                                        3/5
  Installing       : perl-DBD-SQLite-1.64-4.fc32.x86_64                                                                                                                                                                                  4/5
  Installing       : rasdaemon-0.6.4-1.fc32.x86_64                                                                                                                                                                                       5/5
  Running scriptlet: rasdaemon-0.6.4-1.fc32.x86_64                                                                                                                                                                                       5/5
  Verifying        : perl-DBD-SQLite-1.64-4.fc32.x86_64                                                                                                                                                                                  1/5
  Verifying        : perl-DBI-1.643-2.fc32.x86_64                                                                                                                                                                                        2/5
  Verifying        : perl-Math-BigInt-1:1.9998.18-2.fc32.noarch                                                                                                                                                                          3/5
  Verifying        : perl-Math-Complex-1.59-452.fc32.noarch                                                                                                                                                                              4/5
  Verifying        : rasdaemon-0.6.4-1.fc32.x86_64                                                                                                                                                                                       5/5

Installed:
  perl-DBD-SQLite-1.64-4.fc32.x86_64             perl-DBI-1.643-2.fc32.x86_64             perl-Math-BigInt-1:1.9998.18-2.fc32.noarch             perl-Math-Complex-1.59-452.fc32.noarch             rasdaemon-0.6.4-1.fc32.x86_64

Complete!

[root@localhost machinecheck0]# rasdaemon -e
rasdaemon: ras:mc_event event enabled
rasdaemon: ras:aer_event event enabled
rasdaemon: mce:mce_record event enabled
rasdaemon: Can't write to set_event
rasdaemon: devlink:devlink_health_report event enabled
rasdaemon: block:block_rq_complete event enabled
[root@localhost machinecheck0]# systemctl start rasdaemon
[root@localhost machinecheck0]# systemctl enable rasdaemon
Created symlink /etc/systemd/system/multi-user.target.wants/rasdaemon.service → /usr/lib/systemd/system/rasdaemon.service.
[root@localhost machinecheck0]# systemctl status rasdaemon.service
● rasdaemon.service - RAS daemon to log the RAS events
     Loaded: loaded (/usr/lib/systemd/system/rasdaemon.service; enabled; vendor preset: disabled)
     Active: active (running) since Fri 2020-05-08 00:57:46 CEST; 23s ago
   Main PID: 33914 (rasdaemon)
      Tasks: 1 (limit: 38389)
     Memory: 7.1M
        CPU: 10ms
     CGroup: /system.slice/rasdaemon.service
             └─33914 /usr/sbin/rasdaemon -f -r

May 08 00:57:46 localhost.localdomain rasdaemon[33914]: rasdaemon: diskerror_eventstore: 0x564510eb9918
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: rasdaemon: register inserted at db
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1360) ras:mc_event with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1357) ras:aer_event with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (114) mce:mce_record with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1441) net:net_dev_xmit_timeout with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1449) devlink:devlink_health_report with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1154) block:block_rq_complete with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: Calling ras_mc_event_opendb()
May 08 00:57:46 localhost.localdomain rasdaemon[33914]:            <...>-36    [005]     0.000095: block_rq_complete:    2020-05-08 00:57:45 +0200
 
Last edited:

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
202
Development Tools (for make)
Code:
[root@localhost mce-inject-master]# yum groupinstall "Development Tools"
Last metadata expiration check: 0:07:50 ago on Fri 08 May 2020 12:52:01 AM CEST.
Dependencies resolved.
=============================================================================================================================================================================================================================================
 Package                                                                 Architecture                              Version                                                                  Repository                                  Size
=============================================================================================================================================================================================================================================
Installing group/module packages:
 diffstat                                                                x86_64                                    1.63-2.fc32                                                              fedora                                      43 k
...
 xorg-x11-server-utils                                                   x86_64                                    7.7-34.fc32                                                              fedora                                     188 k
Installing weak dependencies:
 kernel-devel                                                            x86_64                                    5.6.8-300.fc32                                                           updates                                     13 M
Installing Groups:
 Development Tools

Transaction Summary
=============================================================================================================================================================================================================================================
Install  79 Packages

Total download size: 124 M
Installed size: 448 M
Is this ok [y/N]: y
Downloading Packages:
(1/79): git-2.26.2-1.fc32.x86_64.rpm                                                                                                                                                                         787 kB/s | 126 kB     00:00
...
(79/79): xorg-x11-fonts-ISO8859-1-100dpi-7.5-24.fc32.noarch.rpm                                                                                                                                              2.6 MB/s | 1.0 MB     00:00
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                                                        6.9 MB/s | 124 MB     00:18
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                                                     1/1
  Installing       : urw-base35-fonts-common-20170801-14.fc32.noarch                                                                                                                                                                    1/79
...
  Running scriptlet: diffstat-1.63-2.fc32.x86_64                                                                                                                                                                                       79/79
  Verifying        : cpp-10.0.1-0.14.fc32.x86_64                                                                                                                                                                                        1/79
...
  Verifying        : xorg-x11-server-utils-7.7-34.fc32.x86_64                                                                                                                                                                          79/79

Installed:
  adobe-mappings-cmap-20171205-7.fc32.noarch                    adobe-mappings-cmap-deprecated-20171205-7.fc32.noarch  adobe-mappings-pdf-20180407-5.fc32.noarch                   binutils-2.34-2.fc32.x86_64
  binutils-gold-2.34-2.fc32.x86_64                              boost-filesystem-1.69.0-15.fc32.x86_64                 boost-system-1.69.0-15.fc32.x86_64                          boost-thread-1.69.0-15.fc32.x86_64
  cpp-10.0.1-0.14.fc32.x86_64                                   diffstat-1.63-2.fc32.x86_64                            doxygen-1:1.8.17-2.fc32.x86_64                              dyninst-10.1.0-5.fc32.x86_64
  gcc-10.0.1-0.14.fc32.x86_64                                   gd-2.3.0-1.fc32.x86_64                                 git-2.26.2-1.fc32.x86_64                                    git-core-2.26.2-1.fc32.x86_64
  git-core-doc-2.26.2-1.fc32.noarch                             glibc-devel-2.31-2.fc32.x86_64                         glibc-headers-2.31-2.fc32.x86_64                            google-droid-sans-fonts-20200215-3.fc32.noarch
  graphviz-2.42.4-1.fc32.x86_64                                 gtk2-2.24.32-7.fc32.x86_64                             gts-0.7.6-37.20121130.fc32.x86_64                           guile22-2.2.6-4.fc32.x86_64
  isl-0.16.1-10.fc32.x86_64                                     jbig2dec-libs-0.17-4.fc32.x86_64                       kernel-devel-5.6.8-300.fc32.x86_64                          kernel-headers-5.6.7-300.fc32.x86_64
  lasi-1.1.3-2.fc32.x86_64                                      libXaw-1.0.13-14.fc32.x86_64                           libXmu-1.1.3-3.fc32.x86_64                                  libXpm-3.5.13-2.fc32.x86_64
  libXt-1.2.0-1.fc32.x86_64                                     libfontenc-1.1.3-12.fc32.x86_64                        libgs-9.52-1.fc32.x86_64                                    libidn-1.35-7.fc32.x86_64
  libijs-0.35-11.fc32.x86_64                                    libimagequant-2.12.6-2.fc32.x86_64                     libmcpp-2.7.2-25.fc32.x86_64                                libmpc-1.1.0-8.fc32.x86_64
  libpaper-1.1.24-26.fc32.x86_64                                libraqm-0.7.0-5.fc32.x86_64                            librsvg2-2.48.4-1.fc32.x86_64                               libserf-1.3.9-15.fc32.x86_64
  libwebp-1.1.0-2.fc32.x86_64                                   libxcrypt-devel-4.4.16-3.fc32.x86_64                   make-1:4.2.1-16.fc32.x86_64                                 mcpp-2.7.2-25.fc32.x86_64
  netpbm-10.90.00-1.fc32.x86_64                                 openjpeg2-2.3.1-6.fc32.x86_64                          patch-2.7.6-12.fc32.x86_64                                  patchutils-0.3.4-15.fc32.x86_64
  perl-Error-1:0.17029-1.fc32.noarch                            perl-Git-2.26.2-1.fc32.noarch                          perl-TermReadKey-2.38-6.fc32.x86_64                         subversion-1.12.2-7.fc32.x86_64
  subversion-libs-1.12.2-7.fc32.x86_64                          systemtap-4.3-0.20200211git91ffb97ad335.fc32.x86_64    systemtap-client-4.3-0.20200211git91ffb97ad335.fc32.x86_64  systemtap-devel-4.3-0.20200211git91ffb97ad335.fc32.x86_64
  systemtap-runtime-4.3-0.20200211git91ffb97ad335.fc32.x86_64   tbb-2020.2-1.fc32.x86_64                               urw-base35-bookman-fonts-20170801-14.fc32.noarch            urw-base35-c059-fonts-20170801-14.fc32.noarch
  urw-base35-d050000l-fonts-20170801-14.fc32.noarch             urw-base35-fonts-20170801-14.fc32.noarch               urw-base35-fonts-common-20170801-14.fc32.noarch             urw-base35-gothic-fonts-20170801-14.fc32.noarch
  urw-base35-nimbus-mono-ps-fonts-20170801-14.fc32.noarch       urw-base35-nimbus-roman-fonts-20170801-14.fc32.noarch  urw-base35-nimbus-sans-fonts-20170801-14.fc32.noarch        urw-base35-p052-fonts-20170801-14.fc32.noarch
  urw-base35-standard-symbols-ps-fonts-20170801-14.fc32.noarch  urw-base35-z003-fonts-20170801-14.fc32.noarch          utf8proc-2.4.0-3.fc32.x86_64                                xapian-core-libs-1.4.14-1.fc32.x86_64
  xorg-x11-font-utils-1:7.5-44.fc32.x86_64                      xorg-x11-fonts-ISO8859-1-100dpi-7.5-24.fc32.noarch     xorg-x11-server-utils-7.7-34.fc32.x86_64

Complete!

mce-inject
Code:
[root@localhost ~]# wget https://github.com/andikleen/mce-inject/archive/master.zip
--2020-05-08 00:49:09--  https://github.com/andikleen/mce-inject/archive/master.zip
Resolving github.com (github.com)... 140.82.118.3
Connecting to github.com (github.com)|140.82.118.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/andikleen/mce-inject/zip/master [following]
--2020-05-08 00:49:09--  https://codeload.github.com/andikleen/mce-inject/zip/master
Resolving codeload.github.com (codeload.github.com)... 140.82.114.9
Connecting to codeload.github.com (codeload.github.com)|140.82.114.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘master.zip’

master.zip                        [ <=>                                              ]  13.21K  --.-KB/s    in 0.09s

2020-05-08 00:49:10 (139 KB/s) - ‘master.zip’ saved [13530]

[root@localhost ~]# unzip master.zip
Archive:  master.zip
4cbe46321b4a81365ff3aafafe63967264dbfec5
   creating: mce-inject-master/
  inflating: mce-inject-master/Makefile
  inflating: mce-inject-master/README
  inflating: mce-inject-master/inject.h
  inflating: mce-inject-master/mce-inject.8
  inflating: mce-inject-master/mce-inject.c
  inflating: mce-inject-master/mce.h
  inflating: mce-inject-master/mce.lex
  inflating: mce-inject-master/mce.y
  inflating: mce-inject-master/parser.h
   creating: mce-inject-master/test/
  inflating: mce-inject-master/test/corrected
  inflating: mce-inject-master/test/fatal
  inflating: mce-inject-master/test/uncorrected
  inflating: mce-inject-master/util.c
  inflating: mce-inject-master/util.h
[root@localhost ~]# cd mce-inject-master/
[root@localhost mce-inject-master]# ls -la
total 48
drwxr-xr-x. 3 root root  189 Jan 19  2013 .
drwxr-xr-x. 3 root root   49 May  8 00:49 ..
-rw-r--r--. 1 root root  193 Jan 19  2013 inject.h
-rw-r--r--. 1 root root  904 Jan 19  2013 Makefile
-rw-r--r--. 1 root root 3863 Jan 19  2013 mce.h
-rw-r--r--. 1 root root 3793 Jan 19  2013 mce-inject.8
-rw-r--r--. 1 root root 6506 Jan 19  2013 mce-inject.c
-rw-r--r--. 1 root root 3487 Jan 19  2013 mce.lex
-rw-r--r--. 1 root root 3822 Jan 19  2013 mce.y
-rw-r--r--. 1 root root  385 Jan 19  2013 parser.h
-rw-r--r--. 1 root root 1460 Jan 19  2013 README
drwxr-xr-x. 2 root root   55 Jan 19  2013 test
-rw-r--r--. 1 root root  364 Jan 19  2013 util.c
-rw-r--r--. 1 root root  290 Jan 19  2013 util.h

[root@localhost mce-inject-master]# make
bison -d mce.y
flex mce.lex
cc -MM -DDEPS_RUN -I. mce-inject.c util.c mce.tab.c lex.yy.c > .depend.X && \
        mv .depend.X .depend
cc -Os -g -Wall   -c -o mce-inject.o mce-inject.c
cc -Os -g -Wall   -c -o mce.tab.o mce.tab.c
cc -Os -g -Wall   -c -o lex.yy.o lex.yy.c
cc -Os -g -Wall   -c -o util.o util.c
cc -pthread  mce-inject.o mce.tab.o lex.yy.o util.o   -o mce-inject
[root@localhost mce-inject-master]# ls -la
total 400
drwxr-xr-x. 3 root root  4096 May  8 01:01 .
drwxr-xr-x. 3 root root    49 May  8 00:49 ..
-rw-r--r--. 1 root root    45 May  8 00:54 correct
-rw-r--r--. 1 root root   185 May  8 01:01 .depend
-rw-r--r--. 1 root root   193 Jan 19  2013 inject.h
-rw-r--r--. 1 root root 47534 May  8 01:01 lex.yy.c
-rw-r--r--. 1 root root 73320 May  8 01:01 lex.yy.o
-rw-r--r--. 1 root root   904 Jan 19  2013 Makefile
-rw-r--r--. 1 root root  3863 Jan 19  2013 mce.h
-rwxr-xr-x. 1 root root 84584 May  8 01:01 mce-inject
-rw-r--r--. 1 root root  3793 Jan 19  2013 mce-inject.8
-rw-r--r--. 1 root root  6506 Jan 19  2013 mce-inject.c
-rw-r--r--. 1 root root 38960 May  8 01:01 mce-inject.o
-rw-r--r--. 1 root root  3487 Jan 19  2013 mce.lex
-rw-r--r--. 1 root root 56619 May  8 01:01 mce.tab.c
-rw-r--r--. 1 root root  2922 May  8 01:01 mce.tab.h
-rw-r--r--. 1 root root 25552 May  8 01:01 mce.tab.o
-rw-r--r--. 1 root root  3822 Jan 19  2013 mce.y
-rw-r--r--. 1 root root   385 Jan 19  2013 parser.h
-rw-r--r--. 1 root root  1460 Jan 19  2013 README
drwxr-xr-x. 2 root root    55 Jan 19  2013 test
-rw-r--r--. 1 root root   364 Jan 19  2013 util.c
-rw-r--r--. 1 root root   290 Jan 19  2013 util.h
-rw-r--r--. 1 root root  8128 May  8 01:01 util.o
[root@localhost mce-inject-master]# modprobe mce_inject
[root@localhost mce-inject-master]# vi correct
[root@localhost mce-inject-master]# cat correct
CPU 1 BANK 2
STATUS corrected
RIP 0x12341234

Prevent the machine from crashing
Code:
[root@localhost mce-inject-master]# cd /sys/devices/system/machinecheck/machinecheck0
[root@localhost machinecheck0]# cat tolerant
1
[root@localhost machinecheck0]# vi tolerant
[root@localhost machinecheck0]# cat tolerant
3
[root@localhost machinecheck0]# 

Check edac status
Code:
[root@localhost ~]# ls /sys/devices/system/edac/mc
mc0  power  subsystem  uevent
[root@localhost ~]# find /lib/modules/$(uname -r) -name '*edac*'
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/amd64_edac_mod.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/e752x_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/edac_mce_amd.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i10nm_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i3000_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i3200_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i5000_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i5100_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i5400_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i7300_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i7core_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i82975x_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/ie31200_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/pnd2_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/sb_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/skx_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/x38_edac.ko.xz
[root@localhost ~]# edac-util -rfull
mc0:csrow2:mc#0csrow#2channel#0:CE:0
mc0:csrow2:mc#0csrow#2channel#1:CE:0
mc0:csrow3:mc#0csrow#3channel#0:CE:0
mc0:csrow3:mc#0csrow#3channel#1:CE:0
mc0:noinfo:all:UE:0
mc0:noinfo:all:CE:0

Inject the error and observe the result
Code:
[root@localhost mce-inject-master]# modprobe mce_inject
[root@localhost mce-inject-master]# ./mce-inject correct
[root@localhost mce-inject-master]#
Message from syslogd@localhost at May  8 01:02:10 ...
 kernel:[Hardware Error]: Corrected error, no action required.

Message from syslogd@localhost at May  8 01:02:10 ...
 kernel:[Hardware Error]: CPU:1 (17:71:0) MC2_STATUS[-|CE|-|-|-|-|-|-|-|-]: 0x9000000000000000

Message from syslogd@localhost at May  8 01:02:10 ...
 kernel:[Hardware Error]: IPID: 0x0000000000000000

Message from syslogd@localhost at May  8 01:02:10 ...
 kernel:[Hardware Error]: L2 Cache Ext. Error Code: 0, L2M Tag Multiple-Way-Hit error.

Message from syslogd@localhost at May  8 01:02:10 ...
 kernel:[Hardware Error]: cache level: RESV, tx: INSN

[root@localhost mce-inject-master]# ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.

No devlink errors.
Disk errors summary:
        0:0 has 1 errors
MCE records summary:
        1 Corrected error, no action required. errors
[root@localhost mce-inject-master]#
 

diversity

Contributor
Joined
Dec 4, 2018
Messages
128
Finally some real progress on this matter! Thanks to Diversity for getting my hopes up again, cause I almost gave up on this...

Always here to help! getting ready to start poking in memory banks again when my watch maker magnifying glasses arrive :)
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
202
Hi everyone,

It's been awhile ago, but I have another update...

I'll start with a summary and then post all "evidence". This is all regarding the platform ASRock Rack X470D4U2-2T + AMD Ryzen 3x00 (Zen 2 cores) + ECC memory (see my signature for more specifics), using the latest stable BIOS. I am using the "overclock the memory until it is barely stable"-method, as described earlier posts.
  1. Memory Injection on Linux, using mce-inject, as described some posts earlier, does not inject memory errors on a platform level, but only on an OS level. So it is not suitable for testing if the IPMI / BMC properly handles memory error detection. We've discovered this because the "Platform First Error Handling" toggle in the BIOS, has no effect on this method.
  2. ECC correction works!
    • Already confirmed / proven earlier in this thread.
    • When using default BIOS settings.
  3. (Corrected) single-bit ECC memory error detection by "the OS" works (if correctly implemented)!
    • Already confirmed / proven earlier in this thread.
    • But only when setting "Platform First Error Handling" to disabled in the BIOS.
    • Works on for example
      • Memtest86 v8.4 or higher
      • Linux kernel 5.6 or higher
      • TrueNAS 12.0 beta 1 (not on FreeNAS 11.3)
  4. (Uncorrected) multi-bit ECC memory error detection by "the OS" works (if correctly implemented)!
    • This is a new discovery.
    • But only when setting "Platform First Error Handling" to disabled in the BIOS.
    • Works on for example
      • Memtest86 (unreleased version - fixes will be included in next release)
      • Linux kernel 5.7 (probably also on 5.6, but I didn't try it)
      • Not sure about TrueNAS 12.0 beta 1. I haven't been able to trigger or recognize it yet.
  5. IPMI / BMC is unable to detect any kind of memory error
    • Confirmed once more.
    • Even when setting "Platform First Error Handling" to enabled in the BIOS.
    • Asrock Rack is (hopefully) still working on getting this fixed?
(Uncorrected) multi-bit ECC memory error detection by "the OS"
Memtest86
After notifying Passmark that Linux is able to detect (uncorrected) multi-bit ECC memory errors and Memtest86 v8.4 isn't, they've asked me to send the log files. They then provided me a new version (so far still unreleased) which fixes the issue and can properly detect (uncorrected) multi-bit ECC memory errors!
Sorry, forgot to take a screenshot of this one. I do still have the log file. Here is the summary of the report:

Test Start Time2020-05-25 08:19:11
Elapsed Time1:47:46
Memory Range Tested0x0 - 80F380000 (33011MB)
CPU Selection ModeParallel (All CPUs)
ECC PollingEnabled
# Tests Passed7/19 (36%)
Lowest Error Address0x489128C48 (18577MB)
Highest Error Address0x73D8367A0 (29656MB)
Bits in Error Mask00000000FDDFFFFF
Bits in Error30
Max Contiguous Errors2
ECC Correctable Errors2689
ECC Uncorrectable Errors244

Linux
Maybe I have triggered these earlier already, but I didn't notice them till recently.

[root@localhost ~]# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow2: 0 Uncorrected Errors
mc0: csrow2: mc#0csrow#2channel#0: 0 Corrected Errors
mc0: csrow2: mc#0csrow#2channel#1: 3 Corrected Errors
mc0: csrow3: 1 Uncorrected Errors
mc0: csrow3: mc#0csrow#3channel#0: 3 Corrected Errors
mc0: csrow3: mc#0csrow#3channel#1: 0 Corrected Errors
[root@localhost ~]# ras-mc-ctl --summary
Memory controller events summary:
Corrected on DIMM Label(s): 'mc#0csrow#2channel#1' location: 0:2:1:-1 errors: 3
Corrected on DIMM Label(s): 'mc#0csrow#3channel#0' location: 0:3:0:-1 errors: 3
Fatal on DIMM Label(s): 'mc#0csrow#3channel#0' location: 0:3:0:-1 errors: 1

No PCIe AER errors.

No Extlog errors.

No devlink errors.
Disk errors summary:
0:0 has 17 errors
0:2048 has 147 errors
0:2816 has 4 errors
MCE records summary:
12 Corrected error, no action required. errors
1 Deferred error, no action required. errors
2 Uncorrected, software containable error. errors
[root@localhost ~]#


Code:
[root@localhost ~]# cat /var/log/messages
...
May 20 00:08:59 localhost rasdaemon[995]: rasdaemon: mce_record store: 0x55aaea8a19e8
May 20 00:08:59 localhost kernel: mce_notify_irq: 1 callbacks suppressed

May 20 00:08:59 localhost kernel: mce: [Hardware Error]: Machine check events logged
May 20 00:08:59 localhost kernel: [Hardware Error]: Corrected error, no action required.
May 20 00:08:59 localhost kernel: [Hardware Error]: CPU:0 (17:71:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
May 20 00:08:59 localhost kernel: [Hardware Error]: Error Addr: 0x00000003080ccb40
May 20 00:08:59 localhost kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0xf79c00000b800003
May 20 00:08:59 localhost kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
May 20 00:08:59 localhost kernel: EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x0)
May 20 00:08:59 localhost kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

May 20 00:08:59 localhost kernel: mce: [Hardware Error]: Machine check events logged
May 20 00:08:59 localhost kernel: [Hardware Error]: Corrected error, no action required.
May 20 00:08:59 localhost kernel: [Hardware Error]: CPU:0 (17:71:0) MC18_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
May 20 00:08:59 localhost kernel: [Hardware Error]: Error Addr: 0x00000003095cc100
May 20 00:08:59 localhost kernel: [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x510600800a800302
May 20 00:08:59 localhost kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
May 20 00:08:59 localhost kernel: EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#2channel#1 (csrow:2 channel:1 page:0x0 offset:0x0 grain:64 syndrome:0x80)
May 20 00:08:59 localhost kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

May 20 00:08:59 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:08:59 localhost rasdaemon[995]:           <...>-661   [000]     0.000066: mce_record:           2020-04-01 19:34:33 +0200 Unified Memory Controller (bank=17), status= dc2040000000011b, Corrected error, no action required., mci=Error_overflow CECC, mca= DRAM ECC error.
May 20 00:08:59 localhost rasdaemon[995]: Memory Error 'mem-tx: generic read, tx: generic, level: L3/generic', memory_channel=0,csrow=3, cpu_type= AMD Family 17h Zen1, cpu= 0, socketid= 0, misc= d01a0f7c01000000, addr= 3080ccb40, synd= f79c00000b800003, ipid= 9600050f00, mcgstatus=0, mcgcap= 11c, apicid= 0
May 20 00:08:59 localhost rasdaemon[995]: rasdaemon: mc_event store: 0x55aaea8a4418
May 20 00:08:59 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:08:59 localhost rasdaemon[995]:           <...>-661   [000]     0.000066: mc_event:             2020-04-01 19:34:33 +0200 1 Corrected error: Cannot decode normalized address on mc#0csrow#3channel#0 (mc: 0 location: 3:0 grain: 6)
May 20 00:08:59 localhost rasdaemon[995]: rasdaemon: mce_record store: 0x55aaea8a19e8
May 20 00:08:59 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:08:59 localhost rasdaemon[995]:           <...>-661   [000]     0.000066: mce_record:           2020-04-01 19:34:33 +0200 Unified Memory Controller (bank=18), status= dc2040000000011b, Corrected error, no action required., mci=Error_overflow CECC, mca= DRAM ECC error.
May 20 00:08:59 localhost rasdaemon[995]: Memory Error 'mem-tx: generic read, tx: generic, level: L3/generic', memory_channel=1,csrow=2, cpu_type= AMD Family 17h Zen1, cpu= 0, socketid= 0, misc= d01a01d301000000, addr= 3095cc100, synd= 510600800a800302, ipid= 9600150f00, mcgstatus=0, mcgcap= 11c, apicid= 0
May 20 00:08:59 localhost rasdaemon[995]: rasdaemon: mc_event store: 0x55aaea8a4418
May 20 00:08:59 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:08:59 localhost rasdaemon[995]:           <...>-661   [000]     0.000066: mc_event:             2020-04-01 19:34:33 +0200 1 Corrected error: Cannot decode normalized address on mc#0csrow#2channel#1 (mc: 0 location: 2:1 grain: 6 syndrome: 0x00000080)
May 20 00:08:59 localhost abrt-server[1611]: Not saving repeating crash in '/boot/vmlinuz-5.6.8-300.fc32.x86_64'
May 20 00:08:59 localhost abrt-server[1614]: Not saving repeating crash in '/boot/vmlinuz-5.6.8-300.fc32.x86_64'
May 20 00:08:59 localhost abrt-server[1618]: Not saving repeating crash in '/boot/vmlinuz-5.6.8-300.fc32.x86_64'
May 20 00:08:59 localhost systemd[1]: Started dbus-:1.3-org.freedesktop.problems@2.service.
May 20 00:08:59 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-:1.3-org.freedesktop.problems@2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 20 00:09:00 localhost abrt-dump-journal-oops[1036]: abrt-dump-journal-oops: Found oopses: 1
May 20 00:09:00 localhost abrt-dump-journal-oops[1036]: abrt-dump-journal-oops: Creating problem directories
May 20 00:09:00 localhost abrt-notification[1657]: System encountered a non-fatal error in ??()
May 20 00:09:01 localhost abrt-dump-journal-oops[1036]: Reported 1 kernel oopses to Abrt
May 20 00:11:12 localhost systemd[1]: dbus-:1.3-org.freedesktop.problems@2.service: Succeeded.
May 20 00:11:12 localhost audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-:1.3-org.freedesktop.problems@2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 20 00:12:15 localhost kernel: mce: [Hardware Error]: Machine check events logged
May 20 00:12:15 localhost kernel: [Hardware Error]: Corrected error, no action required.
May 20 00:12:15 localhost kernel: [Hardware Error]: CPU:0 (17:71:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
May 20 00:12:15 localhost kernel: [Hardware Error]: Error Addr: 0x0000000301a4ef80
May 20 00:12:15 localhost kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0xf79c00000b800003
May 20 00:12:15 localhost kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
May 20 00:12:15 localhost kernel: EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x0)
May 20 00:12:15 localhost kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
May 20 00:12:15 localhost rasdaemon[995]: rasdaemon: mce_record store: 0x55aaea8a19e8
May 20 00:12:15 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:12:15 localhost rasdaemon[995]:           <...>-661   [000]     0.000086: mce_record:           2020-04-01 19:37:49 +0200 Unified Memory Controller (bank=17), status= dc2040000000011b, Corrected error, no action required., mci=Error_overflow CECC, mca= DRAM ECC error.
May 20 00:12:15 localhost rasdaemon[995]: Memory Error 'mem-tx: generic read, tx: generic, level: L3/generic', memory_channel=0,csrow=3, cpu_type= AMD Family 17h Zen1, cpu= 0, socketid= 0, misc= d01b0fff01000000, addr= 301a4ef80, synd= f79c00000b800003, ipid= 9600050f00, mcgstatus=0, mcgcap= 11c, apicid= 0
May 20 00:12:15 localhost rasdaemon[995]: rasdaemon: mc_event store: 0x55aaea8a4418
May 20 00:12:15 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:12:15 localhost rasdaemon[995]:           <...>-661   [000]     0.000086: mc_event:             2020-04-01 19:37:49 +0200 1 Corrected error: Cannot decode normalized address on mc#0csrow#3channel#0 (mc: 0 location: 3:0 grain: 6)
May 20 00:12:15 localhost abrt-server[1674]: Not saving repeating crash in '/boot/vmlinuz-5.6.8-300.fc32.x86_64'
May 20 00:12:15 localhost systemd[1]: Started dbus-:1.3-org.freedesktop.problems@3.service.
May 20 00:12:15 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-:1.3-org.freedesktop.problems@3 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 20 00:12:17 localhost abrt-dump-journal-oops[1036]: abrt-dump-journal-oops: Found oopses: 1
May 20 00:12:17 localhost abrt-dump-journal-oops[1036]: abrt-dump-journal-oops: Creating problem directories
May 20 00:12:17 localhost abrt-notification[1710]: System encountered a non-fatal error in ??()
May 20 00:12:18 localhost abrt-dump-journal-oops[1036]: Reported 1 kernel oopses to Abrt
May 20 00:12:59 localhost systemd[1]: Starting Cleanup of Temporary Directories...
May 20 00:12:59 localhost systemd-tmpfiles[1712]: /usr/lib/tmpfiles.d/BackupPC.conf:1: Line references path below legacy directory /var/run/, updating /var/run/BackupPC → /run/BackupPC; please update the tmpfiles.d/ drop-in file accordingly.
May 20 00:12:59 localhost systemd-tmpfiles[1712]: /etc/tmpfiles.d/tpm2-tss-fapi.conf:3: Line references path below legacy directory /var/run/, updating /var/run/tpm2-tss/eventlog → /run/tpm2-tss/eventlog; please update the tmpfiles.d/ drop-in file accordingly.
May 20 00:12:59 localhost systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
May 20 00:12:59 localhost systemd[1]: Finished Cleanup of Temporary Directories.
May 20 00:12:59 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 20 00:12:59 localhost audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 20 00:14:26 localhost rasdaemon[995]: rasdaemon: mce_record store: 0x55aaea8a19e8

May 20 00:14:26 localhost kernel: mce: [Hardware Error]: Machine check events logged
May 20 00:14:26 localhost kernel: [Hardware Error]: Corrected error, no action required.
May 20 00:14:26 localhost kernel: [Hardware Error]: CPU:0 (17:71:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
May 20 00:14:26 localhost kernel: [Hardware Error]: Error Addr: 0x0000000395164300
May 20 00:14:26 localhost kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0xf79c00000b800003
May 20 00:14:26 localhost kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
May 20 00:14:26 localhost kernel: EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x0)
May 20 00:14:26 localhost kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

May 20 00:14:26 localhost kernel: mce: [Hardware Error]: Machine check events logged
May 20 00:14:26 localhost kernel: [Hardware Error]: Corrected error, no action required.
May 20 00:14:26 localhost kernel: [Hardware Error]: CPU:0 (17:71:0) MC18_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
May 20 00:14:26 localhost kernel: [Hardware Error]: Error Addr: 0x000000030088c100
May 20 00:14:26 localhost kernel: [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x510600800a800302
May 20 00:14:26 localhost kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
May 20 00:14:26 localhost kernel: EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#2channel#1 (csrow:2 channel:1 page:0x0 offset:0x0 grain:64 syndrome:0x80)
May 20 00:14:26 localhost kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

May 20 00:14:26 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:14:26 localhost rasdaemon[995]:           <...>-661   [000]     0.000099: mce_record:           2020-04-01 19:40:01 +0200 Unified Memory Controller (bank=17), status= dc2040000000011b, Corrected error, no action required., mci=Error_overflow CECC, mca= DRAM ECC error.
May 20 00:14:26 localhost rasdaemon[995]: Memory Error 'mem-tx: generic read, tx: generic, level: L3/generic', memory_channel=0,csrow=3, cpu_type= AMD Family 17h Zen1, cpu= 0, socketid= 0, misc= d01b0fff01000000, addr= 395164300, synd= f79c00000b800003, ipid= 9600050f00, mcgstatus=0, mcgcap= 11c, apicid= 0
May 20 00:14:26 localhost rasdaemon[995]: rasdaemon: mc_event store: 0x55aaea8a4418
May 20 00:14:26 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:14:26 localhost rasdaemon[995]:           <...>-661   [000]     0.000099: mc_event:             2020-04-01 19:40:01 +0200 1 Corrected error: Cannot decode normalized address on mc#0csrow#3channel#0 (mc: 0 location: 3:0 grain: 6)
May 20 00:14:26 localhost rasdaemon[995]: rasdaemon: mce_record store: 0x55aaea8a19e8
May 20 00:14:26 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:14:26 localhost rasdaemon[995]:           <...>-661   [000]     0.000099: mce_record:           2020-04-01 19:40:01 +0200 Unified Memory Controller (bank=18), status= dc2040000000011b, Corrected error, no action required., mci=Error_overflow CECC, mca= DRAM ECC error.
May 20 00:14:26 localhost rasdaemon[995]: Memory Error 'mem-tx: generic read, tx: generic, level: L3/generic', memory_channel=1,csrow=2, cpu_type= AMD Family 17h Zen1, cpu= 0, socketid= 0, misc= d01a033c01000000, addr= 30088c100, synd= 510600800a800302, ipid= 9600150f00, mcgstatus=0, mcgcap= 11c, apicid= 0
May 20 00:14:26 localhost rasdaemon[995]: rasdaemon: mc_event store: 0x55aaea8a4418
May 20 00:14:26 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:14:26 localhost rasdaemon[995]:           <...>-661   [000]     0.000099: mc_event:             2020-04-01 19:40:01 +0200 1 Corrected error: Cannot decode normalized address on mc#0csrow#2channel#1 (mc: 0 location: 2:1 grain: 6 syndrome: 0x00000080)
May 20 00:14:26 localhost abrt-server[1729]: Not saving repeating crash in '/boot/vmlinuz-5.6.8-300.fc32.x86_64'
May 20 00:14:26 localhost abrt-server[1732]: Not saving repeating crash in '/boot/vmlinuz-5.6.8-300.fc32.x86_64'
May 20 00:14:26 localhost abrt-server[1735]: Not saving repeating crash in '/boot/vmlinuz-5.6.8-300.fc32.x86_64'
May 20 00:14:28 localhost abrt-dump-journal-oops[1036]: abrt-dump-journal-oops: Found oopses: 1
May 20 00:14:28 localhost abrt-dump-journal-oops[1036]: abrt-dump-journal-oops: Creating problem directories
May 20 00:14:28 localhost abrt-notification[1772]: System encountered a non-fatal error in ??()
May 20 00:14:28 localhost systemd[1]: dbus-:1.3-org.freedesktop.problems@3.service: Succeeded.
May 20 00:14:28 localhost audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-:1.3-org.freedesktop.problems@3 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 20 00:14:29 localhost abrt-dump-journal-oops[1036]: Reported 1 kernel oopses to Abrt
May 20 00:17:03 localhost rasdaemon[995]: rasdaemon: mce_record store: 0x55aaea8a19e8

May 20 00:17:03 localhost kernel: mce: Uncorrected hardware memory error in user-access at 621211640
May 20 00:17:03 localhost kernel: mce: [Hardware Error]: Machine check events logged
May 20 00:17:03 localhost kernel: [Hardware Error]: Uncorrected, software restartable error.
May 20 00:17:03 localhost kernel: [Hardware Error]: CPU:9 (17:71:0) MC0_STATUS[-|UE|MiscV|AddrV|-|-|-|UECC|-|Poison|-]: 0xbc002800000c0135
May 20 00:17:03 localhost kernel: [Hardware Error]: Error Addr: 0x0000000621211640
May 20 00:17:03 localhost kernel: [Hardware Error]: IPID: 0x000000b000000000
May 20 00:17:03 localhost kernel: [Hardware Error]: Load Store Unit Ext. Error Code: 12, DC Data error type 1 and poison consumption.
May 20 00:17:03 localhost kernel: [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD
May 20 00:17:03 localhost kernel: Memory failure: 0x621211: Sending SIGBUS to memtester:1666 due to hardware memory corruption
May 20 00:17:03 localhost kernel: Memory failure: 0x621211: recovery action for dirty LRU page: Recovered
May 20 00:17:03 localhost audit[1666]: ANOM_ABEND auid=0 uid=0 gid=0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=1666 comm="memtester" exe="/usr/bin/memtester" sig=7 res=1
May 20 00:17:03 localhost rasdaemon[995]: rasdaemon: register inserted at db
May 20 00:17:03 localhost rasdaemon[995]:           <...>-213   [009]     0.000114: mce_record:           2020-04-01 19:42:37 +0200 Load Store Unit (bank=0), status= bc002800000c0135, Uncorrected, software containable error., mci=UECC Poison consumed, mca= DC data error type 1 (poison consumption).
May 20 00:17:03 localhost rasdaemon[995]: Memory Error 'mem-tx: data read, tx: data, level: L1', cpu_type= AMD Family 17h Zen1, cpu= 9, socketid= 0, ip= 401e81, cs= 33, misc= d01a000000000000, addr= 621211640, ipid= b000000000, mcgstatus=7 RIPV EIPV MCIP, mcgcap= 11c, apicid= 9
May 20 00:17:03 localhost audit: BPF prog-id=44 op=LOAD
May 20 00:17:03 localhost audit: BPF prog-id=45 op=LOAD
May 20 00:17:03 localhost audit: BPF prog-id=46 op=LOAD
May 20 00:17:03 localhost systemd[1]: Started Process Core Dump (PID 1790/UID 0).
May 20 00:17:03 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-1790-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 20 00:17:03 localhost systemd[1]: Started dbus-:1.3-org.freedesktop.problems@4.service.
May 20 00:17:03 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-:1.3-org.freedesktop.problems@4 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 20 00:17:04 localhost abrt-dump-journal-oops[1036]: abrt-dump-journal-oops: Found oopses: 1
May 20 00:17:04 localhost abrt-dump-journal-oops[1036]: abrt-dump-journal-oops: Creating problem directories
May 20 00:17:05 localhost abrt-dump-journal-oops[1036]: Reported 1 kernel oopses to Abrt
May 20 00:17:06 localhost abrt-notification[1833]: System encountered a non-fatal error in ??()
May 20 00:17:07 localhost systemd-coredump[1792]: Core file was truncated to 2147483648 bytes.
May 20 00:17:08 localhost abrt-dump-journal-core[1035]: Failed to obtain all required information from journald
May 20 00:17:12 localhost systemd-coredump[1792]: Process 1666 (memtester) of user 0 dumped core.#012#012Stack trace of thread 1666:#012#0  0x0000000000401e81 compare_regions (/usr/bin/memtester + 0x1e81)
May 20 00:17:12 localhost systemd[1]: systemd-coredump@1-1790-0.service: Succeeded.
May 20 00:17:12 localhost audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-1790-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 20 00:17:12 localhost systemd[1]: systemd-coredump@1-1790-0.service: Consumed 1.976s CPU time.
May 20 00:17:12 localhost audit: BPF prog-id=46 op=UNLOAD
May 20 00:17:12 localhost audit: BPF prog-id=45 op=UNLOAD
May 20 00:17:12 localhost audit: BPF prog-id=44 op=UNLOAD
May 20 00:17:17 localhost abrtd[1003]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'oops-2020-05-20-00:17:04-1036-0'
May 20 00:17:17 localhost abrtd[1003]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'oops-2020-05-20-00:14:28-1036-0'
May 20 00:17:17 localhost abrtd[1003]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'oops-2020-05-20-00:12:17-1036-0'
May 20 00:17:17 localhost abrtd[1003]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'oops-2020-05-20-00:09:00-1036-0'
May 20 00:17:17 localhost abrtd[1003]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'oops-2020-05-20-00:03:33-1036-0'
May 20 00:17:17 localhost abrtd[1003]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ras-2020-05-20-00:03:31-995'
May 20 00:17:17 localhost abrtd[1003]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ras-2020-05-20-00:17:03-995'
May 20 00:17:17 localhost abrtd[1003]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ras-2020-05-20-00:12:15-995'
May 20 00:17:17 localhost abrtd[1003]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ras-2020-05-20-00:14:26-995'
May 20 00:17:17 localhost abrtd[1003]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ras-2020-05-20-00:08:59-995'
May 20 00:17:17 localhost abrt-server[1844]: Error: No segments found in coredump './coredump'
May 20 00:17:17 localhost abrt-server[1844]: Can't open file 'core_backtrace' for reading: No such file or directory
May 20 00:17:17 localhost abrt-notification[1889]: Process 1666 (memtester) crashed in ??()
 
Last edited:

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
202
FreeNAS / TrueNAS testing
I've also done some brief testing in FreeNAS / TrueNAS. For this I've created a Fedora 32 virtual machine inside FreeNAS / TrueNAS, allocating 20GB of the 32GB of RAM to the VM and then ran "memtester 18gb" in the Fedora VM to stress the memory. Below are the results:
  • FreeNAS 11.3 U3.2 (and probably earlier as well) does not detect anything at all. It just crashes after awhile (probably when an uncorrected error occurs). I couldn't find anything in the logs.
  • TrueNAS 12.0 beta 1 properly detects the corrected errors and shows the following on the console and in /var/log/messages
Code:
Jul  7 13:08:50 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:08:50 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:08:50 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:08:50 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:08:50 data MCA: Address 0x400000326059a00
Jul  7 13:08:50 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:08:50 data MCA: Bank 18, Status 0x9c2040000000011b
Jul  7 13:08:50 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:08:50 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:08:50 data MCA: CPU 0 COR GCACHE LG RD error
Jul  7 13:08:50 data MCA: Address 0x40000031dc09ae0
Jul  7 13:08:50 data MCA: Misc 0xd01a0ffc01000000
Jul  7 13:08:54 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:08:54 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:08:54 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:08:54 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:08:54 data MCA: Address 0x40000032772a880
Jul  7 13:08:54 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:08:54 data MCA: Bank 18, Status 0x9c2040000000011b
Jul  7 13:08:54 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:08:54 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:08:54 data MCA: CPU 0 COR GCACHE LG RD error
Jul  7 13:08:54 data MCA: Address 0x400000323044240
Jul  7 13:08:54 data MCA: Misc 0xd01a0ffc01000000
Jul  7 13:08:56 data kernel: ix1: link state changed to UP
Jul  7 13:09:51 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:09:51 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:09:51 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:09:51 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:09:51 data MCA: Address 0x4000003254bf4c0
Jul  7 13:09:51 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:09:51 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:09:51 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:09:51 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:09:51 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:09:51 data MCA: Address 0x4000003254b8240
Jul  7 13:09:51 data MCA: Misc 0xd01a0ffd01000000
Jul  7 13:12:52 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:12:52 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:12:52 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:12:52 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:12:52 data MCA: Address 0x4000003242494c0
Jul  7 13:12:52 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:12:52 data MCA: Bank 18, Status 0x9c2040000000011b
Jul  7 13:12:52 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:12:52 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:12:52 data MCA: CPU 0 COR GCACHE LG RD error
Jul  7 13:12:52 data MCA: Address 0x40000031dc09ac0
Jul  7 13:12:52 data MCA: Misc 0xd01a0ffc01000000
Jul  7 13:13:03 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:13:03 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:13:03 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:13:03 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:13:03 data MCA: Address 0x400000275f39e00
Jul  7 13:13:03 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:13:20 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:13:20 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:13:20 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:13:20 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:13:20 data MCA: Address 0x40000026edd1e00
Jul  7 13:13:20 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:13:20 data MCA: Bank 18, Status 0x9c2040000000011b
Jul  7 13:13:20 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:13:20 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:13:20 data MCA: CPU 0 COR GCACHE LG RD error
Jul  7 13:13:20 data MCA: Address 0x4000002c7adc4c0
Jul  7 13:13:20 data MCA: Misc 0xd01a0ffc01000000
Jul  7 13:14:17 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:14:17 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:14:17 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:14:17 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:14:17 data MCA: Address 0x4000002b303e9c0
Jul  7 13:14:17 data MCA: Misc 0xd01a0fac01000000
Jul  7 13:14:17 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:14:17 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:14:17 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:14:17 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:14:17 data MCA: Address 0x4000002c2c024c0
Jul  7 13:14:17 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:14:44 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:14:44 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:14:44 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:14:44 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:14:44 data MCA: Address 0x400000293281500
Jul  7 13:14:44 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:14:44 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:14:44 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:14:44 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:14:44 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:14:44 data MCA: Address 0x400000327290240
Jul  7 13:14:44 data MCA: Misc 0xd01a0ffb01000000
Jul  7 13:14:56 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:14:56 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:14:56 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:14:56 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:14:56 data MCA: Address 0x40000023a430300
Jul  7 13:14:56 data MCA: Misc 0xd01a0f3a01000000
Jul  7 13:14:56 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:14:56 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:14:56 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:14:56 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:14:56 data MCA: Address 0x4000002ab5afb00
Jul  7 13:14:56 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:15:08 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:15:08 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:15:08 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:15:08 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:15:08 data MCA: Address 0x400000217793440
Jul  7 13:15:08 data MCA: Misc 0xd01a0f9301000000
Jul  7 13:15:08 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:15:08 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:15:08 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:15:08 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:15:08 data MCA: Address 0x40000029ef7f880
Jul  7 13:15:08 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:15:23 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:15:23 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:15:23 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:15:23 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:15:23 data MCA: Address 0x4000002186c7440
Jul  7 13:15:23 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:15:23 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:15:23 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:15:23 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:15:23 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:15:23 data MCA: Address 0x400000327290240
Jul  7 13:15:23 data MCA: Misc 0xd01a0ff701000000
Jul  7 13:15:38 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:15:38 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:15:38 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:15:38 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:15:38 data MCA: Address 0x400000264398f40
Jul  7 13:15:38 data MCA: Misc 0xd01a0e2301000000
Jul  7 13:15:38 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:15:38 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:15:38 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:15:38 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:15:38 data MCA: Address 0x40000029aba7880
Jul  7 13:15:38 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:16:05 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:16:05 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:16:05 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:16:05 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:16:05 data MCA: Address 0x400000218d41440
Jul  7 13:16:05 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:16:05 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:16:05 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:16:05 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:16:05 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:16:05 data MCA: Address 0x4000002c11e64c0
Jul  7 13:16:05 data MCA: Misc 0xd01a0fed01000000
Jul  7 13:24:40 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:24:40 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:24:40 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:24:40 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:24:40 data MCA: Address 0x4000002beace500
Jul  7 13:24:40 data MCA: Misc 0xd01a085001000000
Jul  7 13:24:40 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:24:40 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:24:40 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:24:40 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:24:40 data MCA: Address 0x4000002c43a04c0
Jul  7 13:24:40 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:36:00 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:36:00 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:36:00 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:36:00 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:36:00 data MCA: Address 0x4000003242494e0
Jul  7 13:36:00 data MCA: Misc 0xd01a08ab01000000
Jul  7 13:36:00 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:36:00 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:36:00 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:36:00 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:36:00 data MCA: Address 0x4000002cd0a6200
Jul  7 13:36:00 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:38:35 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:38:35 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:38:35 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:38:35 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:38:35 data MCA: Address 0x4000002b50ea9c0
Jul  7 13:38:35 data MCA: Misc 0xd01a0e2301000000
Jul  7 13:38:35 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:38:35 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:38:35 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:38:35 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:38:35 data MCA: Address 0x40000032517d6c0
Jul  7 13:38:35 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:39:18 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:39:18 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:39:18 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:39:18 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:39:18 data MCA: Address 0x4000002c4bfd800
Jul  7 13:39:18 data MCA: Misc 0xd01a0c5201000000
Jul  7 13:39:18 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:39:18 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:39:18 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:39:18 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:39:18 data MCA: Address 0x4000002c16784c0
Jul  7 13:39:18 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:40:35 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:40:35 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:40:35 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:40:35 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:40:35 data MCA: Address 0x400000292f99500
Jul  7 13:40:35 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:40:35 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:40:35 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:40:35 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:40:35 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:40:35 data MCA: Address 0x4000002c6f2c4c0
Jul  7 13:40:35 data MCA: Misc 0xd01a0fe401000000
Jul  7 13:41:45 data MCA: Bank 17, Status 0xdc2040000000011b
Jul  7 13:41:45 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:41:45 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:41:45 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:41:45 data MCA: Address 0x40000027d3c2cc0
Jul  7 13:41:45 data MCA: Misc 0xd01b0fff01000000
Jul  7 13:41:45 data MCA: Bank 18, Status 0xdc2040000000011b
Jul  7 13:41:45 data MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Jul  7 13:41:45 data MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
Jul  7 13:41:45 data MCA: CPU 0 COR OVER GCACHE LG RD error
Jul  7 13:41:45 data MCA: Address 0x4000002c535a4c0
Jul  7 13:41:45 data MCA: Misc 0xd01a0fd801000000


I tried to decode them using mcelog, but it seems the CPU currently isn't supported. I am in contact with maintainer of the mcelog code to get this fixed...
Code:
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 17 TSC 5be1c86174
MISC d01b0fff01000000 ADDR 4000003085c6b40
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS 9c2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 17 TSC 5bf3eabd20
MISC d01b0fff01000000 ADDR 40000031f3bab00
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 2
CPU 0 BANK 17 TSC 5c1e4b68ec
MISC d01b0fff01000000 ADDR 400000305340b40
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 3
CPU 0 BANK 17 TSC 5d1e06bb8c
MISC d01a0ffd01000000 ADDR 40000032212fe80
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 4
CPU 0 BANK 18 TSC 5d1e06fb40
MISC d01b0fff01000000 ADDR 400000321c6e8c0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS 9c2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 5
CPU 0 BANK 17 TSC 5db84458c8
MISC d01a0ffa01000000 ADDR 400000321cb62c0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 6
CPU 0 BANK 18 TSC 5db8448fc4
MISC d01b0fff01000000 ADDR 400000326096240
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 7
CPU 0 BANK 17 TSC 6525f2fee0
MISC d01b0fff01000000 ADDR 400000326059a00
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 8
CPU 0 BANK 18 TSC 6525f33840
MISC d01a0ffc01000000 ADDR 40000031dc09ae0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS 9c2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 9
CPU 0 BANK 17 TSC 68c46a10b8
MISC d01b0fff01000000 ADDR 40000032772a880
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 10
CPU 0 BANK 18 TSC 68c46a4670
MISC d01a0ffc01000000 ADDR 400000323044240
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS 9c2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 11
CPU 0 BANK 17 TSC 97f381ab20
MISC d01b0fff01000000 ADDR 4000003254bf4c0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 12
CPU 0 BANK 18 TSC 97f381e0fc
MISC d01a0ffd01000000 ADDR 4000003254b8240
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 13
CPU 0 BANK 17 TSC 12f2bbd5708
MISC d01b0fff01000000 ADDR 4000003242494c0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 14
CPU 0 BANK 18 TSC 12f2bbd8dbc
MISC d01a0ffc01000000 ADDR 40000031dc09ac0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS 9c2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 15
CPU 0 BANK 17 TSC 138ab97bb58
MISC d01b0fff01000000 ADDR 400000275f39e00
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 16
CPU 0 BANK 17 TSC 1476695ee60
MISC d01b0fff01000000 ADDR 40000026edd1e00
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 17
CPU 0 BANK 18 TSC 147669621d8
MISC d01a0ffc01000000 ADDR 4000002c7adc4c0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS 9c2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 18
CPU 0 BANK 17 TSC 17684f22960
MISC d01a0fac01000000 ADDR 4000002b303e9c0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 19
CPU 0 BANK 18 TSC 17684f25bdc
MISC d01b0fff01000000 ADDR 4000002c2c024c0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 20
CPU 0 BANK 17 TSC 18d341a1658
MISC d01b0fff01000000 ADDR 400000293281500
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 21
CPU 0 BANK 18 TSC 18d341a4fdc
MISC d01a0ffb01000000 ADDR 400000327290240
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 22
CPU 0 BANK 17 TSC 196d9d34680
MISC d01a0f3a01000000 ADDR 40000023a430300
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 23
CPU 0 BANK 18 TSC 196d9d389b8
MISC d01b0fff01000000 ADDR 4000002ab5afb00
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 24
CPU 0 BANK 17 TSC 1a11a4a49c4
MISC d01a0f9301000000 ADDR 400000217793440
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 25
CPU 0 BANK 18 TSC 1a11a4a8348
MISC d01b0fff01000000 ADDR 40000029ef7f880
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 26
CPU 0 BANK 17 TSC 1ae230113e8
MISC d01b0fff01000000 ADDR 4000002186c7440
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 27
CPU 0 BANK 18 TSC 1ae23014dd8
MISC d01a0ff701000000 ADDR 400000327290240
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 28
CPU 0 BANK 17 TSC 1b9faf273e0
MISC d01a0e2301000000 ADDR 400000264398f40
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 29
CPU 0 BANK 18 TSC 1b9faf2b034
MISC d01b0fff01000000 ADDR 40000029aba7880
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 30
CPU 0 BANK 17 TSC 1d0fa35ac30
MISC d01b0fff01000000 ADDR 400000218d41440
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 31
CPU 0 BANK 18 TSC 1d0fa35e83c
MISC d01a0fed01000000 ADDR 4000002c11e64c0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 32
CPU 0 BANK 17 TSC 3802329c0a8
MISC d01a085001000000 ADDR 4000002beace500
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
MCE 33
CPU 0 BANK 18 TSC 3802329f588
MISC d01b0fff01000000 ADDR 4000002c43a04c0
TIME 1594121663 Tue Jul  7 13:34:23 2020
STATUS dc2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0

  • I haven't been able to trigger uncorrected errors yet on TrueNAS 12.0 beta 1, I think. The corrected errors have the same status code on FreeBSD, as on Linux. And I couldn't find any error in FreeBSD with the status code of the uncorrected errors in Linux. I suspect that uncorrected errors occur when the system crashes after awhile. But I couldn't find anything about this in the log files. Not sure if this is a bug or not. Please let me know if I need to submit this...
  • TrueNAS 12.0 beta 1 doesn't seem to send any email when MCA errors occur. If I remember well, this is a "known bug" (please correct me if I'm wrong). This does seem very troublesome and I hope this gets fixed soon!
 
Last edited:

b3081a

Cadet
Joined
Apr 15, 2020
Messages
5
There seems to be a new 3.37 beta BIOS which contains AGESA 1.0.0.6 and the Platform First Error Handling option is removed. Has anyone tested if ECC works under that version?
 
Top