I assume my memory is bad?

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
I had huge problems getting my server to boot correctly today there was also a problem 28 days ago when i updated truenas.
i got this email both times:-

TrueNAS @ freenas.local

New alerts:
* Memory #0x08 Asserted Correctable ECC (@DIMM1B(CPU1)).

Current alerts:
* freenas.local had an unscheduled system reboot.
The operating system successfully came back online at Thu Jan 14 09:34:23 2021.

* Memory #0x08 Asserted Correctable ECC (@DIMM1B(CPU1)).

this is my only server and i have zero experience in fault finding ecc memory. i assume that one stick is bad?
now that i am fully booted should i wait until the next time i need to reboot before taking action? or should i take action immediately. power down the server remove the stick and start it again?

thank you
mark
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
or should i take action immediately

Every time you end up with difficulties like this, update your backups. In case your situation degrade to an even worst problem, your backups can save your life.

So keep the system running, update your backups and then we will check at that hardware problem.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
Every time you end up with difficulties like this, update your backups. In case your situation degrade to an even worst problem, your backups can save your life.

So keep the system running, update your backups and then we will check at that hardware problem.
what checks would you recommend?
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Because the suspect is RAM, memtest is probably the first to be run. Create yourself a bootable USB key with memtest and have it run for at least a few runs, if you can for a few days. See if it does detect problems or not.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
Because the suspect is RAM, memtest is probably the first to be run. Create yourself a bootable USB key with memtest and have it run for at least a few runs, if you can for a few days. See if it does detect problems or not.
both emails said Memory #0x08 Asserted Correctable ECC (@DIMM1B(CPU1)). which is DIMM1B couldn't i just remove that stick ?
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Do as you wish... I gave you the resources you need to do more testing and a more precise diagnostic. If you are happy with what you have, the system is yours.... As for me, I would do the test not only to confirm that RAM is indeed the problem, but that there is not more problems in RAM then what made it to the surface so far.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
okay i'll run memtest
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
Just for curiosity, did your problems start after you did the TrueNAS update? A corrupted boot image can give cause for strange errors.

While I know it is not impossible for any particular component to fail, I have never had memory, that has been working well, all of a sudden fail. However, I have had two motherboards fail - and they gave weird -difficult to diagnose problems - as they were going bad. I have also had one power supply fail, which was also difficult to diagnose because the problems were intermittent.

As @Heracles suggested, memtest is a good place to start. I would disconnect all disks and unnecessary peripherals when running the tests, just to minimize possible variables. If the memory fails, I would move the sticks around and run the test again. If a stick is actualy bad, the problem should move with the stick.

Good luck.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
the problem was probably before the update - it was during that reboot when the problem showed up.

i removed the two sticks - Dimm1a and Dimm1b the server booted no problems at all
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
i tried to enable ipmi but when i tried to connect in the web browser it didn't work so i had to disable ipmi to get it to work.
it has been top of my list to ask here on how to make it work.
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
i tried to enable ipmi but when i tried to connect in the web browser it didn't work so i had to disable ipmi to get it to work.
it has been top of my list to ask here on how to make it work.
It looks like you need to spend some time with your user manuals. There will be one for the motherboard and one specifically for IPMI. If you don't have them, they should be available from the Supermicro web site.

the problem was probably before the update - it was during that reboot when the problem showed up.

i removed the two sticks - Dimm1a and Dimm1b the server booted no problems at all
That doesn't really confirm the memory being bad. Could still be a motherboard issue. You can try putting one stick back into DIMM 1a and see if the system boots. Then try putting the other stick into DIMM 1a and see if it boots. If both sticks work, then the problem is likely elsewhere. Also, look in the bios and make certain the sticks are being recognized properly.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
It looks like you need to spend some time with your user manuals. There will be one for the motherboard and one specifically for IPMI. If you don't have them, they should be available from the Supermicro web site.


That doesn't really confirm the memory being bad. Could still be a motherboard issue. You can try putting one stick back into DIMM 1a and see if the system boots. Then try putting the other stick into DIMM 1a and see if it boots. If both sticks work, then the problem is likely elsewhere. Also, look in the bios and make certain the sticks are being recognized properly.
i don't like your attitude- i have huntingtons disease and there are things that i did before that i cannot do now. before i installed freenas i lurked the forums and read every page of the freenas manual. i also read every page of my supermicro manual including the impi manual.

and for your information back in 2014 i had impi working - but not long after the psu damaged my board and the ipmi stopped working. so it has been 6 years since i worked with ipmi. my old motherboard died a couple of months ago so i was trying ipmi again.

with my disease i no longer have patience, stamina or concentration.
i wasn't looking for somebody to do the work for me.

and i already said that i would run memtest - i don't have a spare monitor so on monday i will connect a monitor to my server and run memtest
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
the reason i just removed the known fawlty ram was because my workstations backup to freenas daily. it is also a media server that we use 24/7.
so i wanted to keep it running while i organised a monitor and memtest stick
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
I had huge problems getting my server to boot correctly today there was also a problem 28 days ago when i updated truenas.
i got this email both times:-

TrueNAS @ freenas.local

New alerts:
* Memory #0x08 Asserted Correctable ECC (@DIMM1B(CPU1)).

Current alerts:
* freenas.local had an unscheduled system reboot.
The operating system successfully came back online at Thu Jan 14 09:34:23 2021.

* Memory #0x08 Asserted Correctable ECC (@DIMM1B(CPU1)).

this is my only server and i have zero experience in fault finding ecc memory. i assume that one stick is bad?
now that i am fully booted should i wait until the next time i need to reboot before taking action? or should i take action immediately. power down the server remove the stick and start it again?

thank you
mark

I would recommend running Memtest.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
i did try memtest deluxe on monday but i couldn't get it to boot.
what i plan to do is buy a motherboard and use it as a test bench to run memory tests on all the memory - i know that doesn't actually test my existing motherboard as well. but because of my disease almost all of the work gets done by my son and wife. my wife doesn't care about computers and has not worked on them before. my son is 11 and is very keen he has been building himself a game machine with left over parts - but he doesn't have much experience and therefore doesn't have a great feel when putting things together.

So if they are up for it - 1. New board and run memtest on all my existing ram - keep all the ram that passes.
2. Remove existing m/b and replace with new one and keep some ram back for testing the existing m/b.
3. Run memtest on the existing m/b in case it is faulty as well as the ram.
 
Last edited:

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
i did try memtest deluxe on monday but i couldn't get it to boot.
what i plan to do is buy a motherboard and use it as a test bench to run memory tests on all the memory - i know that doesn't actually test my existing motherboard as well. but because of my disease almost all of the work gets done by my son and wife. my wife doesn't care about computers and has not worked on them before. my son is 11 and is very keen he has been building himself a game machine with left over parts - but he doesn't have much experience and therefore doesn't have a great feel when putting things together.

So if they are up for it - 1. New board and run memtest on all my existing ram - keep all the ram that passes.
2. Remove existing m/b and replace with new one and keep some ram back for testing the existing m/b.
3. Run memtest on the existing m/b in case it is faulty as well as the ram.

Memtest Deluxe? never heard of it. you should be looking at memtest86. https://www.memtest86.com/

Why buy another motherboard? i would test DIMM 1B with Memtest no need to buy another board. I would also look at the logs in IPMI.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
Memtest Deluxe? never heard of it. you should be looking at memtest86. https://www.memtest86.com/

Why buy another motherboard? i would test DIMM 1B with Memtest no need to buy another board. I would also look at the logs in IPMI.
my ipmi is not setup. i wanted a new board so i could have a spare board if my other one failed. the last time it took over a week to get a replacement flash bios etc.

MemTest Deluxe
$14: bootable CD/USB version, delivered via email​
The Deluxe package includes the Windows native Pro version. It adds a 64-bit version of MemTest that runs directly from a bootable CD or USB drivewithout loading your OS first. This version can be run on any PC that supports CSM/legacy BIOS boot, and does not require any sort of installation. Plus, since it does not load an OS, it can directly access and test all of your RAM. This is a great disk for computer technicians. It also uses the rate that memory is checked as a basic speed benchmark. This can be useful if you are trying different BIOS settings. Not only will MemTest tell you if your RAM is still stable, but it will also indicate if the tweaks you have made improve RAM performance.
The Deluxe version is delivered electronically via email. We provide instructions for writing it to CD or a usb stick.

i tested the usb boot disk on a couple of computers but not a server.
Supermicro X9SCL+-F $49 from ebay
 

Herr_Merlin

Patron
Joined
Oct 25, 2019
Messages
200
If you don't have the patience to do the test and setup. That's fine. If you are willing to spend money that's as well.
Go find someone who does that work for you as you seem to care about your data.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
If you don't have the patience to do the test and setup. That's fine. If you are willing to spend money that's as well.
Go find someone who does that work for you as you seem to care about your data.
i don't care about my data based on what?

and i did try the memory test but i couldn't get the server to boot.

i also explained that i have hunntingtons disease
 
Top