SOLVED how about ecc detection/correction/reporting?

Status
Not open for further replies.

diversity

Contributor
Joined
Dec 4, 2018
Messages
128
not sure if I understood it correctly but as far as I can tell there is no email report when an ecc event happens. If I am mistaken then great, sorry for wasting your time. If indeed still low as priority then why on earth not implement this?

In case this will never be implemented under the guyise of a low priority issue then can we have al least cron job that does exactly that?

pls forgive me but I am not experienced enough yet so if someone could point me to a cron then that would be awesome
 

diversity

Contributor
Joined
Dec 4, 2018
Messages
128
ECC uncorrectable shows up on the console window; I know of that much.
I meant getting an email report whenever a corrected error is detected and whenever an uncorrected error is detected.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
In my Supermicro system I have no idea how to check on an ECC error, if one ever occured, if it was corrected or not. Email would be a nice touch but I doubt that exists, if it does, I have not seen it. But then again I would not say I'm super savvy when it comes to ECC checking.
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
TBH, not sure what/where that would get logged within TrueNAS without a lot of research or dropping in a support Q; something you can do ;)

To that; at least with a system (my case, Supermicro boxes) that has an IPMI, that is where those items get logged that are severe enough to warrant actionable items, and those IPMIs can get configured to send email alerts. The operating system CAN flag an ECC corrected and there not be an actual hardware issue; those kinds of issues happen frequently on non-ecc systems and no one bats an eyelash. ECC uncorrectable is generally a module failure; but doesn't instantly mean there is a module failure or that corruption is suspect.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
In my Supermicro system I have no idea how to check on an ECC error
If you have IPMI, it's in the IPMI event log. If not, I'm not sure where you'd find it.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
If you have IPMI, it's in the IPMI event log. If not, I'm not sure where you'd find it.
Thanks, I'll have to look for it. I really like IPMI, very nice.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
... I really like IPMI, very nice.
I like IPMI a lot but I have mine configured not to be available on other ethernet ports and the dedicated IPMI ethernet port stays devoid of ethernet cables unless there is a good reason... and once that time passes, said cable is disconnected again. The IPMI is incredibly powerful and for my use case I see no reason to keep it connected continuously.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Speaking of network access, I just wish I could remote into my home system from work, but there is a firewall in my way, one I have no influence over. Sucks to be me. :frown:
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
If you have good cell coverage at your workplace, one option is to bypass the corporate network with a MiFi and then remote into your home machine.... naturally, using your, not corporate, hardware. Keep your stuff air-gapped from the corporate stuff and everyone should be able to co-exist happily. However, policies do vary by employer so depending on how sensitive your work site is to this, I'd inquire first rather than being walked out by security.
 

diversity

Contributor
Joined
Dec 4, 2018
Messages
128
Speaking of network access, I just wish I could remote into my home system from work, but there is a firewall in my way, one I have no influence over. Sucks to be me. :frown:

You could setup your router at home to run OpenWrt with OpenVPN or Wireguard, or if that is not an option you could setup a hamachi vpn (or perhaps an open source alternative) network on one of your home systems (could be a VM) that is always on.

Then from your work you could setup a OpenVPN/Wireguard client into you home network. Or setup a hamachi (or alternative) VPN client to join your previously created network.

That should allow you to remote desktop into your home system and use a browser from there to access your IPMI interface.

Perhaps you could even configure the OpenVPN server on your home Truenas and use an open VPN client to become part of your home network. Then you should be able to use a browser at your work system to get into IPMI
 

diversity

Contributor
Joined
Dec 4, 2018
Messages
128
is there a FreeBSD equivalent of linux's edac-utils, edac-ctl and edac-util?
If so implementing ecc error email reports should be real easy for seasoned developers
 

diversity

Contributor
Joined
Dec 4, 2018
Messages
128
ECC uncorrectable shows up on the console window; I know of that much.
this suggests there is functionality in FreeBSD that is able to detect ecc (un)corrected errors happening. That is good news. Does anyone know what functionality that is? Then making a scheduled task that runs once every hour and emails when it finds something should not be impossible.

What would be better of course is if this was baked into the TrueNAS functionality from the get go just like other types of email reporting?

Does any one know why this is still not being done? What reason could there be there not to implement this 100% critical feature?
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
I like IPMI a lot but I have mine configured not to be available on other ethernet ports and the dedicated IPMI ethernet port stays devoid of ethernet cables unless there is a good reason... and once that time passes, said cable is disconnected again. The IPMI is incredibly powerful and for my use case I see no reason to keep it connected continuously.

Could physically connect it but leave the port in a shutdown state (or tagged to a bunk VLAN); or leave out/disable firewall rules allowing traffic in/out for that segment.

Much better options than just having it disconnected as the luxury of going TO a datacenter these days are not what they used to be. That's of course if you colo; if it's at your office, that's another story, and sometimes the same. Seems every few days my HR is sending out emails about an exposure risk.
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
this suggests there is functionality in FreeBSD that is able to detect ecc (un)corrected errors happening. That is good news. Does anyone know what functionality that is? Then making a scheduled task that runs once every hour and emails when it finds something should not be impossible.

What would be better of course is if this was baked into the TrueNAS functionality from the get go just like other types of email reporting?

Does any one know why this is still not being done? What reason could there be there not to implement this 100% critical feature?

Consider most RAM errors come about via the hardware reporting it; not so much the OS reporting it directly (short of a true hosing of the system). BIOS has "patrol scrub / read" settings that init some basic RAM testing by probing unused blocks with data and reading it back. That doesn't happen at the OS. Likely, the error message I noted at the console was the OS complaining via IPMI that BIOS found something. MS products don't really handle that. VMWare reads IPMI data; as I suspect True/FreeNAS does.
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
If you have good cell coverage at your workplace, one option is to bypass the corporate network with a MiFi and then remote into your home machine.... naturally, using your, not corporate, hardware. Keep your stuff air-gapped from the corporate stuff and everyone should be able to co-exist happily. However, policies do vary by employer so depending on how sensitive your work site is to this, I'd inquire first rather than being walked out by security.

Yeah, that would very likely get me fired.
 

diversity

Contributor
Joined
Dec 4, 2018
Messages
128
Linux and windows OS's are perfectly able to report on ecc errors. The question remains if FreeBSD can as well
 

diversity

Contributor
Joined
Dec 4, 2018
Messages
128
Some IPMI implementations are crippled and will not log/report ecc errors while linux and windows on those boards can. I know of asrock rack X470D4U and X470D4U-2T that have crippled IPMI and AMD is the one not wanting to have Asrock Rack help them fix it.

I know because I was involved in the effort of getting it fixed
 
Last edited:

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Linux and windows OS's are perfectly able to report on ecc errors. The question remains if FreeBSD can as well
Fundamentally it can. I know because it did for some of my Supermicro servers. I don't know if the others just did not experience ECC events or if for some servers it would not report at all. I always took that feature for granted.
 

diversity

Contributor
Joined
Dec 4, 2018
Messages
128
sweet. All this time it is possible so it seems in FreeBSD. It is very disappointing email notifications on ecc errors is not implemented by default.

 
Status
Not open for further replies.
Top