Given that ECC functionality depends on several components working well together (e.g. cpu, mobo, mem) there are many things that can go wrong resulting in a user detectable lack of ECC support.
However, it's also far too easy to get a false positive for ECC functionality. This might well mean that a large percentage of all the ECC 'enabled' systems in the industry are actually not, either DOA or over time.
What I am worried about is the lack of reliable way of testing ECC functionality in the industry. And by functionality I mean ECC error correcting, reporting and most particual injection.
Without ECC injection, testing for ECC correction functionality is only possible for a small subset of power users that are able to deliberately cause mem errors as to be able to see if ECC reporting and correcting is functional. However this is not something that can be automated as part of a scheduled health check.
I consider ECC reporting (and a way to test if that is still working) a requirement as to be able to preemptively replace memory that is about to go bad.
I am asking for opinion of the community, and most notably senior technicians @ixsystems, regarding this stance because I am quite a bit stuck now not daring to proceed with a mission critical project.
However, it's also far too easy to get a false positive for ECC functionality. This might well mean that a large percentage of all the ECC 'enabled' systems in the industry are actually not, either DOA or over time.
What I am worried about is the lack of reliable way of testing ECC functionality in the industry. And by functionality I mean ECC error correcting, reporting and most particual injection.
Without ECC injection, testing for ECC correction functionality is only possible for a small subset of power users that are able to deliberately cause mem errors as to be able to see if ECC reporting and correcting is functional. However this is not something that can be automated as part of a scheduled health check.
I consider ECC reporting (and a way to test if that is still working) a requirement as to be able to preemptively replace memory that is about to go bad.
I am asking for opinion of the community, and most notably senior technicians @ixsystems, regarding this stance because I am quite a bit stuck now not daring to proceed with a mission critical project.