Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.
Resource icon

Monitor and email an alert for ECC Memory errors on OS-level

Western Digital Drives - The Preferred Drives of FreeNAS and TrueNAS CORE
My scripts assumed you were using /bin/bash as configured shell when executing the scripts. I've now updated the resource so it uses /bin/bash explicitly when executing the scripts, so that it also works when your user has a different shell configured.
  • I've done a minor change to the cron script to store which MCA messages were processed
  • I've added an init script that makes sure no MCA messages are missed because of missed cron jobs during downtime
I've identified an extra possible improvement. One that is also very important (you could call it even a bug).

The script only properly works if FreeNAS/TrueNAS remains online...

For example:
If you have MCA messages at 16h43m and a crash at 16h49m and FreeNAS/TrueNAS only comes online again at 17h03, it will have missed the cronjob at 16h51 that would search in the 16h4x timeframe and the MCA messages that occurred will NOT be reported!

I'll try to get this resolved hopefully soon... But I'm not sure yet if it is possible / do-able...

resolved
I've updated /root/bin/email_mca_log_messages.bash

[[ -f /var/log/messages.0.bz2 ]] &&
and
[[ -f /var/log/messages ]] &&
make sure that the log files actually exist before trying to read them
I've made some small modifications to /root/bin/email_mca_log_messages.bash, so that it does an 'exit 0' at the end. This should prevent the system from thinking the command has failed and writing below error in /var/log/middlewared.log
Code:
[2020/12/13 16:31:00] (ERROR) middlewared.job.run():373 - Job <bound method accepts.<locals>.wrap.<locals>.nf of <middlewared.plugins.cron.CronJobService object at 0x81cb735e0>> failed
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 361, in run
    await self.future
  File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 399, in __run_body
    rv = await self.middleware.run_in_thread(self.method, *([self] + args))
  File "/usr/local/lib/python3.8/site-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread
    return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))
  File "/usr/local/lib/python3.8/site-packages/middlewared/utils/io_thread_pool_executor.py", line 25, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 977, in nf
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/cron.py", line 282, in run
    raise CallError(f'CronTask "{cron_cmd}" exited with {cp.returncode} (non-zero) exit status.')
middlewared.service_exception.CallError: [EFAULT] CronTask "/root/bin/email_mca_log_messages.bash > /dev/null" exited with 1 (non-zero) exit status.
Top