A new message with 11.1-RC1 and now 11.1, 11.1-U1, 11.1-U2, too

zemun2 · Jan 24, 2018

picklefish said:
ctrl W (for write) I believe it shows at the bottom of the nano screen the different commands

This is all the options i get.

Redcoat · Jan 24, 2018

Ctl X for Exit, then Y to save - it'll prompt you for the save and the correct file name

picklefish · Jan 24, 2018

zemun2 said:
This is all the options i get.
View attachment 22509

Sorry, I was mixing up editors. Crtl-O for write, then Ctrl-X to exit. (write = save)

zemun2 · Jan 24, 2018

Awesome thank you sir

Mamdoh · Jan 24, 2018

zemun2 said:
How do I save the freenas_health.sh?

Thanks

I'm assuming you're using nano. If yes, then:

Control + o: to write to the file
Enter: to accept the file name
Control x: to exit.

zemun2 · Jan 24, 2018

I did ^O and now I get these options.

danb35 · Jan 24, 2018

zemun2 said:
I did ^O and now I get these options.

See in the white bar where it's asking for the file name to write, and the correct file name is filled in? Press Enter.

zemun2 · Jan 24, 2018

danb35 said:
See in the white bar where it's asking for the file name to write, and the correct file name is filled in? Press Enter.

That worked. For some reason I thought I needed to select one of the options on the bottom.

Thank you all

NASbox · Jan 24, 2018

Chris Moore said:
No, the configuration has always reloaded at midnight, that is normal for a long time.
The consul version is a new thing but it is only indicating that there is a newer version of the software available and not an issue.

Thanks for this... I didn't realize that this was normal behavior or the implications of there being a newer release.

Chris Moore said:
If the GUI is not alerting you to an error, why are you convinced that there is an error?

If the code shown here:

https://forums.freenas.org/index.php?threads/a-new-message-with-11-1-rc1.58844/page-3#post-433317

is the code responsible for detecting errors:

Then my question is what about this?

Code:

/usr/local/bin/midclt call notifier.get_alerts > /tmp/.alert-health
if [ $? -ne 0 ] ; then																											
   exit 1																														
fi

What is /usr/local/bin/midclt call notifier.get_alerts doing?

I assume that sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0]) is spawning a bunch of processes, but I have no idea what is actually happening. (Have there been any changes to the reporting code? Could a poorly reported error give a blank line instead of a proper message?)

If the supplied patch is working, then it appears that /tmp/.alert-health just contains an empty line(s). As long as midclt returning blank lines and a zero return code means no error detected, then I concur with you that there is no error message. If whatever scripts are executed by midclt are emitting a blank line instead of an error message, then there is something to worry about.

Having said all that, I'm inclined to agree with you it is unlikely there is an error.

toolchain · Jan 24, 2018

picklefish said:
Thanks @Mamdoh I will try this in an hour, appreciate the fix.

Edit: Seems like the only thing missing were the lines
if [ -z $line ] ; then continue fi

Gone for now, thanks

Just building on this thought process, but I just wanted to neuter the script and be done with it.

So in this section of the code where it sets have_alert=1

Code:

fi																																
  echo "$line"																													
  have_alert=1																													
done < /tmp/.alert-health																										
rm /tmp/.alert-health

I replaced the 1 with a 0 and it stopped spamming dead in it's tracks ¯\_(ツ)_/¯

Code:

fi																																
  echo "$line"																													
  have_alert=0																													
done < /tmp/.alert-health																										
rm /tmp/.alert-health

JustinClift · Jan 24, 2018

Looks like a fix is in the works:

https://github.com/freenas/freenas/pull/711

Not sure when it'll get into a released update though.

danb35 · Jan 24, 2018

JustinClift said:
Not sure when it'll get into a released update though.

It's due out with -U2, but I don't know the ETA on that.

ykhodo · Jan 24, 2018

Mamdoh said:
I just applied a proper fix. Full credit goes to @David Steinberger (see link). Steps I followed:

Step one:
cd /usr/local/etc/consul-checks/ nano freenas_health.sh

Edited contents to (go through the edit line by line and make sure it's correct; simple paste can delete some escape characters, resulting in multiple commands in one line):

Code:
#!/bin/sh PATH="${PATH}:/usr/local/bin:/usr/local/sbin" export PATH /usr/local/bin/midclt call notifier.get_alerts > /tmp/.alert-health if [ $? -ne 0 ] ; then exit 1 fi have_alert=0 while read line do if [ -z $line ] ; then continue fi echo $line | grep -q "^OK" if [ $? -eq 0 ] ; then continue fi echo "$line" have_alert=1 done < /tmp/.alert-health rm /tmp/.alert-health if [ $have_alert -eq 0 ] ; then echo "No Alerts" exit 0 else exit 1 fi

Step two: (not needed if you haven’t edited usr/local/etc/consul.d/freenas.json)
cd /usr/local/etc/consul.d nano freenas.json

I changed the the interval back to 120s.

Code:
"interval": "120s"

Step three:
Executed the following in shell:

service consul stop rm -rf /var/db/system/consul service consul start

alternatively just

Code:

wget

the file and overwrite your old one

Code:

cd /usr/local/etc/consul-checks
mv freenas_health.sh freenas_health.sh.bak
wget https://raw.githubusercontent.com/davidsteinberger/freenas/8bf0718c1c3a09f8189c5150c0e22f3dbc4e77e9/src/freenas/usr/local/etc/consul-checks/freenas_health.sh
# grant execute to the file
chmod +x freenas_health.sh

I-Tech · Jan 25, 2018

Mamdoh said:
I just applied a proper fix. Full credit goes to @David Steinberger (see link). Steps I followed:

Step one:
cd /usr/local/etc/consul-checks/ nano freenas_health.sh

Edited contents to (go through the edit line by line and make sure it's correct; simple paste can delete some escape characters, resulting in multiple commands in one line):

Code:
#!/bin/sh PATH="${PATH}:/usr/local/bin:/usr/local/sbin" export PATH /usr/local/bin/midclt call notifier.get_alerts > /tmp/.alert-health if [ $? -ne 0 ] ; then exit 1 fi have_alert=0 while read line do if [ -z $line ] ; then continue fi echo $line | grep -q "^OK" if [ $? -eq 0 ] ; then continue fi echo "$line" have_alert=1 done < /tmp/.alert-health rm /tmp/.alert-health if [ $have_alert -eq 0 ] ; then echo "No Alerts" exit 0 else exit 1 fi

Step two: (not needed if you haven’t edited usr/local/etc/consul.d/freenas.json)
cd /usr/local/etc/consul.d nano freenas.json

I changed the the interval back to 120s.

Code:
"interval": "120s"

Step three:
Executed the following in shell:

service consul stop rm -rf /var/db/system/consul service consul start

ok.. so.. did this..
two questions:
1- removing the /var/db/system/consul folder .. well.. that disappears and doesn't come back.. right?
2- edits are not persistent through reboots, right?

Redcoat · Jan 25, 2018

1. Yes, assuming it was there in the first place (-f suppresses any output).
2. Right.

I-Tech · Jan 25, 2018

Redcoat said:
1. Yes, assuming it was there in the first place (-f suppresses any output).
2. Right.

was kinda concerned that the /var/db/system/consul folder didn't get re-created by anything ..
hope it wasn't important :p

NASbox · Jan 26, 2018

I applied the patch, but I think something must have happened (I accidentally turned /usr/local/etc/consul-checks/freenas_health.sh into a null file for about 5 minutes when I was preparing a script to reinstall the patch) because I kept getting warning messages:

Code:

Jan 26 09:31:07 freenas daemon[16739]:	 2018/01/26 09:31:07 [WARN] agent: Check 'service:nas-health' is now warning
Jan 26 09:33:07 freenas daemon[16739]:	 2018/01/26 09:33:07 [WARN] agent: Check 'service:nas-health' is now warning
Jan 26 09:35:07 freenas daemon[16739]:	 2018/01/26 09:35:07 [WARN] agent: Check 'service:nas-health' is now warning

I thought I should follow up this point that I made earlier regarding:

Chris Moore said:
It doesn't say what the issue is, because there isn't really an issue. It's an erroneous entry.

I'm currently experiencing a latent error is not being handled properly.

I decided to apply a debug patch to: /usr/local/etc/consul-checks/freenas_health.sh and capture /tmp/.alert-health:

Code:

#!/bin/sh

PATH="${PATH}:/usr/local/bin:/usr/local/sbin"
export PATH

/usr/local/bin/midclt call notifier.get_alerts > /CUSTOM/temp-alert-health-debug
/usr/local/bin/midclt call notifier.get_alerts > /tmp/.alert-health
if [ $? -ne 0 ] ; then
   exit 1
fi

have_alert=0

while read line
do
  if [ -z $line ] ; then
	continue
  fi
  echo $line | grep -q "^OK"
  if [ $? -eq 0 ] ; then
	continue
  fi
  echo "$line"
  have_alert=1
done < /tmp/.alert-health
rm /tmp/.alert-health

if [ $have_alert -eq 0 ] ; then
  echo "No Alerts"
  exit 0
else
  exit 1
fi

and I got the following:

Code:

[ENOMETHOD] Method "get_alerts" not found in "notifier"
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/notifier.py", line 66, in __getattr__
	return object.__getattribute__(self, attr)
AttributeError: 'NotifierService' object has no attribute 'get_alerts'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 881, in _method_lookup
	methodobj = getattr(serviceobj, method_name)
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/notifier.py", line 68, in __getattr__
	return getattr(_n, attr)
AttributeError: 'notifier' object has no attribute 'get_alerts'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 150, in call_method
	result = await self.middleware.call_method(self, message)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 109, in __next__
	return self.gen.send(None)
  File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 889, in call_method
	serviceobj, methodobj = self._method_lookup(message['method'])
  File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 883, in _method_lookup
	raise CallError(f'Method "{method_name}" not found in "{service}"', CallError.ENOMETHOD)
middlewared.service_exception.CallError: [ENOMETHOD] Method "get_alerts" not found in "notifier"

I also noticed that /var/db/system/consul

Code:

#>ls -la /var/db/system/consul
ls: /var/db/system/consul: No such file or directory

did not get regenerated after

Code:

  service consul stop
  rm -rf /var/db/system/consul
  service consul start

I rebooted the system and the problem cleared. (It should be noted that /var/db/system/consul is still non-existent after a reboot.)

Any idea what the python errors are about? Does this give any clues about the ongoing issue(s)?

My thinking is that if this is a predictable error, then it should be properly handled with try/except, and if not, there should be an improvement to /usr/local/etc/consul-checks/freenas_health.sh (maybe grep Traceback () to capture the error for troubleshooting.

EDIT: added /usr/local/bin/midclt call notifier.get_alerts > /CUSTOM/temp-alert-health-debug and removed cp statement that would have cleared $?.

Redcoat · Jan 26, 2018

NASbox said:
I rebooted the system and the problem cleared. (It should be noted that /var/db/system/consul is still non-existent after a reboot.)

Yes, we talked about this above. We need someone on 11.1-U1 who did not attempt this fix (or at least did not run "rm -rf /var/db/system/consul") to tell us if the file exists at all in a "clean" 11.1-U1. My present assumption is that it does not.

My attempts to use this "fix" did not result in cessation of the messages BTW.

rick.n.thailand · Jan 26, 2018

Redcoat said:
Yes, we talked about this above. We need someone on 11.1-U1 who did not attempt this fix (or at least did not run "rm -rf /var/db/system/consul") to tell us if the file exists at all in a "clean" 11.1-U1. My present assumption is that it does not.

My attempts to use this "fix" did not result in cessation of the messages BTW.

Hi,

I did a fresh install of 11.1U1 and cannot find the /var/db/system/consul .... and cannot find "system" folder. I have not applied any changes or modified the system in any way yet....

Code:

root@freenas:/var/db # la																										  
./					  hyperv/				 nut/					samba4/				 syslog-ng.persist				  
../					 ipf/					pbi/					services.db			 zfsd/							  
collectd/			   locate.database		 pkg/					sss/					zoneinfo							
dhclient.leases.vmx0	netdata/				ports/				  sss_mc/													
entropy/				ntp/					portsnap/			   sudo/													  
freebsd-update/		 ntpd.leap-seconds.list  samba/				  syslog-ng.ctl=											
root@freenas:/var/db #

I-Tech · Jan 26, 2018

Redcoat said:
Yes, we talked about this above. We need someone on 11.1-U1 who did not attempt this fix (or at least did not run "rm -rf /var/db/system/consul") to tell us if the file exists at all in a "clean" 11.1-U1. My present assumption is that it does not.

My attempts to use this "fix" did not result in cessation of the messages BTW.

I just rebooted in 11.1 release.. deleted 11.1-U1.. (with the applied fix).. re-updated to 11.1-U1
.. and the /var/db/system/consul folder is (still) not there..
(which was the main reason I performed these steps)

Important Announcement for the TrueNAS Community.

A new message with 11.1-RC1 and now 11.1, 11.1-U1, 11.1-U2, too

Explorer

MVP

Explorer

Explorer

Dabbler

Explorer

Hall of Famer

Explorer

Guru

Cadet

Patron

Hall of Famer

Explorer

Dabbler

MVP

Dabbler

Guru

MVP

Cadet

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "A new message with 11.1-RC1 and now 11.1, 11.1-U1, 11.1-U2, too"

Similar threads