A new message with 11.1-RC1 and now 11.1, 11.1-U1, 11.1-U2, too

danb35 · Jan 21, 2018

GreyMatters said:
So you are suggesting to update anyway?

I would say that log spam shouldn't really enter into the decision of whether to upgrade or not.

Chris Moore · Jan 21, 2018

GreyMatters said:
So you are suggesting to update anyway?

I did update and I don't plan to roll back. I don't care that much about some messages that are basically erroneous errors.

J Crnac · Jan 22, 2018

Chris Moore said:
Really, you think some extra entries in the log is a reason to reject an update that fixes so many other things?

Hi Chris,
most likely you are right saying that log spam is no sufficient reason to roll back. I use FreeNAS for 3 years (with growing enthusiasm), but having unstoppable warning messages is a kind of irritating. Furthermore (if I understand it right reading the bug-report) the Devs judge this bug differently in its significance (or the need to remove it).
Given the fact, that I had no problems with the previous version - I decided to roll back and wait (yeah, I know lot of bugs were removed in the actual version-but they have been no issue - at least for me). In the end it´s all about security of my data .... and I´m concerned
Lastly, using common sense - why should "we" ignore an alert?
Thanks for your comment
JC

Chris Moore · Jan 22, 2018

J Crnac said:
Hi Chris,
most likely you are right saying that log spam is no sufficient reason to roll back. I use FreeNas for 3 years (with growing enthusiasm), but having unstoppable warning messages is a kind of irritating. Furthermore (if I understand it right reading the bug-report) the Devs judge this bug differently in its significance (or the need to remove it).
Given the fact, that I had no problems with the previous version - I decided to roll back and wait (yeah, I know lot of bugs were removed in the actual version-but they have been no issue - at least for me). In the end it´s all about security of my data .... and I´m concerned
Lastly, using common sense - why should "we" ignore an alert?
Thanks for your comment
JC

What is the alert about? It is a false alarm.

Sent from my SAMSUNG-SGH-I537 using Tapatalk

danb35 · Jan 22, 2018

J Crnac said:
the Devs judge this bug differently in its significance (or the need to remove it).

...and correctly so. It's a minor issue. It doesn't affect performance, reliability, stability, or functionality. Yes, it's an annoyance. Yes, it clutters the log and makes it harder to find important stuff. And yes, I wish the devs would place a higher priority on minimizing spam, both to the logs and to the console, as this has been a recurring issue for years. But it harms nothing.

J Crnac said:
In the end it´s all about security of my data .... and I´m concerned

Then you'd probably be better off just turning off the option to show log messages in the GUI. Nothing about that message should make you concerned about your data.

J Crnac said:
Lastly, using common sense - why should "we" ignore an alert?

There is no alert. There's a message in the system log file that you're interpreting as an alert.

NASbox · Jan 22, 2018

danb35 said:
Seriously? Roll back just for log spam? You're never going to be happy with FreeNAS, then--log spam (and even console spam), in various forms, has been an issue for a long time, with many versions, and the devs clearly don't care. It's annoying, to be sure, but it doesn't hurt anything.

This one is bad enough to make my logs totally useless... I've got 200K of this and almost nothing else.

danb35 · Jan 22, 2018

NASbox said:
This one is bad enough to make my logs totally useless.

grep -v is your friend.

NASbox · Jan 22, 2018

danb35 said:
grep -v is your friend.

Yup... if there is something to grep... log keeps getting rotate out... I believe the default is a 200K ring buffer is it not? Or does it just archive after 200K? I think I'm going to try that patch mentioned above.

I've got to say it's a pretty lousy message in that it doesn't say what the issue is.

danb35 · Jan 22, 2018

NASbox said:
Or does it just archive after 200K?

I'm not seeing 200KB rollovers, but it does archive the last ten logs--messages.0.bz2 to messages.9.bz2.

Chris Moore · Jan 22, 2018

NASbox said:
Yup... if there is something to grep... log keeps getting rotate out... I believe the default is a 200K ring buffer is it not? Or does it just archive after 200K? I think I'm going to try that patch mentioned above.

I've got to say it's a pretty lousy message in that it doesn't say what the issue is.

It doesn't say what the issue is, because there isn't really an issue. It's an erroneous entry.

Sent from my SAMSUNG-SGH-I537 using Tapatalk

NASbox · Jan 22, 2018

Chris Moore said:
It doesn't say what the issue is, because there isn't really an issue. It's an erroneous entry.

I think you may be jumping to conclusions on this.... maybe you are right, maybe not.

There is no way of knowing based on the information available so far. Under 11.0U4 I have had real warnings and "bug warnings" and in both cases I get the same message. Two cases come to mind.

I attached/used and then detached/removed a single volume pool BACKUP01 on /dev/ada3 (using the GUI), and sometime later I notice a warning (in the GUI) can't connect to device /dev/ada3, and that the log was full of these messages.
I also about the pool being above 80% full (it was)-so it was a valid warning-I later noticed the log full of these messages.

The message didn't provide any helpful info as to what the real problem was. If I had not seen the messages in GUI, I wouldn't have had a clue what the warning was about.

The point is that there may be have been a bug introduced in some code (which may or may not be significant) that is causing that error to be issued.

Ideally instead of this:

2018/01/22 19:16:27 [WARN] agent: Check 'service:nas-health' is now warning

the message would be something like

2018/01/22 19:16:27 [WARN] agent: DESCRIPTIVE MESSAGE HERE

or

2018/01/22 19:16:27 [WARN] agent: Check 'service:nas-health' is now warning - ERROR CODE= SOMETHING

In either case it should point back to the condition and/or code responsible for giving rise to the message.

As an aside I did a quick grep of the older logs and I noticed these messages...

Code:

Jan 18 00:00:00 freenas newsyslog[70622]: logfile turned over due to size>200K
Jan 18 00:00:00 freenas syslog-ng[29910]: Configuration reload request received, reloading configuration;
Jan 18 13:14:34 freenas daemon[3691]: ==> Newer Consul version available: 1.0.2 (currently running: 1.0.0)
Jan 19 00:00:00 freenas syslog-ng[29910]: Configuration reload request received, reloading configuration;
Jan 19 16:30:10 freenas daemon[3691]: ==> Newer Consul version available: 1.0.2 (currently running: 1.0.0)

Do they have anything to do with the issue?

NASbox · Jan 22, 2018

Mamdoh said:
@hervon: Thanks for confirming the emergence of this issue with 11.1-U1.
@Redcoat: I see. Somehow stopping a service completely, even if it's for a few weeks, makes me a bit nervous; especially that I don't know the inners of the software > Regular user! I'm not even sure if it's a big issue to have this message appear every two minutes in my log or if I should worry about it. Input from the FreeNAS experts would be much appreciated.

However, I kinda chose a halfway solution for now. I edited /usr/local/etc/consul.d/freenas.json and changed the Interval field from 120s to 1800s.

Code:
{ "disable_update_check": true, "service": { "name": "nas-health", "tags": ["primary"], "address": "", "port": 80, "enableTagOverride": false, "check": [ { "id": "freenas_health", "name": "FreeNAS health check", "args": [ "/usr/local/etc/consul-checks/freenas_health.sh" ], "interval": "1800s" } ] } }

Then executed the following:
service consul stop rm -rf /var/db/system/consul service consul start

Thanks for this... I followed the directions and just wanted to make sure I haven't created a "real" problem. Can someone with experience tell me if these messages are a problem?

Code:

Jan 22 19:12:20 freenas daemon[3691]:	 2018/01/22 19:12:20 [WARN] agent: Check 'service:nas-health' is now warning
Jan 22 19:14:24 freenas daemon[3691]:	 2018/01/22 19:14:24 [WARN] agent: Check 'service:nas-health' is now warning
Jan 22 19:16:27 freenas daemon[3691]:	 2018/01/22 19:16:27 [WARN] agent: Check 'service:nas-health' is now warning
Jan 22 19:18:31 freenas daemon[3691]:	 2018/01/22 19:18:31 [WARN] agent: Check 'service:nas-health' is now warning
--------------------------- DO I NEED TO WORRY ABOUT ANY OF THIS BELOW
Jan 22 19:20:25 freenas daemon[3691]:	 2018/01/22 19:20:25 [WARN] serf: Shutdown without a Leave
Jan 22 19:20:25 freenas daemon[3691]:	 2018/01/22 19:20:25 [WARN] serf: Shutdown without a Leave
Jan 22 19:20:26 freenas daemon[3691]:	 2018/01/22 19:20:26 [WARN] agent: Timeout stopping HTTP server 127.0.0.1:8500 (tcp)
Jan 22 19:20:27 freenas daemon[74214]: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Jan 22 19:20:27 freenas daemon[74214]: bootstrap = true: do not enable unless necessary
Jan 22 19:20:27 freenas daemon[74214]: ==> Starting Consul agent...
Jan 22 19:20:27 freenas daemon[74214]: ==> Consul agent running!
Jan 22 19:20:27 freenas daemon[74214]:			Version: 'v1.0.0'
Jan 22 19:20:27 freenas daemon[74214]:			Node ID: 'a10------fb7'
Jan 22 19:20:27 freenas daemon[74214]:		  Node name: '----'
Jan 22 19:20:27 freenas daemon[74214]:		 Datacenter: 'dc1' (Segment: '<all>')
Jan 22 19:20:27 freenas daemon[74214]:			 Server: true (Bootstrap: true)
Jan 22 19:20:27 freenas daemon[74214]:		Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600)
Jan 22 19:20:27 freenas daemon[74214]:	   Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
Jan 22 19:20:27 freenas daemon[74214]:			Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
Jan 22 19:20:27 freenas daemon[74214]:
Jan 22 19:20:27 freenas daemon[74214]: ==> Log data will now stream in as it occurs:
Jan 22 19:20:27 freenas daemon[74214]:
Jan 22 19:20:27 freenas daemon[74214]:	 2018/01/22 19:20:27 [WARN] serf: Failed to re-join any previously known node
Jan 22 19:20:27 freenas daemon[74214]:	 2018/01/22 19:20:27 [WARN] serf: Failed to re-join any previously known node
Jan 22 19:20:34 freenas daemon[74214]:	 2018/01/22 19:20:34 [WARN] raft: Heartbeat timeout from "" reached, starting election
Jan 22 19:21:04 freenas daemon[74214]:	 2018/01/22 19:21:04 [WARN] consul.kvs: Rejecting lock of consul-alerts/leader due to lock-delay until 2018-01-22 19:21:19.87925621 -0500 EST m=+52.559008210
Jan 22 19:21:09 freenas daemon[74214]:	 2018/01/22 19:21:09 [WARN] consul.kvs: Rejecting lock of consul-alerts/leader due to lock-delay until 2018-01-22 19:21:19.87925621 -0500 EST m=+52.559008210
Jan 22 19:21:14 freenas daemon[74214]:	 2018/01/22 19:21:14 [WARN] consul.kvs: Rejecting lock of consul-alerts/leader due to lock-delay until 2018-01-22 19:21:19.87925621 -0500 EST m=+52.559008210
--------------------------- ONLY @ MESSAGES / HOUR - A GOOD COMPROMISE
Jan 22 19:38:06 freenas daemon[74214]:	 2018/01/22 19:38:06 [WARN] agent: Check 'nas-health' is now warning
Jan 22 20:08:09 freenas daemon[74214]:	 2018/01/22 20:08:09 [WARN] agent: Check 'nas-health' is now warning
Jan 22 20:38:13 freenas daemon[74214]:	 2018/01/22 20:38:13 [WARN] agent: Check 'nas-health' is now warning
Jan 22 21:08:17 freenas daemon[74214]:	 2018/01/22 21:08:17 [WARN] agent: Check 'nas-health' is now warning
Jan 22 21:38:20 freenas daemon[74214]:	 2018/01/22 21:38:20 [WARN] agent: Check 'nas-health' is now warning

This seems like a nice work around as long as the solution is not ultimately worse than the problem.

zemun2 · Jan 23, 2018

How do I roll back?

Thanks

Redcoat · Jan 23, 2018

Reboot and you will go back to 11.0-U4

zemun2 · Jan 23, 2018

Rebooted 3 times already and no luck. Thought i was missing a step.

Chris Moore · Jan 23, 2018

NASbox said:
I think you may be jumping to conclusions on this.... maybe you are right, maybe not.

There is no way of knowing based on the information available so far. Under 11.0U4 I have had real warnings and "bug warnings" and in both cases I get the same message. Two cases come to mind.

I attached/used and then detached/removed a single volume pool BACKUP01 on /dev/ada3 (using the GUI), and sometime later I notice a warning (in the GUI) can't connect to device /dev/ada3, and that the log was full of these messages.

I also about the pool being above 80% full (it was)-so it was a valid warning-I later noticed the log full of these messages.

The message didn't provide any helpful info as to what the real problem was. If I had not seen the messages in GUI, I wouldn't have had a clue what the warning was about.

The point is that there may be have been a bug introduced in some code (which may or may not be significant) that is causing that error to be issued.

Ideally instead of this:

2018/01/22 19:16:27 [WARN] agent: Check 'service:nas-health' is now warning

the message would be something like

2018/01/22 19:16:27 [WARN] agent: DESCRIPTIVE MESSAGE HERE

or

2018/01/22 19:16:27 [WARN] agent: Check 'service:nas-health' is now warning - ERROR CODE= SOMETHING

In either case it should point back to the condition and/or code responsible for giving rise to the message.

As an aside I did a quick grep of the older logs and I noticed these messages...
Code:
Jan 18 00:00:00 freenas newsyslog[70622]: logfile turned over due to size>200K Jan 18 00:00:00 freenas syslog-ng[29910]: Configuration reload request received, reloading configuration; Jan 18 13:14:34 freenas daemon[3691]: ==> Newer Consul version available: 1.0.2 (currently running: 1.0.0) Jan 19 00:00:00 freenas syslog-ng[29910]: Configuration reload request received, reloading configuration; Jan 19 16:30:10 freenas daemon[3691]: ==> Newer Consul version available: 1.0.2 (currently running: 1.0.0)
Do they have anything to do with the issue?

No, the configuration has always reloaded at midnight, that is normal for a long time.
The consul version is a new thing but it is only indicating that there is a newer version of the software available and not an issue.
If the GUI is not alerting you to an error, why are you convinced that there is an error?

Mamdoh · Jan 23, 2018

I just applied a proper fix. Full credit goes to @David Steinberger (see link). Steps I followed:

Step one:

Code:

cd /usr/local/etc/consul-checks/
nano freenas_health.sh

Edited contents to (go through the edit line by line and make sure it's correct; simple paste can delete some escape characters, resulting in multiple commands in one line):

Code:

#!/bin/sh																														 
																																	
PATH="${PATH}:/usr/local/bin:/usr/local/sbin"																					 
export PATH																														
																																	
/usr/local/bin/midclt call notifier.get_alerts > /tmp/.alert-health																
if [ $? -ne 0 ] ; then																											
   exit 1																														 
fi																																
																																	
have_alert=0																														
																																	
while read line																													
do																																
  if [ -z $line ] ; then																											
	continue																														
  fi																																
  echo $line | grep -q "^OK"																										
  if [ $? -eq 0 ] ; then																											
	continue																														
  fi																																
  echo "$line"																													
  have_alert=1																													
done < /tmp/.alert-health																										 
rm /tmp/.alert-health																											 
																																	
if [ $have_alert -eq 0 ] ; then																									
  echo "No Alerts"																												
  exit 0																															
else																																
  exit 1																															
fi

Step two: (not needed if you haven’t edited usr/local/etc/consul.d/freenas.json)

Code:

cd /usr/local/etc/consul.d
nano freenas.json

I changed the the interval back to 120s.

Code:

"interval": "120s"

Step three:
Executed the following in shell:

Code:

service consul stop
rm -rf /var/db/system/consul
service consul start

picklefish · Jan 23, 2018

Thanks @Mamdoh I will try this in an hour, appreciate the fix.

Edit: Seems like the only thing missing were the lines

Code:

  if [ -z $line ] ; then																											
   continue																														
  fi

Gone for now, thanks

zemun2 · Jan 24, 2018

How do I save the freenas_health.sh?

Thanks

picklefish · Jan 24, 2018

zemun2 said:
How do I save the freenas_health.sh?

Thanks

ctrl W (for write) I believe it shows at the bottom of the nano screen the different commands

Important Announcement for the TrueNAS Community.

A new message with 11.1-RC1 and now 11.1, 11.1-U1, 11.1-U2, too

Hall of Famer

Hall of Famer

Cadet

Hall of Famer

Hall of Famer

Guru

Hall of Famer

Guru

Hall of Famer

Hall of Famer

Guru

Guru

Explorer

MVP

Explorer

Hall of Famer

Dabbler

Explorer

Explorer

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "A new message with 11.1-RC1 and now 11.1, 11.1-U1, 11.1-U2, too"

Similar threads