How to monitor your FreeNAS-9.3 with Icinga/Nagios/NRPE

SebbaG

Dabbler
Joined
Oct 12, 2014
Messages
25
First of all, sorry to put this into the wrong subsection, but I do not have enough posts yet to put this into the HOW-TO area. If people find it useful, maybe some moderator can move it there.

The purpose of this tutorial is to explain how you can setup your icinga (I'm using icinga2) host to monitor your freenas server with the help of nrpe. This how-to was created using FreeNAS-9.3-STABLE-201412090314 build. But it might work with other versions as well.

1. Collection of necessary packages
In order to install nrpe2 successfully you need the following packages. You can download them from -> http://mirror.neolabs.kz/pub/FreeBSD/ports/amd64/packages-9-stable/Latest/
  • nrpe2.tbb
  • gettext.tbz
  • nagios-plugins.tbz
  • libiconv.tbz
as well as from -> http://mirror.neolabs.kz/pub/FreeBSD/ports/amd64/packages-9-stable/perl5/
  • perl5-5.16.3_6.tbz
2. Copy them to the freenas-box and install

scp *.tbz root@nasbox:/tmp

root@nasbox:$ pkg_add libiconv.tbz
root@nasbox:$ pkg_add gettext.tbz
root@nasbox:$ pkg_add perl5-5.16.3_6.tbz

root@nasbox:$ pkg_add nagios-plugins.tbz
root@nasbox:$ pkg_add nrpe2.tbz


3. Edit rc.conf and sudoers

root@nasbox:$ mount -uw/

root@nasbox:$ vi /conf/base/etc/rc.conf
---> add the following:
# Start nrpe2
nrpe2_enable="YES"

root@nasbox:$ vi /conf/base/etc/sudoers
---> add the following:
nagios ALL=NOPASSWD:/usr/local/libexec/nagios/

(Editing the sudoers-file is only neccessary if you want to run specific scripts with sudo permission. In my case I wanted to do so. But this has some security implications!)

4. Create nrpe-Config

root@nasbox:$ cp /usr/local/etc/nrpe.cfg.sample /etc/nrpe.cfg
---> or use my sample nrpe.cfg posted at the end of this how-to

You have to edit at least the following: allowed_hosts=%youricingahostip%

5. Add to Startup-Script in GUI

On FreeNAS-GUI add TASK -> Init/ShutdownScripts and add:
Command: /usr/local/sbin/nrpe2 -c /etc/nrpe.cfg -d

6. Customize icinga2-Config

Customize your icinga2-Hosts configs e.g.:

object Host "freenas"{
import "generic-host"
address = "192.168.0.50"
check_command = "hostalive"
}

object CheckCommand "check_nrpe_nossl" {
import "plugin-check-command"
import "ipv4-or-ipv6"
command = [ PluginDir + "/check_nrpe" ]
arguments = {
"-H" = "$nrpe_address$"
"-c" = "$nrpe_command$"
"-n" = ""
}
}

object Service "Disk /" {
host_name = "freenas"
check_command = "check_nrpe_nossl"
vars.nrpe_command = "check_root"
vars.nrpe_address = "192.168.0.50"
}

In my case nrpe communication was not possible with ssl, that's why I had to use the -n option!

Hope this helps others who want to achieve similar things.
PS: You should have at least basic knowledge about freenas and icinga before doing this!

------------------------------------------------------------------------------------------------------------------
Appendix A: nrpe.cfg
[...]
command[check_users]=/usr/local/libexec/nagios/check_users -w 5 -c 10
command[check_load]=/usr/local/libexec/nagios/check_load -w 15,10,5 -c 30,25,20
command[check_root]=/usr/local/libexec/nagios/check_disk -w 20% -c 10% -p /
command[check_var]=/usr/local/libexec/nagios/check_disk -w 20% -c 10% -p /var
command[check_nas-data]=/usr/local/libexec/nagios/check_disk -w 20% -c 10% -p /mnt/%yourpoolname%
command[check_zombie_procs]=/usr/local/libexec/nagios/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/libexec/nagios/check_procs -w 150 -c 200
command[check_free_mem]=/usr/local/libexec/nagios/check_mem.pl -f -w 7 -c 2
command[check_used_mem]=/usr/local/libexec/nagios/check_mem.pl -u -w 95 -c 100
command[check_hdd1]=sudo /usr/local/libexec/nagios/check_smart_own /dev/da1
command[check_hdd2]=sudo /usr/local/libexec/nagios/check_smart_own /dev/da2
command[check_hdd3]=sudo /usr/local/libexec/nagios/check_smart_own /dev/da3
command[check_hdd4]=sudo /usr/local/libexec/nagios/check_smart_own /dev/da4
[...]

Appendix B: check_smart_own - Script:
#!/bin/bash
smart_status=`/usr/local/sbin/smartctl -H $1 | grep health | awk '{print $6}'`
case $smart_status in
("PASSED")
echo "OK - $1 status is Healthy"
exit 0
;;
("")
echo "WARNING - $1 status is EMPTY -> check script"
exit 1
;;
("FAILED")
echo "CRITICAL - $1 smart-test FAILED -> check HDD"
exit 2
;;
*)
echo "UNKNOWN - $1 status UNKNOWN -> check manually"
exit 3
;;
esac
 

Henning Kessler

Contributor
Joined
Feb 10, 2015
Messages
143

Jacopx

Patron
Joined
Feb 19, 2016
Messages
367
I'm trying to install it but when i run:
Code:
root@Icinga:/tmp # pkg install http://mirror.neolabs.kz/pub/FreeBSD/ports/amd64/packages-9-stable/Latest/libiconv.tbz
Updating FreeBSD repository catalogue...
FreeBSD repository is up-to-date.
All repositories are up-to-date.
pkg: No packages available to install matching 'http://mirror.neolabs.kz/pub/FreeBSD/ports/amd64/packages-9-stable/Latest/libiconv.tbz' have been found in the repositories


If i try with:
Code:
root@Icinga:/tmp # pkg fetch http://mirror.neolabs.kz/pub/FreeBSD/ports/amd64/packages-9-stable/Latest/libiconv.tbz

Updating FreeBSD repository catalogue...
FreeBSD repository is up-to-date.
All repositories are up-to-date.
pkg: No packages matching 'http://mirror.neolabs.kz/pub/FreeBSD/ports/amd64/packages-9-stable/Latest/libiconv.tbz' have been found in the repositories


That's something that i haven't understand! :/
 

Adrian

Contributor
Joined
Jun 29, 2011
Messages
166
...
thanks for sharing this. There is also a nagios plugin from ixsystems which is meant for their TrueNAS line but it works really well for FreeNAS too ;-)
...
Checking for alerts (-t alerts) seems to work, but checking replications (-t repl) seems to report as many replication errors as there are replications. Having Nagios monitoring of FreeNAS alerts is very good to have.
Code:
hpnas alerts: OK - No problem alerts
hpnas repl: WARNING - There are 1 replication errors [tank/remote/hpnas]. Go to Storage > Replication Tasks > View Replication Tasks in TrueNAS for more details.
freenas alerts: OK - No problem alerts
freenas repl: WARNING - There are 1 replication errors [tank/remote/freenas]. Go to Storage > Replication Tasks > View Replication Tasks in TrueNAS for more details.
freenasxl alerts: OK - No problem alerts
freenasxl repl: WARNING - There are 9 replication errors [tank/remote/freenasxl tank/remote/freenasxl tank/remote/freenasxl tank/remote/freenasxl tank/remote/freenasxl tank/remote/freenasxl tank/remote/freenasxl tank/remote/freenasxl tank/remote/freenasxl]. Go to Storage > Replication Tasks > View Replication Tasks in TrueNAS for more details.

Under FreeBSD
You need python and py27-requests.
Download check_truenas.py (https://exchange.nagios.org/directo...AS/Check-TrueNAS-Health-2FReplication/details) and copy to /usr/local/libexec/nagios/check_truenas.py.
Sample Nagios configuration extracts
Code:
==> /usr/local/etc/nagios/resource.cfg <==
...
# FreeNAS root logins
$USER3$=root
$USER4$=CENSORED

==> /usr/local/etc/nagios/objects/commands.cfg <==
...
# 'check_truenas_alerts' command definition
define command{
		command_name	check_truenas_alerts
		command_line	$USER1$/check_truenas.py -H $HOSTADDRESS$ -u $USER3$ -p $USER4$ -n -t alerts ; alerts or repl
		}

==> /usr/local/etc/nagios/services.cfg <==
...
define service {
		use							 local-service
		host_name					   freenas,freenasxl,hpnas ; all freenas
		service_description			 FREENAS_ALERTS
		check_command				   check_truenas_alerts
}

Postscript.
This one line change to check_truenas.py seems to fix replication reporting. My first python script change, so go easy on me.
Code:
[aw1@titus ~]$ diff -u  /usr/local/libexec/nagios/check_truenas.py .
--- /usr/local/libexec/nagios/check_truenas.py  2017-03-09 00:44:00.405158300 +0000
+++ ./check_truenas.py  2017-03-09 03:18:06.631050000 +0000
@@ -74,6 +74,7 @@
		 try:
			 for repl in repls:
				 if repl['repl_status'] != 'Succeeded' \
+					and not repl['repl_status'].startswith('Up to date') \
					 and not repl['repl_status'].startswith('Sending'):
					 errors = errors + 1
					 msg = msg + repl['repl_zfs'] + ' ';
 
Last edited by a moderator:
Joined
Jan 8, 2017
Messages
27
Dear Adrian,

Thank you very much for making the nagios plugin more useful for FreeNAS! As far as I can see, there is one more case leading to false errors: If you create a lot of new replications which do take a while to run, or in the time between rebooting the server and completion of all existing replications, the GUI column "status" remains empty and the colum "Last snapshot sent to remote side" indicates "Not ran since boot". That is perfectly all right and not an error, I think. However, the python script leads to "UNKNOWN - Error when contacting TrueNAS server: (<type 'exceptions.AttributeError'>, AttributeError("'NoneType' object has no attribute 'startswith'",), <traceback object at 0x1dd14d0>) ".

Unfortunately, my python skills are not sufficient to cure this. Would you please be so kind to consider pointing out a fix for this issue?

Regards,

Michael
 
Last edited by a moderator:

Adrian

Contributor
Joined
Jun 29, 2011
Messages
166
That one line is the only line of python that I have ever written, and I have not read much either:(
You need to attract somebody who knows python.
 

Afrojoe

Dabbler
Joined
Mar 25, 2017
Messages
13
There's also this plugin pack from Synodic. Does everything I want to check...

nagios_freenas.png
 

Henning Kessler

Contributor
Joined
Feb 10, 2015
Messages
143
Mmmh I updated to 11.2 and it seams that all the old Nagios/Icinga2 plugins stopped working. Has someone a working alternative for a Nagios/Icinga plugin?

Regards

Henning
 

PCC

Cadet
Joined
Nov 7, 2018
Messages
6
Mmmh I updated to 11.2 and it seams that all the old Nagios/Icinga2 plugins stopped working. Has someone a working alternative for a Nagios/Icinga plugin?

Regards

Henning

I too ran into that issue when updating my FreeNAS servers from 11.1 to 11.2. The check_truenas.py for Nagios seems to be broken and spitting out: UNKNOWN - Error when contacting TrueNAS server: (<type 'exceptions.TypeError'>, TypeError('string indices must be integers',), <traceback object at 0x7f925bd44d40>)

Any info on that synodic one would be cool as I have been unable to Google for it?
 

PCC

Cadet
Joined
Nov 7, 2018
Messages
6
Found the issue with the check_truenas.py plugin:
When query'ing the NAS for 'system/alert', with version 11.2 it returns a lot more info. than just the alert. Which when it tries to do the for loop array of "alert in alerts" it only pulls the first part of the output:

Pre-11.2 querying for alerts would return below:
[{u'timestamp': 1548106869, u'message': u'Device: /dev/ada2, Temperature 44 Celsius reached critical limit of 30 Celsius (Min/Max ??/44)', u'id': u'05ca2194e6970a049651a0f25abc0dfe', u'dismissed': False, u'level': u'CRIT'}, {u'timestamp': 1548106869, u'message': u'Device: /dev/ada1, Temperature 45 Celsius reached critical limit of 30 Celsius (Min/Max ??/45)', u'id': u'001e3a7867e4b858db85f5fe06d6265f', u'dismissed': False, u'level': u'CRIT'}, {u'timestamp': 1548106869, u'message': u'Device: /dev/ada0, Temperature 40 Celsius reached critical limit of 30 Celsius (Min/Max ??/40)', u'id': u'e337e647486440436f41d5db46a1bc95', u'dismissed': False, u'level': u'CRIT'}]

Curent version 11.2 it returns:
{u'meta': {u'previous': None, u'total_count': 1, u'offset': 0, u'limit': 20, u'next': None}, u'objects': [{u'timestamp': 1547964316, u'message': u'Device: /dev/ada0, Temperature 50 Celsius reached critical limit of 50 Celsius (Min/Max 42/50!)', u'id': u'A;SMART;["Device: /dev/ada0, Temperature 50 Celsius reached critical limit of 50 Celsius (Min/Max 42/50!)", null]', u'dismissed': False, u'level': u'CRITICAL'}]}

So when I did a print of "alert" in alerts, it returns "meta" and not the actual alert so it doesn't have a breakdown of the actual alert array at the end of that output.
 

Adrian

Contributor
Joined
Jun 29, 2011
Messages
166
I seem to recall that check_truenas.py was written by Josh Paetzel who has I fear left iXsystems. Fixing it for 11.2 is probably down to us.
 

PCC

Cadet
Joined
Nov 7, 2018
Messages
6
I've been trying to work this out for awhile now. My current workaround/hack that might work is hack off the "{meta.....}, u'objects':" and keep the array section at the end, and I believe Mr. Paetzel's python should work with reading the array.

The other option I could think of is if the FreeNAS API lets you query for just the alerts and not all that extra information before it, but I don't think that's possible.
 

PCC

Cadet
Joined
Nov 7, 2018
Messages
6
Okay update... I have found the fix and updated the script to support FreeNAS 11.2
- When it was looping through alerts, I had to change it to: alerts['objects']
- And changed "CRIT" and "WARN" to "CRITICAL" and "WARNING" respectively

Attached the file here for people to use. Change the ".txt" extension to ".py". The uploader has a check for .py scripts for uploading.
 

Attachments

  • check_truenas.txt
    5.3 KB · Views: 1,233
Top