Guide to setting up Grafana + InfluxDB Metrics

Jack828

Dabbler
Joined
Nov 11, 2020
Messages
16
Hey folks!
Like some others on the forums, my dashboard's metrics graphs don't work, even on a first time install of v12! So, to satiate my need for data, I decided to use Grafana.

I've been setting up Grafana + InfluxDB on my TrueNAS 12 setup and have been struggling to find a start to finish guide on how to do it, including correctly parsing the Graphite metrics from the NAS.

I know the community plugin exists, but why use that when you can do it yourself? :grin:

This'll also serve as my local scratchpad for how I'm still setting it up, so, please excuse the rough state at the moment. I'm also hoping the fact that I can edit my OP to amend the information in there, but if not then I'll have to think of something else, and possibly re-post the guide in it's entirety after you lot give me some good feedback.

Without further ado, let's get started!
Fair warning, we'll be using both the GUI and command line, so this guide does assume you're able to SSH in (using key based auth, hopefully!) the rest of the steps will be very hand-holdy.

0. Preamble
In this post I'll be showing lots of code blocks. In those, the lines that start with moneta% are run on the TrueNAS, and root@grafana:/ # are run within the jail.

1. The Jail
We'll need somewhere for all our work to live. Instead of cluttering up the system for many jails for each service, I like to group logically similar processes together into one jail for ease of use.

Before we create it, we'll need to see what is the latest release we should be using:

Code:
moneta% sudo iocage fetch
[0] 11.2-RELEASE (EOL)
[1] 11.3-RELEASE (EOL)
[2] 11.4-RELEASE
[3] 12.0-RELEASE (EOL)
[4] 12.1-RELEASE
[5] 12.2-RELEASE

Type the number of the desired RELEASE
Press [Enter] to fetch the default selection: (Not a RELEASE)
Type EXIT to quit: ^C
Aborted!


We can see that the latest is 12.2-RELEASE so our jail creation command will be:

Code:
moneta% sudo iocage create -n grafana -r 12.2-RELEASE boot=on vnet=on allow_tun=1 ip4_addr="vnet0|192.168.0.100/24"  allow_sysvipc=1 bpf=yes


Once that's done, go ahead and log into the shell for it:
Code:
moneta% jls
   JID  IP Address      Hostname                      Path
    18                  grafana                       /mnt/tank/iocage/jails/grafana/root
moneta% sudo jexec 18 tcsh
root@grafana:/ #


2. Processes
Now, let's setup our metrics processes. You'll need to install the influxdb and grafana7* packages.
You can also take the opportunity to install your preferred editor of choice - I'll use nano, because it's more lightweight than importing my vim config.

*note: grafana7 is the latest version at the time of writing this

Code:
root@grafana:/ # pkg install -y nano influxdb grafana7


Now enable and start the InfluxDB and Grafana services:
Code:
root@grafana:/ # sysrc influxd_enable="YES"
root@grafana:/ # sysrc grafana_enable="YES"
root@grafana:/ # service influxd start
root@grafana:/ # service grafana start


Let's verify our InfluxDB is working correctly, and also get our database ready for metrics.
Code:
root@grafana:/ # influx
Connected to http://localhost:8086 version 1.8.0
InfluxDB shell version: 1.8.0
> create database graphite
> show databases
name: databases
name
----
_internal
graphite


Now we need to modify our InfluxDB config - it lives here: /usr/local/etc/influxd.conf. Open that file, and head to the [[graphite]] section.

I recommend having a good read of the file - you might find it does something else you like!
Anyway, for Graphite metrics, I'd recommend matching the config to the one shown below:
Code:
###
### [[graphite]]
###
### Controls one or many listeners for Graphite data.
###

[[graphite]]
  # Determines whether the graphite endpoint is enabled.
  enabled = true
  database = "graphite"
  # retention-policy = ""
  bind-address = ":2003"
  protocol = "tcp"
  consistency-level = "one"

  # These next lines control how batching works. You should have this enabled
  # otherwise you could get dropped metrics or poor performance. Batching
  # will buffer points in memory if you have many coming in.

  # Flush if this many points get buffered
  batch-size = 5000

  # number of batches that may be pending in memory
  batch-pending = 10

  # Flush at least this often even if we haven't hit buffer limit
  batch-timeout = "1s"

  # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
  # udp-read-buffer = 0

  ### This string joins multiple matching 'measurement' values providing more control over the final measurement name.
  # separator = "."

  ### Default tags that will be added to all metrics.  These can be overridden at the template level
  ### or by tags extracted from metric
  # tags = ["region=us-east", "zone=1c"]

  ### Each template line requires a template pattern.  It can have an optional
  ### filter before the template and separated by spaces.  It can also have optional extra
  ### tags following the template.  Multiple tags should be separated by commas and no spaces
  ### similar to the line protocol format.  There can be only one default template.
  templates = [
    # Default template
    # "server.*",
  ]


After editing that config file, you'll need to restart the service with service influxd restart for it to pick up the changes.

Now, we need to tell our TrueNAS where to dump it's metrics data. In the GUI, head to System -> Reporting, and add your grafana jail's IP address into Remote Graphite Server Hostname. Optionally, you can change it so it reports CPU usage in %.
truenas-system-reporting.png


Hit save and head back to the jail command line. We're going to check to see if it's recording any metrics.
Code:
root@grafana:/ # influx
Connected to http://localhost:8086 version 1.8.0
InfluxDB shell version: 1.8.0
> use graphite
> show series
key
----
servers.moneta_local.aggregation.cpu-average.cpu.idle
servers.moneta_local.aggregation.cpu-average.cpu.interrupt
servers.moneta_local.aggregation.cpu-average.cpu.nice
servers.moneta_local.aggregation.cpu-average.cpu.system
...omitted for brevity...


If you see all the series, great! Let's move on.

3. Grafana

Head to http://jail.ip.address:3000/datasources and let's setup Grafana.
Click Add data source and choose InfluxDB.

Set the URL to be http://localhost:8086 and your database to be graphite (as we set up earlier).
influx-setup.png

influx-database.png


Smash Save & Test and you should see a beautiful green banner letting you know all is good.
influxdb-working.png



4. Templates
Now that this is done - if you're happy setting up multiple queries on long measurements then we're all done! Head to Grafana and setup the dashboard as you like!
However, we can utilise the powerful templates feature of InfluxDB's Graphite parser to make this data a little more friendly.

Head back to the [[graphite]] section of the config /usr/local/etc/influxd.conf.

So this is where it gets a little complicated to explain - you saw in InfluxDB when you ran show series the huge number of different metrics being collected. We'll focus, for now, on the disktemp metrics series.

The series in InfluxDB is this is servers.moneta_local.disktemp.da0.temperature (in my case) with a value key holding the actual temperature (in Celsius), for my da0 disk.

This isn't necessarily a bad way of storing the metric - but you'll need a number of queries equal to the number of disks you have to show this is in Grafana! If you ever add more disks, you'll need to amend all your graphs to have the additional query.

So, let's fix this by filtering the series to only all the disktemp ones, and parse it into a single series. We want our resultant series to be filterable by each disk and have appropriately named fields.

In the templates option in the config, each line is made up of a filter, and then a template for aggregating the metric, and optional tags.
We want to group the metrics in the disktemp.* series together, and tag them to associate which disk is what.
So our filter is simple - all servers, all hosts, all and only disktemp - *.*.disktemp.*.
Now, we can correct the template - ignore servers, tag the host, tag the measurement, tag the disk, and tag the field - .host.measurement.disk.field.

All together, our template for disk temperature aggregation becomes "*.*.disktemp.* .host.measurement.disk.field". Chuck this into your config:
Code:
[[graphite]]
...omitted for brevity...
  templates = [
    "*.*.disktemp.* .host.measurement.disk.field",
    # Default template
    # "server.*",
  ]


And restart InfluxDB:
Code:
root@grafana:/ # service influxd restart


Wait a few moments for your NAS to report the disktemp metrics - mine takes 15-20s.
Let's now have a look in the database to see if it's correctly parsed our new metric:
Code:
root@grafana:/ # influx
Connected to http://localhost:8086 version 1.8.0
InfluxDB shell version: 1.8.0
> use graphite
> select * from disktemp group by * order by desc limit 1
name: disktemp
tags: disk=da0, host=moneta_local
time                temperature
----                -----------
1606067385000000000 28

name: disktemp
tags: disk=da1, host=moneta_local
time                temperature
----                -----------
1606067385000000000 29
...omitted for brevity...


You'll now see that all the disktemp metrics are aggregated into a single series - split up by the disk, with a field called temperature!

Head to Grafana now and create a new panel - below is the default query that Grafana shows you for your InfluxDB data source.
1606068856548.png


Note how we have an option in the FROM clause to select a measurement - we already set this in our template above using the measurement keyword - in our case, this is disktemp.
We also need to change the field selected in the SELECT clause to be the field we set in the template - in our case, this is temperature.

At this point, your graph will have roared into life, but it isn't showing each disk, just a mean of all the measurements.
So first off, we'll change the points to a continuous line by adjusting the GROUP BY's fill function to be linear.

Now we'll change the GROUP BY a little more to separate by the tag we added in our template - disk - you'll need to add tag(disk).

And as if by magic, you should (hopefully) see a graph containing a line for each disk's temperature.
1606069604411.png


You can see the legend has now changed from disktemp.mean to one for each tagged item in the series!
Let's make it a little less noisy - in the ALIAS BY clause of the query, tell Grafana to use the disk tag - $tag_disk.

Your final graph and query should look like this:
1606069804036.png


Here's a couple more templates for you to try and figure out the correct graph query - don't worry, Grafana should give you hints when you're entering the data.
Code:
    "*.*.cputemp.* .host.measurement.cpu.field",
    "*.*.uptime.* .host.measurement.field",
    "*.*.load.* .host.measurement..term.field",


5. Fin

Here's the final templates config I have for Influx: [Work in Progress!]

Code:
  templates = [
    "*.*.cputemp.* .host.measurement.cpu.field",
    "*.*.uptime.* .host.measurement.field",
    "*.*.load.* .host.measurement..term.field",
    "*.if_octets.* .host.interface.measurement.field..",
    "*.if_packets.* .host.interface.measurement.field..",
    "*.if_errors.* .host.interface.measurement.field..",
  ]


Thanks for reading and I hope you've successfully setup a beautiful metrics dashboard. If I've misquoted, misunderstood, or been just plain wrong, please let me know!


6. Further Reading / Sources
 

Attachments

  • 1606071158552.png
    1606071158552.png
    1.1 MB · Views: 869
Last edited:

f00b4r00

Dabbler
Joined
Jan 5, 2020
Messages
41
Personally I grew tired of trying to coax the graphite output into suitable templates, so in the end I ended up adding 4 little lines to enable native collectd output to remote influxdb configured with a collectd endpoint.

In /etc/local/collectd.conf one simply needs:
Code:
LoadPlugin network
<Plugin network>
    Server "host" "port"
</Plugin>


Done. Of course to make this stick across GUI config changes, the relevant file is /usr/local/lib/python3.7/site-packages/middlewared/etc_files/local/collectd.conf (on 11.3-U5); which will then need to be updated after each upgrade.

I can't quite explain why the ixsystems devs chose to favor Graphite over native collectd remote logging, but heh, it's easy enough to patch.

Now I can apply my standard collectd dashboards without having to pull hair ;-)
 

Pistus

Cadet
Joined
Aug 27, 2014
Messages
2
Thanks for the guide.
Struggling with the dataflow from the TrueNAS system to the jail with influx.
Server 192.168.0.50
Influx jail 192.168.0.62

Got this error:
Jan 2 14:55:48 TrueNAS 1 2021-01-02T14:55:48.446285+01:00 TrueNAS.workgroup collectd 11440 - - write_graphite plugin: Connecting to 192.168.0.62:2003 via tcp failed. The last error was: failed to connect to remote host: Connection refused

Any suggestions how to troubleshoot?

I'm running TrueNAS-12.0-STABLE
 

Jack828

Dabbler
Joined
Nov 11, 2020
Messages
16
Personally I grew tired of trying to coax the graphite output into suitable templates, so in the end I ended up adding 4 little lines to enable native collectd output to remote influxdb configured with a collectd endpoint.

In /etc/local/collectd.conf one simply needs:
Code:
LoadPlugin network
<Plugin network>
    Server "host" "port"
</Plugin>

Oh dang, nice find! I might add that as well - does it keep the Graphite reporting working as well?

I can't quite explain why the ixsystems devs chose to favor Graphite over native collectd remote logging, but heh, it's easy enough to patch.

Tell me about it!!

Got this error:
Jan 2 14:55:48 TrueNAS 1 2021-01-02T14:55:48.446285+01:00 TrueNAS.workgroup collectd 11440 - - write_graphite plugin: Connecting to 192.168.0.62:2003 via tcp failed. The last error was: failed to connect to remote host: Connection refused

Any suggestions how to troubleshoot?

I can only assume because the port is incorrect? InfluxDB is usually on 8086 iirc.
 

f00b4r00

Dabbler
Joined
Jan 5, 2020
Messages
41
Oh dang, nice find! I might add that as well - does it keep the Graphite reporting working as well?
Yes, you can have both. Collectd doesn't care.

I can only assume because the port is incorrect? InfluxDB is usually on 8086 iirc.
InfluxDB's Graphite endpoint listens on 2003 (Graphite's port) by default. The error may either be transient or suggest a configuration mismatch on the influx server
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
moneta% sudo jexec 18 tcsh
With iocage, the iocage console command is preferred, as it actually gives you root's login environment. And it doesn't need a number. iocage console grafana.
 

aschaapherder

Dabbler
Joined
Sep 11, 2015
Messages
14
Thanks, a very nice starting point!

One remark: you mention that grafana7 is the current version but your code says
Code:
root@grafana:/ # pkg install -y nano influxdb grafana

This does not cause an error because the package actually exists (version 1.something) but it should be
Code:
root@grafana:/ # pkg install -y nano influxdb grafana7

Simply
Code:
root@grafana:/ # pkg delete grafana

if you already installed it and
Code:
root@grafana:/ # pkg install grafana7


Still struggling with the disktemp example but that is for me to read up on.
 

aschaapherder

Dabbler
Joined
Sep 11, 2015
Messages
14
Got this error:
Jan 2 14:55:48 TrueNAS 1 2021-01-02T14:55:48.446285+01:00 TrueNAS.workgroup collectd 11440 - - write_graphite plugin: Connecting to 192.168.0.62:2003 via tcp failed. The last error was: failed to connect to remote host: Connection refused

Any suggestions how to troubleshoot?

I'm running TrueNAS-12.0-STABLE

Check if influxd is running
Code:
root@grafana:~ # service influxd status
influxd is running as pid 1497.

I intially has the same issue because there are a few more lines in the /usr/local/etc/influxd.conf. in the [[graphite]] section than shown above and I uncommented too much. As a result influxd did not start anymore.
 

sindreruud

Dabbler
Joined
Sep 21, 2017
Messages
24
Thanks for the guide.
Struggling with the dataflow from the TrueNAS system to the jail with influx.
Server 192.168.0.50
Influx jail 192.168.0.62

Got this error:
Jan 2 14:55:48 TrueNAS 1 2021-01-02T14:55:48.446285+01:00 TrueNAS.workgroup collectd 11440 - - write_graphite plugin: Connecting to 192.168.0.62:2003 via tcp failed. The last error was: failed to connect to remote host: Connection refused

Any suggestions how to troubleshoot?

I'm running TrueNAS-12.0-STABLE

Well, I just spent 30 minutes with the same error as you. And I just found out why the error happened to me atleast (spoiler, I'm an idiot)

I just uncommented the lines in the /usr/local/etc/influxd.conf as directed in the OP. However, I ended up with this, as it is the default config:
Code:
[[graphite]]
  # Determines whether the graphite endpoint is enabled.
  enabled = false


So make sure you just don't uncomment it, but also set it to enabled = true.

It's completely obvious, but I missed it and thought I'd just make a post about it incase anyone else missed it.
 

awat87

Cadet
Joined
Mar 9, 2021
Messages
8
hello,

first of all many thanks for your work. It looks so far quite well with me, I despair only with one thing ...

The CPU temperature is not displayed correctly. In Truenas the temperature is reported correctly.
 

Attachments

  • 2021-03-25_17-15-36.png
    2021-03-25_17-15-36.png
    62 KB · Views: 821
  • 2021-03-25_17-17-21.png
    2021-03-25_17-17-21.png
    224 KB · Views: 1,075

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Use math(), divide by 10 and subtract 273.15 ...

The measurements are in decigrade above absolute zero.
 

f00b4r00

Dabbler
Joined
Jan 5, 2020
Messages
41

NumberSix

Contributor
Joined
Apr 9, 2021
Messages
188
Hi
Things seem to fail quite early on in the process for me. When I reach:
Code:
root@grafana:/ # service influxd start
root@grafana:/ # service grafana start
neither succeed, both responding with:
Code:
service influxd start
influxd does not exist in /etc/rc.d or the local startup directories (/usr/local/etc/rc.d), or is not executable

- and the same with 'service grafana start'. Everything prior to this point went as expected.

Note I'm a beginner with Linux-like commands so there may be an easy solution here but I don't know it. I tried 'locate' to try to find where 'influxd' and 'grafana' are but that approach didn't help.

Anyone?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
@NumberSix
Code:
service -e
pkg info
inside the jail, please.
 

NumberSix

Contributor
Joined
Apr 9, 2021
Messages
188
Thank you so much Patrick.
I will try this as soon as I get back home (& report back on the results). I am so keen to get these packages up and running, so I am very appreciative of your support. Thank you.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Your jails are separate FreeBSD virtual machines. Enable SSH in them and log in directly into the jail for maintenance tasks.
 

ctowle

Dabbler
Joined
Apr 18, 2012
Messages
14
I am running TrueNAS 12.0 U2.1 and have everything install as above, but when I do the show series command, I am not seeing any data :(. I made sure I ran use graphite first. It seems the data isn't being pushed to the DB for some reason. I triple checked my influxd.conf file too. Has anyone seen this before? I am currently upgrading to U4.1 to see if that resolves anything. does it have to do with allow.sysvipc being deprecated?


EDIT2: for those that need to read it twice set influxd.conf enabled = true. It defaults to false.
 
Last edited:

kbrvfx

Dabbler
Joined
Dec 6, 2020
Messages
28
Hi! I have set the config and it seems that I have this problem. How do I resolve this? write_graphite plugin: Connecting to 192.168.x.x:2003 via tcp failed. The last error was: failed to connect to remote host: Connection refused

influxd.conf is configured similar to the suggested configuration.
Data source is working.
Remote Graphite Server Hostname set to Grafana's IP


NO DATA SENT AFTER THESE COMMAND LINES ENTERED
influx > use <your-database-name>; > show series;

I believed that those people above may help me resolve this issue but after understanding and doing tweaks according to how they fixed their issues, I am at my wit's end.
 
Last edited:
Top