TrueNAS-12.0-U8.1 SNMP Extend trouble

Joined
Mar 17, 2015
Messages
3
I have run into a bizarre behavior on one of my Truenas systems. Not yet sure if this is purely net-snmp related (and should be reported there) but it does seem isolated to only this single truenas system and I will explain why I'm suspecting truenas/freebsd to be involved somehow.

The troubled system:
=====
Running TrueNAS-12.0-U8.1. Physical host.
Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
32G ECC ram
2-nic LAGG configured with 3 vlan subinterfaces in addition to primary Lagg0 mgmt/nas traffic. All snmp traffic is transpiring via the Lagg0 interface via polling systems in the same vlan. All routing of ipv4/ipv6 is set to path via the lagg0 interface. The vlan interfaces just provide localized legs into other subnets for fileshares.

Basic SNMP works great on this system. Until I try to extend it with a specific python routine from the librenms project that reports zfs statistics. [https://docs.librenms.org/Extensions/Applications/#zfs]. This snmp-extend is supposed to reply via the NET-SNMP-EXTEND-MIB::nsExtendOutput2Table OID tree.

The trouble is, when this specific extend script is added to the snmpd config, this OID tree no longer replies. Non-extended OIDs such as SNMPv2-MIB reply just fine. Only the NET-SNMP-EXTEND-MIB::nsExtendOutput2Table OID tree is having trouble and only when it's referencing this specific reporting script.

Other extend scripts, such as the librenms freebsd-nfsserver stats python routine [https://docs.librenms.org/Extensions/Applications/#freebsd-nfs-server]
work absolutely fine. If I add this specific zfs statistic extension though, the entire extend OID tree stops responding.

Here's where things get weird though. I am running another truenas as a staging/test VM (same exact truenas revision and in same subnet/vlan) and the zfsstat extend WORKS on this system. I add in the requisite "extend zfs /usr/local/snmp/zfs-freebsd" statement to the snmpd config, and it replies just as expected.

So I have two "identical" truenas hosts in terms of revision, vlan, and configuration of snmpd....one works fine and the other dies not.

Filesystem locations and contents of the local routines are identical. Permissions are identical. Configurations of snmpd are identical. One host works, the other does not.

This is where I'm leaning toward some component of truenas/freebsd being the underlying source of the behavior. I have run tcpdumps on each host to see what is happening at the net traffic level when polling is succeeding and failing. The snmp polling reaches the troubled host just fine, and the host replies but the traffic does not seem to actually be egressing the host. I can see the requests & responses when I tcpdump on the truenas system, but if I run the tcpdump from the polling-host end, I only see the requests sending. No replies get back to the polling system. It's almost like the lagg is not physically transmitting the traffic back out onto the wire. The kernel sees the traffic being sent from a tcpdump perspective, but nothing on the other end of the wire ever receives it. It's almost like something in this one particular python reporting routine is causing the snmp replies to egress the system in a corrupted or mis-routed way.

The differences between the two truenas systems are:
=====
(not responding) physical host -vs- VM instance (working)
(not responding) lagg -vs- single vNIC (working)

This behavior really has me stumped. This specific snmp polling is the only component of the physical host that is exhibiting this weird behavior. Shares and other services are all working without issue.

Any ideas, thoughts, suggestions as to what else to check out or try, I would really appreciate it.
 
Top