TrueNAS Core as OpenVPN client loses connection after a while

rockybulwinkle

Dabbler
Joined
Aug 2, 2021
Messages
25
Here is my setup:
I have a VPN server set up on an OpenWRT router on my home network, following these instructions: https://openwrt.org/docs/guide-user/services/vpn/openvpn/server

My home IP is dynamic, so I use an API my registrar provides (gandi) on my OpenWRT router to update a subdomain to keep pointing to my home address.

I have two TrueNAS servers, one is running Scale 22.02.0.1, the other Core 12.0-U8. Scale is my main server at home, Core is a replica off site. Core connects through OpenVPN to my router to have backups pushed to it.

This is almost working flawlessly, except my Core disconnects usually within 24 hours of coming up, and then I have to call someone to power cycle it.

Here are the settings I have in my OpenVPN client config. Note that there are no further options beneath <tls-crypt-v2>, just the key.
1654975069314.png


I see the following in my messages, repeating over and over again. This is log is from *just* before Core was power cycled, while it was failing to connect to my network. I replaced my home IP with xxx.xxx.xxx.xxx. Are there other logs I should check? Note, atlas is the name of my Core server.

Code:
Jun  7 18:02:06 atlas 1 2022-06-07T18:02:06.597214-07:00 atlas.lan openvpn_client 1355 - - Restart pause, 300 second(s)
Jun  7 18:07:06 atlas 1 2022-06-07T18:07:06.703711-07:00 atlas.lan openvpn_client 1355 - - Outgoing Control Channel Encryption: Cipher 'AES-256-CTR' initialized with 256 bit key
Jun  7 18:07:06 atlas 1 2022-06-07T18:07:06.703751-07:00 atlas.lan openvpn_client 1355 - - Outgoing Control Channel Encryption: Using 256 bit message hash 'SHA256' for HMAC authentication
Jun  7 18:07:06 atlas 1 2022-06-07T18:07:06.703758-07:00 atlas.lan openvpn_client 1355 - - Incoming Control Channel Encryption: Cipher 'AES-256-CTR' initialized with 256 bit key
Jun  7 18:07:06 atlas 1 2022-06-07T18:07:06.703765-07:00 atlas.lan openvpn_client 1355 - - Incoming Control Channel Encryption: Using 256 bit message hash 'SHA256' for HMAC authentication
Jun  7 18:07:17 atlas 1 2022-06-07T18:07:17.506162-07:00 atlas.lan openvpn_client 1355 - - TCP/UDP: Preserving recently used remote address: [AF_INET]xxx.xxx.xxx.xxx:1194
Jun  7 18:07:17 atlas 1 2022-06-07T18:07:17.506188-07:00 atlas.lan openvpn_client 1355 - - Socket Buffers: R=[42080->42080] S=[9216->9216]
Jun  7 18:07:17 atlas 1 2022-06-07T18:07:17.506203-07:00 atlas.lan openvpn_client 1355 - - UDP link local: (not bound)
Jun  7 18:07:17 atlas 1 2022-06-07T18:07:17.506209-07:00 atlas.lan openvpn_client 1355 - - UDP link remote: [AF_INET]xxx.xxx.xxx.xxx:1194
Jun  7 18:08:17 atlas 1 2022-06-07T18:08:17.768606-07:00 atlas.lan openvpn_client 1355 - - [UNDEF] Inactivity timeout (--ping-restart), restarting
Jun  7 18:08:17 atlas 1 2022-06-07T18:08:17.768683-07:00 atlas.lan openvpn_client 1355 - - SIGUSR1[soft,ping-restart] received, process restarting
Jun  7 18:08:17 atlas 1 2022-06-07T18:08:17.768703-07:00 atlas.lan openvpn_client 1355 - - Restart pause, 300 second(s)
Jun  7 18:13:17 atlas 1 2022-06-07T18:13:17.825727-07:00 atlas.lan openvpn_client 1355 - - Outgoing Control Channel Encryption: Cipher 'AES-256-CTR' initialized with 256 bit key
Jun  7 18:13:17 atlas 1 2022-06-07T18:13:17.825777-07:00 atlas.lan openvpn_client 1355 - - Outgoing Control Channel Encryption: Using 256 bit message hash 'SHA256' for HMAC authentication
Jun  7 18:13:17 atlas 1 2022-06-07T18:13:17.825784-07:00 atlas.lan openvpn_client 1355 - - Incoming Control Channel Encryption: Cipher 'AES-256-CTR' initialized with 256 bit key
Jun  7 18:13:17 atlas 1 2022-06-07T18:13:17.825791-07:00 atlas.lan openvpn_client 1355 - - Incoming Control Channel Encryption: Using 256 bit message hash 'SHA256' for HMAC authentication
Jun  7 18:13:28 atlas 1 2022-06-07T18:13:28.628430-07:00 atlas.lan openvpn_client 1355 - - TCP/UDP: Preserving recently used remote address: [AF_INET]xxx.xxx.xxx.xxx:1194
Jun  7 18:13:28 atlas 1 2022-06-07T18:13:28.628454-07:00 atlas.lan openvpn_client 1355 - - Socket Buffers: R=[42080->42080] S=[9216->9216]
Jun  7 18:13:28 atlas 1 2022-06-07T18:13:28.628469-07:00 atlas.lan openvpn_client 1355 - - UDP link local: (not bound)
Jun  7 18:13:28 atlas 1 2022-06-07T18:13:28.628475-07:00 atlas.lan openvpn_client 1355 - - UDP link remote: [AF_INET]xxx.xxx.xxx.xxx:1194
Jun  7 18:14:28 atlas 1 2022-06-07T18:14:28.334816-07:00 atlas.lan openvpn_client 1355 - - [UNDEF] Inactivity timeout (--ping-restart), restarting
Jun  7 18:14:28 atlas 1 2022-06-07T18:14:28.334912-07:00 atlas.lan openvpn_client 1355 - - SIGUSR1[soft,ping-restart] received, process restarting
Jun  7 18:14:28 atlas 1 2022-06-07T18:14:28.334934-07:00 atlas.lan openvpn_client 1355 - - Restart pause, 300 second(s)


I'm hoping someone can provide some insight into why my client is losing connection. In the meantime, I'm going to try to narrow down the exact time that it loses connection by pinging 1 packet a second and recording the date and seeing when it stops responding, then cross reference with all the log files.

Thanks!
 

rockybulwinkle

Dabbler
Joined
Aug 2, 2021
Messages
25
As a bandaid, so I don't have to bother people to restart the server, I set up a cron job to restart the vpn connection every 24 hours. "/usr/sbin/service openvpn_client restart".
 

Stick

Cadet
Joined
Mar 27, 2017
Messages
6
bumping for a very similar if not identical issue:

- Source (Open VPN Client pushing)- TrueNAS Core 13.0 RELEASE. CPU 1x Xeon(R) CPU E5-1620 v4 @ 3.50GHz. RAM 32gb ECC
- Destination (Open VPN Server receiving) - TrueNAS Core 13.0-U1. VMWare with HBA passthrough, 8 vcores 3.2ghz, 64gb ECC. Host R720xd 2x E5-2667v2 128gb RAM

Issue - after a relatively short period of inactivity the VPN tunnel seems to go down, and no further replication tasks succeed until the vpn client (source side) is restarted. Once a replication task is actively running, the tunnel stays up indefinitely with no issues, only to go down again once all traffic completes requiring another restart. I've worked around the issue by throttling traffic on my router so that the large replication task in essence never fully completes (the much smaller regular ones still complete in a timely manner). I've had it run for 4+ months this way with no problem, but as soon as the jobs complete, within 5-10 min or so, the tunnel goes down and my email is flooded with replication errors. I can't figure out why this is happening.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
bumping for a very similar if not identical issue:

- Source (Open VPN Client pushing)- TrueNAS Core 13.0 RELEASE. CPU 1x Xeon(R) CPU E5-1620 v4 @ 3.50GHz. RAM 32gb ECC
- Destination (Open VPN Server receiving) - TrueNAS Core 13.0-U1. VMWare with HBA passthrough, 8 vcores 3.2ghz, 64gb ECC. Host R720xd 2x E5-2667v2 128gb RAM

Issue - after a relatively short period of inactivity the VPN tunnel seems to go down, and no further replication tasks succeed until the vpn client (source side) is restarted. Once a replication task is actively running, the tunnel stays up indefinitely with no issues, only to go down again once all traffic completes requiring another restart. I've worked around the issue by throttling traffic on my router so that the large replication task in essence never fully completes (the much smaller regular ones still complete in a timely manner). I've had it run for 4+ months this way with no problem, but as soon as the jobs complete, within 5-10 min or so, the tunnel goes down and my email is flooded with replication errors. I can't figure out why this is happening.

Either you have a NAT gateway (incorrectly called a "router" by many) where the NAT entry for that session is expiring, causing it to drop, or OpenVPN is timing out due to the lack of traffic. Both of these can be fixed with OpenVPN settings. It might be helpful to know what sort of NAT gateway is in use on each end.
 

rockybulwinkle

Dabbler
Joined
Aug 2, 2021
Messages
25
Either you have a NAT gateway (incorrectly called a "router" by many) where the NAT entry for that session is expiring, causing it to drop, or OpenVPN is timing out due to the lack of traffic. Both of these can be fixed with OpenVPN settings. It might be helpful to know what sort of NAT gateway is in use on each end.
On my server's end, my devices are MB8611 modem <--> WRT3200ACM running OpenWRT <--> TrueNAS

The VPN is running on the WRT3200ACM.

I'm less certain of the network hardware on my client's end as I don't manage that network.

I've removed my domain and keys from below for obvious security/privacy reasons.

My client's options are:
Client Certificate: ovpn_client_cert
Root CA: ovpn_client_ca
Remote: subdomain.example.com
Port: 1194
Protocol: UDP
Device Type: TUN

The rest look like they're at their defaults.

My client's "additional parameters" are:
Code:
--ping 15
--ping-restart 300
--resolv-retry 300
--persist-tun
--persist-key
--remote-cert-tls server
<tls-crypt-v2>
-----BEGIN OpenVPN tls-crypt-v2 client key-----
-----END OpenVPN tls-crypt-v2 client key-----
</tls-crypt-v2>


The settings on my server:
Code:
user nobody
group nogroup
dev tun
port 1194
proto udp
server 192.168.8.0 255.255.255.0
topology subnet
client-to-client
keepalive 10 60
persist-tun
persist-key
push "dhcp-option DNS 192.168.8.1"
push "dhcp-option DOMAIN lan"
push "redirect-gateway def1"
push "persist-tun"
push "persist-key"
<dh>
-----BEGIN DH PARAMETERS-----
-----END DH PARAMETERS-----
</dh>
<tls-crypt-v2>
-----BEGIN OpenVPN tls-crypt-v2 server key-----
-----END OpenVPN tls-crypt-v2 server key-----
</tls-crypt-v2>
<key>
-----BEGIN PRIVATE KEY-----
-----END PRIVATE KEY-----
</key>
<cert>
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
</cert>
<ca>
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
</ca>
 

rockybulwinkle

Dabbler
Joined
Aug 2, 2021
Messages
25
bumping for a very similar if not identical issue:

- Source (Open VPN Client pushing)- TrueNAS Core 13.0 RELEASE. CPU 1x Xeon(R) CPU E5-1620 v4 @ 3.50GHz. RAM 32gb ECC
- Destination (Open VPN Server receiving) - TrueNAS Core 13.0-U1. VMWare with HBA passthrough, 8 vcores 3.2ghz, 64gb ECC. Host R720xd 2x E5-2667v2 128gb RAM

Issue - after a relatively short period of inactivity the VPN tunnel seems to go down, and no further replication tasks succeed until the vpn client (source side) is restarted. Once a replication task is actively running, the tunnel stays up indefinitely with no issues, only to go down again once all traffic completes requiring another restart. I've worked around the issue by throttling traffic on my router so that the large replication task in essence never fully completes (the much smaller regular ones still complete in a timely manner). I've had it run for 4+ months this way with no problem, but as soon as the jobs complete, within 5-10 min or so, the tunnel goes down and my email is flooded with replication errors. I can't figure out why this is happening.
A different workaround I've been using is to schedule a restart of the vpn service on my client shortly before its scheduled replication.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You might try adding a ping stanza on your side to keep your side sending traffic. This can have the effect of causing the NAT engine to keep the session alive.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I have a similar problem, but not on TrueNAS. It's on my vanilla FreeBSD transmission server that maintains a 24/7 NordVPN OpenVPN connection.
The way I solved it is by having a simple shell script that checks if the connection is up, and if it isn't, it just restarts the openvpn service. I then register that script in crontab and have it check every hour complete with logging.

From the logs, there doesn't seem to be a consistent pattern for the disconnection. The first time I deployed it, it lasted a day before the OpenVPN service needed to be restarted. After that, 2 days, after that it lasted for 5 days, then back down to 2 days.... *shrugs*. I still can't figure out the root cause, but the crontab is a fine work-around, so it's good enough for me as I can't be bothered to invest more time into fixing it the "right" way.
 

Stick

Cadet
Joined
Mar 27, 2017
Messages
6
NAT gateway (router) on each end is a Mikrotik RB4011iGS. I don't believe I have any configuration specific to the VPN on the routers beyond forwarding one UDP port on the server (receiving) end.

Not sure what settings should be checked OpenVPN wise within freenas? I've looked through several times and not seen anything that looks like it would be causing the connection to time out when idle.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Mikrotik RB4011iGS. I don't believe I have any configuration specific to the VPN on the routers beyond forwarding one UDP port on the server (receiving) end.

Well, there's your problem.

According to Mikrotik, their NAT timeout for established UDP sessions is three minutes.


udp-stream-timeout (time; Default: 3m)Specifies the timeout of udp connections that has seen packets in both directions

Not sure what settings should be checked OpenVPN wise within freenas? I've looked through several times and not seen anything that looks like it would be causing the connection to time out when idle.

The TrueNAS/FreeNAS OpenVPN instance is little more than a lightly GUI-wrapped bog standard OpenVPN. You would need to refer to the OpenVPN documentation. As someone who operates "high volume" (but not TrueNAS-based) OpenVPN setups, I would suggest looking at the ping stanza as a plausible starting point, followed by the keepalive stanza. Pay attention to what is showing up in the logs as reasons for disconnect.
 

Stick

Cadet
Joined
Mar 27, 2017
Messages
6
Thanks for the clues, and frankly everything you contribute to the community. My post count may be low, but I've found the solutions to countless issues and configurations over the years and more often than not it seems to be you contributing (or at the very least someone with your avatar!) so thank you for that. Took me a few days to report back since I had to wait for replications to finish to know if it worked, but keepalive did the trick.

For anyone else having this issue, here is what worked for me:
On the Open VPN Server side go to - Services -> OpenVPN Server -> Additional Parameters and add a line "keepalive 10 60"

As a suggestion: It would be nice for the GUI to have a checkbox to enable this option in the future. If routers are commonly dropping connections after a relatively short time, and a common expected use case for a VPN on a NAS box would be to replicate snapshots offsite, then it would make sense to have keepalive built in or at the very least noted in the configuration guide.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thanks for the clues, and frankly everything you contribute to the community.

Happy to be of some assistance.

As a suggestion: It would be nice for the GUI to have a checkbox to enable this option in the future. If routers are commonly dropping connections after a relatively short time, and a common expected use case for a VPN on a NAS box would be to replicate snapshots offsite, then it would make sense to have keepalive built in or at the very least noted in the configuration guide.

Would you kindly consider using the "Report a Bug" feature in the topbar to make this as a suggestion?

Having worked with OpenVPN since its inception, I'm not certain that a tickbox will be workable due to the need to have this set up on both sides of a connection. However, a mention of it in the configuration guide is a very good idea. You are obviously (now) familiar with how annoying this is. Turns out that it's not TrueNAS, not even really OpenVPN, but rather the Mikrotik. People struggle with discovering the root cause of their issues. I've seen people "solve" it by running continuous ping sessions and other similar hacks. There's clear reasoning for things being the way that they are on both the OpenVPN and Mikrotik sides of this, but it is nonobvious to most newcomers.
 

Stick

Cadet
Joined
Mar 27, 2017
Messages
6
Having worked with OpenVPN since its inception, I'm not certain that a tickbox will be workable due to the need to have this set up on both sides of a connection.
At least in my case, the only change I had to make was in the OpenVPN server configuration with no changes to client. So at least for this use case a GUI Setting in the OpenVPN server configuration window would have done the trick.

There's clear reasoning for things being the way that they are on both the OpenVPN and Mikrotik sides of this, but it is nonobvious to most newcomers.
Agreed, which is why I was hesitant to change the Mikrotik UDP timeout from 3m to something like 24h+ which is what I'd need to fix it on that end since I only do daily snapshots on weekends. I don't know WHY they chose a default of 3m, but I'm assuming there is a reason, and changing any default by a factor of ~500x is generally not a good idea if you don't really know what you are doing :smile:
 

ragametal

Contributor
Joined
May 4, 2021
Messages
188
I have a similar problem, but not on TrueNAS. It's on my vanilla FreeBSD transmission server that maintains a 24/7 NordVPN OpenVPN connection.
The way I solved it is by having a simple shell script that checks if the connection is up, and if it isn't, it just restarts the openvpn service. I then register that script in crontab and have it check every hour complete with logging.

From the logs, there doesn't seem to be a consistent pattern for the disconnection. The first time I deployed it, it lasted a day before the OpenVPN service needed to be restarted. After that, 2 days, after that it lasted for 5 days, then back down to 2 days.... *shrugs*. I still can't figure out the root cause, but the crontab is a fine work-around, so it's good enough for me as I can't be bothered to invest more time into fixing it the "right" way.
Would it be possible for you to share that script?
I'm facing this same problem and no matter what options i change, my openVPN connection keeps dropping every so often.

I'm curious to know which method you used to verify if the connection is up.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Would it be possible for you to share that script?
I'm facing this same problem and no matter what options i change, my openVPN connection keeps dropping every so often.

I'm curious to know which method you used to verify if the connection is up.
I sure can, but I want to tell you that there is a prerequisite for this script. My jail is setup with a firewall (I use pf) that makes it so that I ONLY have internet through the VPN tunnel. This makes it so that detecting if the connection is active or not is just as simple as loading ANY website. If it succeeds, then VPN is good, if not, VPN is down.

Anyway, here's the script:
Code:
#!/bin/sh

curl ifconfig.co $1> /dev/null
ret_val=$?

if [ $ret_val -ne 0 ]; then
    echo $0 `date`
    echo "VPN down, restart OpenVPN and transmission."
    echo "Restarting OpenVPN..."
    service openvpn restart
    echo "Restarting transmission..."
    service transmission restart
    echo
fi


I put this in a cron job that runs every 5 minutes. Here's the cron entry:
Code:
*/5 * * * * root /root/bin/check_vpn.sh >> /var/log/check_vpn.log


In this way, I can check /var/log/check_vpn.log for how often it has to restart the VPN tunnel.
On average, on my system, it seems to do it every 2 days, but not always. Here's what the logs typically look like:
Code:
/root/bin/check_vpn.sh Mon Jan 23 03:01:41 EST 2023
VPN down, restart OpenVPN and transmission.
Restarting OpenVPN...
Stopping openvpn.
Waiting for PIDS: 32037.
Starting openvpn.
Restarting transmission...
Stopping transmission.
Waiting for PIDS: 32073.
Starting transmission.

/root/bin/check_vpn.sh Tue Jan 24 12:50:23 EST 2023
VPN down, restart OpenVPN and transmission.
Restarting OpenVPN...
Stopping openvpn.
Waiting for PIDS: 47571.
Starting openvpn.
Restarting transmission...
Stopping transmission.
Waiting for PIDS: 47930.
Starting transmission.

/root/bin/check_vpn.sh Thu Jan 26 09:02:08 EST 2023
VPN down, restart OpenVPN and transmission.
Restarting OpenVPN...
Stopping openvpn.
Waiting for PIDS: 60560.
Starting openvpn.
Restarting transmission...
Stopping transmission.
Waiting for PIDS: 60660.
Starting transmission.

/root/bin/check_vpn.sh Sat Jan 28 19:20:56 EST 2023
VPN down, restart OpenVPN and transmission.
Restarting OpenVPN...
Stopping openvpn.
Waiting for PIDS: 78037.
Starting openvpn.
Restarting transmission...
Stopping transmission.
Waiting for PIDS: 78073.
Starting transmission.


Works quite well as you can see.
 

ragametal

Contributor
Joined
May 4, 2021
Messages
188
I sure can, but I want to tell you that there is a prerequisite for this script. My jail is setup with a firewall (I use pf) that makes it so that I ONLY have internet through the VPN tunnel.

This shouldn't be a problem as, just like you, I use a PF sense firewall where my OpenVPN server is installed. I have it set up so all the traffic from the client goes thru the OpenVPN tunnel as well, so i guess i can use your script as is.

I just need to find some spare time to go to the remote location where the Truenas with the OpenVPN client is installed.

Question, i see you are restarting the openVPN service with
Code:
service openvpn restart


since we are talking about the OpenCPN client, wouldn't it be better to restart just the client as opposed to the whole thing?
Code:
service openvpn_client restart
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Question, i see you are restarting the openVPN service with
Code:
service openvpn restart


since we are talking about the OpenCPN client, wouldn't it be better to restart just the client as opposed to the whole thing?
Code:
service openvpn_client restart
It's not restarting the "whole thing". OpenVPN can be setup as either a client or a service. openvpn is just the name of the executable script that's setup to run my NordVPN client. On FreeBSD, you can setup multiple scripts with OpenVPN as long as you give it a different name and it can be as arbitrary as openvpn_nordvpn. This way, you can have multiple servers and clients. My jail only has the client and nothing else, so I just use the default name. I suspect that it's setup with a different name on yours.

In other words, my plain "openvpn" IS the client.
 
Top