Long distance replication over VPN running 400Kbps

lambert

Cadet
Joined
Mar 30, 2020
Messages
9
I have replication running over a VPN from one strong FreeNAS box, site A, to another strong FreeNAS box, site B (1000 miles away). I have more than 50Mbps at each end. The VPN processing can and does handle 20 - 50 Mbps of traffic, when I send test traffic. Latency, inside the VPN is about 100ms. I don't normally need to run much traffic across it. The bottleneck becomes CPU utilization on the site B router. This a purely personal setup. Once synced up, there will be occasional new files. Something like new photo occasionally, I'm not a photographer. A few documents change every few days to weeks.

At one time, with the source FreeNAS box on a FreeNAS-mini, replication was using about 6Mbps, which was the artificial rate limit I put on it with QoS on the router. I was getting a lot of failed sync task messages, probably because it started from scratch and needs to move about 100GB. I didn't have the cycles to troubleshoot and just turned the sync process off.

After the FreeNAS-mini's main board died, I moved the disks to a duplicate of the the FreeNAS box at site B. 2009 model 2u Dell servers,
OS Version:
FreeNAS-11.2-U8
(Build Date: Feb 14, 2020 15:55)

Processor:
Intel(R) Xeon(R) CPU X5660 @ 2.80GHz (24 cores)

Memory:
128 GiB
RAIDZ with 4 2TB WD RED drives at site A and 6 2TB Dell SAS drives at site B, difference due to the drives coming from the mini. I started up the sync process and in two weeks, I've gotten about 6 GB of datasets synced to site B.

As far as I can tell, the sync and SSH processes are not CPU bound.

The replication process is set to replication the entire pool. That's probably a bad idea. I'm still learning how to deal with multiple FreeNAS boxes.
It's set for recursive replication. LZ4 compression, default encryption, ssh pubkey auth. Begin at 00:00:00, end 23:59:00.

What do I need to look at to further troubleshoot this?

This is an example of the failed sync messages:
Hello,
The replication failed for the local ZFS Pool1/iocage/releases/11.2-RELEASE/root while attempting to
apply incremental send of snapshot auto-20200419.0115-2m -> auto-20200420.0000-2w to 192.168.128.33
 

lambert

Cadet
Joined
Mar 30, 2020
Messages
9
The best proof-reading always happens 15 seconds after you hit submit.

"The replication process is set to replicate the entire pool."
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
The best proof-reading always happens 15 seconds after you hit submit.

"The replication process is set to replicate the entire pool."
The problem is the "iocage" folder.
"iocage" is the mount point location when Freenas starts the jails.
I believe it is safe enough to exclude it from replication if you have your data stored into their own dataset.
 

lambert

Cadet
Joined
Mar 30, 2020
Messages
9
So, odd thing.

iperf from site B to site A = fast. iperf site A to site B = average 419Kbps. iperf -P 8 site A to site B = about 3.2 to 3.4 Mbps, which is about 419kbps per connection.

So, something in my VPN devices is doing something stupid and somehow rate limiting each TCP connection to about 400Kbps.

This does not appear to be a FreeNAS issue at this time.
 
Joined
Dec 29, 2014
Messages
1,135
It could be anywhere along the path, or it could be as simple as TCP window size. The TCP window size is how much data the sender will transmit without receiving an acknowledgement. If there are long delays in the transport path, you may not use all of the available bandwidth because of the delays in the receipt of acknowledgements. It is important to also understand that rsync spends a lot of energy determining what needs to be sent. In my experience it almost never uses all the available bandwidth. I use it to do backups and I am almost never in a hurry because all my stuff really just lab gear. That means I just care that it works, not that it goes at maximum speed. I have 40G connections between my FreeNAS boxes, and it never uses more 10% of the available speed. I have not invested any time in trying to make it go faster.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
So, odd thing.

iperf from site B to site A = fast. iperf site A to site B = average 419Kbps. iperf -P 8 site A to site B = about 3.2 to 3.4 Mbps, which is about 419kbps per connection.

So, something in my VPN devices is doing something stupid and somehow rate limiting each TCP connection to about 400Kbps.

This does not appear to be a FreeNAS issue at this time.
This could be one of your ISP limiting upload speed on one side.
In Canada, there is a a provider under the name Shaw which has decent download speed. You can get the 50Mbs or the faster 500Mbs package but you will always be capable to upload at a max speed around 20-30Mbs.

Adding long distance and increased latency and you final throughput is taking a hit.
 

lambert

Cadet
Joined
Mar 30, 2020
Messages
9
This could be one of your ISP limiting upload speed on one side.
In Canada, there is a a provider under the name Shaw which has decent download speed. You can get the 50Mbs or the faster 500Mbs package but you will always be capable to upload at a max speed around 20-30Mbs.

Adding long distance and increased latency and you final throughput is taking a hit.

I am the ISP on the B end. I was the ISP on the A end, until three years ago. Small local ISPs using wireless to reach customers the big provider don't think are worth their time. I have about 100Mbps up at the source end. I'm further out in the sticks at the remote end, but it's more than sufficient for my needs.

My FreeNAS servers are in seperate VLANs from the laptops and phones network. Using iperf on my laptop (on WiFi) it's getting good speeds to site B (14 Mbps with other traffic happening on the far end) and the server here at site A (80 Mbps). So, I'm beginning to think the VPN is fine. Maybe the Netonix switch or the port on the router to the switch is having an issue. I'll have to wire the laptop into the server VLAN to see if I can isolate things further. Weird fun.
 
Top