Slow ZFS Replication Speeds

klingon55

Dabbler
Joined
Mar 2, 2017
Messages
12
Howdy!

First time having 2 systems and setting up replication through ZFS. We have 340TB that needs to be replicated at this time and I am unable to get over 1.1-1.2gbit out of a single replication task.

I have put in a ticket but I assume things are pretty backed up right now.
TrueNAS-12.0-U3
Setup: 2x TRUENAS-M40-HA with 4gbit link between datacenters
Ram: 128GB
CPU: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
Network: 2x 10gbit SFP per system (1 used currently)

Does anyone have an recommendations, seems like a few have had this issue but I haven't seen any resolution.
 

klingon55

Dabbler
Joined
Mar 2, 2017
Messages
12
Receiving System
last pid: 29607; load averages: 64.65, 44.86, 44.26 up 23+20:41:22 12:03:48 61 processes: 3 running, 57 sleeping, 1 zombie CPU: 2.5% user, 0.0% nice, 11.2% system, 0.6% interrupt, 85.8% idle Mem: 2812K Active, 1585M Inact, 670M Laundry, 118G Wired, 4037M Free ARC: 109G Total, 444M MFU, 107G MRU, 837M Anon, 1380M Header, 28M Other 104G Compressed, 115G Uncompressed, 1.10:1 Ratio Swap: 16G Total, 16G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 21363 root 1 95 0 32M 19M RUN 11 19:40 70.01% sshd 21365 root 2 23 0 18M 8788K piperd 19 1:46 6.08% zfs 617 root 48 20 0 768M 608M kqread 11 33.7H 1.30% python3. 48726 root 1 20 0 48M 30M CPU7 7 10:31 0.77% winbindd 29604 root 1 20 0 17M 6980K CPU11 11 0:00 0.60% top 92269 www 1 20 0 36M 9872K kqread 5 0:03 0.04% nginx 54032 root 1 20 0 25M 12M select 12 29:53 0.02% snmpd 4943 root 8 20 0 42M 13M select 2 50:46 0.01% rrdcach

Sending System
last pid: 76937; load averages: 34.70, 84.59, 63.12 up 45+02:03:40 15:04:53 98 processes: 5 running, 93 sleepingCPU: 8.2% user, 0.0% nice, 25.7% system, 1.1% interrupt, 65.0% idle Mem: 103M Active, 2285M Inact, 1412M Laundry, 117G Wired, 3474M Free ARC: 92G Total, 11G MFU, 76G MRU, 106M Anon, 2048M Header, 2810M Other 81G Compressed, 89G Uncompressed, 1.09:1 Ratio Swap: 16G Total, 16G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 68594 root 1 99 0 18684K 13996K CPU7 7 23:52 82.22% ssh 6702 root 1 52 0 19496K 15948K select 1 17.7H 62.89% sshd 82 root 99 76 0 1437M 1337M CPU18 18 252.8H 51.45% python3.7 68600 root 1 38 0 6252K 2056K CPU11 11 7:21 25.24% throttle 68601 root 2 23 0 10036K 4356K CPU4 4 2:24 8.08% zfs 6704 root 1 22 0 7852K 4088K piperd 8 74:33 4.55% zfs 148 root 5 21 0 192M 159M usem 17 406:45 1.62% python3.7 150 root 4 24 0 190M 159M usem 0 407:02 1.54% python3.7
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
What sort of pool layout are you using, and what kind of data is involved? The behaviour of replicating small files with lots of metadata from a RAIDZ will tend to be relatively slow, for example. What's the latency like? Have you done any tuning for 10 gigabit speeds? A different congestion control algorithm for wide area network use? Etc.
 

klingon55

Dabbler
Joined
Mar 2, 2017
Messages
12
What sort of pool layout are you using, and what kind of data is involved? The behaviour of replicating small files with lots of metadata from a RAIDZ will tend to be relatively slow, for example. What's the latency like? Have you done any tuning for 10 gigabit speeds? A different congestion control algorithm for wide area network use? Etc.
Iperf shows we can send 3.8gbit over our 4gbit intersite link.

We have 7 raidz2's with 8 12TB drives each. With a log and cache ssd setup as well.

The files being replicated are large contiguous files (100's of gigabytes each to terabytes each)

Latency between the 2 datacenters is 8ms over our 4gbit intersite link.

I am just trying to at least get 3gbit out of this as at this point it will take a very long time to sync.

If we setup multiple replication tasks we are able to saturate the 4gbit link however there are some very large datasets (150tb) that we can obviously not split into multiple replication tasks.
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
For the replication task, are you using NETCAT+SSH? Are you using Encryption?

First one will make it fast(er), second one will make it slow(er).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Receiving System
last pid: 29607; load averages: 64.65, 44.86, 44.26 up 23+20:41:22 12:03:48 61 processes: 3 running, 57 sleeping, 1 zombie CPU: 2.5% user, 0.0% nice, 11.2% system, 0.6% interrupt, 85.8% idle Mem: 2812K Active, 1585M Inact, 670M Laundry, 118G Wired, 4037M Free ARC: 109G Total, 444M MFU, 107G MRU, 837M Anon, 1380M Header, 28M Other 104G Compressed, 115G Uncompressed, 1.10:1 Ratio Swap: 16G Total, 16G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 21363 root 1 95 0 32M 19M RUN 11 19:40 70.01% sshd 21365 root 2 23 0 18M 8788K piperd 19 1:46 6.08% zfs 617 root 48 20 0 768M 608M kqread 11 33.7H 1.30% python3. 48726 root 1 20 0 48M 30M CPU7 7 10:31 0.77% winbindd 29604 root 1 20 0 17M 6980K CPU11 11 0:00 0.60% top 92269 www 1 20 0 36M 9872K kqread 5 0:03 0.04% nginx 54032 root 1 20 0 25M 12M select 12 29:53 0.02% snmpd 4943 root 8 20 0 42M 13M select 2 50:46 0.01% rrdcach

Sending System
last pid: 76937; load averages: 34.70, 84.59, 63.12 up 45+02:03:40 15:04:53 98 processes: 5 running, 93 sleepingCPU: 8.2% user, 0.0% nice, 25.7% system, 1.1% interrupt, 65.0% idle Mem: 103M Active, 2285M Inact, 1412M Laundry, 117G Wired, 3474M Free ARC: 92G Total, 11G MFU, 76G MRU, 106M Anon, 2048M Header, 2810M Other 81G Compressed, 89G Uncompressed, 1.09:1 Ratio Swap: 16G Total, 16G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 68594 root 1 99 0 18684K 13996K CPU7 7 23:52 82.22% ssh 6702 root 1 52 0 19496K 15948K select 1 17.7H 62.89% sshd 82 root 99 76 0 1437M 1337M CPU18 18 252.8H 51.45% python3.7 68600 root 1 38 0 6252K 2056K CPU11 11 7:21 25.24% throttle 68601 root 2 23 0 10036K 4356K CPU4 4 2:24 8.08% zfs 6704 root 1 22 0 7852K 4088K piperd 8 74:33 4.55% zfs 148 root 5 21 0 192M 159M usem 17 406:45 1.62% python3.7 150 root 4 24 0 190M 159M usem 0 407:02 1.54% python3.7

It seems pretty clear that you have a bottleneck with SSH. SSH is good if you need the encryption, but if you already have a 4gbit circuit, ditch the encryption and see what happens.
 

klingon55

Dabbler
Joined
Mar 2, 2017
Messages
12
For the replication task, are you using NETCAT+SSH? Are you using Encryption?

First one will make it fast(er), second one will make it slow(er).
We are just using SSH, we do use encryption due to the type of data we have stored. I will try netcat+ssh.
 

klingon55

Dabbler
Joined
Mar 2, 2017
Messages
12
As an update, switching to netcat+ssh gave us 1.7gbit. Support reached out and provided some extra tuneables that increased that to 2.8gbit due to the 8ms of latency which is good enough for us. It took us from 14 days to sync 150tb down to 5.

Code:
net.inet.tcp.cc.algorithm
cubic
sysctl


cc_cubic_load
YES
loader
 
Top