Slow Replication Speeds (Encryption & Compression Disabled)

Status
Not open for further replies.

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
Looking at my pool speed, the fact that it is empty, and knowing disk speed is my bottleneck (40G direct link between servers / iperf @ ~14G) ... I'd expect stronger performance than slightly north of 200 MB/s (think it peaked at 300 MB/s absolute tops).

Others have suggested to disable encryption and compression which I've done.

Pool = raidz2 in 2 vdevs of 6 x 10 TB Easystores ... With compression off / sync disabled it writes at ~ 900 MB/s.

Thoughts for how to speed it up? Thx!
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
Were you able to figure out what is causing the bottleneck?

Short Answer = Unfortunately not.

Thoughts (I was on my phone previously when I wrote my prior message):
  • Network: I never thought the network was the bottleneck; however, I figured the more headroom the better. I managed to get the direct connection (Mellanox Connect-x3 MCX354A-FCBT on each host, via VMXNET 3) speed up to 19G (from 14G) per iperf. Its safe to cross that off the list IMO.
  • zpool speed
    • Pull Pool: I have 12 WDC WD100EMAZ-00WJTA0 (10TB) configured as an encrypted RaidZ 3x4x10.0 TB and while my knowledge base limited my ability to benchmark, the average sequential write speed was 1,065 MB/s (sync=disabled).
    • Push Pool: I have 12 HGST HDN726060ALE610 (6TB) configured as RaidZ2 6x2x6.0 TB and they are able to read many orders of magnitude faster than I see the Pull Pool writing. Its safe to rule out disk speed as a bottleneck.
  • General
    • Pull = 2 x E5-2690 v2s + 200 GB ECC RAM (of 256 on host)
    • Push = 2 x E5-2690 v2s + 200 GB ECC RAM (of 256 on host)
    • ssh PID shows WPC ~80%, but that is @ ~75% idle, so even though single threaded ssh can be a limiting factor it shouldn't be at 2.8Ghz.
    • I'm at a complete loss here ...
[as always thanks for your kind follow up - and just as an fyi - if I ever managed to figure out the issue on my own, I would of course reply back to my own thread in case in comes up in search results and assists others down the road]
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
The VMNEXT3 is the most likely culprit. Use vmx instead.

Sorry for the delayed reply, but many thanks for your kind offer to assist by way of suggesting VMXNET3 could be the issue. :) Honestly, you caught me be surprise with your comment as I've used the VMXNET3 adapter since initially following a guide to virtualize FreeNAS back in Apr '17 and haven't had any issues to speak of. I did reference the guide and man page as suggested, however, due to the lack of chatter with more recent FN versions regarding the adapter + other trouble shooting, I've decided this can be ruled out as a culprit.

If I understand correctly WCPU is weighted, so sshd @ ~90% doesn't necessarily mean that CPU is maxxed out, right? I remain at a loss to explain current ~250 MB/s replication speeds over a 40G link, but if pinned down for a guess I would begrudgingly suggest that my bottleneck is single threaded SSH and a E5-2690v2 @ 3.0 Ghz and E5-2680 v2 @ 2.8 Ghz can only offer ~250 MB/s via replication. While I lack the experience / accumen of many on this forum, those are rather beefy processors, even though a few generations old, so my uncalibrated opinion is that I'd be surprised if that was all they could handle.

Note, I've included the output of top and zpool iostat to help support my above comments (presented in code tags embedded in spoilers below).

Any thoughts for further troubleshooting (this is driving me nuts)? Other than the fact that FreeNAS is virtualized, I'd suggest that I'm nearly precisely aligned with what is commonly accepted as FreeNAS "best practices" down to 512GB of ECC RAM (as one example).

Is it possible this is a bug and I should log it (I've never done so before and don't want to waste anyone's time)?

As always thanks for your assistance and I hope that you and your family are enjoying the holidays.

Code:
  
                                           capacity     operations    bandwidth
pool                                    alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
Tank1                                   61.7T  47.3T  1.96K    128   252M   693K
  raidz1                                15.2T  12.1T    501     38  62.4M   184K
    gptid/0fa60f93-0112-11e9-961a-000c292dd999.eli      -      -    316      9  20.7M   122K
    gptid/10d528a5-0112-11e9-961a-000c292dd999.eli      -      -    322      9  21.2M   123K
    gptid/120c3eaa-0112-11e9-961a-000c292dd999.eli      -      -    314      9  20.6M   122K
  raidz1                                15.3T  12.0T    500     29  62.6M   139K
    gptid/138a4938-0112-11e9-961a-000c292dd999.eli      -      -    319      8  20.9M  94.1K
    gptid/14ae4b9e-0112-11e9-961a-000c292dd999.eli      -      -    315      7  20.7M  93.1K
    gptid/15df0a8a-0112-11e9-961a-000c292dd999.eli      -      -    322      7  21.1M  92.6K
  raidz1                                15.6T  11.7T    502     27  63.6M   132K
    gptid/17543098-0112-11e9-961a-000c292dd999.eli      -      -    317      8  21.4M  88.1K
    gptid/18806e57-0112-11e9-961a-000c292dd999.eli      -      -    317      8  21.1M  86.2K
    gptid/d47bbd57-029c-11e9-83a2-000c292dd999.eli      -      -    319      8  21.2M  87.1K
  raidz1                                15.6T  11.6T    500     30  63.1M   153K
    gptid/1b22c086-0112-11e9-961a-000c292dd999.eli      -      -    313      7  20.6M  99.5K
    gptid/1c43521a-0112-11e9-961a-000c292dd999.eli      -      -    316      7  20.9M   101K
    gptid/1d6efe73-0112-11e9-961a-000c292dd999.eli      -      -    323      7  21.7M  99.9K
logs                                        -      -      -      -      -      -
  gpt/Opt-02_Log-01                     2.57M  19.5G      0      1      0  42.0K
  gpt/Opt-02_Log-02                     2.46M  19.5G      0      1      0  42.8K
Code:
                                           capacity     operations    bandwidth
pool                                    alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
Tank1                                   8.94T  56.1T      0  2.52K  3.33K   251M
  raidz2                                8.94T  56.1T      0  2.52K  3.33K   251M
    gptid/248423c5-0975-11e9-9693-000c2972d355.eli      -      -      0    254    409  27.7M
    gptid/25da1c0a-0975-11e9-9693-000c2972d355.eli      -      -      0    255    136  27.7M
    gptid/2720df96-0975-11e9-9693-000c2972d355.eli      -      -      0    270    136  27.8M
    gptid/2865d96e-0975-11e9-9693-000c2972d355.eli      -      -      0    269    477  27.8M
    gptid/29e6ad54-0975-11e9-9693-000c2972d355.eli      -      -      0    267    204  27.8M
    gptid/2b2bb2c6-0975-11e9-9693-000c2972d355.eli      -      -      0    267     68  27.8M
    gptid/2c73a254-0975-11e9-9693-000c2972d355.eli      -      -      0    259    477  27.7M
    gptid/2dca862d-0975-11e9-9693-000c2972d355.eli      -      -      0    256    272  27.7M
    gptid/2f40c6c3-0975-11e9-9693-000c2972d355.eli      -      -      0    275    204  27.8M
    gptid/309c5afc-0975-11e9-9693-000c2972d355.eli      -      -      0    275    272  27.8M
    gptid/31db5a01-0975-11e9-9693-000c2972d355.eli      -      -      0    273    409  27.8M
    gptid/333e2746-0975-11e9-9693-000c2972d355.eli      -      -      0    277    341  27.8M
logs                                        -      -      -      -      -      -
  gpt/Opt-02_Log-01                         0  19.5G      0      0      0  62.2K
  gpt/Opt-02_Log-02                      128K  19.5G      0      0      0  61.4K
Code:
last pid: 84086;  load averages:  3.70,  3.91,  3.57                                         up 0+20:55:37  14:04:48
62 processes:  2 running, 60 sleeping
CPU:  4.3% user,  0.0% nice, 22.5% system,  1.9% interrupt, 71.3% idle
Mem: 27M Active, 297M Inact, 873M Laundry, 189G Wired, 4571M Free
ARC: 178G Total, 144G MFU, 33G MRU, 2916K Anon, 954M Header, 275M Other
     171G Compressed, 206G Uncompressed, 1.20:1 Ratio
Swap: 4096M Total, 4096M Free

  PID USERNAME     THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
68894 root           1  96    0 18592K 13032K CPU1    1  17:44  78.63% ssh
68893 root           1  36    0  5228K  2072K select  3   4:48  21.49% pipewatcher
68891 root           2  32    0  9940K  3996K pipewr  2   4:01  17.88% zfs
Code:
last pid: 56875;  load averages: 21.76, 12.41, 11.47                                         up 0+09:50:45  14:05:00
62 processes:  2 running, 60 sleeping
CPU:  7.7% user,  0.0% nice, 27.1% system,  2.4% interrupt, 62.8% idle
Mem: 55M Active, 625M Inact, 301M Laundry, 189G Wired, 5567M Free
ARC: 180G Total, 29M MFU, 179G MRU, 484M Anon, 363M Header, 23M Other
     174G Compressed, 184G Uncompressed, 1.06:1 Ratio
Swap: 4096M Total, 4096M Free

  PID USERNAME     THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
42321 root           1  99    0 25212K 20804K CPU5    5  21:40  90.67% sshd
42324 root           1  38    0  7836K  4004K piperd  2   6:46  28.15% zfs
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Howdy, @svtkobra7 -- Hope you're doing well!

I've fought the slow rsync transfer problem myself... Not sure I've come up with an idealized solution, but FWIW, here is the wrapper script I use for rsync tranfers. It specifies some ssh options to disable encryption and compression. You're welcome to tinker with it to see if it helps.

Code:
#!/bin/sh
#######################################################################
# A wrapper for invoking rsync to copy a source dataset to a destination
# dataset with options that are 'known to work' with FreeNAS/FreeBSD
# and Windows datasets. This means we have to avoid the '-p' (preserve
# permissions) option in all of its forms, per bug report 7713:
#
# https://bugs.freenas.org/issues/7713
#
# Command-line parameters:
#   1: Source dataset       R_SRC     (local system)  /mnt/tank/sharename/
#   2: Destination dataset  R_DEST    (local system)  /volume1/sharename
#   3: Log file             R_LOGFILE (local system)  /mnt/tank/sysadmin/log/push-log.log
#
# R_SRC or R_DEST may include user ID and hostname specifier
# 
# Example usage, where 'boomer' is a FreeNAS server and 'bertrand' is
# a Synology Diskstation. Note that you should include a trailing slash
# only on the source dataset specifier:
#
# rsync-invoke.sh root@bertrand:/volume1/devtools/ /mnt/tank/devtools /mnt/tank/sysadmin/log/pull-from-bertrand.log
# rsync-invoke.sh /mnt/tank/devtools/ root@bertrand:/volume1/devtools /mnt/tank/sysadmin/log/push-to-bertrand.log
#
# Assumes SSH has been enabled and configured between the source
# and destination hosts.
#
# Invokes rsync with these SSH options to optimize transfer speed:
#
#   "-e ssh -T -c arcfour -o Compression=no -x"
#
# You may need to remove this altogether, or modify the encryption
# specifier (-c arcfour) to use a scheme available on your system.
#
# !!! WARNING !!!
# This script deletes files on the destination that don't exist
# on the source! Edit R_OPTIONS below and remove '--delete-during'
# and '--inplace' if you don't want this behavior!'
#                    '
# Tested with:
#   FreeNAS 9.3-STABLE
#   FreeNAS 9.10-STABLE
#   Synology DSM 5.x (as destination only)
#######################################################################

if [ $# -ne 3 ]
then
  echo "Error: not enough arguments!"
  echo "Usage is: $0 r_src r_dest r_logfile"
  exit 2
fi

R_SRC=$1
R_DEST=$2
R_LOGFILE=$3

# Options:
#   -r  recurse into directories
#   -l  copy symlinks as symlinks
#   -t  preserve modification times
#   -g  preserve group
#   -o  preserve owner
#   -D  preserve device and special files
#   -h  human readable progress
#   -v  increase verbosity

#   --delete-during   receiver deletes during the transfer
#   --inplace         write updated data directly to destination file
#   --log-file        specify log file

R_OPTIONS="-rltgoDhv --delete-during --inplace --progress --log-file="${R_LOGFILE}

# Files to exclude:
#   .windows      FreeNAS ACL settings (?)
#   vmware.log    VMware virtual machine log files
#   vmware-*.log
#   @eaDir/       Synology extended attributes (?)
#   @eaDir
#   Thumbs.db     Windows system files

# R_EXCLUDE="--exclude .windows --exclude vmware.log --exclude vmware-*.log --exclude @eaDir/ --exclude @eaDir --exclude Thumbs.db"
R_EXCLUDE="--exclude vmware.log --exclude vmware-*.log --exclude @eaDir/ --exclude @eaDir --exclude Thumbs.db"

echo "$(date) Copy" ${R_SRC} "to" ${R_DEST} >> ${R_LOGFILE}
rsync ${R_OPTIONS} ${R_EXCLUDE} -e "ssh -T -c none -o Compression=no -x" ${R_SRC} ${R_DEST}
echo "$(date) Copy completed" >> ${R_LOGFILE}
exit
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
Howdy, @svtkobra7 -- Hope you're doing well!
  • Long time, 'ol friend, I hope all is well with you and it is certainly great to hear from you!
  • Thanks very much for the assist - I will be sure to give it a go!
  • Any theories as to what was causing transfer speeds that didn't meet your expectations?
I've fought the slow rsync transfer problem myself... Not sure I've come up with an idealized solution, but FWIW, here is the wrapper script I use for rsync tranfers. It specifies some ssh options to disable encryption and compression. You're welcome to tinker with it to see if it helps.
  • May I ask why you decided to go with rsync vs. snapshot replication (I would think the latter would be faster)? Did you have speed issues with replication too?
  • And what does your replication schema look like if you don't mind me asking?
  • If rsync can get me better speeds, I'd definitely be inclined to move away from snapshot replication, and hopefully it can handle keeping 41.6 TiB in sync. I've played with rsync before when needing to move a bit of data and ended up saying "screw it" and turned one 826 into a makeshift JBOD chassis temporarily to speed up the process. Once cascaded off the expander, I found rclone to be much more performant than rsync which it is based on (I *think* rsync is single threaded / a PITA to multi-thread)?
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
  • Long time, 'ol friend, I hope all is well with you and it is certainly great to hear from you!
  • Thanks very much for the assist - I will be sure to give it a go!
  • Any theories as to what was causing transfer speeds that didn't meet your expectations?

  • May I ask why you decided to go with rsync vs. snapshot replication (I would think the latter would be faster)? Did you have speed issues with replication too?
  • And what does your replication schema look like if you don't mind me asking?
  • If rsync can get me better speeds, I'd definitely be inclined to move away from snapshot replication, and hopefully it can handle keeping 41.6 TiB in sync. I've played with rsync before when needing to move a bit of data and ended up saying "screw it" and turned one 826 into a makeshift JBOD chassis temporarily to speed up the process. Once cascaded off the expander, I found rclone to be much more performant than rsync which it is based on (I *think* rsync is single threaded / a PITA to multi-thread)?
I do use replication tasks for most of my file syncing chores... But sometimes rsync is handy.

I'm running an rsync test right now with the schema I post earlier. Sad to report that I'm only getting ~500Mb/s over my 10G network. :confused:

When that completes, I'll scrape away the test target data and try a replication job. We'll see if that runs any faster. I'll post results later on.

The only unusual (?) things I do with replication tasks is:
  • Replication Stream Compression = OFF
  • Encryption Cipher = DISABLED
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
I do use replication tasks for most of my file syncing chores... But sometimes rsync is handy.
  • Got it - thanks for clarifying.
I'm running an rsync test right now with the schema I post earlier. Sad to report that I'm only getting ~500Mb/s over my 10G network. :confused:
  • Jumbo frames??? :oops: (kidding - I'm sure you recall my epicly bad sense of humor)
  • I'd take 500 MB/s any day of the week and twice on Sunday compared to current.
  • Its a shame that I'm essentially only saturating 2 x 1G links (hypothetical) and I have a direct connection that supports 40G. I managed to get that link to ~19G via iPerf (FN to FN), but using iperf in ESXi (to ESXi) nets me ~33G+.
  • Note, I don't get better replication speeds when pushing across the switched 10G connection (and iperf nets ~9G+).
  • While I have plenty of headroom even at ~19G (obv pool's write speed which will saturate 10G should be the bottleneck), for some reason I'm missing some tuning in FN. Care to share tunables / iface options you are running?
When that completes, I'll scrape away the test target data and try a replication job. We'll see if that runs any faster. I'll post results later on.
  • I eagerly await to hear back (at your convenience o/c).
The only unusual (?) things I do with replication tasks is:
  • Replication Stream Compression = OFF
  • Encryption Cipher = DISABLED
  • Same here ... should help, not hurt o/c.

As an aside, would you agree with my assertion that clock speed isn't the bottleneck? Just curious.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
That's a lower-case 'b', so I'm only getting 500 mega-bits per second. I'd be jumping with joy if it was mega-bytes instead! :D

Turns out that replication runs about three times faster than rsync on my system; 1.5Gb/s vs 500Mb/s. Nowhere near the capacity of my network, but better that 1G Ethernet!

rsync-vs-replication.jpg
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
That's a lower-case 'b', so I'm only getting 500 mega-bits per second. I'd be jumping with joy if it was mega-bytes instead! :D
  • Sorry let me correct that ...
  • zfs set casesensitivity=sensitive svtkobra7
Turns out that replication runs about three times faster than rsync on my system; 1.5Gb/s vs 500Mb/s. Nowhere near the capacity of my network, but better that 1G Ethernet!
  • replication > rsync doesn't surprise me given prior experience (thanks for the follow up).
  • So we are in the same neighborhood in regards to snapshot replication speed, which may actually support the ssh CPU bound theory I dislike.
    • Was Bacon pushing (2.8GHz)? As I have a 2.8GHz processor pushing as well. And if you look at max throughput, it is 1.8G in my case and 1.7G in yours. But likely I'm looking to connect dots that aren't to be connected.
  • Do you have any theories, even if working, unsupported as to what is going on here?
In my case, clearly it isn't the network (as is well established) as FreeNAS has been pushing nearly 20G for the past hour ...
  • rx = iperf (vmx0 = 10G switched / vmx1 = 40G direct)
  • tx = snapshot replication, which interestingly isn't hindered noticeably with all of that traffic.

net.png
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
  • Sorry let me correct that ...
  • zfs set casesensitivity=sensitive svtkobra7

  • replication > rsync doesn't surprise me given prior experience (thanks for the follow up).
  • So we are in the same neighborhood in regards to snapshot replication speed, which may actually support the ssh CPU bound theory I dislike.
    • Was Bacon pushing (2.8GHz)? As I have a 2.8GHz processor pushing as well. And if you look at max throughput, it is 1.8G in my case and 1.7G in yours. But likely I'm looking to connect dots that aren't to be connected.
  • Do you have any theories, even if working, unsupported as to what is going on here?
In my case, clearly it isn't the network (as is well established) as FreeNAS has been pushing nearly 20G for the past hour ...
  • rx = iperf (vmx0 = 10G switched / vmx1 = 40G direct)
  • tx = snapshot replication, which interestingly isn't hindered noticeably with all of that traffic.

View attachment 27371

I do not have any theories as to why replication and rsync are both so slow compared to the 10G network infrastructure. I get iperf test results at near line rates -- greater than 9Gb/s between my two 10G systems, 'bandit' and 'boomer' -- but this isn't reflected in real-world performance.

That's why I've been watching your post. I'm hoping that someone more knowledgeable will come along and set us both straight. ;)
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
@dlavigne ... separate issue I wanted to call out for your benefit: Replication Tasks | Potential Bug + User Guide Clarification Suggestion

execsum: Encrypted pools + replication don't play nice and at reboot, due to the pool being locked, it triggers a complete re-sync from scratch. Unsure if this has been documented before (I couldn't find note of it), but unencrypted pools exhibit different behavior, as provided the last replication completed prior to reboot, replication continues incrementally, not from scratch. Minimially I'd suggest note of this be made in the user guide (even if this is a bit of an edge case) and optimally this is flagged as a bug and remedied.

Thanks in advance for any assistance you can provide.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Your respective systems leave my puny 10GBe network in the dust. However, in a first round of informal testing, I have found an 500GB L2ARC to be helpful. The time it took to clone a share went from about 6 hours to about 3 hours. I have yet to confirm that this behavior is consistent, however. I can't take credit for this solution - someone else recommended a L2ARC with rsync because it allegedly traverses all the directories over and over and that can take a lot of time if you have many small files.

I also want to see if setting the secondary cache to metadata only leads to more permanent improvements (i.e. 500GB is likely enough to capture all the metadata associated with a few terabytes of files). Of note, however, is that I have 64GB memory in this server, so FreeNAS gods hopefully approve of my system having a L2ARC in the first place. :)
 
Status
Not open for further replies.
Top