Replication bottleneck on 10GbE

Status
Not open for further replies.

helloha

Contributor
Joined
Jul 6, 2014
Messages
109
I finally got data replication to work to my secondary server, but now I notice that the speed hovers at around 200MB/s while my pool can reach speeds of up to 700 MB/s over my 10GbE connection.

I am running low power CPU's in my setup and when i run TOP I see this:

Screen Shot 2017-04-20 at 19.48.31.png


Would the bottleneck be the CPU? I disabled encryption completely and stream compression is set to OFF (since it's mainly incompressible data).

THX!
K.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
How low-power? That CPU certainly looks maxed-out, but it may be waiting for memory or something.
 

helloha

Contributor
Joined
Jul 6, 2014
Messages
109

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What speed is the RAM running at? It could conceivably be the bottleneck.
 

helloha

Contributor
Joined
Jul 6, 2014
Messages
109
It's running at 1333Mhz. Could this be the issue? Because internally I can max out my pools. I even did stripe tests with 14 disks and could reach speeds up to 2400 MB/s.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
No, Avoton with dual-channel at 1600MHz can do at least 5Gb speeds with SMB, so triple-channel 1333MHz makes a RAM bottleneck unlikely (yes, it's a simplistic comparison, but Atom is a simpler core than Westmere and has less cache).

However, I am somewhat concerned that it's running so fast, since the CPU is only rated for 1066MHz and the microcode might not like that, even if the memory controller works fine (my i7-930 ran fine for years at 1600MHz). Wouldn't be causing your trouble, though.
 

helloha

Contributor
Joined
Jul 6, 2014
Messages
109
I could try to experiment with E5540's that I have around. But it seems only marginally faster. Is ssh multi threaded? It doen't seem like it.

But I don't feel like breaking my back again by moving 80 pound servers around... i'll see what I can do and if not, incremental backups don't take that long anyway...
 

helloha

Contributor
Joined
Jul 6, 2014
Messages
109
Can anyone tell me what happens when you interrupt the data replication? Does it completely start over next time it runs? Or does it work incremental where it continues where it left of?
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,477
To my knowledge, replication has to start all over. This is one advantage that rsync has over replication, if rsync is interrupted, it can pick right back up where it left off.
 

helloha

Contributor
Joined
Jul 6, 2014
Messages
109
When I'm sending the snapshot the SSH command is taking 75% of the cpu, would it be logical to assume that this also impacts transfer speeds when I use other protocols to pull data of the server simultaneously?

My snapshot takes about 8-10h to complete at 200 MB/S. (11TB). When making copies to a USB ssd that can write 350-400 MB/s I seem to hit 100MB/s.

Was wondering if upgrading the CPU would make much difference? I also have additional CPU's so I could populate the additional socket to see what happens. But don't know if that would make much difference?

Code:
last pid: 32115;  load averages:  2.91,  3.02,  3.05							 up 0+06:49:11  18:50:32

52 processes:  5 running, 47 sleeping

CPU:  7.0% user,  0.0% nice, 23.2% system,  3.7% interrupt, 66.1% idle

Mem: 686M Active, 38G Inact, 7231M Wired, 646M Cache, 242M Free

ARC: 5526M Total, 4242M MFU, 996M MRU, 33M Anon, 38M Header, 217M Other

Swap: 44G Total, 452M Used, 44G Free, 1% Inuse, 4K In, 1628K Out


  PID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	WCPU COMMAND

4922 root		  1  94	0 56752K  4812K CPU7	7 283:02  74.27% ssh

4920 root		  1  47	0 12340K  2716K CPU1	1 142:35  37.60% dd

4919 root		  1  52	0 12340K  2720K RUN	 6 161:57  36.28% dd

26750 root		  1  33	0   146M 15232K CPU0	0  12:37  20.36% afpd

4917 root		  2  27	0 48620K  2368K pipewr  3  54:41  13.57% zfs

4921 root		  1  27	0  9256K  1756K select  2  50:48  12.89% pipewatcher

23911 root		  1  22	0   142M  8576K select  3   3:53   3.96% afpd

2107 root		  1 -52   r0  6304K  2272K nanslp  4   1:57   0.20% watchdogd

11839 root		  1  20	0 65332K  9752K select  2   3:50   0.10% cnid_dbd

3619 root		  1  52	0   261M 17220K select  0   0:41   0.10% python2.7


Also not too sure on how to read the top command, is the CPU completely loaded? Because when I look under ZFS reporting it seems only under 40% load.

zfsgraph.png
 
Last edited:
Status
Not open for further replies.
Top