HP N36L FreeNAS to HP N36L FreeNAS rsync

Status
Not open for further replies.

nogi

Explorer
Joined
Jul 5, 2011
Messages
74
I have 2 microservers configured the same:
  • HP N36L Microserver
  • 8Gb ECC RAM
  • 4 x 3TB Hitachi 7K3000 HDDs (RAIDZ, ZFS)
  • HP NC360T Dual Gig NIC

I currently have the NC360T + on board configured in LAGG. As the on board does not support Jumbo frames, this isn't enabled so the 3 NICs are basically running in default mode. This is my current performance between the 2 boxes: http://www.bebetech.com/images/freenas/LAGG_Perf.tiff.

I am syncing around 6.2TB of data from one to the other and it's day 3 and only 30% of the way there.:(

Would I get better performance disabling the on board, only having the 2 nics on the NC360T in LAGG and enabling Jumbo frames?

iperf:
Code:
[nogi@KANDOR] /> iperf -c 192.168.1.200 -t 60 -i 10 -f M
------------------------------------------------------------
Client connecting to 192.168.1.200, TCP port 5001
TCP window size: 0.03 MByte (default)
------------------------------------------------------------
[  3] local 192.168.1.201 port 62528 connected with 192.168.1.200 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   981 MBytes  98.1 MBytes/sec
[  3] 10.0-20.0 sec  1000 MBytes   100 MBytes/sec
[  3] 20.0-30.0 sec   903 MBytes  90.3 MBytes/sec
[  3] 30.0-40.0 sec   987 MBytes  98.7 MBytes/sec
[  3] 40.0-50.0 sec  1024 MBytes   102 MBytes/sec
[  3] 50.0-60.0 sec   966 MBytes  96.6 MBytes/sec
[  3]  0.0-60.0 sec  5861 MBytes  97.7 MBytes/sec
[nogi@KANDOR] /> 
 

Milhouse

Guru
Joined
Jun 1, 2011
Messages
564
I rsync'ed data between two N36Ls (each with 8GB RAM, a single Intel CT NIC with onboard NIC disabled, and Samsung drives in both source and target), and achieved a consistent 65MB/second (650Mbits/s), so yes, something is definitely wrong. :)

I agree you should disable the bge interface as from my experience the Broadcom NIC is a major fail where FreeBSD is concerned - have written about it elsewhere.

Jumbo Frames might improve performance slightly, so worth a shot once you've got your baseline to where it should be.
 

louisk

Patron
Joined
Aug 10, 2011
Messages
441
Unless you have specifically configured your LAGG and switch, you will not see a benefit to using multiple links. The default behavior for LACP will give you additional throughput for additional clients, but not for a single client (what it sounds like you have in this senario). That said, you should be able to iperf at least 800Mbit (somewhere around 100MB). It looks like your iperf is pretty reasonable between 96MB and 102MB (multiply by 8 if you want to see Mbits). Unless you have some 10G interfaces, I think you're maxing out your 1G network with the iperf. I don't think that jumbo frames will make a difference, since you're already max'd out. That leaves the question of what is slowing things down with rsync. I would look at the "Display System Processes" while you're doing an rsync and see what percentage of the CPU is being consumed by SSH and what percentage is being consumed by rsync. If you're running 100% with ssh, that is your bottleneck, and I would suggest you setup the rsync module on the other machine and use the rsync protocol instead of the ssh protocol. You will see much higher speed w/o the overhead of the crypto. That said, if you're doing this over an untrusted network, you will probably have to live with the slow(er) speed because you don't want to compromise the integrity of your data.
 

nogi

Explorer
Joined
Jul 5, 2011
Messages
74
Unless you have specifically configured your LAGG and switch, you will not see a benefit to using multiple links.

I'm using a HP ProCurve switch which supports LACP and LAGG.


That leaves the question of what is slowing things down with rsync. I would look at the "Display System Processes" while you're doing an rsync and see what percentage of the CPU is being consumed by SSH and what percentage is being consumed by rsync.

On Sender box:
Code:
PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 3891 root          1 118    0 48692K 11204K CPU1    1  81.5H 98.97% rsync
 2266 root          7  44    0 65044K  7440K ucond   0   2:20  0.00% collectd
 2169 root          6  44    0   126M 65944K uwait   1   0:30  0.00% python
 1916 root          1  44    0 11776K  2380K select  0   0:13  0.00% ntpd
 2644 www           1  44    0 19324K  3636K kqread  1   0:10  0.00% lighttpd
 1712 root          1  44    0 38056K  4516K select  0   0:05  0.00% nmbd
 1716 root          1  44    0 46640K  6720K select  0   0:05  0.00% smbd
 8494 avahi         1  44    0 16932K  2400K select  0   0:03  0.00% avahi-daemon
 6821 root          6  76    0  5684K  1076K rpcsvc  1   0:03  0.00% nfsd
 2104 root          1  44    0 16084K  3284K select  0   0:02  0.00% proftpd
 4479 root          1  50    0  7832K  1304K nanslp  0   0:02  0.00% cron
 1348 root          1  44    0  6904K  1304K select  0   0:01  0.00% syslogd
41548 root          1  44    0 48692K  8128K select  0   0:01  0.00% smbd
 2733 root          1  76    0 64096K 23244K ttyin   0   0:00  0.00% python
 2558 root          1  44    0  7836K  1448K select  0   0:00  0.00% rpcbind
 6820 root          1  44    0  5684K  1244K select  0   0:00  0.00% nfsd
41549 root          1  44    0 46832K  7324K select  0   0:00  0.00% smbd
 1746 root          1  44    0 46628K  6648K select  0   0:00  0.00% smbd


On Receiver box:
Code:
PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
34420 root          1  57    0 48820K  6972K select  1 889:33 20.17% rsync
 2924 root          7  44    0 67092K  9900K ucond   1   3:44  0.00% collectd
 1942 root          6  44    0   170M   107M piperd  1   0:49  0.00% python
 2597 root          1  44    0 11776K  2876K select  0   0:23  0.00% ntpd
 2352 www           1  44    0 19324K  4176K kqread  1   0:19  0.00% lighttpd
34413 root          1  44    0 46772K  6256K select  1   0:12  0.00% rsync
29512 root          1  44    0 38048K  6580K select  0   0:07  0.00% nmbd
30867 avahi         1  44    0 16932K  2956K select  0   0:06  0.00% avahi-daemon
 7820 root          4  44    0  5684K  1236K rpcsvc  0   0:04  0.00% nfsd
 2096 root          1  49    0  7832K  1512K nanslp  0   0:03  0.00% cron
 1370 root          1  44    0  6904K  1520K select  0   0:02  0.00% syslogd
29516 root          1  44    0 46624K  9156K select  0   0:01  0.00% smbd
 2269 root          1  44    0  7836K  1644K select  0   0:01  0.00% rpcbind
 2441 root          1  46    0 64096K 24424K ttyin   1   0:00  0.00% python
 7819 root          1  76    0  5684K  1488K select  0   0:00  0.00% nfsd
23170 root          1  44    0 24972K  4636K select  0   0:00  0.00% sshd
 1024 root          1  44    0  3200K   712K select  1   0:00  0.00% devd
62212 root          1  44    0  9224K  2060K CPU0    0   0:00  0.00% top
 

louisk

Patron
Joined
Aug 10, 2011
Messages
441
Interesting that the performance isn't the same on both boxes.

Are both boxes configured the same way? For example, both have the same number of spindles configured in the same way using ZFS, both have the same number of network interfaces, aggregated the same way...
 

nogi

Explorer
Joined
Jul 5, 2011
Messages
74
Both boxes have identical configurations with the same model components. The only difference is that the box with higher processor usage is also serving shares to the network. Although, there wasn't much traffic at the time other than rsync.
 

nogi

Explorer
Joined
Jul 5, 2011
Messages
74
I'll post when I get home but essentially used this guide for FreeNAS to FreeNAS setup.
 

louisk

Patron
Joined
Aug 10, 2011
Messages
441
OK, I would start by doing the following:
1) break the lagg, pick a single interface on each system, configure it
2) run your rsync test
3) if you still have disperate performance metrics between the 2 boxes, login to the one with the higher cpu usage (to the console), run 'systat -vmstat'. give it about 30s. Look in towards the bottom left for the name of your interface. Look at the interrupt counters. Now login to the other box, run the same command. Compare the counters. If they are similar, we'll have to look somewhere else. type ':q' when you're done with systat to exit to a shell.
4) it could be that one interface is acting odd, you can try configuring different interfaces and see if they behave differently or the same.

I just helped a guy debug an issue where the interrupts were through the roof when copying files over the network. Turned out a bios patch solved the problem.
 
Status
Not open for further replies.
Top