Performance was good suddenly woeful

Status
Not open for further replies.

solv

Cadet
Joined
Oct 24, 2011
Messages
6
I am having a sudden slowdown mainly with SAMBA but I'm not 100% sure it's the issue.

I have 3 network interfaces - 1 is for management the other two are link aggregated LACP mode - switch set to trunk port LACP

Things were humming along at about 60MB per second read and write to the samba share.
Someone screwed around with my switch and I had to reset to factory and reconfigure.

Suddenly SMB has dropped to 10MBps read and 4 MBps write.

I tested with iperf on both the management interface and the lagg - it appears the network side is okay but not really sure as I didn't use iperf before the issue to have anything to compare against - my iperf results below:

tech@opensuse:~> iperf -c 172.16.0.6 -i2 <---- (This is the management interface ip)
------------------------------------------------------------
Client connecting to 172.16.0.6, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 172.16.0.129 port 42805 connected with 172.16.0.6 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 2.0 sec 22.5 MBytes 94.4 Mbits/sec
[ 3] 2.0- 4.0 sec 22.5 MBytes 94.4 Mbits/sec
[ 3] 4.0- 6.0 sec 22.4 MBytes 93.8 Mbits/sec
[ 3] 6.0- 8.0 sec 22.5 MBytes 94.4 Mbits/sec
[ 3] 8.0-10.0 sec 22.4 MBytes 93.8 Mbits/sec
[ 3] 0.0-10.0 sec 112 MBytes 94.3 Mbits/sec
tech@opensuse:~> iperf -c 172.16.0.186 -i2 <----- (This is the lagg ip address)
------------------------------------------------------------
Client connecting to 172.16.0.186, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 172.16.0.129 port 33762 connected with 172.16.0.186 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 2.0 sec 166 MBytes 697 Mbits/sec
[ 3] 2.0- 4.0 sec 166 MBytes 696 Mbits/sec
[ 3] 4.0- 6.0 sec 166 MBytes 698 Mbits/sec
[ 3] 6.0- 8.0 sec 167 MBytes 699 Mbits/sec
[ 3] 8.0-10.0 sec 167 MBytes 700 Mbits/sec
[ 3] 0.0-10.0 sec 832 MBytes 698 Mbits/sec

tech@opensuse:~> iperf -c 172.16.0.6 -i2
------------------------------------------------------------
Client connecting to 172.16.0.6, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 172.16.0.129 port 42805 connected with 172.16.0.6 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 2.0 sec 22.5 MBytes 94.4 Mbits/sec
[ 3] 2.0- 4.0 sec 22.5 MBytes 94.4 Mbits/sec
[ 3] 4.0- 6.0 sec 22.4 MBytes 93.8 Mbits/sec
[ 3] 6.0- 8.0 sec 22.5 MBytes 94.4 Mbits/sec
[ 3] 8.0-10.0 sec 22.4 MBytes 93.8 Mbits/sec
[ 3] 0.0-10.0 sec 112 MBytes 94.3 Mbits/sec
tech@opensuse:~> iperf -c 172.16.0.186 -i2
------------------------------------------------------------
Client connecting to 172.16.0.186, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 172.16.0.129 port 33762 connected with 172.16.0.186 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 2.0 sec 166 MBytes 697 Mbits/sec
[ 3] 2.0- 4.0 sec 166 MBytes 696 Mbits/sec
[ 3] 4.0- 6.0 sec 166 MBytes 698 Mbits/sec
[ 3] 6.0- 8.0 sec 167 MBytes 699 Mbits/sec
[ 3] 8.0-10.0 sec 167 MBytes 700 Mbits/sec
[ 3] 0.0-10.0 sec 832 MBytes 698 Mbits/sec

If I do an ifconfig it appears the lagg0 link is working as expected

lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>

ether 00:1b:21:41:cd:73
inet 172.16.0.186 netmask 0xffffff00 broadcast 172.16.0.255
media: Ethernet autoselect
status: active
laggproto lacp
laggport: re0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

Lastly if I do a HDD speed test on my RAID-Z I get a measly 70 MB per second

dd if=/dev/zero of=/mnt/allstorage/tech/testfile bs=8192k count=1000
1000+0 records in
1000+0 records out
8388608000 bytes transferred in 114.727514 secs (73117666 bytes/sec)

So I'm kind wondering if the raid-z is the issue and where I can look to do further testing.

Any help appreciated
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You're using two different types of ethernet interfaces for the lagg?

You're only getting 700Mbps out of the lagg?

What's the point of the lagg, again, exactly? You might want to try seeing what happens with just em0 (presumably a halfway decent Intel interface known/expected to work well). The network is probably not your problem, but you may have created unnecessary complexity which may not be helping.

70MBytes/sec is not awful write performance for at least some use cases. You haven't said anything about the storage side of the equation yet you seem to feel that might be where the problem is, so all I can really say is that my N36L RAIDZ2 with WD Caviars is a bit slower than that.

8388608000 bytes transferred in 130.662268 secs (64200692 bytes/sec)
 

solv

Cadet
Joined
Oct 24, 2011
Messages
6
You're using two different types of ethernet interfaces for the lagg?

Yeah probably not a great idea but I have to put these things together at work on spare parts because my boss can't see the point in 'diy' anything

You're only getting 700Mbps out of the lagg?

Well remember I'm testing it from a PC that only has 1 ethernet port connected so it's perfectly acceptable...what I failed to test was two clients connecting to the nas concurrently running iperf to see if they both maintained that speed - will do so at some point.

What's the point of the lagg, again, exactly? You might want to try seeing what happens with just em0 (presumably a halfway decent Intel interface known/expected to work well). The network is probably not your problem, but you may have created unnecessary complexity which may not be helping.

There are about 5 techs at work who get isos, service packs, program installers etc etc on a fairly regular basis so I want them to get maximum speed if multiple connections are happening at the same time...plus just because I can =)
That said I had tested disabling the port trunking in the switch but had forgot to remove the lagg from the freenas. I did as you suggested and removed the lagg and just enabled the em0 NIC and now I'm getting good speed. Which is nice....but I hate it when something that has been working fine then suddenly stops!

70MBytes/sec is not awful write performance for at least some use cases. You haven't said anything about the storage side of the equation yet you seem to feel that might be where the problem is, so all I can really say is that my N36L RAIDZ2 with WD Caviars is a bit slower than that.
8388608000 bytes transferred in 130.662268 secs (64200692 bytes/sec)

I did some tweaking of the loader.conf to adjust the max mem and disabled the zil and now my disk performance has jumbed to 110 MBps so that is a nice outcome too.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Sounds like a win. :smile: I would note that the lagg thing will likely be less-well-tested on crummy hardware... typically the people that need lagg have already put a decent server-grade interface in their system and found it deficient, and I'd bet that such a system would have been the starting point for the birth of FreeBSD lagg, rather than someone trying to scrape together some rl and sis interfaces or something like that. That's not to say that it won't work or it shouldn't work, just that it's likely to be less well-tested and less supported.
 

solv

Cadet
Joined
Oct 24, 2011
Messages
6
Hey guys just thought I'd post an update on this.
I found the issue finally after all this time. It was my managed switch.
after a douche bag created a loop back on our network I enabled spanning tree protocol to detect loopbacks, but this seems to have had an effect on link aggregation.
It also causes problems with dchp for new hardware connecting on any port as it adds aa 15 second forwarding delay on packets while it checks for loopbacks.
So yeah turn of STP unless you really need it!

Not sure how to mark this as solved
 
Status
Not open for further replies.
Top