ESXi 5.5 Network Performance Comparison with VMXNET and Intel EM

Status
Not open for further replies.

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Okay so this isn't exactly FreeNAS networking, but I thought I'd post a few results here since it is interesting and related.

On a nice new E5-2697 machine, 2.7GHz, I set up two FreeNAS instances for the purpose of experimenting with network performance. Being on the same machine, the network isn't limited to 1Gbps, and I was curious to see how Intel EM stacked up to VMXNET 2 and VMXNET 3, plus other variables like MTU. The tests were all unidirectional, 20 seconds, run with iperf, most run several times and eyeball-picked the average. The same settings were applied on each side. The goal was to give something that resembled the use model of a fileserver serving a single file now and then. I have pruned a lot of superfluous noise and am including just the "statistically interesting" results.

Other observations:

VMXNET 2 has some problem with 5.5. I don't know what, but it failed with both the built-in FreeNAS driver (which I believe is a derivative of Open VM Tools VMXNET2) and the VMware supplied version. It would work for just a bit then crap out.

With an MTU of 1500, basically it worked out that EM capped out around 2Gbps, and VMX3 at around 3Gbps, regardless of window size. For larger MTU, a 256K window seems to provide better throughput, but going further seems to degrade throughput.

Code:
em0, mtu 1500, default window        ~2Gbps
em0, mtu 9000, default window        ~1.7Gbps
em0, mtu 9000, 192K window        ~5.3Gbps
em0, mtu 1500, 256K window        ~2Gbps
em0, mtu 9000, 256K window        ~5.6Gbps
em0, mtu 9000, 384K window        ~4.4Gbps
em0, mtu 9000, 512K window        ~3.9Gbps
 
vmx3, mtu 1500, default window        ~3Gbps
vmx3, mtu 9000, default window        ~2Gbps
vmx3, mtu 9000, 192K window        ~2.3Gbps
vmx3, mtu 9000, 256K window        ~2.5Gbps
vmx3, mtu 1500, 256K window        ~3Gbps
vmx3, mtu 9000, 384K window        ~2.4Gbps


The unexpected surprise was that while VMXNET3 is more efficient than Intel EM for 1500 MTU, that the performance of VMXNET3 plummets substantially for large MTU use (3Gbps down to 2Gbps for default window), whereas Intel EM drops a little bit (2Gbps down to 1.7Gbps) but then explodes with the 256K window.

So, hey, if you're doing jumbo frames, try the Intel EM driver and a 256K window size.
 

NachoMan77

Dabbler
Joined
Sep 23, 2013
Messages
17
Interesting. I'm trying 5.5 myself.

BTW, what version of virtual hardware are you running FN on?

What about CPU usage?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Interesting. I'm trying 5.5 myself.

BTW, what version of virtual hardware are you running FN on?

8, for the purposes of this experiment. We're actually an ESXi 4.1 shop. 5.5 has turned out to be something of a disappointment. Features such as Flash Read Cache are only available under machine version vmx-10, and there's this incredible cliff that they've made in not supporting the legacy vSphere client for vmx-10. Funny? Really! You can actually bring up a console window but you can't mount a CD with the legacy client on a vmx-10 VM. Ugh. So I'm not looking forward to a future involving both the legacy client and the craptacular Web client. And the HTML5 console absolutely sucks compared to the legacy client.

Whoever at VMware thought I was going to welcome doing all my VM management through a web browser should be fired. I already have far too many tabs open all the time.

What about CPU usage?

It was nailing it pretty good, which was about what I expected.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Is 5.5 official? I keep hearing about it but I'm not finding any 5.5 update anywhere on VMWare's website.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It's been out for ... wanna say weeks now.

http://www.vmware.com/products/vsphere/features/esxi-hypervisor.html

And I do think it is a disaster in some ways. For example, I don't have any great love of Windows but we do use it for some things. Interestingly enough among them... administering vSphere 4.1 - which pretty much ONLY works on Windows. So we've got some XP boxes that are suitable.

So get this crap:

VMware switched the SSL code to require higher grade encryption (which Windows XP doesn't support). But there's a tech note on how to disable that - but only for vCenter Server. ESXi itself still mandates Vista or better to run the legacy client. And ESXi by itself doesn't support the new Web Client (featuring Adobe Flash Technology!). So if you have Free ESXi 5.5 and only an XP box, you are entirely out of luck - you cannot manage your system with the VMware tools. And if you have vCenter, then you get a different set of compromises. Coming from the rational and unified management environment that is vSphere 4.1, all I have to say is that 5.5 is teh suck.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I installed 5.5 on my system. I thought I could downgrade back to 5.1 easily. I was wrong. I had to reinstall ESXi from scratch and import my VMs. Total time to undo my 5.5 installation, about an hour.

But, I had serious problems. I kept getting the PSOD on my Linux VM. Worked perfectly on 5.1, but upon upgrading to 5.5 it would only start booting for a few seconds before PSOD for the box. Not cool at all.

Then its as you already discussed. The standalone client is being phased out in support of the web client. Basically the standalone client can do everything that the 5.1 client can do, but any new features require the web client. And apparently if you haven't paid for a license of ESXi then the webclient is a feature that isn't available to you. So basically anyone with the free ESXi license is stuck between a rock and a hard place.

Overall, I'll stick with 5.1 until VMWare figures out what they are doing. (and hopefully fix whatever problem is PSODing my box). :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Actually it isn't a matter of paid license for ESXi. It's a matter of using the vCenter Server, which is a separate machine that manages all your ESXi hosts. It used to be that you had to run it on a separate Windows machine, which involves some other nasty stuff like a machine and Windows Server. Now you can get the vCenter Server Appliance, which is a Linux VM(!) that has everything mostly ready to go or wizard-friendly. Except they still basically don't make proper SSL easy.

So anyways yeah the problem is that there's no obvious current path to get vmx-10 features available on free ESXi. It seems an odd choice to make when ESXi is slowly but inevitably losing its lead to Xen.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
It used to be that you had to run it on a separate Windows machine, which involves some other nasty stuff like a machine and Windows Server. Now you can get the vCenter Server Appliance, which is a Linux VM(!) that has everything mostly ready to go or wizard-friendly. Except they still basically don't make proper SSL easy.

Really? I'm running a 'home lab' kind of thing with a separate 2008r2 vm for vcenter server. Is the linux vcenter server appliance thing compatible with esxi 5.1, and accepting connections from the windows vsphere client? I'd love to dump the 2008r2 vm for a linux vm that does the same thing. But I still like running the vsphere windows client on my desktop.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well vcsa came out with 5.1, didn't it? 5.5 mainly embiggens the number of VM's etc it is supposed to handle. I have to assume it is compatible but as we are a 4.1 shop and I have no spare hosts to play with right now, I haven't tried 5.5->4.1 management, and no 5.1 at all.
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
VCSA was pretty much a mess under 5.x. They switched databases(from db2 to postgresql after the initial version or so) and the best way to update the beast was to simply delete it all and do a new install and setup everything again. That said from a home lab standpoint the vcsa needs too much ram and I can stick regular vcs onto a windows VM that does something else also.

The current 5.5 vcsa can manage a wide array of ESXi versions, it seems the general rule is your vcs needs to be the same version of your newest ESXi box your want it to manage. There is a vmware document that lists all the exact versions that every version supports. But vcsa wise 5.5 seems like the best version to run so far, though I personally have stuck to vcs on windows because you still have to run windoz & sqlserver for update manager, so might as well just run vcs on the same stack and be done with it(yes I'm lazy and like using update manager to push my updates when it's time to do them).

Nice work on the vmxnet testing, for now I'm still running the vmware driver that came with 5.1u1 in my FreeNAS vm, I'll do some tests with different mtu sizes and see if it matches what you got.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well our old 4.1 VCS is running on XP 64 bit, no longer supported by VMware. I'm the cheap sort where I feel we've given Microsoft too much $$$ over the years so I am fairly rigidly against spending cash on Windows Server. The 5.5 VCSA offers some reasonable value though.

If I could only figure out how to get local root CA SSL support working correctly...
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
So I did some benchmarking as well, system specs:

Xeon E3-1240v2, 32G ECC RAM, ESXi 5.5
I have one FreeNAS 9.1.1 VM with 16G of RAM, and a second Ubuntu 12.04 server VM with 4G of RAM.
This machine is set up as an all in one, where FreeNAS is exporting NFS back to ESXi to store VM images on. FreeNAS is using the vmxnet3 driver for networking, MTU is set to 1500. Sync is turned off at the zfs level for testing.

What prompted this was writes that I thought should be going quite a bit faster, from within my Linux VM doing a large dd write to disk was resulting in around 45MB/s of write bandwidth, whereas within FreeNAS itself I was seeing ~350MB/s of sequential write bandwidth. Read bandwidth within the VM was 260MB/s, which was within a factor of two of read bandwidth measured within FreeNAS. I started looking at the network and found some very strange behavior. Bandwidth incoming to FreeNAS was nearly 1/20th as limited as outgoing bandwidth, as measured by iperf:

Code:
[root@freenas] ~# iperf -i 10 -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[  4] local 10.128.0.20 port 5001 connected with 10.128.0.131 port 60554
[ ID] Interval      Transfer    Bandwidth
[  4]  0.0-119.4 sec  18.9 GBytes  1.36 Gbits/sec
 
[root@freenas] ~# iperf -t 120 -i 10 -c 10.128.0.131
------------------------------------------------------------
Client connecting to 10.128.0.131, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 10.128.0.20 port 33751 connected with 10.128.0.131 port 5001
[ ID] Interval      Transfer    Bandwidth
[  3]  0.0-120.0 sec  453 GBytes  32.4 Gbits/sec


Can anyone else confirm this? And better yet, does anyone have a suggestion on how to fix this? I don't know for sure that this is the cause of my slow write speeds, since they weren't even maxing out the 1.3Gb/s speeds, but I'm sure this overhead can't be helping.
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
David,

For what it's worth here is your iperf test run between my 2 FreeNAS VMs that are on physical separate servers that are connected via a SAN/vMotion only 10Gbe network. I've got about the same setup as you, both my VMs have dual vmxnet3 NICs, the 1st box is hosted on ESXi 5.1 and the 2nd is hosted on ESXi 5.5. Both VMs have 20GB of ram, 4 vCPU cores, and I've got NFS configured for 4 nfs servers to match the 4 cores.

[root@san] ~# iperf -i 10 -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.101.11 port 5001 connected with 192.168.101.111 port 59482
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 1.72 GBytes 1.48 Gbits/sec
[ 4] 10.0-20.0 sec 1.85 GBytes 1.59 Gbits/sec

[root@sandbox] ~# iperf -t 120 -i 10 -c 192.168.101.11
------------------------------------------------------------
Client connecting to 192.168.101.11, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.101.111 port 59482 connected with 192.168.101.11 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.72 GBytes 1.48 Gbits/sec
[ 3] 10.0-20.0 sec 1.85 GBytes 1.59 Gbits/sec

The above #s are about double of what I saw when I hit the "management" network side which is connected to a 1Gbe network.

As another reference point I get the following results on my CentOS 6 test mule VM. Which is just a stock install of CentOS6 with a 2nd unformated HD attached to it that sits on my FreeNAS SAN shared via NFS back to the ESXi. I'm running sync=standard on the ZFS share, which is keeping my write speeds caped, I did do a quick test with sync=disabled to show that the network can push more write bandwidth if I had a lower latency slog device.

sync=standard
[root@test ~]# dd if=/dev/zero of=/dev/sdb bs=4k count=2000000
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB) copied, 46.332 s, 177 MB/s
[root@test ~]# dd of=/dev/zero if=/dev/sdb bs=4k count=2000000
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB) copied, 14.4661 s, 566 MB/s

sync=disabled
[root@test ~]# dd if=/dev/zero of=/dev/sdb bs=4k count=2000000
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB) copied, 31.1689 s, 263 MB/s

Doing the same dd write test locally gets me 280 MB/s.

Unless my coffee hasn't kicked in enough yet I think you are simply seeing the standard effect of ESXi doing sync writes via NFS and the only cure for that is a low latency slog device and even then performance will still drop some. Writing & committing to disk can never be as fast as writing to RAM and then to disk later.
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
Hi pbucher, thanks for the response! See inline:

David,

For what it's worth here is your iperf test run between my 2 FreeNAS VMs that are on physical separate servers that are connected via a SAN/vMotion only 10Gbe network. I've got about the same setup as you, both my VMs have dual vmxnet3 NICs, the 1st box is hosted on ESXi 5.1 and the 2nd is hosted on ESXi 5.5. Both VMs have 20GB of ram, 4 vCPU cores, and I've got NFS configured for 4 nfs servers to match the 4 cores.



The above #s are about double of what I saw when I hit the "management" network side which is connected to a 1Gbe network.

1.6Gb/s seems pretty slow for a 10Gbe network, isn't this concerning?

As another reference point I get the following results on my CentOS 6 test mule VM. Which is just a stock install of CentOS6 with a 2nd unformated HD attached to it that sits on my FreeNAS SAN shared via NFS back to the ESXi. I'm running sync=standard on the ZFS share, which is keeping my write speeds caped, I did do a quick test with sync=disabled to show that the network can push more write bandwidth if I had a lower latency slog device.



Unless my coffee hasn't kicked in enough yet I think you are simply seeing the standard effect of ESXi doing sync writes via NFS and the only cure for that is a low latency slog device and even then performance will still drop some. Writing & committing to disk can never be as fast as writing to RAM and then to disk later.


That was a fast write from within your VM - this was a VM writing to its disk mounted via ESXi to FreeNAS within the same physical machine, correct? Have you tried doing the same test but to a disk mounted in FreeNAS on your other machine?

In the test I mentioned above I had sync set to disabled, so unfortunately that is not the culprit. I'm continuing to do some more testing today, but I am definitely continuing to see strange behavior (ie: http://forums.freenas.org/threads/single-nfs-write-stream-limited-at-70mb-s.16077/).

Also can you elaborate on why you have two vnics per VM? Also what does "I've got NFS configured for 4 nfs servers to match the 4 cores" mean? I was under the impression nfsd was multithreaded?

Thanks!
 

pbucher

Contributor
Joined
Oct 15, 2012
Messages
180
That was a fast write from within your VM - this was a VM writing to its disk mounted via ESXi to FreeNAS within the same physical machine, correct? Have you tried doing the same test but to a disk mounted in FreeNAS on your other machine?
Correct both the Linux VM & FreeNAS are on the same ESXi box for the dd tests. Hitting the FreeNAS on the other ESXi is hard to get good #s from because that's my main server and it's always got activity on it. I did get a result of 132MB/s doing the usual test so I'd say factoring in the drop of actually hitting the network + contention from other VMs it's about what I'd expect.

I do agree that 1.6Gbs seems rather slow, something isn't quite right. I redid the test from my test mule against FreeNAS on the same ESXi box so I'm not leaving the virtual network and I'm seeing 2.4Gbs. I get the same 1.6 going from the Linux to FN on other physical box.


In the test I mentioned above I had sync set to disabled, so unfortunately that is not the culprit. I'm continuing to do some more testing today, but I am definitely continuing to see strange behavior (ie: http://forums.freenas.org/threads/single-nfs-write-stream-limited-at-70mb-s.16077/).
My bad I missed that.

Also can you elaborate on why you have two vnics per VM? Also what does "I've got NFS configured for 4 nfs servers to match the 4 cores" mean? I was under the impression nfsd was multithreaded?
The docs say to configure the # of servers to match the # of cores and watching the processes in top nfs looks single threaded to me.
 
Status
Not open for further replies.
Top