VMware virtual disk 2.0 only 300.000MB/s

ERM-Consulting

Dabbler
Joined
Sep 6, 2020
Messages
14
I am wondering why nobody discusses the performance of VMware virtual disk 2.0 within freebsd.
I checked many many posted dmesg outputs in this forum as well as in freebsd and xigmanas: all posted protocols are showing reports like following:

cd0 at ahcich0 bus 0 scbus2 target 0 lun 0
cd0: <NECVMWar VMware SATA CD00 1.00> Removable CD-ROM SCSI device
cd0: Serial Number 00000000000000000001
cd0: 600.000MB/s transfers (SATA 3.x, UDMA2, ATAPI 12bytes, PIO 8192bytes)
cd0: 520MB (266420 2048 byte sectors)
da0 at mpt0 bus 0 scbus32 target 0 lun 0
da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 8192MB (16777216 512 byte sectors)
da0: quirks=0x140<RETRY_BUSY,STRICT_UNMAP>
da1 at mpt0 bus 0 scbus32 target 1 lun 0
da1: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device
da1: 300.000MB/s transfers
da1: Command Queueing enabled
da1: 22887268MB (46873125584 512 byte sectors)
da1: quirks=0x140<RETRY_BUSY,STRICT_UNMAP>

It is obvious that all VMware Virtual disks 2.0 only provide 300.000MB/s.
Even the provided CD-ROM provides 600.000MB/s.

In my case (example) we are using 12Gb/s disks in esxi and provide RAID6 based datastores to VM.

We lose about 1/4 of performance due to VMware Virtual compared to passthrough disks

=> Yes, ZFS and direct disk access is recommended, but why does nobody wonder about this poor performance?
=> Does anybody know a solution for this issue?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
Is this an issue at all? I mean, have you measured the performance? VMware emulates a particular hardware SCSI controller model and connected drives. FreeBSD knows this technology to be 300 M/s. Hence the boot message. I wouldn't be surprised of you got higher speeds in reality - as long as the underlying real hardware can deliver them ...
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
It is obvious that all VMware Virtual disks 2.0 only provide 300.000MB/s.

That's "obvious" to you?

Or are those just the flags that are passed to the mpt device driver, which VMware is emulating?

When you are using a device that is emulated, you inherit the behaviours of the driver that goes along with it. For example, the mpt SAS driver in FreeBSD conveniently tells you what speed the device is attached at. But since the device is not real, and yet the driver is designed for real devices, the fastest of which are 300MB/s, VMware decided to have it report the fastest speed available.

It is clearly capable of going much faster than that, and if you stick an mpt-based vmdk on something like an NVMe backed datastore, it will happily go in the multi-gigabyte-per-second range.

This is just like how the em0 driver goes "much faster than 1Gbps".

We lose about 1/4 of performance due to VMware Virtual compared to passthrough disks

That, on the other hand, makes complete sense, as it obviously should.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
[mod note: this is a basic virtualization theory discussion, and has nothing to do with TrueNAS or even FreeBSD performance. As such, I am moving it to off-topic -jg]
 

ERM-Consulting

Dabbler
Joined
Sep 6, 2020
Messages
14
Facts:
Attaching same disks (Seagate 12GB/s), same server (Dell R730) with HBA330 per passthrogh to trueNAS in VMware we get 1,67 GB/s
copying from one disk to the other.

Using same disks, same server via Raid-controller H730 as "VMware Virtual Disk 2.0" we get 0,1 GB/s copying from "disk" to "disk" in best cases.
CPU host is at 5%, CPU VM (4 Cores) 11%
gstat:
Storage6 - gstat V01.jpg
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
But that is not due to any artificial 300 M/s limit. It's due to the fact that FreeBSD thinks it's talking to a SCSI HBA with all the protocol overhead, queueing and interrupt handling necessary, and VMware pretending to be just such an HBA in software painstakingly emulating the other side of that complex and completely unnecessary protocol, and finally figuring out which block on the real disk FreeBSD wanted in the first place.

That's why paravirtualisation exists.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Facts:
Attaching same disks (Seagate 12GB/s), same server (Dell R730) with HBA330 per passthrogh to trueNAS in VMware we get 1,67 GB/s
copying from one disk to the other.

Using same disks, same server via Raid-controller H730 as "VMware Virtual Disk 2.0" we get 0,1 GB/s copying from "disk" to "disk" in best cases.
CPU host is at 5%, CPU VM (4 Cores) 11%
gstat:
View attachment 45335

So you use a crappy RAID controller which gives you a slowish datastore and you don't get a lot of speed. And you're using disks as well.

One of the problems with hypervisors is that you get a lot of complicated effects. For example, if you create a brand new thin provisioned disk on a datastore, and attach that to a FreeBSD VM:

mpt0: Rescan Port: 0
da1 at mpt0 bus 0 scbus2 target 1 lun 0
da1: <VMware Virtual disk 1.0> Fixed Direct Access SCSI-2 device
da1: 320.000MB/s transfers (160.000MHz, offset 127, 16bit)
da1: Command Queueing enabled
da1: 32768MB (67108864 512 byte sectors)
da1: quirks=0x140<RETRY_BUSY,STRICT_UNMAP>

Now I run a dd to see how "fast" that is:

# dd if=/dev/da1 of=/dev/null bs=1048576
32768+0 records in
32768+0 records out
34359738368 bytes transferred in 11.343339 secs (3029067469 bytes/sec)

WOW HOLY MACKEREL THREE GIGABYTES PER SECOND!!!

But what's happening here is that ESXi's thin provisioning code is creating artificial zero-filled blocks and then feeding it via the mpt emulator over to FreeBSD. This, incidentally, is the classic proof that your original post's point is simply wildly incorrect. There is no artificial throttle at 300MByte/sec. FreeBSD is showing you that it can actually communicate an entire order of magnitude faster than 300MBytes/sec via the mpt driver.

But this is a highly optimized case. In actual practice, what actually happens on a datastore is much more complicated, and rarely performs this well. When you ask a vmdk to read what appears to be a sequential list of LBA's, you don't know how that was actually stored. If it was written noncontiguously, it isn't going to read it as fast as a raw disk device would, because there are seeks. If you've taken snapshots, you again have a situation where you will have noncontiguous data. If you used thin provisioning, this is very likely to result in noncontiguous data. All of that results in seeks, which significantly impacts performance.

If you are expecting a vmdk to perform similarly to a raw disk device, you need to start with an empty datastore, and you need to use thick provisioned eager zeroed mode, and you must not take snapshots. In such a case, you can get a good fraction of the speed of the underlying hard disks out of the vmdk, I'm guessing you might be able to get 50-75%, I'd be pleased to see even more, but I'm kinda cynical.

But even then, the fastest hard disks out there only deliver about 250MBytes/sec at peak speed, so that's still going to be less than 300MBytes/sec.

There's other stuff in your posts above that are clearly incorrect; Seagate does not make 12GB/s disks (although they do make 12Gbit/sec SAS disks), and it isn't possible to get 1.67GBytes/sec copying disk-to-disk unless there's some sort of caching going on.

Anyways, I've demonstrated that the "300MByte/sec" SCSI channel actually works at ten times that on the FreeBSD side, so any complaints you have are actually related to ESXi datastore performance on the ESXi backend. You are welcome to continue to discuss ESXi with us here in the off-topic forum. I don't really have a lot of time to be bringing folks up to speed on the ins and outs of virtualization I/O, but we do have a number of exceptionally knowledgeable and experienced people with lots of clue, including @Patrick M. Hausen and others.

As a final note, one of the reasons a lot of ESXi admins end up here is because they want to make use of ZFS to accelerate their virtualization environment. ZFS is able to do a lot of crazy things for ESXi performance, but you have to throw quite a bit of resources at it.
 

ERM-Consulting

Dabbler
Joined
Sep 6, 2020
Messages
14
But that is not due to any artificial 300 M/s limit. It's due to the fact that FreeBSD thinks it's talking to a SCSI HBA with all the protocol overhead, queueing and interrupt handling necessary, and VMware pretending to be just such an HBA in software painstakingly emulating the other side of that complex and completely unnecessary protocol, and finally figuring out which block on the real disk FreeBSD wanted in the first place.

That's why paravirtualisation exists.

Do I misunderstand your reference to paravirtualisation?
I chose the "VMware paravirtual SCSI" but cannot access VMware paravirtual SCSI disks provided to TrueNAS.
Is TrueNAS working with paravirtual SCSI disks as a client VM?
It is said, VMware paravirtual SCSI might be available with freeBSD 13 - so it should not work.
But TrueNAS-VMware Whitepaper contains the advice: "Configure your VM using the VMware paravirtual SCSI controller"

Can you clarify this point?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
I was referring to paravirtualisation in general. Because it is utter nonsense to painstakingly emulate a piece of outdated hardware from the bottom while implementing all the logic to drive that hardware from the top.

FreeBSD can use paravirtualised disks in bhyve or KVM just fine. If VMware decides not to implement VirtIO like everybody else, that's their problem.

My main point was that the transfer is not limited to 300M/s because the diagnostic message of a driver for a piece of hardware that is not even there says so ...
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
This sounds a lot like how the VMXNET3 adapter is "Limited to 10Gb/s" but in reality, close to 20Gb/s is the usual speed achieved
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
This sounds a lot like how the VMXNET3 adapter is "Limited to 10Gb/s" but in reality, close to 20Gb/s is the usual speed achieved

Where does it say it is limited to 10Gb/s?

Random rant:

vmxnet3 sucks. Beware vmxnet3. It is awesome until it isn't.

After having run into similar bad behaviour years ago with vxn (FreeBSD VMXNET2 driver), we had deployed some test hosts using VMXNET3 instead of E1000 doing complex infrastructure stuff. For whatever reason, this seemed to work fine for quite some time, so I had signed off on using this in production on a limited basis. All seemed reasonably happy, until recently when we had to do a full evacuation of a hypervisor for hardware updates, and a bunch of VMXNET3 FreeBSD VM's started experiencing massive packet loss and/or extreme latency. Examination revealed that packets were being corrupted, usually truncated, and led to

https://kb.vmware.com/s/article/2039495

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236999

which seemed likely to be related in various ways. Exasperating.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
Where does it say it is limited to 10Gb/s?
Like with the virtual SCSI disk that started this thread the kernel simply logs the detection and activation of the device with a "10 Gbit/s" speed info. No connection to any real number whatsoever. I bet this is just hardcoded into the driver ... the message, I mean.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
So it doesn't actually claim to limit it to 10Gbps, it just claims a 10Gbps link speed.

I'm sorta trying to make the point that it just adds more confusion for @HarambeLives to insert the words "limited to" in front of that link speed, when really it is just a reported link speed which is just garbage that has been hardcoded in the driver, as @Patrick M. Hausen notes.

Given the relative care with which net/if_media.h seems to take in trying to accurately describe so many media possibilities, I really have no idea why some sort of "virtual" media type hasn't been added and why it wouldn't be reported in that manner, but I'm old and cynical about how logic does or doesn't get applied to these sorts of issues these days. :-/

Noticed in passing, if_media.h now lists devices as fast as 400Gbps in FreeBSD 13...
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
Given the relative care with which net/if_media.h seems to take in trying to accurately describe so many media possibilities, I really have no idea why some sort of "virtual" media type hasn't been added and why it wouldn't be reported in that manner, but I'm old and cynical about how logic does or doesn't get applied to these sorts of issues these days. :-/
"Hey, what speed should we make that virtual device report?"

"What's the fastest anyone has got in the datacenter today?"

"1 gig per second."

"Then let's report 10 gig - that's infinity for all practical considerations."


Narrator: "10 years later ..."
 

ShimadaRiku

Contributor
Joined
Aug 28, 2015
Messages
104
Where does it say it is limited to 10Gb/s?

Random rant:

vmxnet3 sucks. Beware vmxnet3. It is awesome until it isn't.

After having run into similar bad behaviour years ago with vxn (FreeBSD VMXNET2 driver), we had deployed some test hosts using VMXNET3 instead of E1000 doing complex infrastructure stuff. For whatever reason, this seemed to work fine for quite some time, so I had signed off on using this in production on a limited basis. All seemed reasonably happy, until recently when we had to do a full evacuation of a hypervisor for hardware updates, and a bunch of VMXNET3 FreeBSD VM's started experiencing massive packet loss and/or extreme latency. Examination revealed that packets were being corrupted, usually truncated, and led to

https://kb.vmware.com/s/article/2039495

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236999

which seemed likely to be related in various ways. Exasperating.
Been struggling for a year with slow network traversal & freezing Truenas VM during large file transfer by users. Couldn't figure out why. Thought it was hardware issue, maybe Truenas configuration issue upgrading from Freenas 9, tried a bunch of stuff until I saw your comment and stopped using VMXNET3.

"When using the VMXNET3 driver on a virtual machine on ESXi, you see significant packet loss during periods of very high traffic bursts. The virtual machine may even freeze entirely." -kb2039495
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
tried a bunch of stuff until I saw your comment and stopped using VMXNET3.

You're welcome (....?) FWIW, I experiment with whatever the current "recommended" virtual driver for FreeBSD every several years before I get exasperated, find some major unusability issue, and write it off as a poorly designed PoS, and switch back to E1000 or E1000E or whatever the emulated card is available in the hw version I happen to be using.
 
Top