Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

VMware virtual disk 2.0 only 300.000MB/s

ERM-Consulting

Neophyte
Joined
Sep 6, 2020
Messages
11
I am wondering why nobody discusses the performance of VMware virtual disk 2.0 within freebsd.
I checked many many posted dmesg outputs in this forum as well as in freebsd and xigmanas: all posted protocols are showing reports like following:

cd0 at ahcich0 bus 0 scbus2 target 0 lun 0
cd0: <NECVMWar VMware SATA CD00 1.00> Removable CD-ROM SCSI device
cd0: Serial Number 00000000000000000001
cd0: 600.000MB/s transfers (SATA 3.x, UDMA2, ATAPI 12bytes, PIO 8192bytes)
cd0: 520MB (266420 2048 byte sectors)
da0 at mpt0 bus 0 scbus32 target 0 lun 0
da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 8192MB (16777216 512 byte sectors)
da0: quirks=0x140<RETRY_BUSY,STRICT_UNMAP>
da1 at mpt0 bus 0 scbus32 target 1 lun 0
da1: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device
da1: 300.000MB/s transfers
da1: Command Queueing enabled
da1: 22887268MB (46873125584 512 byte sectors)
da1: quirks=0x140<RETRY_BUSY,STRICT_UNMAP>
It is obvious that all VMware Virtual disks 2.0 only provide 300.000MB/s.
Even the provided CD-ROM provides 600.000MB/s.

In my case (example) we are using 12Gb/s disks in esxi and provide RAID6 based datastores to VM.

We lose about 1/4 of performance due to VMware Virtual compared to passthrough disks

=> Yes, ZFS and direct disk access is recommended, but why does nobody wonder about this poor performance?
=> Does anybody know a solution for this issue?
 

Patrick M. Hausen

Dedicated Sage
Joined
Nov 25, 2013
Messages
2,431
Is this an issue at all? I mean, have you measured the performance? VMware emulates a particular hardware SCSI controller model and connected drives. FreeBSD knows this technology to be 300 M/s. Hence the boot message. I wouldn't be surprised of you got higher speeds in reality - as long as the underlying real hardware can deliver them ...
 

jgreco

Resident Grinch
Moderator
Joined
May 29, 2011
Messages
13,329
It is obvious that all VMware Virtual disks 2.0 only provide 300.000MB/s.
That's "obvious" to you?

Or are those just the flags that are passed to the mpt device driver, which VMware is emulating?

When you are using a device that is emulated, you inherit the behaviours of the driver that goes along with it. For example, the mpt SAS driver in FreeBSD conveniently tells you what speed the device is attached at. But since the device is not real, and yet the driver is designed for real devices, the fastest of which are 300MB/s, VMware decided to have it report the fastest speed available.

It is clearly capable of going much faster than that, and if you stick an mpt-based vmdk on something like an NVMe backed datastore, it will happily go in the multi-gigabyte-per-second range.

This is just like how the em0 driver goes "much faster than 1Gbps".

We lose about 1/4 of performance due to VMware Virtual compared to passthrough disks
That, on the other hand, makes complete sense, as it obviously should.
 

jgreco

Resident Grinch
Moderator
Joined
May 29, 2011
Messages
13,329
[mod note: this is a basic virtualization theory discussion, and has nothing to do with TrueNAS or even FreeBSD performance. As such, I am moving it to off-topic -jg]
 

ERM-Consulting

Neophyte
Joined
Sep 6, 2020
Messages
11
Facts:
Attaching same disks (Seagate 12GB/s), same server (Dell R730) with HBA330 per passthrogh to trueNAS in VMware we get 1,67 GB/s
copying from one disk to the other.

Using same disks, same server via Raid-controller H730 as "VMware Virtual Disk 2.0" we get 0,1 GB/s copying from "disk" to "disk" in best cases.
CPU host is at 5%, CPU VM (4 Cores) 11%
gstat:
Storage6 - gstat V01.jpg
 

Patrick M. Hausen

Dedicated Sage
Joined
Nov 25, 2013
Messages
2,431
But that is not due to any artificial 300 M/s limit. It's due to the fact that FreeBSD thinks it's talking to a SCSI HBA with all the protocol overhead, queueing and interrupt handling necessary, and VMware pretending to be just such an HBA in software painstakingly emulating the other side of that complex and completely unnecessary protocol, and finally figuring out which block on the real disk FreeBSD wanted in the first place.

That's why paravirtualisation exists.
 

jgreco

Resident Grinch
Moderator
Joined
May 29, 2011
Messages
13,329
Facts:
Attaching same disks (Seagate 12GB/s), same server (Dell R730) with HBA330 per passthrogh to trueNAS in VMware we get 1,67 GB/s
copying from one disk to the other.

Using same disks, same server via Raid-controller H730 as "VMware Virtual Disk 2.0" we get 0,1 GB/s copying from "disk" to "disk" in best cases.
CPU host is at 5%, CPU VM (4 Cores) 11%
gstat:
View attachment 45335
So you use a crappy RAID controller which gives you a slowish datastore and you don't get a lot of speed. And you're using disks as well.

One of the problems with hypervisors is that you get a lot of complicated effects. For example, if you create a brand new thin provisioned disk on a datastore, and attach that to a FreeBSD VM:

mpt0: Rescan Port: 0
da1 at mpt0 bus 0 scbus2 target 1 lun 0
da1: <VMware Virtual disk 1.0> Fixed Direct Access SCSI-2 device
da1: 320.000MB/s transfers (160.000MHz, offset 127, 16bit)
da1: Command Queueing enabled
da1: 32768MB (67108864 512 byte sectors)
da1: quirks=0x140<RETRY_BUSY,STRICT_UNMAP>

Now I run a dd to see how "fast" that is:

# dd if=/dev/da1 of=/dev/null bs=1048576
32768+0 records in
32768+0 records out
34359738368 bytes transferred in 11.343339 secs (3029067469 bytes/sec)

WOW HOLY MACKEREL THREE GIGABYTES PER SECOND!!!

But what's happening here is that ESXi's thin provisioning code is creating artificial zero-filled blocks and then feeding it via the mpt emulator over to FreeBSD. This, incidentally, is the classic proof that your original post's point is simply wildly incorrect. There is no artificial throttle at 300MByte/sec. FreeBSD is showing you that it can actually communicate an entire order of magnitude faster than 300MBytes/sec via the mpt driver.

But this is a highly optimized case. In actual practice, what actually happens on a datastore is much more complicated, and rarely performs this well. When you ask a vmdk to read what appears to be a sequential list of LBA's, you don't know how that was actually stored. If it was written noncontiguously, it isn't going to read it as fast as a raw disk device would, because there are seeks. If you've taken snapshots, you again have a situation where you will have noncontiguous data. If you used thin provisioning, this is very likely to result in noncontiguous data. All of that results in seeks, which significantly impacts performance.

If you are expecting a vmdk to perform similarly to a raw disk device, you need to start with an empty datastore, and you need to use thick provisioned eager zeroed mode, and you must not take snapshots. In such a case, you can get a good fraction of the speed of the underlying hard disks out of the vmdk, I'm guessing you might be able to get 50-75%, I'd be pleased to see even more, but I'm kinda cynical.

But even then, the fastest hard disks out there only deliver about 250MBytes/sec at peak speed, so that's still going to be less than 300MBytes/sec.

There's other stuff in your posts above that are clearly incorrect; Seagate does not make 12GB/s disks (although they do make 12Gbit/sec SAS disks), and it isn't possible to get 1.67GBytes/sec copying disk-to-disk unless there's some sort of caching going on.

Anyways, I've demonstrated that the "300MByte/sec" SCSI channel actually works at ten times that on the FreeBSD side, so any complaints you have are actually related to ESXi datastore performance on the ESXi backend. You are welcome to continue to discuss ESXi with us here in the off-topic forum. I don't really have a lot of time to be bringing folks up to speed on the ins and outs of virtualization I/O, but we do have a number of exceptionally knowledgeable and experienced people with lots of clue, including @Patrick M. Hausen and others.

As a final note, one of the reasons a lot of ESXi admins end up here is because they want to make use of ZFS to accelerate their virtualization environment. ZFS is able to do a lot of crazy things for ESXi performance, but you have to throw quite a bit of resources at it.
 
Top