ESXi Extremely Slow Network Speeds but Hosted VMS are Fast

ajzride

Cadet
Joined
May 18, 2022
Messages
2
Before I go into my problem, I will say that I am new to network storage (but not hypervisors), and that the answer for this question is probably out here somewhere, but I must lack enough insight to get the keywords for my searches right. I did close to 7 hours of reading on Sunday, and then stepped away for a few days and did a few more hours of searching the forum and google this morning, and just can't seem to find a problem similar to mine. There are lots of "slow iSCSI" or "slow NFS" threads, but the resolution to the ones I found didn't seem to apply to my situation.

A little background, I'm not a network engineer, computer specialist, or have any formal training (aside from classes on Udemy) for information technology infrastructure. I'm a chemical engineer who works for a small consulting company, and we have some very specialized engineering software that I maintain our own VM farm to utilize. The software if very particular about patches, antivirus, etc, so IT has wiped their hands of it and I maintain it for our group. Currently we have two Dell PowerEdge 730xd with direct attached storage running about 60 VMs. I would like to migrate these to shared storage so I can take advantage of vMotion for load balancing and high availability. To this end I have setup a small test bed to assess options for network storage. I really like the idea of an open source platform and not being tied to a vendor, so TrueNAS core is the first setup I have tested.

The testbed hypervisor is built on a workstation class machine running an Z170 chipset, i7-6700K, 64GB DDR4 Ram, 16GB SATA SSD for boot volume, and two local datastores (1TB NVME, 2TB 7200RPM SATA). Running ESXi 7u2 with no vsphere installed at the moment, just base ESX.i

The testbed TrueNAS is a Dell DR4100 (essentially a PE 720xd with a storage focused firmware) with 64GB DDR3 ECC RAM and a Xeon E5-2620. The Boot volume is the rear 2.5" bays with 2 x 246gb 10K SAS in a Raid 1 on the Perc H710p mini. The front SAS backplane is 12 x 3.5hdd tied into a H200 flashed to IT mode. Running TrueNAS core v13.0.

Both machines have an intel X540-T2 (2 x RJ45 10GB) NICs installed. One NIC on each machine is the management network (192.1681.x) which goes through an Ubiquiti 1GB unmanaged switch, and the other NICs are tied directly together with a CAT5e cable for a data network (10.10.10.x). I know the cable will be a limiting factor, but at the moment it is not the bottleneck. I have ordered some Mellanox Connect-X 3 cards to upgrade to fiber, they just aren't here yet.

My vDevs are setup as follows:

POOL1 (iSCSI VMF Datastore): Mirror1 (2 x 7200RPM 1TB SAS), Mirror2 (2 x 7200RPM 1TB SAS), log (1 x 256GB SSD SATA), cache (1 x 256GB SSD SATA)
POOL2 (NFS VMF Datastore): Mirror1 (2 x 7200RPM 1TB SAS)
POOL3 (Windows iSCSI): Mirror1 (2 x 7200RPM 1TB SAS)
POOL4 (SMB): 1 x 500GB SSD SATA

My log and cache are not currently battery backed up, but I'm not concerned about the integrity for my data, this is solely for testing purposes.

Also for testing purpose, sync=disabled on all pools.

The issue I am seeing is that attaching ESX to the datastores (iscsi or NFS) and trying to copy over files using the datastore browser is incredibly slow, but if I fire up a VM on the NVME datastore and transfer the same file through windows iSCSI or SMB the speeds are much faster, going over the exact same NICs and Cables. For the purposes of testing i created a 35GB zip file and here are the transfer speeds I saw under different scenarios:

On the hypervisor I have the data network port tied into two vswitches, 1 for iSCSI datastore and 1 for VM access. Whenever I am testing transfers from the windows 10 VM stored on the local NVME datastore to TrueNAS, I disconnect the iSCSI datastore to ensure there is no competing traffic on the cable.

ESXI Datastore Browser to iSCSI datastore (POOL1): 0.4Gb/s upload, 0.3Gb/s download
ESXi Datastore Browser to NFS datastore (POOL2): 0.4Gb/s upload, 0.3Gb/s download
Windows VM file explorer to iSCSI volume (POOL3): 1.5Gb/s upload, 1.5Gb/s download
Windows VM File Explorer to SMB Share (POOL4): 3.5Gb/s upload, 3.5Gb/s download

I know that the SMB share being a single SSD could account for the difference in the SMB and windows iSCSI speeds (3.5Gb/s vs 1.6Gb/s), but the windows iSCSI share and the ESX NFS share are identical POOLs, so i feel like there is something in my configuration on ESXi or TrueNAS that has to be tweaked to allow ESXi to communicate faster. While VM performance seems okay on the iSCSI datastore (still faster than the single 7200 RPM direct attached datastore), I'm RAM limited to about 16VMs, so I fear that a production environment with 60 would slow to crawl if I am only getting 0.4 Gb/s. I know the NICs and Cables can support at least 3.5 Gb/s, and the spinning disks can support at least 1.5Gb/s.. so why is ESXi topping out at 0.4Gb/s?

Thanks for the support

-Aj.
 

ajzride

Cadet
Joined
May 18, 2022
Messages
2
The Mellanox cards arrived today, so I installed them and re-tested. I also tested with MTU set to 9000 for ESXi and TrueNAS (no noticeable difference).

I also created an SMB share on POOL3 so that I could test SMB and iSCSI to the exact same datastore (7200RPM Mirrorx1).

Transfer A is Windows 10 VM hosted on NVME datastore transferring via SMB to POOL3 (1.7 Gb/s)
Transfer B is POOL3 transferring to Windows 10 VM hosted on NVME datastore via SMB (5.7 Gb/s)
Transfer C is ESXi datastore copy from POOL1 to NVME datastore via iSCSI (1.25 Gb/s)
Transfer D is ESXI datastore copy from NVME to POOL1 via iSCSI (0.4 Gb/s)
Transfer E is Windows 10 VM hosted on NVME datastore transferring to POOL3 via iSCSI (0.4 Gb/s)
Transfer F is POOL3 transferring to Windows 10 VM hosted on NVME datastore via iSCAI (4.5 Gb/s).

Untitled 4.png




Untitled 5.png



I suppose my take away from tonight is that iSCSI writes via Windows or ESXi are stupid slow compared to everything else. But at least they are consistent between the two now, so I am off to research iSCSI in general to see if I can learn something that will bring me to a resolution.

Thanks for any input.
 
Top