Performance gap between bare metal and VM installation

cstamato

Cadet
Joined
Oct 22, 2017
Messages
8
Hello to everyone,

I have been using TrueNAS for some time and I would like your help with the following findings.

Supermicro X10SL7-F
Xeon E3-1245 v3
32GB RAM ECC
LSI 2308 (on board), IT mode (passthrough to TrueNAS)
4x2TB, ZFS mirror, VMFS6 datastore
Intel 82599 10Gb
Everything updated to the latest firmware.

I haven't been running extensive performance tests. Just using ATTO Benchmark and Crystal Mark.

Bare metal installation:
crystal_fast.png

atto_fast.png


ESXi 7.0 U2, VM installation (LSI 2308 and Intel 82599 passthrough), no other VM:
crystal_slow.png

atto_slow.png


I am getting significantly lower performance in the VM installation comparing to the bare metal.

I have tried TrueNAS Auto-tune, changing PCIe slot to network card, several BIOS settings and VM settings. No success.
To a certain extent, I realize that I might not get the same performance but the gap seems to big to me.

Any thoughts or suggestions would be greatly appreciated.
Thank you in advance.
 

Attachments

  • crystal_fast.png
    crystal_fast.png
    44.3 KB · Views: 306

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
significantly lower performance in the VM installation comparing to the bare metal.

Yup. Virtualization involves sharing the processor with other virtual machines, and this increases latency especially for read operations.

Read operations are heavily impacted especially for small ones. Whereas a bare metal NAS is not doing anything else and is just hanging around waiting for the interrupt from the network card and goes immediately to town servicing it, the hypervisor gets an interrupt, processes it through a vSwitch, figures out the VM it belongs to, delivers it to the VM queue, wakes up the VM, which then processes the request.

Write operations end up being less impacted because writes get "stacked up" into the NAS VM, so when the VM gets a timeslice, there's often more work queued to do.

Some hints for better VM performance:

Use the smallest number of CPU cores possible (but no less than 2)

Use large buffers on both the clients and server

Set a CPU reservation or set CPU shares to "high"

Use scheduling affinity to pin a CPU core

Use NIC virtual functions to bypass the vSwitch infrastructure

Each of these things is a small world of arcane knowledge of its own.
 

cstamato

Cadet
Joined
Oct 22, 2017
Messages
8
Hello and thank you for your answer,

Following your recommendations, I have been changing several settings to see if I can gain a bit of performance but it is always significantly lower comparing to bare metal.

I will stick to bare metal installation.
 

CDRG

Dabbler
Joined
Jun 12, 2020
Messages
18
@cstamato Where is your client in relation to storage (since it's not indicated)?

@jgreco with respect to his setup and your comment about CPU contention, surely as a single VM in the setup it's not going to be as dramatic as you suggest? He is also passing the NIC directly to the VM so no vSwitch in this context.

At the end if the day, CPU metrics should show it being a bottleneck. To your points though, YMMV and my HW is currently vastly different than his.

I ask as I'm doing some basic testing under the same premise as this thread caught my eye, but my client in this case is remote, so I would be taking network performance into account as well.

For the initial purposes of my testing I'm only on a 1Gb link to the server as my new server here is awaiting its new NIC.

That said, concentrating on low file sizes shown within ATTO, while there is a difference, it's not a massive difference until we hit the 8KB and 16KB mark. But to that end, it's also not as bad as yours in a VM.

For the record:
Supermicro H12SSL-i
Epyc 7282
256GB RAM ECC (64GB given when run as a VM)
Broadcom 9305-24i passthru
4x6TB, ZFS mirror (WD60EFZX)

Obviously bare metal using the same HW.

Tests were via an SMB mapping.

I'm also curious to know what disks your using. You say 4X2TB mirrors. I assume you have 2 vdevs of 2TB mirrors? If so, that that R/W seems extraordinarily high unless these are SSDs (also not indicated and if so, negates my entire post)

I wouldn't mind setting my testing up to mirror your setup to get a more apples to apples comparison.

I'd also had issues from a VM perspective with between VMXNET3 vs E1000e, with the latter to be required to get better throughput to the VM.
 

Attachments

  • VM-1Gb.png
    VM-1Gb.png
    37.2 KB · Views: 270
  • BM-1Gb.png
    BM-1Gb.png
    36.4 KB · Views: 268

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
surely as a single VM in the setup it's not going to be as dramatic as you suggest?

It's much easier to provide the pessimistic facts, which is that virtualization can suffer significant overheads, and I've seen 30-50% in practice in some bad-case scenarios, but then have a poster's reality turn out to be "not too bad" once the merry go round stops, and be only 5% performance loss. People expect stuff to be magic, but it is actually compsci, and it's real. Plus, you don't run a single VM setup on a hypervisor. The entire point of a hypervisor is virtualization of multiple workloads.

At the end if the day, CPU metrics should show it being a bottleneck.

ESXi has always been crappy about reflecting this in a way that accurately represents user experience. And they're the best of the bunch.
 

CDRG

Dabbler
Joined
Jun 12, 2020
Messages
18
Plus, you don't run a single VM setup on a hypervisor. The entire point of a hypervisor is virtualization of multiple workloads.
Fair point. But for the purposes of trying to do an apples to apples comparison, with all else being equal, it’s the best place to start. Once loaded up with other stuff, all bets are off. But at that point you can point the finger elsewhere for any NAS related performance issues.
 
Top