Slow virtual Epyc vs twice as fast bare-metal Ryzen 5 3600

iVhksqJjo

Dabbler
Joined
Jan 27, 2023
Messages
17
Heyo!

I'm running a Proxmox server with TrueNAS Scale. The Hypervisor is running on an AMD Epyc 7662, 512GB of RAM and a 10G NIC.
However, I don't get anywhere near the 10G NIC speeds - but I do get faste speeds with an obscene amount of cores.

I did three quick tests to illustrate my issue:

1. Allocating 16 threads to the VM.
This sees my SMB throughput limited to below 1G speeds, at around 95-100 MB/s.
16c.png


2. Allocating 48 threads.
This gets me up to around 270 MB/s.
48c.png


3. 120 threads.
Now sustaining just below 450-500/s - which is still way below the capabilities of the pool.
120c.png



I also tried going from letting Proxmox handling the NIC to passing it through to the VM, but same result.

Any ideas how to not have to waste 100+ threads to get decent throughput? I assume I made some sort of config error to make this CPU perform so poorly.
And also why a beefy Epyc CPU can be beaten in throughput by way an Atom CPU?
 
Last edited:

iVhksqJjo

Dabbler
Joined
Jan 27, 2023
Messages
17
I also have a second bare-metal TrueNAS Scale server, running a wimpy AMD Ryzen 5 3600(non-x) which completely reaches it's pool speed with the same file and running the same 10G NIC:
12t.png
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
I also tried going from letting Proxmox handling the NIC to passing it through to the VM, but same result.

Any ideas how to not have to waste 100+ threads to get decent throughput? I assume I made some sort of config error to make this CPU perform so poorly.
How are the drives attached and presented to TrueNAS?
Are you, by any chance, presenting virtual drives handled by Proxmox? Just like you let Proxmox handle the NIC.
 

iVhksqJjo

Dabbler
Joined
Jan 27, 2023
Messages
17
How are the drives attached and presented to TrueNAS?
Are you, by any chance, presenting virtual drives handled by Proxmox? Just like you let Proxmox handle the NIC.
Nope, all disks are connected to HBAs, which are passed through to TrueNAS:
passthrough.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I'm running a Proxmox server with TrueNAS Scale. The Hypervisor is running on an AMD Epyc 7662, 512GB of RAM and a 10G NIC.

What *kind* of "10G NIC"? They're not all created equal. Far from it.

What happens if you deconstruct the bond and try adding an Intel X710, then use a virtual function for the NAS VM?
 

iVhksqJjo

Dabbler
Joined
Jan 27, 2023
Messages
17
What *kind* of "10G NIC"? They're not all created equal. Far from it.

What happens if you deconstruct the bond and try adding an Intel X710, then use a virtual function for the NAS VM?
Sure they're not - but in my case it's literally the same SKU in both machines. So I don't see that being the problem.


I think I have narrowed it down to (at least) 2 issues.
1. Epyc is running at or below base clock, despite boost and default TDP being enabled in the BIOS, while being below 45c at all times:
no boost.png


2. There is definitely something wrong with how the disks are being passed through.
If I load up the Epyc TrueNAS system with a file ARC, it will deliver that file to me at near as damn it 10G.
So it's only when reading from disks its performance is atrocious, and just a few days ago the same disks and HBAs were in a bare metal server where performance was great, so the hardware is not to blame.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Sure they're not - but in my case it's literally the same SKU in both machines. So I don't see that being the problem.

It doesn't matter if they're the same SKU. What's the model of the card, please? The Forum Rules, conveniently linked at the top in red, do ask that you provide details about your hardware. I'm literally the guy who wrote the 10 Gig Networking Primer, so it's not like I'm some unknown person asking for irrelevant details.
 

iVhksqJjo

Dabbler
Joined
Jan 27, 2023
Messages
17
It doesn't matter if they're the same SKU. What's the model of the card, please? The Forum Rules, conveniently linked at the top in red, do ask that you provide details about your hardware. I'm literally the guy who wrote the 10 Gig Networking Primer, so it's not like I'm some unknown person asking for irrelevant details.
Hey, if you want to dive deep on the NICs, fine, no harm ment. I just geniuely don't see the point, when I can now see that I can max out the NIC just fine from the virtualized TrueNAS server.

Both cards are HP MCX312B-XCCT 546SFP, based on Mellanox ConnectX-3 Pro. They're dual port SPF+ cards and for both of them, the setup is the same: both ports are bonded together and connected to a Mikrotik CRS317-1G-16S+RM.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Hey, if you want to dive deep on the NICs, fine, no harm ment. I just geniuely don't see the point, when I can now see that I can max out the NIC just fine from the virtualized TrueNAS server.

So you're all OK then and not having problems....? This is the sort of thing where you have to make up your mind. Proxmox isn't a real high quality hypervisor and you're not the first person to roll through here having performance problems on AMD. Modern servers are complicated things with many layers, and I'm starting to see a pattern where simultaneous I/O to disk and net may be impacted or interfering with each other. That's just a theory. Ideally, I'd ask you to try it on ESXi and see if it was better, which might clarify if it was just naive or low performance code on Proxmox causing an issue. It's much more helpful if you can keep an open mind to the idea that simple tests might not be sufficient to identify issues. Sometimes they do, sometimes not. If you can, try something "interesting" such as repeating your "max out" test but while you're doing it, kick in a "dd" pulling data from one of your pool drives and throwing it into /dev/null. See what I'm trying to do there? If we can step closer to your problem, we may be able to identify it.
 

iVhksqJjo

Dabbler
Joined
Jan 27, 2023
Messages
17
So you're all OK then and not having problems....? This is the sort of thing where you have to make up your mind. Proxmox isn't a real high quality hypervisor and you're not the first person to roll through here having performance problems on AMD. Modern servers are complicated things with many layers, and I'm starting to see a pattern where simultaneous I/O to disk and net may be impacted or interfering with each other. That's just a theory. Ideally, I'd ask you to try it on ESXi and see if it was better, which might clarify if it was just naive or low performance code on Proxmox causing an issue. It's much more helpful if you can keep an open mind to the idea that simple tests might not be sufficient to identify issues. Sometimes they do, sometimes not. If you can, try something "interesting" such as repeating your "max out" test but while you're doing it, kick in a "dd" pulling data from one of your pool drives and throwing it into /dev/null. See what I'm trying to do there? If we can step closer to your problem, we may be able to identify it.
I wrote in the post just before you asked for the SKUs that "If I load up the Epyc TrueNAS system with a file ARC, it will deliver that file to me at near as damn it 10G."
So it doesn't look like the NIC is the problem.

I installed TrueNAS directly on the system and saw no issues and great 10G performance, so I've decided to flip my plan on it's head: Instead of virtualizing TrueNAS in Proxmox, I will be virtualizing my needed machines in TrueNAS Scale.
I've played around a bit - never used TrueNAS as a hypervisor before - and everything seems to work great :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I wrote in the post just before you asked for the SKUs that "If I load up the Epyc TrueNAS system with a file ARC, it will deliver that file to me at near as damn it 10G."
So it doesn't look like the NIC is the problem.

You're not understanding what I'm saying. I'm saying that it's quite possible that having two different high volume I/O streams simultaneously working may be causing a problem, especially since one is a "native" PCIe device and the other is a virtualized network device. The way a hypervisor works with these devices is complicated, and if you are getting interrupts from a real PCIe device while also getting the virtual I/O from a network device, that could be causing the VM to cede its timeslice or have some other unusual interaction with the VM scheduler.

From my perspective, I'd rather see a virtual function presented by the network card (which generates interrupts and acts pretty much like a physical ethernet card) along with the HBA because then both are working on the same playing field; you don't have to worry as much about whether the virtual network card is stabbing you in the back somehow. Debugging this stuff is tedious and quite possibly beyond the scope of this forum, but some experimentation can yield clues. Which is what I was suggesting.

I installed TrueNAS directly on the system and saw no issues and great 10G performance, so I've decided to flip my plan on it's head: Instead of virtualizing TrueNAS in Proxmox, I will be virtualizing my needed machines in TrueNAS Scale.
I've played around a bit - never used TrueNAS as a hypervisor before - and everything seems to work great :)

TrueNAS as a hypervisor will do for basic VM operations, and it will get better with time. Virtualizing TrueNAS is kind of fraught with various perils so if you can avoid it, best to do so.
 
Top