Mellanox NIC passthrough in ESXi/vSphere - kernel panic

Mal

Cadet
Joined
Oct 5, 2021
Messages
5
Hello! Recently I decided to virtualize TrueNAS in ESXi/vSphere: (reason: I was previously using a 4U case with 15 drive bays. Now I have a 4U disk shelf with 24 bays in its place). In the baremetal box I was using a Mellanox ConnectX-2 10gbe card and it performed very well. After virtualizing I noticed that network speed tanked; I maxed out around 2gbps using the VMXNET3 adapter (even with artificial tests with iperf). After a lot of troubleshooting and unsuccessful changes, I just decided to put another ConnectX-2 card in the host and pass it through directly to the VM.

Unfortunately, TrueNAS (12.0 U5) won't even boot with that passthrough card attached. Even the installer hangs and crashes. I get this message before it throws a kernel panic:

Code:
mlx4_core0 unable to determine pci device chain minimum bw


I've tried a different PCIe slot, no luck. This is the same card I had in the baremetal machine, and it still works if I throw it back in there.

I'm not sure how to grab the full console output from the kernel panic, if that's possible and would help. All the posts I've found researching this at least got the OS to boot, they were just having trouble actually using the card, but I can't even get that far.

Any tips on where to head next are appreciated. Or if I should just bite the bullet and try a different model card? Thanks!

Hardware:
Supermicro X9DRH-7TF
2x E5-2697v2
256GB RAM

VM specs:
4 cores
32GB RAM (reserved)
16GB boot disk
LSI SAS9207-8e (PCIe passthrough)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There's really no guarantee that PCIe passthru is going to work for random cards, especially on the somewhat fickle X9D boards. These were really the first generation (Sandy/Ivy) that could do PCIe passthru for ANY hardware with reliability, and a lot of that is limited to LSI HBA's and maybe Intel X520's. I'm sure there's other stuff that might work, but as I recently explained to someone else, the Mellanox cards tend to be designed for OEM use, so if PCIe passthru usage on Intel Sandy/Ivy boards wasn't on the design board at Mellanox a decade ago, there's no real reason to think it'll be supported or work.

I suspect newer cards are more likely to work, as are cards that are commonly sold at retail, such as the Intel X520, which I have done passthru on with X9D gear in the past IIRC. This really mostly has to do with finicky bits of precision that all need to be aligned properly.
 
  • Like
Reactions: Mal

Mal

Cadet
Joined
Oct 5, 2021
Messages
5
Thanks @jgreco that makes sense, and where my mind was headed. I know this isn't really a TrueNAS question, but do you know if I get one of the 2-port X520 cards if I can passthrough one port to the VM and use the other for the host?
 

blanchet

Guru
Joined
Apr 17, 2018
Messages
516
If it may help:

By default, VMware configures FreeBSD VM to boot with BIOS, but KB 2142307 says that UEFI if preferable.
To use more than 3.75GB of total BAR allocation within a virtual machine add this line to the virtual machine's vmx file to set the virtual machine BIOS to use UEFI

This second article about GPU passthrough says the same thing:
 
  • Like
Reactions: Mal

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thanks @jgreco that makes sense, and where my mind was headed. I know this isn't really a TrueNAS question, but do you know if I get one of the 2-port X520 cards if I can passthrough one port to the VM and use the other for the host?

I believe so, yes.

If it may help:

By default, VMware configures FreeBSD VM to boot with BIOS, but KB 2142307 says that UEFI if preferable.

It's worth trying, but UEFI has a few problems of its own.
 
  • Like
Reactions: Mal

Mal

Cadet
Joined
Oct 5, 2021
Messages
5
I believe so, yes.
Thanks, I've got one ordered. Even if I can't split it, not a big deal.

If it may help:

By default, VMware configures FreeBSD VM to boot with BIOS, but KB 2142307 says that UEFI if preferable.
Thanks for the suggestion. I tried it but it still crashed the installer. I managed to fit the whole output into a screenshot in case anyone is curious about the kernel panic.

panic.jpg
 

Mal

Cadet
Joined
Oct 5, 2021
Messages
5
UPDATE:

I finally got a chance to shut everything down and get the new Intel X520 card installed. I passed it through to the FreeNAS VM and it booted with absolutely no issues. The iperf test ran much better:

Code:
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-5.00   sec  3.29 GBytes  5.65 Gbits/sec
[  5]   5.00-10.00  sec  2.75 GBytes  4.73 Gbits/sec
[  5]  10.00-15.00  sec  2.59 GBytes  4.44 Gbits/sec
[  5]  15.00-20.00  sec  2.69 GBytes  4.63 Gbits/sec
[  5]  20.00-25.00  sec  2.64 GBytes  4.53 Gbits/sec
[  5]  25.00-30.00  sec  2.83 GBytes  4.86 Gbits/sec
[  5]  30.00-30.00  sec   844 KBytes  4.69 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-30.00  sec  16.8 GBytes  4.80 Gbits/sec


It improved even more after I used the "Autotune" function in FreeNAS:

Code:
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-5.00   sec  4.90 GBytes  8.42 Gbits/sec
[  5]   5.00-10.00  sec  4.53 GBytes  7.78 Gbits/sec
[  5]  10.00-15.00  sec  4.63 GBytes  7.95 Gbits/sec
[  5]  15.00-20.00  sec  4.92 GBytes  8.46 Gbits/sec
[  5]  20.00-25.00  sec  5.01 GBytes  8.60 Gbits/sec
[  5]  25.00-30.00  sec  4.85 GBytes  8.33 Gbits/sec
[  5]  30.00-30.00  sec   275 KBytes  1.51 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-30.00  sec  28.8 GBytes  8.26 Gbits/sec


Overall I'm very happy I went ahead and spent the money on the dual-port NIC. I was able to pass only one port through to the VM as well. So I might just use the 2nd port for the rest of the vNICs and take out the existing Mellanox card.

Thanks everyone for your help!!!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There's also a possibility of the Mellanox card maybe being a fake. This is something that I hadn't seen to be a problem in the past, and I'm a little skeptical of it now. Just be mindful if you use it anywhere else that if it gives you unexpected problems, maybe it is best to toss it. Damaged genuine cards are also a thing, unusual, but possible. Or are you saying it was working fine when used alone in ESXi? Sorry, not going to reread the whole thread right now...

The Intel X520 is still the gold standard for overall 10G compatibility with the widest set of operating systems and hypervisors out there, so even if you end up using the X520 in a different role someday, it's unlikely to be wasted money, and good for years to come.
 

Mal

Cadet
Joined
Oct 5, 2021
Messages
5
Or are you saying it was working fine when used alone in ESXi?

The card worked fine on baremetal FreeNAS, I got expected speeds.

That being said, I use an identical Mellanox NIC as the main ESXi NIC (VM traffic + vmkernel) which "seemed" fine, but it never had to handle high-bandwidth traffic.

I will probably just phase out the Mellanox cards I have in various boxes.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Sorry, I was in a rush before. Yeah, the ConnectX-2 is a really ancient card, PCIe 2, x8, and it was kind of late to the party from a FreeBSD perspective as it wasn't supported early on -- it came as a vendor bolt-on for FreeBSD 9, for example. Upsides are a Mellanox-authored driver, but it seemed to be often problematic for users. I remember that there was a lot of pressure shortly after I wrote the 10 Gig Networking Primer in 2014 because Mellanox ConnectX-2 cards flooded eBay for like $20/ea and I actually ended up buying a few to play with here in the shop. I wasn't that impressed, but, if you were just looking to "go faster than gig" it was clearly the cheapest credible thing out there. It falls into that grey area of inexpensive 10G cards like the Chelsio 320's that are cheap for a reason.
 
  • Like
Reactions: Mal
Top