Virtualized freeNAS with NVMe error: nvme0: Missing interrupt

ChrisReeve

Explorer
Joined
Feb 21, 2019
Messages
91
Good morning

I know many people here dislike the thought of virtualizing freeNAS for a number of different reasons. I still followed Stux' guide, with success, and have been running virtualized freeNAS for about a week now, mostly without issues. All drives are HW passthrou'd, and boot drive is mirrored to a physical USB key. If I have issues, I can reboot my server, and boot to the USB stick, into a barebone copy of freeNAS. This has been tested, and works.

However, I have an Intel DC P3700 drive as a combined L2ARC and SLOG (20GB slog, 256GB L2ARC), which also is HW-passthrou'd to freeNAS. This works for a few hours, but then I start getting the following error (about once every 60 seconds) in the command line interface in freeNAS:

Code:
nvme0: Missing interrupt


It doesn't seem to have an immediate effect on either freeNAS' stability or performance, and I cant find any errors in freeNAS GUI (haven't checked log files). But, when trying to power off/reboot freeNAS, I am unable to do so. I am also unable to unmount the cache partitions after the error appears. The only way to fix this (as far as I know), is to force the VM to shut down, reboot into barebone freeNAS, detatch both SLOG and L2ARC, and shut down. Then boot into ESXi, remove the passthrough of my DC P3700, and from here, the setup seems stable (no issues whatsoever for about 1 week).

I don't strictly need my cache drive, and for most of my usecases, it gives no percieved performance boost. I dont need to install VMs to my pool, through a iSCSI partition, I actually prefer to install them on a separate SSD. That way, the other VMs can keep running if I choose to power off freeNAS for any reason. Still, I would like to see if there is any known fix for this issue?


Also, I am unable to connect any jails to the internet, in a virtualized freeNAS. I get a DHCP error (I dont have a screenshot of the error, but it returns an error, with the IP 0.0.0.0/8 as the assigned IP?) when choosing DHCP, and if I try to choose a static IP, i get a DNS mismatch error. Not sure why. I have tried to do a passthrough of a physical NIC, which installs plugins successfully, but plex seems partly broken. I can add media, and it partially scans, but is unable to download metadata, and unable to play media. My original plan was to run plex as a plugin in freeNAS, but now I just have a separate VM for plex. Less efficient, but at least it works, and seems stable (and less buggy than the plex plugin in freeNAS).

tl;dr: I run ESXi vSphere 6.7 on my server, with freeNAS virtualized. I am happy with the result, but have some issues with PCIe drives, and plugin installs.

Also, and this is weird: With the exact same setup (switching between a mirrored boot in virtualized freeNAS, and barebone boot of freeNAS), i se higher SMB sequential performance when running freeNAV virtualized! Reads from the server tops out at around 800MB/s when barebone, and constant 1,10GB/s when virtualized in ESXi. Everything else the same. I have no idea why this is the case.


Edit: Server specs:
MB: Supermicro X9DRL-3F
CPU: 2x E5-2650 v2
RAM: 128GB ECC DDR3
Cache drive: Intel DC P3700 400GB (20GB SLOG, 256GB L2ARC)
Drives: 10x10 TB WD Red white label (shucked from WD Elements and WD EasyStore) 8 drives connected through on-board SAS-controller, 2 drives connected through HBA) in RAIDz2 pool, encryption and defaullt compression on)
NIC: Intel X540-T2
HBA: LSI 9211-8i
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Sounds like you might be experiencing the MSI-X interrupt bug with VMware; I'll copy from an old post here. The user here had an Optane card but it should still apply.

Take a look at this (legacy) bug report around passthrough issues with the Optane cards under ESXi, this post describes disabling the MSI-X interrupts for the device (although note that you may need to confirm your Optane card is device 0 vs 1):

https://redmine.ixsystems.com/issues/26508#note-68

You may need to also perform the edit to the passthru.map described here:

https://redmine.ixsystems.com/issues/26508#note-62
 

ChrisReeve

Explorer
Joined
Feb 21, 2019
Messages
91
Sounds like you might be experiencing the MSI-X interrupt bug with VMware; I'll copy from an old post here. The user here had an Optane card but it should still apply.
Thank you. I did try to add the code to the freeNAS.vmx file, but the issue persists. I can't remember if I tried to reboot the ESXi host after adding msiEnabled = "FALSE", and will re-attempt this in a few days. I dont have physical access to the server right now, and don't want to screw up things now that everything is running fine.
 
Top