Nvidia gpu passthrough for VM?

basb

Cadet
Joined
Jan 2, 2021
Messages
1
I tried to passthrough an nvidia gpu to a Windows VM, and ran into a few issues. I finally got it working in a very hacky way, so I was wondering if my approach can be improved, and if this is something that TrueNAS Scale will be able to support at all in the future.

First issue is that detaching an Nvidia card from the host (to make it available for VM's) is impossible, because the card is used by nvidia-device-plugin, which is used to allow gpu passthrough for docker. I did not find a proper way to disable this feature. Is there a configuration option for this?
Next up, I tried to blacklist the nvidia driver, assuming this would prevent nvidia-device-plugin to run. The usual method of blacklisting a kernel module (by adding a file in /etc/modprobe.d/ ) did not work, as it seems this directory is regenerated upon boot. Is there a way to make a persistent change here somehow?
Finally, I just edited the grub commandline at boot time (adding module_blacklist=nvidia,nouveau to the kernel parameters), which worked, but is of course not persistent across reboots :frown:

Second issue is the VM definition. I ran into a bug, where choosing the pci passthrough device from the list did not make it correctly into the libvirt domain xml (I think this is the same bug as this one: https://jira.ixsystems.com/browse/NAS-107243 so I added a comment there).
After manually fixing the domain xml (virsh edit), the device was visible in the VM, but not useable, as it seems that another option (<kvm><hidden state="on"/></kvm>) is needed, which is not exposed in the TrueNAS web interface. After manually editing the domain xml to add this, I finally got gpu passthrough working.
Only when starting from the command line, of course, as the definition gets overwritten when starting from the web interface. From this bug (https://jira.ixsystems.com/browse/NAS-108713) I understand this is the normal behavior.

So, is there any way to improve on my current setup?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
At this stage of the alpha, sorry, no. You could always submit a feature request.
 

ornias

Wizard
Joined
Mar 6, 2020
Messages
1,458
The current issues with broken mvidia drivers dont help.
It also important to note that ALPHA builds are NOT QA tested by IX with nvidia hardware.
 

matusu

Cadet
Joined
Oct 2, 2021
Messages
1
Second issue is the VM definition. I ran into a bug, where choosing the pci passthrough device from the list did not make it correctly into the libvirt domain xml (I think this is the same bug as this one: https://jira.ixsystems.com/browse/NAS-107243 so I added a comment there).
After manually fixing the domain xml (virsh edit), the device was visible in the VM, but not useable, as it seems that another option (<kvm><hidden state="on"/></kvm>) is needed, which is not exposed in the TrueNAS web interface. After manually editing the domain xml to add this, I finally got gpu passthrough working.
Only when starting from the command line, of course, as the definition gets overwritten when starting from the web interface. From this bug (https://jira.ixsystems.com/browse/NAS-108713) I understand this is the normal behavior.

So, is there any way to improve on my current setup?

quick hack to make your <kvm><hidden state="on"/></kvm> permanent is to edit:

Code:
/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor_base.py


and replace your existing get_features_xml() method with:

Code:
def get_features_xml(self):
    return create_element(
        'features', attribute_dict={
            'children': [
                create_element('acpi'),
                create_element('kvm', attribute_dict={'children': [create_element('hidden', state='on')]})
            ],
        }
    )
 
Top