Unable to SAVE selected isolated GPU, TrueNAS-SCALE-23.10.1

Daneel

Cadet
Joined
Feb 9, 2019
Messages
3
When configuring usage of a isolated GPU I get an error message when saving configuration.
The error is "MethodNotFoundError".

As I have not found any identical issues regarding the saving step of a configuration I am creating a new thread.

Before you can pass on a device the device must be isolated in the System Settings - Advanced. After this is done the GPU should be seen as not available to host, which can be verified using the CLI.
Code:
[truenas]> system device get_info type=GPU
+--------+------------------------------------------------------------------+---------+--------+------------------------------+-------------------+
| addr   | description                                                      | devices | vendor | uses_system_critical_devices | available_to_host |
+--------+------------------------------------------------------------------+---------+--------+------------------------------+-------------------+
| <dict> | NVIDIA Corporation GM107GL [Quadro K620]                         | <list>  | NVIDIA | false                        | false             |
| <dict> | ASPEED Technology, Inc. ASPEED Graphics Family                   | <list>  | <null> | false                        | true              |
| <dict> | Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega S... | <list>  | AMD    | false                        | true              |
+--------+------------------------------------------------------------------+---------+--------+------------------------------+-------------------+
After setting up a virtual machine you press save. Upon pressing save I am presented with a popup error message:
Code:
[ENOMETHOD] Method 'get_pci_ids_for_gpu_isolation' not found in 'device'
Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/utils/service/call.py", line 30, in _method_lookup
    methodobj = getattr(serviceobj, method_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'CompoundService' object has no attribute 'get_pci_ids_for_gpu_isolation'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 340, in on_message
    serviceobj, methodobj = self.middleware._method_lookup(message['method'])
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/utils/service/call.py", line 32, in _method_lookup
    raise MethodNotFoundError(method_name, service)
middlewared.utils.service.call.MethodNotFoundError: [ENOMETHOD] Method 'get_pci_ids_for_gpu_isolation' not found in 'device'
So there seem to be an issue calling the method 'get_pci_ids_for_gpu_isolation' in the 'device' context.
Code:
[truenas]> service vm device get_pci_ids_for_gpu_isolation gpu_pci_id="0000:10:00.0"
pci_0000_10_00_0
pci_0000_10_00_1
pci_0000_00_01_0
pci_0000_00_01_1
Calling this method from CLI seems fine.
There is a virtual machine configuration saved without the GPU information.
Code:
[truenas]> service vm get_instance 6
+-------------------------+--------------------------------------+
|                      id | 6                                    |
|                    name | Test                                 |
|             description | Test                                 |
|                   vcpus | 1                                    |
|                  memory | 8192                                 |
|              min_memory | 512                                  |
|               autostart | false                                |
|                    time | LOCAL                                |
|              bootloader | UEFI                                 |
|                   cores | 2                                    |
|                 threads | 4                                    |
|   hyperv_enlightenments | false                                |
|        shutdown_timeout | 90                                   |
|                cpu_mode | HOST-PASSTHROUGH                     |
|               cpu_model | <null>                               |
|                  cpuset |                                      |
|                 nodeset |                                      |
|               pin_vcpus | false                                |
|           hide_from_msr | false                                |
|     suspend_on_snapshot | false                                |
|   ensure_display_device | true                                 |
|               arch_type | <null>                               |
|            machine_type | <null>                               |
|                    uuid | c314fd9b-9ec6-4655-91d9-73ca257f089a |
|       command_line_args |                                      |
|         bootloader_ovmf | OVMF_CODE.fd                         |
| trusted_platform_module | false                                |
|                 devices | <list>                               |
|       display_available | false                                |
|                  status | <dict>                               |
+-------------------------+--------------------------------------+
Ideally I would like to inject the missing GPU-information and just be done. I have no idea how to do that...
This is the GPU-information in a longer format
Code:
root@truenas[~/temp]# midclt call device.get_gpus | jq .
[
  {
    "addr": {
      "pci_slot": "0000:10:00.0",
      "domain": "0000",
      "bus": "10",
      "slot": "00"
    },
    "description": "NVIDIA Corporation GM107GL [Quadro K620]",
    "devices": [
      {
        "pci_id": "10DE:13BB",
        "pci_slot": "0000:10:00.0",
        "vm_pci_slot": "pci_0000_10_00_0"
      },
      {
        "pci_id": "10DE:0FBC",
        "pci_slot": "0000:10:00.1",
        "vm_pci_slot": "pci_0000_10_00_1"
      }
    ],
    "vendor": "NVIDIA",
    "uses_system_critical_devices": false,
    "available_to_host": false
  },
  {
    "addr": {
      "pci_slot": "0000:2b:00.0",
      "domain": "0000",
      "bus": "2b",
      "slot": "00"
    },
    "description": "ASPEED Technology, Inc. ASPEED Graphics Family",
    "devices": [
      {
        "pci_id": "1A03:2000",
        "pci_slot": "0000:2b:00.0",
        "vm_pci_slot": "pci_0000_2b_00_0"
      }
    ],
    "vendor": null,
    "uses_system_critical_devices": false,
    "available_to_host": true
  },
  {
    "addr": {
      "pci_slot": "0000:30:00.0",
      "domain": "0000",
      "bus": "30",
      "slot": "00"
    },
    "description": "Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]",
    "devices": [
      {
        "pci_id": "1002:1638",
        "pci_slot": "0000:30:00.0",
        "vm_pci_slot": "pci_0000_30_00_0"
      },
      {
        "pci_id": "1002:1637",
        "pci_slot": "0000:30:00.1",
        "vm_pci_slot": "pci_0000_30_00_1"
      },
      {
        "pci_id": "1022:15DF",
        "pci_slot": "0000:30:00.2",
        "vm_pci_slot": "pci_0000_30_00_2"
      },
      {
        "pci_id": "1022:1639",
        "pci_slot": "0000:30:00.3",
        "vm_pci_slot": "pci_0000_30_00_3"
      },
      {
        "pci_id": "1022:1639",
        "pci_slot": "0000:30:00.4",
        "vm_pci_slot": "pci_0000_30_00_4"
      },
      {
        "pci_id": "1022:15E2",
        "pci_slot": "0000:30:00.5",
        "vm_pci_slot": "pci_0000_30_00_5"
      },
      {
        "pci_id": "1022:15E3",
        "pci_slot": "0000:30:00.6",
        "vm_pci_slot": "pci_0000_30_00_6"
      }
    ],
    "vendor": "AMD",
    "uses_system_critical_devices": false,
    "available_to_host": true
  }
]


Ok, what information did I miss?
  • I did reinstall from scratch and got the exact same results.
  • I did try isolating the AMD-IGP, same results.
  • I did check for reasonable IOMMU-groups, prettified copy below
  • I did check some results of the actual isolation, and afaik they seemed fine doing the blacklisting
Any Ideas?
Code:
root@truenas[~/temp]# ./iommupretty.sh
IOMMU Group 0:
        00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
        00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1633]
        10:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL [Quadro K620] [10de:13bb] (rev a2)
        10:00.1 Audio device [0403]: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] [10de:0fbc] (rev a1)
IOMMU Group 1:
        00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
        00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge [1022:1634]
        00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge [1022:1634]
        16:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset USB 3.1 XHCI Controller [1022:43ee]
        16:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset SATA Controller [1022:43eb]
        16:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset Switch Upstream Port [1022:43e9]
        20:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
        20:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
        20:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
        20:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
        21:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GBASE-T [8086:15ff] (rev 02)
        21:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GBASE-T [8086:15ff] (rev 02)
        27:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
        28:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
        2a:00.0 PCI bridge [0604]: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge [1a03:1150] (rev 04)
        2b:00.0 VGA compatible controller [0300]: ASPEED Technology, Inc. ASPEED Graphics Family [1a03:2000] (rev 41)
        2c:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. Device [2646:5017] (rev 03)
IOMMU Group 2:
        00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
        00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
        30:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1638] (rev d8)
        30:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller [1002:1637]
        30:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
        30:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 [1022:1639]
        30:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 [1022:1639]
        30:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 01)
        30:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller [1022:15e3]
IOMMU Group 3:
        00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 51)
        00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 4:
        00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 [1022:166a]
        00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 [1022:166b]
        00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 [1022:166c]
        00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 [1022:166d]
        00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 [1022:166e]
        00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 [1022:166f]
        00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 [1022:1670]
        00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 [1022:1671]
 

Daneel

Cadet
Joined
Feb 9, 2019
Messages
3
I have a hard time believing this is unique to my configuration.
Anyone that have iommu enabled and a GPU should at least be able to provide some information if it's unique to my system or happens on more configurations.

I would appreciate If anyone has the time to replicate this, please do and respond to this thread with your result.
 

bcat

Explorer
Joined
Oct 20, 2022
Messages
84
My setup is a bit different than yours, but there's some general weirdness with GPU isolation right now. See this thread and related JIRA issue for possibly helpful discussion.

I consider this a workaround rather than a fix I'm in love with for the long term, but I was able to pass through a GPU that TrueNAS wouldn't "isolate" (even though only one device requires passthrough, and that device lives in its own IOMMU group) by simply adding it to the VM as an ordinary PCIe device rather than going through the GPU passthrough rigamarole. I wouldn't do this in a mission-critical business environment, but it's been working fine for a couple months in my homelab. :)
 

Kiefff

Cadet
Joined
Jan 20, 2024
Messages
2
I'm running into issues passing a 2nd GPU to a Windows10 VM on Scale 23.10.1.1

This was a recent migration for me though, ultimately moving my truenas instance from an i7 3770/32GB based node to a dual socket xeon e5 26xx v4/256GB ECC/10GB uplinks workhorse.

Moved over boot drive + ZFS pool drives(8), all I really had to do is move my boxes static IP over to the 10GB NIC. 90% of my k3s apps came up without too much of an issue, so everything seemed to be a success. Was able to assign the Quadro M2000 to the plex app for HW transcoding, verified it working.

My issue currently is when trying to pass my RTX2070 to a Win10 VM. When assigning the GPU while editing the VM, I get the same exact error you were getting. Also tried swapped scenarios(rtx2070 -> Plex app transcoding with quadro m2000 isolated to try and assign to VM.)

I then tried assigning the GPU as a passthrough pcie device, in which I get iommu errors.

I don't want to run esxi...
 

Kiefff

Cadet
Joined
Jan 20, 2024
Messages
2
was a late night for me last night, so mistakes were made passing the GPU to the VM.

I just added both the GPU + GPU Audio device and I now see the GPU inside my VM.

It appears like mentioned in bcats thread that the 'edit VM' passthrough for GPU is broken, but pci passthrough by adding devices to the VM does work.

GPU is also isolated in advanced settings.
 

Daneel

Cadet
Joined
Feb 9, 2019
Messages
3
Thanks @bcat and @Kiefff !
So I guess we will just leave it like it is for now and pass them through as PCIe-devices. I have some more stuff regarding QEMU to work throuh, but I will leave that for another thread :grin:.
 

kashiwagi

Dabbler
Joined
Jul 5, 2011
Messages
35
Having exactly the same problem. On top of that I get terrible GPU performance, where I used to get really great GPU performance inside my Windows 11 guest.
 
Top