SOLVED Nvidia GPU Passthrough on Debian VM not working

solitarius

Cadet
Joined
Jun 17, 2022
Messages
9
Hello,

I am trying to figure out how to use a GTX 1050 Ti in my Debian VM (to use it for transcoding in a linuxserver/plex docker.

Here is my config:
So far, I have read a lot on the issue and, even if there were no thread that matched my case exactly, I have done the following:

  • Enabled IOMMU on the mother board
  • Isolated the GPU GTX 1050Ti : It seems to have correctly black listed the driver for this device and attributed it to vfio-pci.

Code:
root@truenas[~]# cat /etc/modprobe.d/nvidia.conf
softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidia* pre: vfio-pci


Code:
root@truenas[~]# cat /etc/modprobe.d/vfio.confoptions vfio-pci ids=10DE:1C82,10DE:0FB9


Code:
root@truenas[~]# cat /etc/modprobe.d/nvidia-blacklists-nouveau.conf
# You need to run "update-initramfs -u" after editing this file.

# see #580894
blacklist nouveau


Code:
root@truenas[~]# cat /etc/modprobe.d/nvidia-kernel-common.confalias char-major-195* nvidia
#options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=44 NVreg_DeviceFileMode=0660
# To enable FastWrites and Sidebus addressing, uncomment these lines
# options nvidia NVreg_EnableAGPSBA=1
# options nvidia NVreg_EnableAGPFW=1


Code:
root@truenas[~]# lspci -v
0a:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd GP107 [GeForce GTX 1050 Ti]
        Flags: bus master, fast devsel, latency 0, IRQ 124, IOMMU group 25
        Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at f000 [size=128 ]
        Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_current_drm, nvidia_current


0a:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd GP107GL High Definition Audio Controller
        Flags: bus master, fast devsel, latency 0, IRQ 125, IOMMU group 25
        Memory at fc080000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

  • Checked nvidia-smi: I only have the GT710 card detected.
Code:
root@truenas[~]# nvidia-smi
Fri Jul 29 17:46:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:04:00.0 N/A |                  N/A |
| 40%   51C    P8    N/A /  N/A |      0MiB /   981MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

  • Installed the nvidia drivers using apt install nvidia-driver firmware-misc-nonfree[I][/I]
  • Added the device to the VM: The device is listed:
Code:
root@vm:/home/vm# ls -alh /dev/dri/
total 0
drwxr-xr-x  3 root root        120 Jul 29  2022 .
drwxr-xr-x 20 root root       3.2K Jul 29 15:52 ..
drwxr-xr-x  2 root root        100 Jul 29  2022 by-path
crw-rw----  1 root video  226,   0 Jul 29  2022 card0
crw-rw----  1 root video  226,   1 Jul 29  2022 card1
crw-rw----  1 root render 226, 128 Jul 29  2022 renderD128

root@vm:/home/vm# ls -alh /dev/dri/by-path/
^[[3~total 0
drwxr-xr-x 2 root root 100 Jul 29  2022 .
drwxr-xr-x 3 root root 120 Jul 29  2022 ..
lrwxrwxrwx 1 root root   8 Jul 29  2022 pci-0000:00:02.0-card -> ../card0
lrwxrwxrwx 1 root root   8 Jul 29  2022 pci-0000:00:07.0-card -> ../card1
lrwxrwxrwx 1 root root  13 Jul 29  2022 pci-0000:00:07.0-render -> ../renderD128


Code:
user@vm[~]# sudo lspci -v
00:07.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd GP107 [GeForce GTX 1050 Ti]
        Physical Slot: 7
        Flags: bus master, fast devsel, latency 0, IRQ 11
        Memory at c8000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 800000000 (64-bit, prefetchable) [size=256M]
        Memory at 810000000 (64-bit, prefetchable) [size=32M]
        I/O ports at c000 [size=128 ]
        Expansion ROM at c9060000 [virtual] [disabled] [size=128K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Kernel driver in use: nvidia
        Kernel modules: nvidia

00:08.0 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd GP107GL High Definition Audio Controller
        Physical Slot: 8
        Flags: bus master, fast devsel, latency 0, IRQ 11
        Memory at c9040000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel


  • Checked nvidia-smi: This is where I am stuck
Code:
root@vm:/home/vm# nvidia-smi
No devices were found


At this point, I do not know what I missed. I have notices that the lspci on the VM do not list as much capabilities for the GPU as it does in the TrueNAS host but I have not found out if it is relevant.

Any idea is much appreciated. Thanks in advance.
 
Last edited:

solitarius

Cadet
Joined
Jun 17, 2022
Messages
9
Also, I have check the dmesg and It show trouble initializing the GPU

Code:
root@vm:/home/vm# dmesg | grep 0000:00:07
[    0.317470] pci 0000:00:07.0: [10de:1c82] type 00 class 0x030000
[    0.326338] pci 0000:00:07.0: reg 0x10: [mem 0xc8000000-0xc8ffffff]
[    0.334338] pci 0000:00:07.0: reg 0x14: [mem 0x800000000-0x80fffffff 64bit pref]
[    0.342338] pci 0000:00:07.0: reg 0x1c: [mem 0x810000000-0x811ffffff 64bit pref]
[    0.350338] pci 0000:00:07.0: reg 0x24: [io  0xc000-0xc07f]
[    0.358338] pci 0000:00:07.0: reg 0x30: [mem 0xfffe0000-0xffffffff pref]
[    0.358624] pci 0000:00:07.0: Enabling HDA controller
[    0.394619] pci 0000:00:07.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.394619] pci 0000:00:07.0: vgaarb: bridge control possible
[    0.410005] pci 0000:00:07.0: can't claim BAR 6 [mem 0xfffe0000-0xffffffff pref]: no compatible bridge window
[    0.410011] pci 0000:00:07.0: BAR 6: assigned [mem 0xc9060000-0xc907ffff pref]
[    2.081814] nvidia 0000:00:07.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    2.250945] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:07.0 on minor 1
[    3.164192] NVRM: GPU 0000:00:07.0: RmInitAdapter failed! (0x24:0xffff:1211)
[    3.186585] NVRM: GPU 0000:00:07.0: rm_init_adapter failed, device minor number 0
[    3.411924] NVRM: GPU 0000:00:07.0: RmInitAdapter failed! (0x24:0xffff:1211)
[    3.412049] NVRM: GPU 0000:00:07.0: rm_init_adapter failed, device minor number 0
[   52.179580] NVRM: GPU 0000:00:07.0: RmInitAdapter failed! (0x24:0xffff:1211)


When googleing stuf, it was mentionned that the nvidia-persistenced might be the issue so I tried to run sudo nvidia-persistenced but if showd an error.

Code:
nvidia-persistenced failed to initialize. Check syslog for more details.


Code:
vm@vm:~$ sudo tail -n 1000 /var/log/syslog
Jul 29 16:47:53 vm kernel: [ 3354.520852] NVRM: GPU 0000:00:07.0: RmInitAdapter failed! (0x24:0xffff:1211)
Jul 29 16:47:53 vm kernel: [ 3354.520987] NVRM: GPU 0000:00:07.0: rm_init_adapter failed, device minor number 0
Jul 29 16:47:53 vm kernel: [ 3354.754563] NVRM: GPU 0000:00:07.0: RmInitAdapter failed! (0x24:0xffff:1211)
Jul 29 16:47:53 vm kernel: [ 3354.754663] NVRM: GPU 0000:00:07.0: rm_init_adapter failed, device minor number 0
Jul 29 16:57:24 vm kernel: [ 3926.033142] NVRM: GPU 0000:00:07.0: RmInitAdapter failed! (0x24:0xffff:1211)
Jul 29 16:57:24 vm kernel: [ 3926.033241] NVRM: GPU 0000:00:07.0: rm_init_adapter failed, device minor number 0
Jul 29 16:57:24 vm kernel: [ 3926.264552] NVRM: GPU 0000:00:07.0: RmInitAdapter failed! (0x24:0xffff:1211)
Jul 29 16:57:24 vm kernel: [ 3926.264684] NVRM: GPU 0000:00:07.0: rm_init_adapter failed, device minor number 0
Jul 29 16:57:45 vm nvidia-persistenced: Failed to lock PID file: Resource temporarily unavailable
Jul 29 16:57:45 vm nvidia-persistenced: Shutdown (15006)


But htop confirmed that nvidia-persistenced was running (with the user nvpd)

Here is the /proc/driver/nvidia/gpus/0000\:00\:07.0/information file (don't know what it tels but it seems not correct)

Code:
vm@vm:~$ cat /proc/driver/nvidia/gpus/0000\:00\:07.0/information
Model:           NVIDIA GeForce GTX 1050 Ti
IRQ:             35
GPU UUID:        GPU-????????-????-????-????-????????????
Video BIOS:      ??.??.??.??.??
Bus Type:        PCIe
DMA Size:        47 bits
DMA Mask:        0x7fffffffffff
Bus Location:    0000:00:07.0
Device Minor:    0
GPU Excluded:    No
 
Last edited:

solitarius

Cadet
Joined
Jun 17, 2022
Messages
9
I tried to use an Ubuntu VM but did not got more luck. Same ????? in the driver information file and no device found by nvidia-smi.

Any idea of what I could try to pinpoint the issue ?
 

solitarius

Cadet
Joined
Jun 17, 2022
Messages
9
So, I got an Nvidia Quadro P2000 and tried with it but got the exact same result, no detection from nvidia-smi and lots of ? in the driver information.

I don't know what to do next to troubleshoot this.
 

buswedg

Explorer
Joined
Aug 17, 2022
Messages
69
So, I got an Nvidia Quadro P2000 and tried with it but got the exact same result, no detection from nvidia-smi and lots of ? in the driver information.

I don't know what to do next to troubleshoot this.
I've got a thread going over here, which I probably should have started in this forum.

I'm able to at least see my gpu via nvidia-smi in the VM. So, you might want to try the same steps I note in my most recent post.

Still having issue getting my jellyfin container (within the VM) to leverage it for transcoding however.
 

buswedg

Explorer
Joined
Aug 17, 2022
Messages
69
Sorry, should mention I also ran the below, which bumped me up to 515.65. Though, I'm quite sure nvidia-smi output was still showing fine, prior to running the cuda toolkit installer.

Code:
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
sudo sh cuda_11.7.1_515.65.01_linux.run


sudo reboot


https://developer.nvidia.com/cuda-d...n&target_version=11&target_type=runfile_local
 
Last edited:

solitarius

Cadet
Joined
Jun 17, 2022
Messages
9
I've got a thread going over here, which I probably should have started in this forum.

I'm able to at least see my gpu via nvidia-smi in the VM. So, you might want to try the same steps I note in my most recent post.

Still having issue getting my jellyfin container (within the VM) to leverage it for transcoding however.

Hi buswedg,

I was Googling and browsing a lot lately and I happened to figure out my problem. I had to change some settings in my ASRock X570 PRO4 BIOS to enable proper BAR features:
  • Disable CSM (Compatibility Support Module)
  • Enable Above 4G decoding
  • Enable Re-Size BAR Support
Once the server restarted I managed to have the GPU correctly detected:

Code:
vm@vm:/srv/plex$ nvidia-smi
Sun Aug 21 21:38:43 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06   Driver Version: 470.129.06   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2000        On   | 00000000:00:07.0 Off |                  N/A |
| 49%   39C    P8     4W /  75W |      1MiB /  5059MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
vm@vm:/srv/plex$ cat /proc/driver/nvidia/gpus/0000\:00\:07.0/information
Model:           Quadro P2000
IRQ:             32
GPU UUID:        GPU-1be609b5-838f-05c8-fa63-7c2f982d9153
Video BIOS:      86.06.3f.00.2f
Bus Type:        PCIe
DMA Size:        47 bits
DMA Mask:        0x7fffffffffff
Bus Location:    0000:00:07.0
Device Minor:    0
GPU Excluded:    No


I guess I can mark this tread as solved.

I am still having issues to have it properly usable by Plex for transcoding but it is probably better if we continue this on your thread.
 
Joined
Sep 14, 2022
Messages
2
Just solved this myself with a Quadro P400 - I'm not sure if the "Disable CSM", "Above 4G Decoding", and "Re-Size BAR Support" changes were required on my end, but I did them anyways. What finally put the nail in the coffin was enabling the "Hide from MSR" option in the VM definition in the TrueNAS UI. I googled that setting a while ago but it did not seem related. Only later did I realize that that setting is responsible for adding the

Code:
<kvm><hidden state='on'/></kvm>


section in the VM XML definition. This confuses me greatly because I thought nvidia's drivers were supposed to be tolerant of "workstation class" cards like a Quadro P400 in virtualization scenarios, but I suppose not. Maybe I'm misplacing blame. BUT, nvidia-smi shows my card now.

Thanks for writing your findings down here!
 
Top