GPU Isolate for VM

darkobas

Cadet
Joined
Oct 29, 2018
Messages
6
Hi

I have 2 nvidia cards. I have reserved 1 for passthrough in advance options.
When I add it to a VM and then run the vm i get:

Code:
kernel: VFIO - User Level meta-driver version: 0.3
kernel: NVRM: Attempting to remove device 0000:03:00.0 with non-zero usage count!


and then whole libvirtd freezes so it cant even be restarted

Code:
# fuser -v /dev/nvidia0
                     USER        PID ACCESS COMMAND
/dev/nvidia0:        root      29724 F.... nvidia-device-p
# fuser -v /dev/nvidia1
                     USER        PID ACCESS COMMAND
/dev/nvidia1:        root      29724 F.... nvidia-device-p

# ps -ef |grep 29724
root       29724   29700  1 16:06 ?        00:00:00 nvidia-device-plugin



Code:
Thu Feb  2 16:17:48 2023     
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 35%   30C    P8    N/A /  75W |      0MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
| 22%   31C    P8     1W /  38W |      0MiB /  1024MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                            
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+


Code:
# dmesg |grep -i iommu
[    0.416758] iommu: Default domain type: Passthrough (set via kernel command line)
[    0.501256] DMAR: IOMMU feature sc_support inconsistent
[    0.501257] DMAR: IOMMU feature dev_iotlb_support inconsistent
[    0.501348] pci 0000:ff:0b.0: Adding to iommu group 0
[    0.501363] pci 0000:ff:0b.1: Adding to iommu group 0
[    0.501377] pci 0000:ff:0b.2: Adding to iommu group 0
[    0.501442] pci 0000:ff:0c.0: Adding to iommu group 1
[    0.501456] pci 0000:ff:0c.1: Adding to iommu group 1
[    0.501470] pci 0000:ff:0c.2: Adding to iommu group 1
[    0.501483] pci 0000:ff:0c.3: Adding to iommu group 1
[    0.501496] pci 0000:ff:0c.4: Adding to iommu group 1
[    0.501511] pci 0000:ff:0c.5: Adding to iommu group 1
[    0.501567] pci 0000:ff:0f.0: Adding to iommu group 2
[    0.501581] pci 0000:ff:0f.1: Adding to iommu group 2
[    0.501595] pci 0000:ff:0f.4: Adding to iommu group 2
[    0.501609] pci 0000:ff:0f.5: Adding to iommu group 2
[    0.501623] pci 0000:ff:0f.6: Adding to iommu group 2
[    0.501679] pci 0000:ff:10.0: Adding to iommu group 3
[    0.501693] pci 0000:ff:10.1: Adding to iommu group 3
[    0.501708] pci 0000:ff:10.5: Adding to iommu group 3
[    0.501723] pci 0000:ff:10.6: Adding to iommu group 3
[    0.501737] pci 0000:ff:10.7: Adding to iommu group 3
[    0.501767] pci 0000:ff:12.0: Adding to iommu group 4
[    0.501782] pci 0000:ff:12.1: Adding to iommu group 4
[    0.501863] pci 0000:ff:13.0: Adding to iommu group 5
[    0.501879] pci 0000:ff:13.1: Adding to iommu group 5
[    0.501894] pci 0000:ff:13.2: Adding to iommu group 5
[    0.501908] pci 0000:ff:13.3: Adding to iommu group 5
[    0.501923] pci 0000:ff:13.4: Adding to iommu group 5
[    0.501937] pci 0000:ff:13.5: Adding to iommu group 5
[    0.501952] pci 0000:ff:13.6: Adding to iommu group 5
[    0.501967] pci 0000:ff:13.7: Adding to iommu group 5
[    0.502030] pci 0000:ff:14.0: Adding to iommu group 6
[    0.502046] pci 0000:ff:14.1: Adding to iommu group 6
[    0.502061] pci 0000:ff:14.2: Adding to iommu group 6
[    0.502075] pci 0000:ff:14.3: Adding to iommu group 6
[    0.502090] pci 0000:ff:14.6: Adding to iommu group 6
[    0.502105] pci 0000:ff:14.7: Adding to iommu group 6
[    0.502153] pci 0000:ff:15.0: Adding to iommu group 7
[    0.502169] pci 0000:ff:15.1: Adding to iommu group 7
[    0.502185] pci 0000:ff:15.2: Adding to iommu group 7
[    0.502200] pci 0000:ff:15.3: Adding to iommu group 7
[    0.502237] pci 0000:ff:16.0: Adding to iommu group 8
[    0.502254] pci 0000:ff:16.6: Adding to iommu group 8
[    0.502270] pci 0000:ff:16.7: Adding to iommu group 8
[    0.502325] pci 0000:ff:17.0: Adding to iommu group 9
[    0.502342] pci 0000:ff:17.4: Adding to iommu group 9
[    0.502358] pci 0000:ff:17.5: Adding to iommu group 9
[    0.502375] pci 0000:ff:17.6: Adding to iommu group 9
[    0.502391] pci 0000:ff:17.7: Adding to iommu group 9
[    0.502445] pci 0000:ff:1e.0: Adding to iommu group 10
[    0.502462] pci 0000:ff:1e.1: Adding to iommu group 10
[    0.502480] pci 0000:ff:1e.2: Adding to iommu group 10
[    0.502497] pci 0000:ff:1e.3: Adding to iommu group 10
[    0.502513] pci 0000:ff:1e.4: Adding to iommu group 10
[    0.502542] pci 0000:ff:1f.0: Adding to iommu group 11
[    0.502560] pci 0000:ff:1f.2: Adding to iommu group 11
[    0.502573] pci 0000:00:00.0: Adding to iommu group 12
[    0.502589] pci 0000:00:01.0: Adding to iommu group 13
[    0.502604] pci 0000:00:01.1: Adding to iommu group 14
[    0.502617] pci 0000:00:02.0: Adding to iommu group 15
[    0.502630] pci 0000:00:03.0: Adding to iommu group 16
[    0.502645] pci 0000:00:05.0: Adding to iommu group 17
[    0.502658] pci 0000:00:05.1: Adding to iommu group 18
[    0.502672] pci 0000:00:05.2: Adding to iommu group 19
[    0.502686] pci 0000:00:05.4: Adding to iommu group 20
[    0.502700] pci 0000:00:11.0: Adding to iommu group 21
[    0.502722] pci 0000:00:11.4: Adding to iommu group 22
[    0.502736] pci 0000:00:14.0: Adding to iommu group 23
[    0.502757] pci 0000:00:16.0: Adding to iommu group 24
[    0.502772] pci 0000:00:19.0: Adding to iommu group 25
[    0.502785] pci 0000:00:1a.0: Adding to iommu group 26
[    0.502799] pci 0000:00:1b.0: Adding to iommu group 27
[    0.502812] pci 0000:00:1c.0: Adding to iommu group 28
[    0.502827] pci 0000:00:1c.2: Adding to iommu group 29
[    0.502841] pci 0000:00:1c.4: Adding to iommu group 30
[    0.502854] pci 0000:00:1d.0: Adding to iommu group 31
[    0.502892] pci 0000:00:1f.0: Adding to iommu group 32
[    0.502913] pci 0000:00:1f.2: Adding to iommu group 32
[    0.502932] pci 0000:00:1f.3: Adding to iommu group 32
[    0.502947] pci 0000:02:00.0: Adding to iommu group 33
[    0.502981] pci 0000:03:00.0: Adding to iommu group 34
[    0.503001] pci 0000:03:00.1: Adding to iommu group 34
[    0.503033] pci 0000:04:00.0: Adding to iommu group 35
[    0.503053] pci 0000:04:00.1: Adding to iommu group 35
[    0.503067] pci 0000:06:00.0: Adding to iommu group 36
[    0.503083] pci 0000:07:00.0: Adding to iommu group 37

what am i doing wrong here ?
one thing i can think of maybe having a role is that i installed one HW card AFTER the system was installed.. maybe i need to trigger update of initfs or something ? but the card seems to work according to nvidia-smi



EDIT:
Ok I killed the PID that i saw with fuser, all hell broke loose and libvirtd crashed, but after reboot it was working
 
Last edited:
Top