GPU passthrough (yes, again)

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
Hello gentlemen,

I have 2 GPUs in TrueNas Scale 22.12.0, one Nvidia GT710 for the server display, and one GTX 1050 for hardware transcoding (emby) and I cannot make it work.

When I attach the 1050 to the VM, it is actually not attached. The GUI let me save, but there is nothing in "Device", and if I click again on edit, the GPU is not mentioned in the last block of configuration. Of cours if I boot the VM, the GPU is not there.
So I took a look at the log and dmesg shows this :

Code:
[734382.068516] NVRM: Attempting to remove device 0000:43:00.0 with non-zero usage count!


My GPUs have the following PCI ID :
Code:
43:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050] (rev a1)
43:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
44:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)
44:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)


And the isolated GPU is the GTX 1050

gpu-isolated.png


So it looks like the 1050 is not really isolated and cannot be used for the VM.

I checked dmesg under TrueNas and it shows the 710 as unsupported with the version of nvidia-smi running currently (515.65.01), the 710 is now legacy and uses the driver 470.161.03 released 3 months ago (2022.11.22)...

Code:
[    8.808334] NVRM: The NVIDIA GeForce GT 710 GPU installed in this system is
               NVRM:  supported through the NVIDIA 470.xx Legacy drivers. Please
               NVRM:  visit http://www.nvidia.com/object/unix.html for more
               NVRM:  information.  The 515.65.01 NVIDIA driver will ignore
               NVRM:  this GPU.  Continuing probe...
[    8.927474] NVRM: ignoring the legacy GPU 0000:44:00.0
[    8.927987] nvidia: probe of 0000:44:00.0 failed with error -1
[    8.928529] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  515.65.01  Wed Jul 20 14:00:58 UTC 2022
[    8.936071] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  515.65.01  Wed Jul 20 13:43:59 UTC 2022
[    9.038468] [drm] [nvidia-drm] [GPU ID 0x00004300] Loading driver
[    9.039168] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:43:00.0 on minor 0


So my guess is that the 710 is too old to be recognized, so it's not used in TrueNas and cannot be isolated. The fallback is to isolate the 1050 even if it's not flagged as isolated, and thus I cannot use it for my VM.

Are my guessing correct and what can I do to bypass this issue ?

Can I install the driver 470 in place of the 515 one ? Will it be still ok for upgrade if I do this ?

Best regards,
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I'd suggest trying each GPU individually.. confirming it works or doesn't.. then try the combination.

I'm sure we don't have a test case with two different generations of GPUs 'cooperating".

SCALE 22.12.1 comes out this week.. so some behaviors may improve.
 
Last edited:

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
What do you mean by "try" ? Try the GPU in the host only, or try to isolate ?

If you embedded a newer version of NVSMI in this release 22.12.1, I might even lose support for my 1050 because it's becoming pretty old :wink:
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
What do you mean by "try" ? Try the GPU in the host only, or try to isolate ?

If you embedded a newer version of NVSMI in this release 22.12.1, I might even lose support for my 1050 because it's becoming pretty old :wink:

Suggesting that you remove one GPU... test the other in the roles you want it to take.
Repeat the process for the othe GPU.
If both GPUs are working fine then test the combination.

As it is, the setup is complex and its not likely anyone has a similar setup. Its hard to work out where to start to resolve the issues.
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
Ok, let me work on it.

I don't know though if I will be able to isolate a single GPU because I do not have embedded graphic in the motherboard, so I need to pass one for the host. That's why I have the 710 for host usage and the 1050 for GPU passthrough.
If I connect only one GPU, I'm afraid I won't be able to isolate it for VM.

What logs would be useful to take for each step of this test ? The debug file generated through the UI would be enough ?
 

Chanabra

Cadet
Joined
Jun 29, 2022
Messages
1
Try removing any gpu from that isolation window, anytime I had a GPU selected and isolated in that advanced>isolate gpu window --- it would not be available at all for passthrough
 
Top