Nvidia GPU not appearing for use with SCALE

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
can you, or are there plans to, allow for some kind of driver version selection by the user? sure it would require a reboot to change the driver. but better than forcing a user to choose between major TN release versions just for a different nvidia driver.

i would recommend 3 versions to choose from. ~340 or 390 for ancient legacy cards, ~470 for middle aged cards, and latest driver version (currently 525 branch) to support the newest cards and CUDA features. that should cover pretty much everyone, and major TN software releases could only focus on updating whatever the latest stable version is.

The current driver for Bluefin is 515.65.01.... it was the latest stable release before RC1 of bluefin. We'll update again for Cobia.

The Nvidia GPUs supported are in the "supported Products" tab. Its more extensive than I thought. Includes most GPUs less than 10 years old.


If there exceptions .. please highlight,
 

FindingFilene

Dabbler
Joined
Nov 25, 2020
Messages
20
Nothing like being called out of the woodwork due to generational gaps. This downgrade goes much, much further back than just Kepler. I know some Video Cards are $20, though, and that includes the good news about VMs. It doesn't bode well for 10+ year old technology that still works--and still works better than a similarly-priced replacement in this historic moment. (I'm looking at you, ASUS :3 <3 )
 

Saberwolf

Explorer
Joined
Feb 7, 2021
Messages
63
simple fix just get the newer version of that card witch is the P2000 it will work out of the box no problems there is a way to downgrade the drives to legacy but it is unsupported. and they will not help you with any issues that arise from your modification. you will run into a kernel issue if you downgrade to far past a certain point there are know kernel and driver issues that your probably running into. currently there is no fix it would require kernel downgrade and driver downgrade to support that card best luck is to pass it to VM and install your own OS and driver and kernel version
 
Last edited:

fatalskeptic

Dabbler
Joined
Feb 28, 2023
Messages
13
Hi, been banging my head against the wall on what seems to be a similar (but not the same issue). I noticed that I no longer have the GPU listed under GPU Configuration in any of the "Apps". I am very certain that this was not the case a few days ago. I ran some of the basic commands and here's the output (as far as I can tell, the system recognizes the GPU). ALso, it is not isolated, the section for Isolated GPU is blank; when I go to add my GPu for isolated GPU, system shows the GPU in the drop down but will not let me select it).

TrueNAS-SCALE-22.12.1

1679996018215.png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
@fatalskeptic Which NVIDIA card are you using? Your nvidia-smi output is too narrow to show it correctly; you might need to use lspci instead.

Edit: It appears you have a GTX 1650 Super, which should be covered in the 515 driver. I wonder if there's a mismatch in the middleware as we saw previously in this thread. Can you lspci -v and look to see that your 1650S is being claimed properly?
 
Last edited:

fatalskeptic

Dabbler
Joined
Feb 28, 2023
Messages
13
@fatalskeptic Which NVIDIA card are you using? Your nvidia-smi output is too narrow to show it correctly; you might need to use lspci instead.

Edit: It appears you have a GTX 1650 Super, which should be covered in the 515 driver. I wonder if there's a mismatch in the middleware as we saw previously in this thread. Can you lspci -v and look to see that your 1650S is being claimed properly?
Here are the Nvidia related entries with that command:

0c:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Hewlett-Packard Company TU116 [GeForce GTX 1650 SUPER]
Flags: bus master, fast devsel, latency 0, IRQ 109, IOMMU group 18
Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at f0000000 (64-bit, prefetchable) [size=32M]
I/O ports at e000
Expansion ROM at f7000000 [virtual] [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_current_drm, nvidia_current

0c:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)
Subsystem: Hewlett-Packard Company TU116 High Definition Audio Controller
Flags: bus master, fast devsel, latency 0, IRQ 98, IOMMU group 18
Memory at f7080000 (32-bit, non-prefetchable) [size=16K]

Capabilities: <access denied>
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

0c:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1) (prog-if 30 [XHCI])
Subsystem: Hewlett-Packard Company TU116 USB 3.1 Host Controller
Flags: fast devsel, IRQ 50, IOMMU group 18
Memory at f2000000 (64-bit, prefetchable) [size=256K]

Memory at f2040000 (64-bit, prefetchable) [size=64K]
Capabilities: <access denied>
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
 
Last edited:

airberg

Cadet
Joined
Aug 17, 2023
Messages
9
Any ideas anyone
I wish the TrueNAS team could speak of why GPU passthrough is not working in Bluefin. Makes me wish I never upgraded from Core
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I wish the TrueNAS team could speak of why GPU passthrough is not working in Bluefin. Makes me wish I never upgraded from Core
I assume you have only a single GPU?
 

Saberwolf

Explorer
Joined
Feb 7, 2021
Messages
63
let try and help you guys out my current system
Code:
Last login: Sun Aug 13 14:09:08 PDT 2023 on pts/3
root@truenas:~# nvidia-smi
Sun Aug 20 19:17:21 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2000        Off  | 00000000:0B:00.0 Off |                  N/A |
| 51%   42C    P8     5W /  75W |      2MiB /  5120MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
root@truenas:~# lsmod | grep nvidia
nvidia_uvm           1302528  2
nvidia_drm             73728  0
nvidia_modeset       1150976  1 nvidia_drm
nvidia              40853504  44 nvidia_uvm,nvidia_modeset
drm_kms_helper        315392  1 nvidia_drm
drm                   647168  4 drm_kms_helper,nvidia,nvidia_drm


i am not using the in app setting to apply my gpu to apps due to the imposed limitations of kuberctl

what i have done was to add Environment Variables for Plex any for any other app you would like to use gpu for Hardware transcoding or acceleration. with the way scale is implementing it currently in the app if you allocate gpu to container app under gpu configuration it take it away from every other app on system and it will crash other apps if more than one is enable the second will not start

1692584576628.png


so for now just disable gpu config and set Environment Variables for Plex as see above or if your really good find Nvidia docs that talk about Environment Variables. blow is a pic showing that it is disabled
1692584820121.png


it does work by the way loook my the next code bits
Code:
root@truenas:~# nvidia-smi
Sun Aug 20 19:29:41 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2000        Off  | 00000000:0B:00.0 Off |                  N/A |
| 58%   59C    P0    41W /  75W |   1005MiB /  5120MiB |     33%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2183540      C   ...diaserver/Plex Transcoder      501MiB |
|    0   N/A  N/A   2183608      C   ...diaserver/Plex Transcoder      501MiB |
+-----------------------------------------------------------------------------+
root@truenas:~# 


1692585059635.png

If you are running scale 22.12.x update to 22.12.3.3 get the latest driver for NVidia also if handling of the blacklisted gpu's is done with in Scale ui no need to modify files could brake some thing. the reason for the issues is due to the newer Kernel Versions back in a earlier version of Linux kernel if you ran to new of NVidia driver it would break it hardware acceleration it is a combination of both kernel and driver version that were updated to add HA back into scale adding legacy drivers will not work in the newer kernel hence upgrade your gpu look at this website

for you next gpu upgrade stick with Pascal and new gpus gtx 1060 should work but only on 5 concurrent sessions for encode but only 1 decode p2000 is unlimited and the card only cost 200$ and it works flawlessly for me

it really depends your use case for the server let me know if this helps.
 

Saberwolf

Explorer
Joined
Feb 7, 2021
Messages
63
I assume you have only a single GPU?
you can only allot one gpu to one VM
 

airberg

Cadet
Joined
Aug 17, 2023
Messages
9
I assume you have only a single GPU?
I understand the post is about passing to VMs but, like many others in this thread, it wont show up for apps even though nothing else is using it. And yes, I only have 1 gpu
 

Saberwolf

Explorer
Joined
Feb 7, 2021
Messages
63
I understand the post is about passing to VMs but, like many others in this thread, it wont show up for apps even though nothing else is using it. And yes, I only have 1 gpu
have you blacklisted the gpu it will not show up for APPS if you have not blacklisted gpu it will not show up for VM's most likely it is a config issue

have you checked your Isolated GPU devices under system advanced settings?
 

Sasquatch

Explorer
Joined
Nov 11, 2017
Messages
87
have you blacklisted the gpu it will not show up for APPS if you have not blacklisted gpu it will not show up for VM's most likely it is a config issue

have you checked your Isolated GPU devices under system advanced settings?
After changing GPU isolation settings you haveto manually reset middlewares, at least in 12.2

Run
midclt call boot.update_initramfs
and reboot.

Source:
 

airberg

Cadet
Joined
Aug 17, 2023
Messages
9
have you blacklisted the gpu it will not show up for APPS if you have not blacklisted gpu it will not show up for VM's most likely it is a config issue

have you checked your Isolated GPU devices under system advanced settings?
Gpu is not isolated (blacklisted) and still not showing up for apps. Gpu drivers up to date. Everything worked in core. On a new install to Scale, the gpu shows but will not show in apps to be selected.
 

Sasquatch

Explorer
Joined
Nov 11, 2017
Messages
87
Gpu is not isolated (blacklisted) and still not showing up for apps. Gpu drivers up to date. Everything worked in core. On a new install to Scale, the gpu shows but will not show in apps to be selected.
I had exact same problem.
Run
Code:
midclt call boot.update_initramfs

and reboot, worked for me
 

airberg

Cadet
Joined
Aug 17, 2023
Messages
9
I had exact same problem.
Run
Code:
midclt call boot.update_initramfs

and reboot, worked for me
Thanks for the tip... It said "false". Did it say that for you?
 

airberg

Cadet
Joined
Aug 17, 2023
Messages
9
I had exact same problem.
Run
Code:
midclt call boot.update_initramfs

and reboot, worked for me
Still stuck in the same issue :( thanks for trying
 

Saberwolf

Explorer
Joined
Feb 7, 2021
Messages
63
Gpu is not isolated (blacklisted) and still not showing up for apps. Gpu drivers up to date. Everything worked in core. On a new install to Scale, the gpu shows but will not show in apps to be selected.
are you running the latest release of truenas scale current version is 22.12.3.3 unless you upgrade to the beta i would suggest upgrading to latest bluefin release try again i know is 12.3 they applied new gpu drivers
 

Saberwolf

Explorer
Joined
Feb 7, 2021
Messages
63
do me a favor and loot at shell command and run this

cli
system device gpu_pci_ids_choices

what is your output mine looks like this
Code:
root@truenas:~# cli
[truenas]> system device gpu_pci_ids_choices
+-------------------------------------------+--------------+
| NVIDIA Corporation GP106GL [Quadro P2000] | 0000:0b:00.0 |
+-------------------------------------------+--------------+
[truenas]>
 
Top