NVIDIA drivers won't load

Sasquatch

Explorer
Joined
Nov 11, 2017
Messages
87
My system won't load NVIDIA drivers.
TrueNAS-SCALE-22.12.0 switched trains from CORE->ANGELFISH->BLUEFIN if that matters

Code:
lspci |grep NVIDIA
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P400] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
 nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.


and from /var/log/errors
Code:
Jan 12 17:34:14 truenas kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
Jan 12 17:34:14 truenas kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
Jan 12 17:34:41 truenas systemd-modules-load[2523]: Failed to find module 'vfio_pci ids=10DE:1CB3,10DE:0FB9'
Jan 12 17:34:41 truenas systemd-modules-load[2523]: Failed to find module 'nvidia-drm'


Code:
 lshw -C display
  *-display
       description: VGA compatible controller
       product: GP107GL [Quadro P400]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=vfio-pci latency=0
       resources: irq:11 memory:f5000000-f5ffffff memory:d0000000-dfffffff memory:e0000000-e1ffffff ioport:e000(size=128) memory:c0000-dffff
  *-display
       description: VGA compatible controller
       product: ASPEED Graphics Family
       vendor: ASPEED Technology, Inc.
       physical id: 0
       bus info: pci@0000:06:00.0
       version: 21
       width: 32 bits
       clock: 33MHz
       capabilities: pm vga_controller cap_list
       configuration: driver=ast latency=0
       resources: irq:18 memory:f0000000-f3ffffff memory:f4000000-f401ffff ioport:b000(size=128)
root@truenas[~]#
 
Last edited:

hmak604

Cadet
Joined
Nov 26, 2022
Messages
5
Same!

So currently I must isolate the GPU so that my K3S cluster starts up correctly.
 

un4

Cadet
Joined
Aug 9, 2022
Messages
4
the same issue, I just upgraded to bluefin (also I have an upgrade path as follows CORE->ANGELFISH->BLUEFIN) and GPU drivers don't start for Nvidia. Nvidia card is not available for selection on apps as well.

1674205777650.png


1674205834271.png


The GPU is availbele for isolation, I made sure it is NOT isolated and restarted the system (some other post had similar issue when GPU was isolated)
 

mjflower

Dabbler
Joined
Sep 14, 2020
Messages
25
I have never had mine isolated but now discover that it is not available in my Plex app. It is detected nvidia-smi but am getting the "Failed to find module 'nvidia-drm'" error when typing: systemctl status systemd-modules-load.service

I'm really pulling my hair out here. Constant issue with offocial plex on Truenas scale Bluefin
 

un4

Cadet
Joined
Aug 9, 2022
Messages
4
I have never had mine isolated but now discover that it is not available in my Plex app. It is detected nvidia-smi but am getting the "Failed to find module 'nvidia-drm'" error when typing: systemctl status systemd-modules-load.service

I'm really pulling my hair out here. Constant issue with offocial plex on Truenas scale Bluefin
Looking back to my post it seems that I state that the solution is to make sure it is not isolated. To clarify, this did not resolve my issue.

I do not think this is an issue with plex, this is an issue with truenas itself. I am having a lot of issues with my apps after the upgrade, this is the last one left to be resolved.
 

mjflower

Dabbler
Joined
Sep 14, 2020
Messages
25
Ah, I thought you said you were like me and that you never had it isolated.. I'll keep looking.
 

hmak604

Cadet
Joined
Nov 26, 2022
Messages
5
Something peculiar is that I believe it requires “nvidia driver” but maybe the latest drivers show up as “nvidia current driver”
 

mjflower

Dabbler
Joined
Sep 14, 2020
Messages
25
Something peculiar is that I believe it requires “nvidia driver” but maybe the latest drivers show up as “nvidia current driver”
I've just logged a support ticket. Hopefully it can be worked out
 

dh_vie

Cadet
Joined
Feb 18, 2023
Messages
1
I also have the same problem. nvidia-smi doesn't work and says the driver isn't running. When I like at lspci I see that the driver loaded for my M2000 is vfio which is the passthrough driver.

I have no card set in isolation. Prior to the upgrade (Anglefish to Bluefin) I did have the nvidia card in isolation but removed it post upgrade as I realized that docker containers need the card to be not isolated (only VMs need isolation).

I also see that the nvidia driver is installed it's just not loaded, instead vfio is loaded for the NVIDIA card.
 

mjflower

Dabbler
Joined
Sep 14, 2020
Messages
25
Good luck. I lodged a ticket weeks ago with no response. Given up with apps on Truenas. Just using for storage now.
 

mgoulet65

Explorer
Joined
Jun 15, 2021
Messages
95
Same...following
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hello all,

If you're specifically having issues with your NVIDIA card being incorrectly claimed by vfio-pci please check the fix posted here:


NVIDIA GPUs before the Maxwell generation are not supported in Bluefin - please check the supported card listing below for the NVIDIA driver:

 

anto294

Dabbler
Joined
Mar 27, 2023
Messages
11
LoL, I have connected a monitor to the server, and after a reboot, an error was present after grub.
NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
With Windows and other Linux OS (debian based) no issues ...
From the BIOS I have already enabled “Above 4G Decoding”, and disabled Secure Boot & CSM legacy.
Nothing, the gpu doesn't load the driver. (GPU was present on lspci -v)

Thanks to this nvidia forum thread, I have added "pci=realloc=off" to the grub.
Boom nvidia-smi detect the card, and no "missing nvidia-drm" error.

So, I have made permanent fix with:
sudo midclt call system.advanced.update '{"kernel_extra_options": "pci=realloc=off"}'

My HW:
AMD Ryzen 5 5600x
Asrock B550M PG Riptite
32GB ECC Kinston RAM
LSI 9211i HBA
Nvidia Quadro P400 (on 1x pci slot, whit supp. power)
Intel X520 SFP+ 10G NIC
Silverstone CS381 case
4 x 4TB Wd Red Plus
 
Top