Multiple Tesla M40 GPU passthrough on one

angst911

Dabbler
Joined
Sep 11, 2015
Messages
12
I have two Tesla M40's installed in my system running TrueNAS Scale 22.02.3, and if I isolate one of the cards, they both disappear to the operating system. Any thoughts on what's going on?


Code:
root@truenas[~]# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.



Code:
root@truenas[~]# lspci | grep M40
18:00.0 3D controller: NVIDIA Corporation GM200GL [Tesla M40] (rev a1)
c3:00.0 3D controller: NVIDIA Corporation GM200GL [Tesla M40] (rev a1)
 

angst911

Dabbler
Joined
Sep 11, 2015
Messages
12
Code:
[   23.244656] nvidia-nvlink: Nvlink Core is being initialized, major device number 239

[   23.257563] Error: Driver 'pcspkr' is already registered, aborting...
[   23.262985] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[   23.285990] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[   23.316464] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[   23.323462] ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed.
[   23.355173] NVRM: No NVIDIA devices probed.
[   23.383908] nvidia-nvlink: Unregistered the Nvlink Core, major device number 239
[   23.470610] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x002a7c, prod_id: 0x1b57, dev_id: 0x20)
[   23.554773] checking generic (a4000000 300000) vs hw (a4000000 1000000)
[   23.554777] fb0: switching to astdrmfb from EFI VGA
[   23.564470] Console: switching to colour dummy device 80x25
[   23.570308] ast 0000:05:00.0: [drm] Using P2A bridge for configuration
[   23.576840] ast 0000:05:00.0: [drm] AST 2500 detected
[   23.581899] ast 0000:05:00.0: [drm] Analog VGA only
[   23.586789] ast 0000:05:00.0: [drm] dram MCLK=800 Mhz type=7 bus_width=16
[   23.593645] [TTM] Zone  kernel: Available graphics memory: 65761748 KiB
[   23.597127] ipmi_si IPI0001:00: IPMI kcs interface initialized
[   23.600258] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[   23.612616] [TTM] Initializing pool allocator
[   23.612622] [TTM] Initializing DMA pool allocator
[   23.622012] [drm] Initialized ast 0.1.0 20120228 for 0000:05:00.0 on minor 0
[   23.631899] ipmi_ssif: IPMI SSIF Interface driver
[   23.632196] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[   23.644358] fbcon: astdrmfb (fb0) is primary device
[   23.644369] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[   23.649585] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[   23.649597] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[   23.649598] NVRM: No NVIDIA devices probed.
[   23.649840] nvidia-nvlink: Unregistered the Nvlink Core, major device number 239
[   23.656336] Console: switching to colour frame buffer device 128x48
[   23.677733] rndis_host 1-14.2:2.0 enxb03af2b6059f: renamed from eth0
[   23.686803] ast 0000:05:00.0: [drm] fb0: astdrmfb frame buffer device
[   23.892864] intel_rapl_common: Found RAPL domain package
[   23.898207] intel_rapl_common: Found RAPL domain dram
[   23.903284] intel_rapl_common: DRAM domain energy unit 15300pj
[   23.915486] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[   23.923439] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[   23.934639] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[   23.952166] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[   23.975769] NVRM: No NVIDIA devices probed.
[   23.980953] nvidia-nvlink: Unregistered the Nvlink Core, major device number 239
[   23.992477] snd_hda_intel 0000:00:1f.3: enabling device (0140 -> 0142)
[   24.019021] snd_hda_codec_realtek hdaudioC0D0: autoconfig for ALC888-VD: line_outs=3 (0x14/0x17/0x16/0x0/0x0) type:line
[   24.031331] snd_hda_codec_realtek hdaudioC0D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[   24.031334] snd_hda_codec_realtek hdaudioC0D0:    hp_outs=1 (0x1b/0x0/0x0/0x0/0x0)
[   24.031335] snd_hda_codec_realtek hdaudioC0D0:    mono: mono_out=0x0
[   24.031337] snd_hda_codec_realtek hdaudioC0D0:    dig-out=0x1e/0x0
[   24.031338] snd_hda_codec_realtek hdaudioC0D0:    inputs:
[   24.031360] snd_hda_codec_realtek hdaudioC0D0:      Front Mic=0x19
[   24.080552] snd_hda_codec_realtek hdaudioC0D0:      Rear Mic=0x18
[   24.080553] snd_hda_codec_realtek hdaudioC0D0:      Line=0x1a
[   24.132018] input: HDA Intel PCH Front Mic as /devices/pci0000:00/0000:00:1f.3/sound/card0/input4
[   24.146823] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[   24.155200] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[   24.160569] input: HDA Intel PCH Rear Mic as /devices/pci0000:00/0000:00:1f.3/sound/card0/input5
[   24.166911] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[   24.194000] input: HDA Intel PCH Line as /devices/pci0000:00/0000:00:1f.3/sound/card0/input6
[   24.197272] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
 

eroji

Contributor
Joined
Feb 2, 2015
Messages
140
Were you able to get this resolved? I'm having the same problem with 2 Tesla P4s.
 

mgoulet65

Explorer
Joined
Jun 15, 2021
Messages
95
What we heard from iX is that this should be addressed in next release mid December.
 

Sparx

Contributor
Joined
Apr 18, 2017
Messages
107
What was the outcome of this? I cant open the Jira.
 

mgoulet65

Explorer
Joined
Jun 15, 2021
Messages
95

iolaus

Cadet
Joined
Aug 21, 2023
Messages
1
@HoneyBadger Is there any new info on this issue or documentation about exactly what the issue is? I'm running into the same problem with a pair of Nvidia P2000 Quadros. I'd like to isolate one for a VM and use the other for an app but as soon as I isolate one they both disappear from the host OS.
 
Top