GPU not available to Plex app

mgoulet65 · Nov 30, 2022

Daisuke said:
See above. :) Report that you have the same issue into ticket and attach the debug file.

Done

Daisuke · Nov 30, 2022

Regardless the failed nvidia-drm module, the Tesla P4 card is detected properly in @truecharts Plex app.

Can someone provide some guidance how to test the card is used in Plex for transcoding? I believe the easiest way is to check the dashboard.

Dashboard with transcoding disabled/forced, I can see the (hw):

Transcoder settings:

mgoulet65 · Nov 30, 2022

Daisuke said:
Regardless the failed nvidia-drm module, the Tesla P4 card is detected properly in @truecharts Plex app.

View attachment 60487

Can someone provide some guidance how to test the card is used in Plex for transcoding? I believe the easiest way is to check the dashboard.

Dashboard with transcoding disabled/forced, I can see the (hw):

View attachment 60489 View attachment 60490

Transcoder settings:

View attachment 60488

Mine is not recognized by Plex

Daisuke · Nov 30, 2022

mgoulet65 said:
Mine is not recognized by Plex

Make sure you have nothing set in Isolated GPU Devices.

Daisuke · Nov 30, 2022

@mgoulet65 did you set everything or still have issues having the GPU detected in Plex?

mgoulet65 · Nov 30, 2022

Daisuke said:
Make sure you have nothing set in Isolated GPU Devices.

View attachment 60491

Still having same issue

Daisuke · Nov 30, 2022

mgoulet65 said:
Still having same issue

Is the card detected into OS?

Code:

# lspci | grep -i nvidia
03:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)

mgoulet65 · Nov 30, 2022

Daisuke said:
Is the card detected into OS?

Code:
# lspci | grep -i nvidia 03:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)

Code:

#  lspci | grep -i nvidia
04:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)

eroji · Nov 30, 2022

I just upgraded to BlueFin last night. There were some anomolies I noticed where the driver failed to load properly but a reboot fixed the issue. A good indicator is that you can execute nvidia-smi. This will show any transcode processes running within the Plex container. If the command comes back with some sort of error then your container will not be able to use hardware transcode.

Code:

Unable to determine the device handle for GPU 0000:81:00.0: Unknown Error

However, if the driver is properly loaded and nvidia-smi shows the GPU is seen then transcode should work properly.

Code:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:81:00.0 Off |                  Off |
| N/A   56C    P0    26W /  75W |    239MiB /  8192MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    172057      C   ...diaserver/Plex Transcoder      237MiB |
+-----------------------------------------------------------------------------+

Daisuke · Nov 30, 2022

eroji said:
However, if the driver is properly loaded and nvidia-smi shows the GPU is seen then transcode should work properly.

Thank you for that command, I did not know about it. Here's mines, using 239MiB / 8121MiB and slightly hotter at 62C, yours shows 8192MiB:

Code:

# nvidia-smi
Wed Nov 30 18:47:20 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:03:00.0 Off |                  Off |
| N/A   62C    P0    26W /  75W |    239MiB /  8121MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    211532      C   ...diaserver/Plex Transcoder      237MiB |
+-----------------------------------------------------------------------------+

Do you have the card installed in a rack server? Do these Tesla P4 cards have upgradable firmware?

Edit: While transcoding, I just saw a crash on screen (Angelfish 22.02.04), I'm rebooting to see if I can catch it again:

Code:

# grep -i '18:57:07' -A4 /var/log/messages
Nov 30 18:57:07 uranus kernel: pcieport 0000:00:02.0: AER: Uncorrected (Fatal) error received: 0000:00:02.0
Nov 30 18:57:07 uranus kernel: nvidia 0000:03:00.0: AER: can't recover (no error_detected callback)
Nov 30 18:57:07 uranus kernel: NVRM: GPU at PCI:0000:03:00: GPU-ff5239ea-ba5d-4952-a55f-6ad0fc8f56bb
Nov 30 18:57:07 uranus kernel: NVRM: Xid (PCI:0000:03:00): 79, pid=0, GPU has fallen off the bus.
Nov 30 18:57:07 uranus kernel: NVRM: GPU 0000:03:00.0: GPU has fallen off the bus.
Nov 30 18:57:07 uranus kernel: NVRM: GPU 0000:03:00.0: GPU serial number is [removed].
Nov 30 18:57:07 uranus kernel: NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
Nov 30 18:57:08 uranus kernel: pcieport 0000:00:02.0: AER: Root Port link has been reset (0)
Nov 30 18:57:08 uranus kernel: pcieport 0000:00:02.0: AER: device recovery failed

# nvidia-smi
Unable to determine the device handle for GPU 0000:03:00.0: Unknown Error

mgoulet65 · Nov 30, 2022

Daisuke said:

Thank you for that command, I did not know about it. Here's mines, using 239MiB / 8121MiB and slightly hotter at 62C, yours shows 8192MiB:

Code:

# nvidia-smi
Wed Nov 30 18:47:20 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:03:00.0 Off |                  Off |
| N/A   62C    P0    26W /  75W |    239MiB /  8121MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    211532      C   ...diaserver/Plex Transcoder      237MiB |
+-----------------------------------------------------------------------------+

Do you have the card installed in a rack server? Do these Tesla P4 cards have upgradable firmware?

Edit: While transcoding, I just saw a crash on screen (Angelfish 22.02.04), I'm rebooting to see if I can catch it again:

Code:

# grep -i '18:57:07' -A4 /var/log/messages
Nov 30 18:57:07 uranus kernel: pcieport 0000:00:02.0: AER: Uncorrected (Fatal) error received: 0000:00:02.0
Nov 30 18:57:07 uranus kernel: nvidia 0000:03:00.0: AER: can't recover (no error_detected callback)
Nov 30 18:57:07 uranus kernel: NVRM: GPU at PCI:0000:03:00: GPU-ff5239ea-ba5d-4952-a55f-6ad0fc8f56bb
Nov 30 18:57:07 uranus kernel: NVRM: Xid (PCI:0000:03:00): 79, pid=0, GPU has fallen off the bus.
Nov 30 18:57:07 uranus kernel: NVRM: GPU 0000:03:00.0: GPU has fallen off the bus.
Nov 30 18:57:07 uranus kernel: NVRM: GPU 0000:03:00.0: GPU serial number is {REMOVED}.
Nov 30 18:57:07 uranus kernel: NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
Nov 30 18:57:08 uranus kernel: pcieport 0000:00:02.0: AER: Root Port link has been reset (0)
Nov 30 18:57:08 uranus kernel: pcieport 0000:00:02.0: AER: device recovery failed

# nvidia-smi
Unable to determine the device handle for GPU 0000:03:00.0: Unknown Error

Mine shows nothing!

Code:

]# nvidia-smi
No devices were found

Daisuke · Nov 30, 2022

@eroji I got the crash again. Streaming stopped on @truecharts Plex, this is an issue that needs to be reported. How do I pull the crash dump, so I report it to IX into a Jira ticket?

Edit: I created NAS-119223, can you report you experience the same issue in Bluefin and submit a debug file into ticket?

Code:

Nov 30 19:24:21 uranus kernel: pcieport 0000:00:02.0: AER: Uncorrected (Fatal) error received: 0000:00:02.0
Nov 30 19:24:21 uranus kernel: nvidia 0000:03:00.0: AER: can't recover (no error_detected callback)
Nov 30 19:24:21 uranus kernel: NVRM: GPU at PCI:0000:03:00: GPU-ff5239ea-ba5d-4952-a55f-6ad0fc8f56bb
Nov 30 19:24:21 uranus kernel: NVRM: Xid (PCI:0000:03:00): 79, pid=0, GPU has fallen off the bus.
Nov 30 19:24:21 uranus kernel: NVRM: GPU 0000:03:00.0: GPU has fallen off the bus.
Nov 30 19:24:21 uranus kernel: NVRM: GPU 0000:03:00.0: GPU serial number is [removed].
Nov 30 19:24:21 uranus kernel: NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
Nov 30 19:24:22 uranus kernel: pcieport 0000:00:02.0: AER: Root Port link has been reset (0)
Nov 30 19:24:22 uranus kernel: pcieport 0000:00:02.0: AER: device recovery failed

Screen Shot 2022-11-30 at 7.28.19 PM.png

eroji · Nov 30, 2022

I am not experiencing the same issue so far.

Daisuke · Nov 30, 2022

eroji said:
I am not experiencing the same issue so far.

Good to know, can you please force transcoding for a while to see if the issue makes surface? In my case, after one reboot with forced transcoding, I experienced the issue again 15min later while playing from Apple TV. I'll force again transcoding to see if it happens. In Apple TV Plex app Settings, I set the Home Streaming to 10Mbps, 1080p.

Edit: The crash occurred again, 30min later.

Your versions:

Code:

NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7

Mines:

Code:

NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4

Bluefin 22.12.0 is very close of being released.

eroji · Dec 2, 2022

I'm not entirely sure what you mean by "force transcoding". Transcoding is enabled currently and it will use hardware transcode when needed. I've confirmed this. However, right now, through my testing with 2 Tesla P4s it seems like there is an issue of passing 1 GPU to container and 1 to VM. I created a post here: GPU passthrough to VM and app.

Daisuke · Dec 2, 2022

Say you have a 1080p 20Mbps video, you force your player to play the video at 10Mpbs or lower. It will always transcode. It looks that the devs fixed all issues in 22.12-RC.1, which explains why everything works properly for you. I'll wait for 22.12.0 release, before I upgrade.

eroji · Dec 3, 2022

So after playing around with GPU passthrough for both VM and container, now the OS is not detecting both cards as VGA controllers. They however instead show up as 3D controllers, which seems to be causing the OS not to load the nVidia drivers on boot. I have no idea why this is happening.

Code:

Dec  3 13:03:36 nas-02 kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 236
Dec  3 13:03:36 nas-02 kernel: NVRM: The NVIDIA probe routine was not called for 2 device(s).
Dec  3 13:03:36 nas-02 kernel: NVRM: This can occur when a driver such as:
NVRM: nouveau, rivafb, nvidiafb or rivatv
NVRM: was loaded and obtained ownership of the NVIDIA device(s).
Dec  3 13:03:36 nas-02 kernel: NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting

Code:

root@nas-02:~# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Code:

root@nas-02:~# lshw -C display
  *-display                
       description: 3D controller
       product: GP104GL [Tesla P4]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       logical name: /dev/fb0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress cap_list fb
       configuration: depth=32 driver=vfio-pci latency=0 mode=1280x1024 visual=truecolor xres=1280 yres=1024
       resources: iomemory:3800-37ff iomemory:3800-37ff irq:11 memory:f6000000-f6ffffff memory:38060000000-3806fffffff memory:38070000000-38071ffffff
  *-display
       description: 3D controller
       product: GP104GL [Tesla P4]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:81:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress cap_list
       configuration: driver=vfio-pci latency=0
       resources: iomemory:2000-1fff iomemory:2000-1fff irq:11 memory:f0000000-f0ffffff memory:20000000000-2000fffffff memory:20010000000-20011ffffff
  *-display
       description: VGA compatible controller
       product: ASPEED Graphics Family
       vendor: ASPEED Technology, Inc.
       physical id: 0
       bus info: pci@0000:c4:00.0
       version: 41
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi vga_controller cap_list
       configuration: driver=ast latency=0
       resources: irq:327 memory:b6000000-b6ffffff memory:b7000000-b701ffff ioport:e000(size=128)

Daisuke · Dec 13, 2022

After Bluefin 22.12.0 upgrade, all transcoding issues are fixed, no more crashes. Been force transcoding for over 5 hours without issues.

Code:

# nvidia-smi
Tue Dec 13 17:23:58 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:03:00.0 Off |                  Off |
| N/A   93C    P0    27W /  75W |    275MiB /  8192MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    155778      C   ...diaserver/Plex Transcoder      273MiB |
+-----------------------------------------------------------------------------+

Is it expected to run this hot at 93degrees while transcoding for a while? When not in use, the temperature drops to 41degrees:

Code:

# nvidia-smi
Tue Dec 13 23:01:33 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:03:00.0 Off |                  Off |
| N/A   41C    P8     6W /  75W |      2MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Sasquatch · Jan 17, 2023

my quadro

Daisuke said:

After Bluefin 22.12.0 upgrade, all transcoding issues are fixed, no more crashes. Been force transcoding for over 5 hours without issues.

Code:

# nvidia-smi
Tue Dec 13 17:23:58 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:03:00.0 Off |                  Off |
| N/A   93C    P0    27W /  75W |    275MiB /  8192MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    155778      C   ...diaserver/Plex Transcoder      273MiB |
+-----------------------------------------------------------------------------+

Is it expected to run this hot at 93degrees while transcoding for a while? When not in use, the temperature drops to 41degrees:

Code:

# nvidia-smi
Tue Dec 13 23:01:33 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:03:00.0 Off |                  Off |
| N/A   41C    P8     6W /  75W |      2MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

My quadro p400 runs at 52C while transcoding 4k 10bit to 1080p and 34C idle so your temps are sky high.
Isn't tesla P4 relying for cooling on case air flow and den't have own fan?

BTW bluefin is totaly broken for me when it comes to GPU support.
tried P400 and rtx3070 and with both i get:

Code:

systemd-modules-load[2523]: Failed to find module 'nvidia-drm'

Daisuke · Jan 17, 2023

Sasquatch said:
Isn't tesla P4 relying for cooling on case air flow and den't have own fan?

I had to add a fan that sucks the air, now I run at 38degrees idle. I don’t think these cards have a fan, they rely purely on server ventilation. That NVIDIA-drm message is unrelated, I get it also and everything works properly.

Important Announcement for the TrueNAS Community.

GPU not available to Plex app

Explorer

Contributor

Explorer

Contributor

Contributor

Explorer

Contributor

Explorer

Contributor

Contributor

Explorer

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Explorer

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "GPU not available to Plex app"

Similar threads