Can multiple apps utilize the same GPU?

awil95 · Mar 11, 2022

I was curious if multiple apps can utilize the same Nvidia GPU? I currently have Plex installed utilizing my Nvidia GTX 1650. I was trying to add Tdarr from truecharts so that I could transcode my entire media library. When I allocated the GPU to it as well the app would never deploy. So I changed the Tdarr app settings so it is not allocating the GPU and the app deployed instantly. I have little to no experience with k8s but from my understanding the plain nvidia-docker2 container, it is able to allocate the gpu to multiple containers. Is this not possible on TrueNAS Scale since it is using Kubernetes?

stavros-k · Mar 11, 2022

Currently there is no support for using same GPU on multiple apps (upstream issue, not iX's).
Also note for tdarr, it's quite tricky to setup (folders and networking related), I have it on my todo list to fix it up a bit in the near future :P

awil95 · Mar 12, 2022

StavrosMadK said:
Currently there is no support for using same GPU on multiple apps (upstream issue, not iX's).
Also note for tdarr, it's quite tricky to setup (folders and networking related), I have it on my todo list to fix it up a bit in the near future :P

Thanks for the response! From what I have read/researched it seems to be an issue with Kubernetes. I might detach my GTX 1650 from my Plex app for awhile so I can use it for Tdarr. I am wanting to convert my entire 5TB media library to H265 to help save space.

When I was setting up the Tdarr app I had issues using host paths for its files storage. I have a mirrored 1TB SSD pool that I use for my ixapplications pool as well as individual datasets for all of my apps configurations. When I tried using host paths for the "server, config, logs and temp mounts" I kept getting permissions and copy errors in the logs (even with the apps account 568 given permissions to the respective datasets). Additionally Tdarr would not pull community plugins. I set the server and config mounts to PVC and it worked find. This way I only had the temp/transcode mount and my media mount set to host paths. I havent actually tried to transcode anything from my Tdarr app on TrueNAS yet though.

I'm getting a head start on transcoding right now utilizing my laptop with a GTX 1060 but my 1Gbps network makes file copy and move operations terribly slow. My GTX 1060 seems like it can transcode the files faster than my network can move them back and forth to the server. It would be much nicer to run Tdarr directly on the TrueNAS server so I will have to do some more tinkering. I am very please with the work TrueCharts is doing for the TrueNAS app community. I have often contemplated switching to unraid for the better app support but I hate the way their RAID/files system works. I have been bitten by the ZFS bug and cant find myself to leave. Also TrueNAS has put in tons of great work with SCALE and i'm excited to see where it goes. I've been running it since the first beta.

GenericEric · Aug 28, 2022

awil95 said:
I was curious if multiple apps can utilize the same Nvidia GPU? I currently have Plex installed utilizing my Nvidia GTX 1650. I was trying to add Tdarr from truecharts so that I could transcode my entire media library. When I allocated the GPU to it as well the app would never deploy. So I changed the Tdarr app settings so it is not allocating the GPU and the app deployed instantly. I have little to no experience with k8s but from my understanding the plain nvidia-docker2 container, it is able to allocate the gpu to multiple containers. Is this not possible on TrueNAS Scale since it is using Kubernetes?

Apps can share the same GPU. I've been doing it forever. When your setting up the deployment, do not allocate GPU to it as it will not allow any other pod to use it.

Set your env variables like this on each deployment to utilize GPU:

GenericEric · Aug 28, 2022

GenericEric said:
Apps can share the same GPU. I've been doing it forever. When your setting up the deployment, do not allocate GPU to it as it will not allow any other pod to use it.

Set your env variables like this on each deployment to utilize GPU:
View attachment 58020

For NVIDIA_VISIBLE_DEVICES value, you need to insert your own device or simply insert `all` to utilize any GPU.

mervincm · Aug 28, 2022

super interesting, did you manage to find a decent way to apply the keylase patch?

Ppriorfl · Aug 28, 2022

GenericEric said:
Apps can share the same GPU. I've been doing it forever. When your setting up the deployment, do not allocate GPU to it as it will not allow any other pod to use it.

Set your env variables like this on each deployment to utilize GPU:
View attachment 58020

How do I get the value for the GPU to plug in there?

GenericEric · Aug 28, 2022

mervincm said:
super interesting, did you manage to find a decent way to apply the keylase patch?

It's interesting you bring that up! https://github.com/GenericEric/truenas-scale-projects/tree/main/keylase-nvidia-patch

GenericEric · Aug 28, 2022

Ppriorfl said:
How do I get the value for the GPU to plug in there?

Sorry, I should have mentioned that. You can run "nvidia-smi -L" to get the ID.

mervincm · Aug 28, 2022

GenericEric said:
It's interesting you bring that up! https://github.com/GenericEric/truenas-scale-projects/tree/main/keylase-nvidia-patch

I tried that and I don't think it was making a difference with my T600. nvdec will go past 3, but nvenc stops at 3. Maybe I will take another whack at it. Thanks!!

EDIT never mind, I had edited the sh within windows and had invalid EOL codes .. script wasn't executing ... works fine now !!!

truecharts · Aug 29, 2022

GenericEric said:
For NVIDIA_VISIBLE_DEVICES value, you need to insert your own device or simply insert `all` to utilize any GPU.

This is not likely to do anything of value, as this is literally set by default on all gpu consuming containers.
It also does not give access to the GPU device on a permissions level.

While there definately are a few sneaky ways to pass through multiple containers to the same GPU, that definately requires the use of hostPath mounts and/or device plugins. With these env-vars no one is going to get GPU access.

If this would've been working/true, all our apps would get magical access to all NVIDIA gpu's even without them being selected

ChickenSalad · Aug 29, 2022

truecharts said:
This is not likely to do anything of value, as this is literally set by default on all gpu consuming containers.
It also does not give access to the GPU device on a permissions level.

While there definately are a few sneaky ways to pass through multiple containers to the same GPU, that definately requires the use of hostPath mounts and/or device plugins. With these env-vars no one is going to get GPU access.

If this would've been working/true, all our apps would get magical access to all NVIDIA gpu's even without them being selected

This actually seems to work for me, at least for a custom docker image. I have a tensorflow container that I used to have "Allocate 1 nvidia.com/gpu GPU" set for. Switched it to allocate 0, added the ENV variables, and it's definitely picking up the GPU.

truecharts · Aug 29, 2022

ChickenSalad said:
This actually seems to work for me, at least for a custom docker image. I have a tensorflow container that I used to have "Allocate 1 nvidia.com/gpu GPU" set for. Switched it to allocate 0, added the ENV variables, and it's definitely picking up the GPU.

It's important to note that the "Launch Docker" button, runs everything as root by default. So a lot "normal" security precautions are thrown out of the window. ;-)

mervincm · Aug 29, 2022

I used to allocate my nvidia GPU to my truecharts plex container. I removed that config, and added these variables to three truecharts apps (tdarr, tdarr node, plex) and now all three can access my GPU. Plex uses it for hw assisted transcoding, and tdarr uses it for gpu assisted full file health checks.

Code:

root@tnas[~]# nvidia-smi
Mon Aug 29 13:23:06 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T600         Off  | 00000000:65:00.0 Off |                  N/A |
| 55%   71C    P0    N/A /  41W |   2041MiB /  3909MiB |     12%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    104034      C   ...diaserver/Plex Transcoder       98MiB |
|    0   N/A  N/A    113339      C   ...diaserver/Plex Transcoder       98MiB |
|    0   N/A  N/A    116827      C   ...diaserver/Plex Transcoder       95MiB |
|    0   N/A  N/A    118993      C   ...diaserver/Plex Transcoder       95MiB |
|    0   N/A  N/A    126108      C   tdarr-ffmpeg                     1105MiB |
|    0   N/A  N/A    127887      C   tdarr-ffmpeg                      273MiB |
|    0   N/A  N/A    128250      C   tdarr-ffmpeg                      271MiB |
+-----------------------------------------------------------------------------+

GenericEric · Aug 29, 2022

mervincm said:

I used to allocate my nvidia GPU to my truecharts plex container. I removed that config, and added these variables to three truecharts apps (tdarr, tdarr node, plex) and now all three can access my GPU. Plex uses it for hw assisted transcoding, and tdarr uses it for gpu assisted full file health checks.

Code:

root@tnas[~]# nvidia-smi
Mon Aug 29 13:23:06 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T600         Off  | 00000000:65:00.0 Off |                  N/A |
| 55%   71C    P0    N/A /  41W |   2041MiB /  3909MiB |     12%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    104034      C   ...diaserver/Plex Transcoder       98MiB |
|    0   N/A  N/A    113339      C   ...diaserver/Plex Transcoder       98MiB |
|    0   N/A  N/A    116827      C   ...diaserver/Plex Transcoder       95MiB |
|    0   N/A  N/A    118993      C   ...diaserver/Plex Transcoder       95MiB |
|    0   N/A  N/A    126108      C   tdarr-ffmpeg                     1105MiB |
|    0   N/A  N/A    127887      C   tdarr-ffmpeg                      273MiB |
|    0   N/A  N/A    128250      C   tdarr-ffmpeg                      271MiB |
+-----------------------------------------------------------------------------+

I think truecharts was pointing out that

Code:

NVIDIA_VISIBLE_DEVICES

is already set in the containers that can utilize the GPU so it's unnecessary, which I confirmed. I'm not sure however if they run on Nvidia runtime by default though. They don't seem to since my deployments don't work without the variables.

Ppriorfl · Aug 29, 2022

GenericEric said:
Sorry, I should have mentioned that. You can run "nvidia-smi -L" to get the ID.

Well crap, I still get “No devices found” as response to that. NVidia 3080 card, I guess drivers still don’t support nvidia 3080s…. Sigh.

mervincm · Aug 29, 2022

GenericEric said:
I think truecharts was pointing out that
Code:
NVIDIA_VISIBLE_DEVICES
is already set in the containers that can utilize the GPU so it's unnecessary, which I confirmed. I'm not sure however if they run on Nvidia runtime by default though. They don't seem to since my deployments don't work without the variables.

There has been a lot of folks looking for this functionality, I was just adding my voice and a screen scrape showing that this is indeed real. This shared approach combined with the keylase driver patch has significantly improved the advantage gained by adding the T600 card.

It's good to hear that the NVIDIA_VISIBLE_DEVICES variable is not required, I will remove that from my config as well. ... simpler is better!

GenericEric · Aug 29, 2022

Ppriorfl said:
Well crap, I still get “No devices found” as response to that. NVidia 3080 card, I guess drivers still don’t support nvidia 3080s…. Sigh.

The latest update came with the 470 driver which should support your 3080. Are you on version 22.02.3? Does it show up when you run "lspci | grep VGA"?

mervincm · Aug 29, 2022

Ppriorfl said:
Well crap, I still get “No devices found” as response to that. NVidia 3080 card, I guess drivers still don’t support nvidia 3080s…. Sigh.

Linux x64 (AMD64/EM64T) Display Driver | 470.103.01 | Linux 64-bit | NVIDIA

Download the English (US) Linux x64 (AMD64/EM64T) Display Driver for Linux 64-bit systems. Released 2022.1.31

www.nvidia.com

Driver 470.103.01 does indeed support the standard 3080 10GB cards
Maybe you have a 12GB version? That version seems to require the 510.39.01 driver version at minimum. I may be wrong, but I found some info here

NVIDIA releases a 12GB GeForce RTX 3080

For those of you with money to burn who want a new GPU, perhaps the latest from NVIDIA will catch your eye? They've introduced a new model of the GeForce RTX 3080.

www.gamingonlinux.com

truecharts · Aug 30, 2022

GenericEric said:
I think truecharts was pointing out that
Code:
NVIDIA_VISIBLE_DEVICES
is already set in the containers that can utilize the GPU so it's unnecessary, which I confirmed. I'm not sure however if they run on Nvidia runtime by default though. They don't seem to since my deployments don't work without the variables.

Lets make some guesses here... maybe the nvidia device plugin, reads the env-vars?

Important Announcement for the TrueNAS Community.

Can multiple apps utilize the same GPU?

Dabbler

Patron

Dabbler

Cadet

Cadet

Contributor

Dabbler

Cadet

Cadet

Contributor

Guru

Cadet

Guru

Contributor

Cadet

Dabbler

Contributor

Cadet

Contributor

Guru

Similar threads