Can multiple apps utilize the same GPU?

awil95

Dabbler
Joined
Apr 23, 2017
Messages
28
I was curious if multiple apps can utilize the same Nvidia GPU? I currently have Plex installed utilizing my Nvidia GTX 1650. I was trying to add Tdarr from truecharts so that I could transcode my entire media library. When I allocated the GPU to it as well the app would never deploy. So I changed the Tdarr app settings so it is not allocating the GPU and the app deployed instantly. I have little to no experience with k8s but from my understanding the plain nvidia-docker2 container, it is able to allocate the gpu to multiple containers. Is this not possible on TrueNAS Scale since it is using Kubernetes?
 

stavros-k

Patron
Joined
Dec 26, 2020
Messages
231
Currently there is no support for using same GPU on multiple apps (upstream issue, not iX's).
Also note for tdarr, it's quite tricky to setup (folders and networking related), I have it on my todo list to fix it up a bit in the near future :P
 

awil95

Dabbler
Joined
Apr 23, 2017
Messages
28
Currently there is no support for using same GPU on multiple apps (upstream issue, not iX's).
Also note for tdarr, it's quite tricky to setup (folders and networking related), I have it on my todo list to fix it up a bit in the near future :P
Thanks for the response! From what I have read/researched it seems to be an issue with Kubernetes. I might detach my GTX 1650 from my Plex app for awhile so I can use it for Tdarr. I am wanting to convert my entire 5TB media library to H265 to help save space.

When I was setting up the Tdarr app I had issues using host paths for its files storage. I have a mirrored 1TB SSD pool that I use for my ixapplications pool as well as individual datasets for all of my apps configurations. When I tried using host paths for the "server, config, logs and temp mounts" I kept getting permissions and copy errors in the logs (even with the apps account 568 given permissions to the respective datasets). Additionally Tdarr would not pull community plugins. I set the server and config mounts to PVC and it worked find. This way I only had the temp/transcode mount and my media mount set to host paths. I havent actually tried to transcode anything from my Tdarr app on TrueNAS yet though.

I'm getting a head start on transcoding right now utilizing my laptop with a GTX 1060 but my 1Gbps network makes file copy and move operations terribly slow. My GTX 1060 seems like it can transcode the files faster than my network can move them back and forth to the server. It would be much nicer to run Tdarr directly on the TrueNAS server so I will have to do some more tinkering. I am very please with the work TrueCharts is doing for the TrueNAS app community. I have often contemplated switching to unraid for the better app support but I hate the way their RAID/files system works. I have been bitten by the ZFS bug and cant find myself to leave. Also TrueNAS has put in tons of great work with SCALE and i'm excited to see where it goes. I've been running it since the first beta.
 

GenericEric

Cadet
Joined
Aug 28, 2022
Messages
9
I was curious if multiple apps can utilize the same Nvidia GPU? I currently have Plex installed utilizing my Nvidia GTX 1650. I was trying to add Tdarr from truecharts so that I could transcode my entire media library. When I allocated the GPU to it as well the app would never deploy. So I changed the Tdarr app settings so it is not allocating the GPU and the app deployed instantly. I have little to no experience with k8s but from my understanding the plain nvidia-docker2 container, it is able to allocate the gpu to multiple containers. Is this not possible on TrueNAS Scale since it is using Kubernetes?
Apps can share the same GPU. I've been doing it forever. When your setting up the deployment, do not allocate GPU to it as it will not allow any other pod to use it.

Set your env variables like this on each deployment to utilize GPU:
Screenshot from 2022-08-28 08-27-20.png
 

GenericEric

Cadet
Joined
Aug 28, 2022
Messages
9
Apps can share the same GPU. I've been doing it forever. When your setting up the deployment, do not allocate GPU to it as it will not allow any other pod to use it.

Set your env variables like this on each deployment to utilize GPU:
View attachment 58020
For NVIDIA_VISIBLE_DEVICES value, you need to insert your own device or simply insert `all` to utilize any GPU.
 

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
super interesting, did you manage to find a decent way to apply the keylase patch?
 

GenericEric

Cadet
Joined
Aug 28, 2022
Messages
9

GenericEric

Cadet
Joined
Aug 28, 2022
Messages
9
How do I get the value for the GPU to plug in there?
Sorry, I should have mentioned that. You can run "nvidia-smi -L" to get the ID.
 

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
Last edited:

truecharts

Guru
Joined
Aug 19, 2021
Messages
788
For NVIDIA_VISIBLE_DEVICES value, you need to insert your own device or simply insert `all` to utilize any GPU.

This is not likely to do anything of value, as this is literally set by default on all gpu consuming containers.
It also does not give access to the GPU device on a permissions level.

While there definately are a few sneaky ways to pass through multiple containers to the same GPU, that definately requires the use of hostPath mounts and/or device plugins. With these env-vars no one is going to get GPU access.

If this would've been working/true, all our apps would get magical access to all NVIDIA gpu's even without them being selected
 

ChickenSalad

Cadet
Joined
Jul 10, 2022
Messages
8
This is not likely to do anything of value, as this is literally set by default on all gpu consuming containers.
It also does not give access to the GPU device on a permissions level.

While there definately are a few sneaky ways to pass through multiple containers to the same GPU, that definately requires the use of hostPath mounts and/or device plugins. With these env-vars no one is going to get GPU access.

If this would've been working/true, all our apps would get magical access to all NVIDIA gpu's even without them being selected
This actually seems to work for me, at least for a custom docker image. I have a tensorflow container that I used to have "Allocate 1 nvidia.com/gpu GPU" set for. Switched it to allocate 0, added the ENV variables, and it's definitely picking up the GPU.
 

truecharts

Guru
Joined
Aug 19, 2021
Messages
788
This actually seems to work for me, at least for a custom docker image. I have a tensorflow container that I used to have "Allocate 1 nvidia.com/gpu GPU" set for. Switched it to allocate 0, added the ENV variables, and it's definitely picking up the GPU.

It's important to note that the "Launch Docker" button, runs everything as root by default. So a lot "normal" security precautions are thrown out of the window. ;-)
 

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
I used to allocate my nvidia GPU to my truecharts plex container. I removed that config, and added these variables to three truecharts apps (tdarr, tdarr node, plex) and now all three can access my GPU. Plex uses it for hw assisted transcoding, and tdarr uses it for gpu assisted full file health checks.

Code:
root@tnas[~]# nvidia-smi
Mon Aug 29 13:23:06 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T600         Off  | 00000000:65:00.0 Off |                  N/A |
| 55%   71C    P0    N/A /  41W |   2041MiB /  3909MiB |     12%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    104034      C   ...diaserver/Plex Transcoder       98MiB |
|    0   N/A  N/A    113339      C   ...diaserver/Plex Transcoder       98MiB |
|    0   N/A  N/A    116827      C   ...diaserver/Plex Transcoder       95MiB |
|    0   N/A  N/A    118993      C   ...diaserver/Plex Transcoder       95MiB |
|    0   N/A  N/A    126108      C   tdarr-ffmpeg                     1105MiB |
|    0   N/A  N/A    127887      C   tdarr-ffmpeg                      273MiB |
|    0   N/A  N/A    128250      C   tdarr-ffmpeg                      271MiB |
+-----------------------------------------------------------------------------+
 

GenericEric

Cadet
Joined
Aug 28, 2022
Messages
9
I used to allocate my nvidia GPU to my truecharts plex container. I removed that config, and added these variables to three truecharts apps (tdarr, tdarr node, plex) and now all three can access my GPU. Plex uses it for hw assisted transcoding, and tdarr uses it for gpu assisted full file health checks.

Code:
root@tnas[~]# nvidia-smi
Mon Aug 29 13:23:06 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T600         Off  | 00000000:65:00.0 Off |                  N/A |
| 55%   71C    P0    N/A /  41W |   2041MiB /  3909MiB |     12%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    104034      C   ...diaserver/Plex Transcoder       98MiB |
|    0   N/A  N/A    113339      C   ...diaserver/Plex Transcoder       98MiB |
|    0   N/A  N/A    116827      C   ...diaserver/Plex Transcoder       95MiB |
|    0   N/A  N/A    118993      C   ...diaserver/Plex Transcoder       95MiB |
|    0   N/A  N/A    126108      C   tdarr-ffmpeg                     1105MiB |
|    0   N/A  N/A    127887      C   tdarr-ffmpeg                      273MiB |
|    0   N/A  N/A    128250      C   tdarr-ffmpeg                      271MiB |
+-----------------------------------------------------------------------------+
I think truecharts was pointing out that
Code:
NVIDIA_VISIBLE_DEVICES
is already set in the containers that can utilize the GPU so it's unnecessary, which I confirmed. I'm not sure however if they run on Nvidia runtime by default though. They don't seem to since my deployments don't work without the variables.
 

Ppriorfl

Dabbler
Joined
May 22, 2021
Messages
46
Sorry, I should have mentioned that. You can run "nvidia-smi -L" to get the ID.

Well crap, I still get “No devices found” as response to that. NVidia 3080 card, I guess drivers still don’t support nvidia 3080s…. Sigh.
 

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
I think truecharts was pointing out that
Code:
NVIDIA_VISIBLE_DEVICES
is already set in the containers that can utilize the GPU so it's unnecessary, which I confirmed. I'm not sure however if they run on Nvidia runtime by default though. They don't seem to since my deployments don't work without the variables.
There has been a lot of folks looking for this functionality, I was just adding my voice and a screen scrape showing that this is indeed real. This shared approach combined with the keylase driver patch has significantly improved the advantage gained by adding the T600 card.

It's good to hear that the NVIDIA_VISIBLE_DEVICES variable is not required, I will remove that from my config as well. ... simpler is better!
 

GenericEric

Cadet
Joined
Aug 28, 2022
Messages
9
Well crap, I still get “No devices found” as response to that. NVidia 3080 card, I guess drivers still don’t support nvidia 3080s…. Sigh.
The latest update came with the 470 driver which should support your 3080. Are you on version 22.02.3? Does it show up when you run "lspci | grep VGA"?
 

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
Well crap, I still get “No devices found” as response to that. NVidia 3080 card, I guess drivers still don’t support nvidia 3080s…. Sigh.
Driver 470.103.01 does indeed support the standard 3080 10GB cards
Maybe you have a 12GB version? That version seems to require the 510.39.01 driver version at minimum. I may be wrong, but I found some info here
 

truecharts

Guru
Joined
Aug 19, 2021
Messages
788
I think truecharts was pointing out that
Code:
NVIDIA_VISIBLE_DEVICES
is already set in the containers that can utilize the GPU so it's unnecessary, which I confirmed. I'm not sure however if they run on Nvidia runtime by default though. They don't seem to since my deployments don't work without the variables.

Lets make some guesses here... maybe the nvidia device plugin, reads the env-vars?
 
Top