Plex NVIDIA Gpu Passthrough SCALE 21.02

flatline69 · Jun 18, 2021

waqarahmed said:
@flatline69 when a GPU is added for passthrough, that will result in host not seeing it which is desired as otherwise GPU passthrough wouldn't be possible. About cuda errors, i am not sure how you are accessing it, but nothing should change and it should work just fine. If you are able to reproduce this with a clean slate on latest nightlies, please let me know and i can debug/fix as required. ( Please don't alter system state via CLI as that can potentially change system state and how middleware/system expects it to be which can result in other various issues, so during these tests, kindly avoid that so we know for sure that something is wrong with this workflow )

When I did the GPU passthrough, the host could not see the GPU (via nvidia-smi) and neither could the container (in fact, it refused to start because of this.) When I undid the change via CLI then the host/container could see it but I was back to CUDA initialization errors that I had pre-apt-package install (which originally fixed the issue for me.)

flatline69 · Jun 18, 2021

@waqarahmed I just want to be sure when I reinstall all of this. GPU passthrough for docker containers is a requirement for Emby (or Plex) to leverage the GPU? As noted, when I installed the packages via apt above, Emby could see the GPU as a viable transcoder until I excluded it via midclt which resulted in the host not seeing the GPU (as expected) but it also affected the container as well (it couldn't see it either.)

waqarahmed · Jun 18, 2021

@flatline69 GPU passthrough is for VMs and not containers. When a GPU is selected for passthrough, everything on host will not be able to see it as it's meant to be consumed by a VM now so nothing on the host can consume/look it up. If you just want to use it with a container, you can select it in the app installation/creation wizard that a GPU is required and it will expose the GPU to the container/pod in question then.

BetYourBottom · Jun 19, 2021

I'm attempting to test GPU passthrough in a similar manner but with Jellyfin instead.

Whenever I set it to allocate 1 GPU to the container it fails to launch. Inside the Application Events window it lists "0/1 nodes are available: 1 Insufficient nvidia.com/gpu"

I can see the gpu just fine via the TrueNAS shell with nvidia-smi. I read that it might be because I'm testing with an old GTX780 but I'm unsure how to apply the fix recommended in that thread.

waqarahmed · Jun 19, 2021

@BetYourBottom can you please confirm if the GPU is being consumed by another installed app ? If it is, then it can't be used with multiple apps as one has already claimed it.

BetYourBottom · Jun 19, 2021

waqarahmed said:
@BetYourBottom can you please confirm if the GPU is being consumed by another installed app ? If it is, then it can't be used with multiple apps as one has already claimed it.

Can't be. As I'm only testing things at the moment, Jellyfin is the only app/docker image installed.

waqarahmed · Jun 19, 2021

Interesting, i wonder if kubernetes is then recognizing the GPU in question. Can you perhaps share the output of "k3s kubectl describe nodes" please ?

BetYourBottom · Jun 19, 2021

waqarahmed said:
Interesting, i wonder if kubernetes is then recognizing the GPU in question. Can you perhaps share the output of "k3s kubectl describe nodes" please ?

Code:

truenas# k3s kubectl describe nodes
Name:               ix-truenas
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ix-truenas
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node-role.kubernetes.io/master=true
                    openebs.io/nodename=ix-truenas
Annotations:        csi.volume.kubernetes.io/nodeid: {"zfs.csi.openebs.io":"ix-truenas"}
                    k3s.io/node-args:
                      ["server","--flannel-backend","none","--disable","traefik,metrics-server,local-storage","--disable-kube-proxy","--disable-network-policy",...
                    k3s.io/node-config-hash: TI6KLJJHDZINTAO3MG3HINQINYAUKVR5BZDGLCEIO4PBMDAJNU7A====
                    k3s.io/node-env: {"K3S_DATA_DIR":"/mnt/test/ix-applications/k3s/data/11347498feda7a0048cf376e3f4c1626523dbb94ae900b8256db941e2113a653"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 18 Jun 2021 23:28:02 -0700
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ix-truenas
  AcquireTime:     <unset>
  RenewTime:       Sat, 19 Jun 2021 12:01:51 -0700
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Sat, 19 Jun 2021 11:58:29 -0700   Fri, 18 Jun 2021 23:28:00 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Sat, 19 Jun 2021 11:58:29 -0700   Fri, 18 Jun 2021 23:28:00 -0700   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Sat, 19 Jun 2021 11:58:29 -0700   Fri, 18 Jun 2021 23:28:00 -0700   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Sat, 19 Jun 2021 11:58:29 -0700   Sat, 19 Jun 2021 11:58:19 -0700   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  xxx.xxx.xxx.xxx
  Hostname:    ix-truenas
Capacity:
  cpu:                4
  ephemeral-storage:  919796608Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16319976Ki
  nvidia.com/gpu:     1
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  894778139561
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16319976Ki
  nvidia.com/gpu:     0
  pods:               110
System Info:
  Machine ID:                 12
  System UUID:                42
  Boot ID:                    76
  Kernel Version:             5.10.18+truenas
  OS Image:                   Debian GNU/Linux bullseye/sid
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.5
  Kubelet Version:            v1.20.4-k3s1
  Kube-Proxy Version:         v1.20.4-k3s1
PodCIDR:                      172.16.0.0/16
PodCIDRs:                     172.16.0.0/16
Non-terminated Pods:          (5 in total)
  Namespace                   Name                                    CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                    ------------  ----------  ---------------  -------------  ---
  kube-system                 openebs-zfs-node-nmswj                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         12h
  ix-jellyfin                 jellyfin-ix-chart-8484b87dd4-gw26r      0 (0%)        0 (0%)      0 (0%)           0 (0%)         11h
  kube-system                 coredns-854c77959c-scxtt                100m (2%)     0 (0%)      70Mi (0%)        170Mi (1%)     12h
  kube-system                 nvidia-device-plugin-daemonset-pjnfk    0 (0%)        0 (0%)      0 (0%)           0 (0%)         12h
  kube-system                 openebs-zfs-controller-0                0 (0%)        0 (0%)      0 (0%)           0 (0%)         12h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  0 (0%)
  memory             70Mi (0%)  170Mi (1%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
  nvidia.com/gpu     0          0
Events:
  Type     Reason                   Age    From     Message
  ----     ------                   ----   ----     -------
  Normal   Starting                 3m42s  kubelet  Starting kubelet.
  Normal   NodeAllocatableEnforced  3m42s  kubelet  Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  3m42s  kubelet  Node ix-truenas status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    3m42s  kubelet  Node ix-truenas status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     3m42s  kubelet  Node ix-truenas status is now: NodeHasSufficientPID
  Warning  Rebooted                 3m42s  kubelet  Node ix-truenas has been rebooted, boot id: c7d6fe6b-83a2-4ea7-bdf0-52c686eaea51
  Normal   NodeNotReady             3m42s  kubelet  Node ix-truenas status is now: NodeNotReady
  Normal   NodeReady                3m32s  kubelet  Node ix-truenas status is now: NodeReady

flatline69 · Jun 19, 2021

For what it's worth, I finally got this going on my end here using Emby and it was by staying within the lines and not circumventing the UI as previously I had to install additional packages via apt to get it working.

Almost all of my containers I created via "Launch Docker Image" from the Apps page and filled in the details. Emby sees the GPU for transcoding and nvidia-smi shows the Emby ffmpeg process when transcoding is going on.

flatline69 · Jun 20, 2021

BetYourBottom said:

I don't know if this will help but I noticed that you have 1 GPU in the system but for whatever reason it's not able to be allocated. I'm sorry I didn't notice in this thread how you spun up jellyfin but was it via launch docker image and you filled in the values or a official/truechart scale app? Did you exclude the GPU through the UI or otherwise as per the dev notes?

Here's mine, you can see capacity=1 and allocatable=1:

Code:

Capacity:
  cpu:                12
  ephemeral-storage:  945730304Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65585196Ki
  nvidia.com/gpu:     1
  pods:               110
Allocatable:
  cpu:                12
  ephemeral-storage:  920006439010
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65585196Ki
  nvidia.com/gpu:     1
  pods:               110

Maybe this will help you, here's my config, adjust accordingly. I used "Launch Docker Image" from the Apps tab (see below.) I'm currently running host networking because of the port 9000+/kube requirement, etc but I did have it working in an isolate container as well before I used my current process of spinning up containers via the UI versus docker-compose:

Code:

AppName: emby
Image repo: linuxserver/emby
Update Strat: create new/kill old
Image Tag: latest
Image Pull Policy: only if not present on host
Container CMD: <empty>
Conatainer Args: <empty>
Envvars:
- TZ=America/Edmonton
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
- PGID=911
- PUID=911
Host Network: enabled/checked*
External Interfaces: <empty>
DNS Policy: default policy
Nameservers: <empty>
Searches: <empty>
Port Forwarding: <empty>
Host Paths:
- /mnt/ZFSPOOL/docker-data/emby:/config
- etc
Volumes: <empty>
Privileged: disabled/unchecked
Resource Reserveration: Allocate 0 nvidia.com/gpu GPU (although I have a "1 nvidia.com/gpu GPU" as well.)

waqarahmed · Jun 21, 2021

@BetYourBottom kubernetes is not recognizing your GPU, it could be for various reasons. Can you please confirm your GPU model so that i can see if the nvidia device plugin supports it ( older models of nvidia gpu are not supported ) and a debug of your system would also be nice to make sure everything is in place from our end. Can you please email the debug at waqar at the rate of ixsystems.com ? Thank you

waqarahmed · Jun 21, 2021

@flatline69 glad it worked, let me know please if you run into issues moving on. Thanks!

BetYourBottom · Jun 21, 2021

waqarahmed said:
@BetYourBottom kubernetes is not recognizing your GPU, it could be for various reasons. Can you please confirm your GPU model so that i can see if the nvidia device plugin supports it ( older models of nvidia gpu are not supported ) and a debug of your system would also be nice to make sure everything is in place from our end. Can you please email the debug at waqar at the rate of ixsystems.com ? Thank you

My GPU is an Nvidia GTX 780 which I understand is a rather old model.
I did see some information that it might be caused by nvidia's software returning the GPU healthcheck as bad by default for old GPUs but that it should be able to be bypassed with an environment variable.

guyp2k · Jul 11, 2021

Kris Moore said:
Ahh, since its a container that changes things. Try this:

# apt update
# apt install nvidia-cuda-dev nvidia-cuda-toolkit

Those packages are 6GB, so they will take a while... After done, reboot.

Once done, click "edit" on your plex container. At the bottom do you see some options for enabling nvidia GPU now?

Dumb question, apt-install nvidia-cuda-dev..., I assume I run that on the SCALE host and not the container? If I type nvidia-smi on the host I see the GPU, on the container I do not or any nvidia in /dev.

Plus the following:

kube-system nvidia-device-plugin-daemonset-2tpw8 0/1 CrashLoopBackOff

Thanks

ornias · Jul 12, 2021

guyp2k said:
Dumb question, apt-install nvidia-cuda-dev..., I assume I run that on the SCALE host and not the container? If I type nvidia-smi on the host I see the GPU, on the container I do not or any nvidia in /dev.

Plus the following:

kube-system nvidia-device-plugin-daemonset-2tpw8 0/1 CrashLoopBackOff

Thanks

Please don't necro threads.
This is a thread about 21.02, that already got necro'ed twice. please don't.

BetYourBottom · Jul 12, 2021

ornias said:
Please don't necro threads.
This is a thread about 21.02, that already got necro'ed twice. please don't.

Guess I should take that as my question is never going to be answered.

ornias · Jul 13, 2021

BetYourBottom said:
Guess I should take that as my question is never going to be answered.

Because you should write a bugreport instead ;-)

BetYourBottom · Jul 13, 2021

ornias said:
Because you should write a bugreport instead ;-)

I don't know if it's a bug or just a mistake on my end.

Also condescending replies like that are worthless ;-)

ornias · Jul 13, 2021

BetYourBottom said:
I don't know if it's a bug or just a mistake on my end.

Also condescending replies like that are worthless ;-)

I wasn't being condecending. If I wanted to be that would be a lot more clear.

iX just rather wants more than less bug reports.

BetYourBottom · Jul 13, 2021

ornias said:
I wasn't being condecending.

Then you need to figure out what signals you are sending off by going "Guess you are doing it wrong then *winkyface*"

iX probably wants bug reports that are actually bugs. Not, what I expect it to be, a lack of support for 700 series GPUs with Nvidia's software that allows GPU passthrough.

The only thing they might, MIGHT, do is add a checkbox that allows me to use the possible fix I previously mentioned, to bypass whatever disallows it from being allocated.

Important Announcement for the TrueNAS Community.

Plex NVIDIA Gpu Passthrough SCALE 21.02

Dabbler

Dabbler

iXsystems

Contributor

iXsystems

Contributor

iXsystems

Contributor

Dabbler

Dabbler

iXsystems

iXsystems

Contributor

Dabbler

Wizard

Contributor

Wizard

Contributor

Wizard

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Plex NVIDIA Gpu Passthrough SCALE 21.02"

Similar threads