Immich (ix-apps) with amd gpu - 0/1 nodes are available: 1 Insufficient amd.com/gpu

se7zer · Feb 14, 2024

Thank you everyone in advance for any insight or help you may be able to offer!

I'm using TrueNAS-SCALE-22.12.4.2 bare metal and I have an AMD RX 5500 GPU which have the following settings:
- Applications/Settings/Advanced Settings/'Enable GPU support' selected
- System Settings/Advanced/Isolated GPU Device(s) was empty

lspci -k

Code:

0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 14 [Radeon RX 5500/5500M / Pro 5500M] (rev c5)
        Subsystem: Sapphire Technology Limited Navi 14 [Radeon RX 5500/5500M / Pro 5500M]
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

It shows as Capacity and Allocatable on k3s kubectl describe nodes

Code:

Capacity:
  amd.com/gpu:        1
  cpu:                12
  ephemeral-storage:  3607177856Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16284308Ki
  pods:               250
Allocatable:
  amd.com/gpu:        1
  cpu:                12
  ephemeral-storage:  3509062615565
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16284308Ki
  pods:               250

I don't have any apps besides Immich, that I'm trying to install.

When trying to install Immich v1.94.1_3.0.9 via ix-apps (truenas community) and selecting Allocate 1 amd.com/gpu GPU,
it gives me the following error on the last (5/5) deployment:

0/1 nodes are available: 1 Insufficient amd.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.

All other 4 deployments are scheduled correctly and running:

Code:

root@truenas[/home/admin]# k3s kubectl get deployment -n ix-immich
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
immich-redis             1/1     1            1           151m
immich-postgres          1/1     1            1           151m
immich                   1/1     1            1           151m
immich-machinelearning   1/1     1            1           151m
immich-microservices     0/1     1            0           151m

It seems that k3s is trying to allocate the entire GPU in some pod and other pods can´t allocate it.

Code:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                140m (1%)   16 (133%)
  memory             270Mi (1%)  32938Mi (207%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
  amd.com/gpu        1           1
Events:              <none>

How can I make this GPU allocatable to every pod that needs it, or even every app, if ever install another app ?
If I select Allocate 0 amd.com/gpu GPU, it would seem to me that it will disable HWA, and I want to have it enabled.

Please tell me if you need more logs or information, I am happy to provide.

yandalorian · Feb 24, 2024

I have the same issue, I am using onboard Intel gpu. Used to work for earlier version of immich

strauss2 · Feb 25, 2024

I am getting the same issue on Truecharts Jellyfin version 18.1.3. Maybe the issue is not isolated to immich?

x-cimo · Feb 25, 2024

Same issue here, broke in the last apps update

yandalorian · Feb 25, 2024

Here's what I tried over the weekend and worked for me. I assigned 0 GPU to immich. stopped and started the app. Then stopped it. Assigned 1 GPU and started it. It now deployed without issue.

strauss2 · Mar 5, 2024

se7zer said:
Please tell me if you need more logs or information, I am happy to provide.

Would you be able to file a bug report on Jira? I am still seeing this issue, however it is affected truecharts apps which are not eligible for a bug report. Please link the issue here when you do! Thanks so much.

clasyc · Apr 3, 2024

Same issue on TrueNAS-SCALE-23.10.2

Any solutions?

se7zer · Apr 3, 2024

I've learned some things along the way and wanted to share with you guys.
Couldn't find a solution for AMD GPU thou.

1. GPUs are not fractionable by default in Kubernetes. You have to use a "gpushare" scheduler.
2. This post explains very well how to share gpus with a NVIDIA gpu: post. The post uses this gpushare extender scheduler from Aliyun (https://github.com/AliyunContainerService/gpushare-scheduler-extender).

So, from my understanding, this would work with a nvidia gpu, but not amd. Couldn't find a post about "gpushare/fractionable gpu" extender for AMD gpu.

clasyc · Apr 4, 2024

se7zer said:
I've learned some things along the way and wanted to share with you guys.
Couldn't find a solution for AMD GPU thou.

1. GPUs are not fractionable by default in Kubernetes. You have to use a "gpushare" scheduler.
2. This post explains very well how to share gpus with a NVIDIA gpu: post. The post uses this gpushare extender scheduler from Aliyun (https://github.com/AliyunContainerService/gpushare-scheduler-extender).

So, from my understanding, this would work with a nvidia gpu, but not amd. Couldn't find a post about "gpushare/fractionable gpu" extender for AMD gpu.

Yeah, I've created an issue on the TrueNAS charts repository: https://github.com/truenas/charts/issues/2336

The author mentioned that he believes AMD might not be supported, so, based on your message as well, it seems that AMD cannot be shared, unfortunately, although he might investigate further.

This is a big problem because I have an integrated GPU in my Ryzen 4650G and I can't use it properly :(

se7zer · Apr 4, 2024

clasyc said:
This is a big problem because I have an integrated GPU in my Ryzen 4650G and I can't use it properly :(

Couldn't agree more.

Important Announcement for the TrueNAS Community.

Immich (ix-apps) with amd gpu - 0/1 nodes are available: 1 Insufficient amd.com/gpu

se7zer

Cadet

yandalorian

Cadet

strauss2

Cadet

x-cimo

Dabbler

yandalorian

Cadet

strauss2

Cadet

clasyc

Cadet

se7zer

Cadet

clasyc

Cadet

se7zer

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Immich (ix-apps) with amd gpu - 0/1 nodes are available: 1 Insufficient amd.com/gpu

Cadet

Cadet

Cadet

Dabbler

Cadet

Cadet

Cadet

Cadet

Cadet

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Immich (ix-apps) with amd gpu - 0/1 nodes are available: 1 Insufficient amd.com/gpu"

Similar threads