Immich (ix-apps) with amd gpu - 0/1 nodes are available: 1 Insufficient amd.com/gpu

se7zer

Cadet
Joined
Feb 14, 2024
Messages
3
Thank you everyone in advance for any insight or help you may be able to offer!

I'm using TrueNAS-SCALE-22.12.4.2 bare metal and I have an AMD RX 5500 GPU which have the following settings:
- Applications/Settings/Advanced Settings/'Enable GPU support' selected
- System Settings/Advanced/Isolated GPU Device(s) was empty

lspci -k
Code:
0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 14 [Radeon RX 5500/5500M / Pro 5500M] (rev c5)
        Subsystem: Sapphire Technology Limited Navi 14 [Radeon RX 5500/5500M / Pro 5500M]
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu


It shows as Capacity and Allocatable on k3s kubectl describe nodes
Code:
Capacity:
  amd.com/gpu:        1
  cpu:                12
  ephemeral-storage:  3607177856Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16284308Ki
  pods:               250
Allocatable:
  amd.com/gpu:        1
  cpu:                12
  ephemeral-storage:  3509062615565
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16284308Ki
  pods:               250


I don't have any apps besides Immich, that I'm trying to install.

When trying to install Immich v1.94.1_3.0.9 via ix-apps (truenas community) and selecting Allocate 1 amd.com/gpu GPU,
it gives me the following error on the last (5/5) deployment:
0/1 nodes are available: 1 Insufficient amd.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.

All other 4 deployments are scheduled correctly and running:
Code:
root@truenas[/home/admin]# k3s kubectl get deployment -n ix-immich
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
immich-redis             1/1     1            1           151m
immich-postgres          1/1     1            1           151m
immich                   1/1     1            1           151m
immich-machinelearning   1/1     1            1           151m
immich-microservices     0/1     1            0           151m


It seems that k3s is trying to allocate the entire GPU in some pod and other pods can´t allocate it.
Code:
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                140m (1%)   16 (133%)
  memory             270Mi (1%)  32938Mi (207%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
  amd.com/gpu        1           1
Events:              <none>


How can I make this GPU allocatable to every pod that needs it, or even every app, if ever install another app ?
If I select Allocate 0 amd.com/gpu GPU, it would seem to me that it will disable HWA, and I want to have it enabled.

Please tell me if you need more logs or information, I am happy to provide.
 

yandalorian

Cadet
Joined
Dec 2, 2023
Messages
4
I have the same issue, I am using onboard Intel gpu. Used to work for earlier version of immich
 

strauss2

Cadet
Joined
Aug 17, 2022
Messages
4
I am getting the same issue on Truecharts Jellyfin version 18.1.3. Maybe the issue is not isolated to immich?
 

yandalorian

Cadet
Joined
Dec 2, 2023
Messages
4
Here's what I tried over the weekend and worked for me. I assigned 0 GPU to immich. stopped and started the app. Then stopped it. Assigned 1 GPU and started it. It now deployed without issue.
 

strauss2

Cadet
Joined
Aug 17, 2022
Messages
4
Please tell me if you need more logs or information, I am happy to provide.

Would you be able to file a bug report on Jira? I am still seeing this issue, however it is affected truecharts apps which are not eligible for a bug report. Please link the issue here when you do! Thanks so much.
 

se7zer

Cadet
Joined
Feb 14, 2024
Messages
3
I've learned some things along the way and wanted to share with you guys.
Couldn't find a solution for AMD GPU thou.

1. GPUs are not fractionable by default in Kubernetes. You have to use a "gpushare" scheduler.
2. This post explains very well how to share gpus with a NVIDIA gpu: post. The post uses this gpushare extender scheduler from Aliyun (https://github.com/AliyunContainerService/gpushare-scheduler-extender).

So, from my understanding, this would work with a nvidia gpu, but not amd. Couldn't find a post about "gpushare/fractionable gpu" extender for AMD gpu.
 

clasyc

Cadet
Joined
Apr 3, 2024
Messages
2
I've learned some things along the way and wanted to share with you guys.
Couldn't find a solution for AMD GPU thou.

1. GPUs are not fractionable by default in Kubernetes. You have to use a "gpushare" scheduler.
2. This post explains very well how to share gpus with a NVIDIA gpu: post. The post uses this gpushare extender scheduler from Aliyun (https://github.com/AliyunContainerService/gpushare-scheduler-extender).

So, from my understanding, this would work with a nvidia gpu, but not amd. Couldn't find a post about "gpushare/fractionable gpu" extender for AMD gpu.
Yeah, I've created an issue on the TrueNAS charts repository: https://github.com/truenas/charts/issues/2336

The author mentioned that he believes AMD might not be supported, so, based on your message as well, it seems that AMD cannot be shared, unfortunately, although he might investigate further.

This is a big problem because I have an integrated GPU in my Ryzen 4650G and I can't use it properly :(
 
Top