allan.tatter
Cadet
- Joined
- Nov 10, 2023
- Messages
- 3
I have a fresh install of TrueNAS SCALE on my old PC. I have the following hardware:
However when I try to deploy any custom app (any public image from Docker Hub) the state stays at "Deploying" and never gets to "Running". Under the custom app Details section I have a recurring following Related Kubernetes Event:
With kubectl I see the following:
Successfully deployed nginx-proxy-manager app from Discover apps section. Probably related to not having `amd.com/gpu` in the Limits and Requests section.
Additional debugging info:
Do I need to do anything special about the external GPU when doing a fresh install of TrueNAS SCALE? Maybe installing drivers or anything like that? I have another machine with different hardware and with only integrated graphic and no issues there.
- CPU: Intel Core i5-8400 (with integrated graphics)
- Mother board: GIGABYTE H370M DS3H
- Graphics card: AMD Radeon HD 7970/8970 OEM / R9 280X
However when I try to deploy any custom app (any public image from Docker Hub) the state stays at "Deploying" and never gets to "Running". Under the custom app Details section I have a recurring following Related Kubernetes Event:
Code:
Allocate failed due to no healthy devices present; cannot allocate unhealthy devices amd.com/gpu, which is unexpected
With kubectl I see the following:
Code:
$ kubectl -n ix-whoami4 get pods NAME READY STATUS RESTARTS AGE whoami4-ix-chart-749c8ff779-mbrqp 0/1 UnexpectedAdmissionError 0 91s whoami4-ix-chart-749c8ff779-hf824 0/1 UnexpectedAdmissionError 0 90s whoami4-ix-chart-749c8ff779-htw88 0/1 UnexpectedAdmissionError 0 89s whoami4-ix-chart-749c8ff779-lghjc 0/1 UnexpectedAdmissionError 0 89s whoami4-ix-chart-749c8ff779-9zmz7 0/1 UnexpectedAdmissionError 0 87s whoami4-ix-chart-749c8ff779-p6g2v 0/1 UnexpectedAdmissionError 0 85s whoami4-ix-chart-749c8ff779-jnh9m 0/1 Pending 0 84s`````$ kubectl -n ix-whoami4 describe pods Name: whoami4-ix-chart-749c8ff779-mbrqp Namespace: ix-whoami4 Priority: 0 Node: ix-truenas/ Start Time: Sun, 12 Nov 2023 19:41:04 +0200 Labels: app.kubernetes.io/instance=whoami4 app.kubernetes.io/name=ix-chart pod-template-hash=749c8ff779 Annotations: rollme: dcArL Status: Failed Reason: UnexpectedAdmissionError Message: Pod was rejected: Allocate failed due to no healthy devices present; cannot allocate unhealthy devices amd.com/gpu, which is unexpected IP: IPs: <none> Controlled By: ReplicaSet/whoami4-ix-chart-749c8ff779 Containers: ix-chart: Image: traefik/whoami:latest Port: 80/TCP Host Port: 0/TCP Limits: amd.com/gpu: 0 gpu.intel.com/i915: 0 nvidia.com/gpu: 0 Requests: amd.com/gpu: 0 gpu.intel.com/i915: 0 nvidia.com/gpu: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4xrh6 (ro) Volumes: kube-api-access-4xrh6: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 103s default-scheduler Successfully assigned ix-whoami4/whoami4-ix-chart-749c8ff779-mbrqp to ix-truenas Warning UnexpectedAdmissionError 103s kubelet Allocate failed due to no healthy devices present; cannot allocate unhealthy devices amd.com/gpu, which is unexpected
Successfully deployed nginx-proxy-manager app from Discover apps section. Probably related to not having `amd.com/gpu` in the Limits and Requests section.
Code:
$ kubectl -n ix-nginx-proxy-manager get pods NAME READY STATUS RESTARTS AGE nginx-proxy-manager-747c57ddf4-qnvfk 1/1 Running 0 119s``````$ kubectl -n ix-nginx-proxy-manager describe pods nginx-proxy-manager-747c57ddf4-qnvfk Name: nginx-proxy-manager-747c57ddf4-qnvfk Namespace: ix-nginx-proxy-manager Priority: 0 Node: ix-truenas/192.168.1.10 Start Time: Sun, 12 Nov 2023 20:32:25 +0200 Labels: app=nginx-proxy-manager-1.0.18 app.kubernetes.io/instance=nginx-proxy-manager app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=nginx-proxy-manager app.kubernetes.io/version=2.10.4 helm-revision=1 helm.sh/chart=nginx-proxy-manager-1.0.18 pod-template-hash=747c57ddf4 pod.name=npm release=nginx-proxy-manager Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "ix-net", "interface": "eth0", "ips": [ "172.16.0.161" ], "mac": "3e:a4:04:04:81:b6", "default": true, "dns": {}, "gateway": [ "172.16.0.1" ] }] rollme: uVBno Status: Running IP: 172.16.0.161 IPs: IP: 172.16.0.161 Controlled By: ReplicaSet/nginx-proxy-manager-747c57ddf4 Containers: nginx-proxy-manager: Container ID: containerd://0528182aff475d4963bb237f5e6fb2708a9850bb5fb62a268533bb21f249d5e2 Image: jc21/nginx-proxy-manager:2.10.4 Image ID: docker.io/jc21/nginx-proxy-manager@sha256:e1000dd653d193ac70cb3635c27333b0183a11f987e2b1c6043589d9d948bc0f Ports: 80/TCP, 443/TCP, 81/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP State: Running Started: Sun, 12 Nov 2023 20:32:26 +0200 Ready: True Restart Count: 0 Limits: cpu: 4 memory: 8Gi Requests: cpu: 10m memory: 50Mi Liveness: exec [/bin/check-health] delay=10s timeout=5s period=10s #success=1 #failure=5 Readiness: exec [/bin/check-health] delay=10s timeout=5s period=10s #success=2 #failure=5 Startup: exec [/bin/check-health] delay=30s timeout=2s period=5s #success=1 #failure=120 Environment: TZ: Europe/Tallinn UMASK: 002 UMASK_SET: 002 NVIDIA_VISIBLE_DEVICES: void PUID: 1000 USER_ID: 1000 UID: 1000 PGID: 1000 GROUP_ID: 1000 GID: 1000 DB_SQLITE_FILE: /data/database.sqlite DISABLE_IPV6: true Mounts: /data from data (rw) /etc/letsencrypt from certs (rw) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: certs: Type: HostPath (bare host directory volume) Path: /mnt/pool-2/ix-applications/releases/nginx-proxy-manager/volumes/ix_volumes/certs HostPathType: data: Type: HostPath (bare host directory volume) Path: /mnt/pool-2/ix-applications/releases/nginx-proxy-manager/volumes/ix_volumes/data HostPathType: QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 86s default-scheduler Successfully assigned ix-nginx-proxy-manager/nginx-proxy-manager-747c57ddf4-qnvfk to ix-truenas Normal AddedInterface 86s multus Add eth0 [172.16.0.161/16] from ix-net Normal Pulled 86s kubelet Container image "jc21/nginx-proxy-manager:2.10.4" already present on machine Normal Created 85s kubelet Created container nginx-proxy-manager Normal Started 85s kubelet Started container nginx-proxy-manager Warning Unhealthy 46s (x2 over 51s) kubelet Startup probe failed: NOT OK
Additional debugging info:
Code:
$ lspci01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X]
Code:
$ lsmodgpu_sched 53248 1 amdgpu drm_buddy 20480 2 amdgpu,i915 drm_display_helper 184320 3 amdgpu,radeon,i915 drm_ttm_helper 16384 2 amdgpu,radeon i2c_algo_bit 16384 3 amdgpu,radeon,i915 video 65536 3 amdgpu,radeon,i915
Do I need to do anything special about the external GPU when doing a fresh install of TrueNAS SCALE? Maybe installing drivers or anything like that? I have another machine with different hardware and with only integrated graphic and no issues there.