zmalqp
Cadet
- Joined
- Nov 5, 2022
- Messages
- 2
Hi,
I'm having a issues relating to the gpu support inside trunas scale, basically none of the application can use or see the gpu (Rtx 3060 )
Here some information:
I'm running truenas TrueNAS-SCALE-22.02.4
The gpu is not isolated:
Application gpu menu:
This does not wok
Command "systemctl status systemd-modules-load.service"
Command " nvidia-smi" (Everything looks ok)
Command "lspci | grep NVIDIA"
Command "k3s kubectl describe nodes" (Note the "nvidia.com/gpu: 0" line )
Does anyone have any idea how I can fix this issue?
Thanks for the support
I'm having a issues relating to the gpu support inside trunas scale, basically none of the application can use or see the gpu (Rtx 3060 )
Here some information:
I'm running truenas TrueNAS-SCALE-22.02.4
The gpu is not isolated:
Application gpu menu:
This does not wok
Command "systemctl status systemd-modules-load.service"
Code:
Load Kernel Modules Loaded: loaded (/lib/systemd/system/systemd-modules-load.service; static) Active: active (exited) since Wed 2022-11-02 02:23:56 CET; 3 days ago Docs: man:systemd-modules-load.service(8) man:modules-load.d(5) Main PID: 3267 (code=exited, status=0/SUCCESS) Tasks: 0 (limit: 18439) Memory: 0B CPU: 0 CGroup: /system.slice/systemd-modules-load.service Warning: journal has been rotated since unit was started, output may be incomplete.
Command " nvidia-smi" (Everything looks ok)
Code:
Sat Nov 5 19:04:34 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:B3:00.0 Off | N/A | | 35% 36C P0 40W / 170W | 0MiB / 12053MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
Command "lspci | grep NVIDIA"
Code:
0000:02:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1) 0000:02:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1) 0000:b3:00.0 VGA compatible controller: NVIDIA Corporation Device 2504 (rev a1) 0000:b3:00.1 Audio device: NVIDIA Corporation Device 228e (rev a1)
Command "k3s kubectl describe nodes" (Note the "nvidia.com/gpu: 0" line )
Code:
Name: ix-truenas Roles: control-plane,master Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux egress.k3s.io/cluster=true kubernetes.io/arch=amd64 kubernetes.io/hostname=ix-truenas kubernetes.io/os=linux node-role.kubernetes.io/control-plane=true node-role.kubernetes.io/master=true openebs.io/nodeid=ix-truenas openebs.io/nodename=ix-truenas Annotations: csi.volume.kubernetes.io/nodeid: {"zfs.csi.openebs.io":"ix-truenas"} k3s.io/node-args: ["server","--cluster-cidr","172.16.0.0/16","--cluster-dns","172.17.0.10","--data-dir","/mnt/NCC-1647/ix-applications/k3s","--kube-apiserve... k3s.io/node-config-hash: 4WLIOEEHXIUWUFC3B5ZWAWEEBA4CTYN7DE5CWRAURRH3F2SBQIXA==== k3s.io/node-env: {"K3S_DATA_DIR":"/mnt/NCC-1647/ix-applications/k3s/data/c7ce83f20ceffcbc9abbeb57698113658d664374455f25106875b3393927f7ce"} node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Wed, 05 Oct 2022 15:15:21 +0200 Taints: <none> Unschedulable: false Lease: HolderIdentity: ix-truenas AcquireTime: <unset> RenewTime: Sat, 05 Nov 2022 19:07:10 +0100 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Sat, 05 Nov 2022 19:04:06 +0100 Thu, 27 Oct 2022 13:15:25 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Sat, 05 Nov 2022 19:04:06 +0100 Thu, 27 Oct 2022 13:15:25 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Sat, 05 Nov 2022 19:04:06 +0100 Thu, 27 Oct 2022 13:15:25 +0200 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Sat, 05 Nov 2022 19:04:06 +0100 Wed, 02 Nov 2022 02:25:28 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled Addresses: InternalIP: 192.168.2.200 Hostname: ix-truenas Capacity: cpu: 12 ephemeral-storage: 420037504Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16069172Ki nvidia.com/gpu: 0 pods: 250 Allocatable: cpu: 12 ephemeral-storage: 408612483571 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16069172Ki nvidia.com/gpu: 0 pods: 250 System Info: Machine ID: 9847b502f1aa4eafa45e91b5b05ed754 System UUID: 03d502e0-045e-05ad-0c06-6b0700080009 Boot ID: c2a74b26-ed1e-4380-8a06-e262c7835569 Kernel Version: 5.10.142+truenas OS Image: Debian GNU/Linux 11 (bullseye) Operating System: linux Architecture: amd64 Container Runtime Version: docker://Unknown Kubelet Version: v1.23.5+k3s-fbfa51e5-dirty Kube-Proxy Version: v1.23.5+k3s-fbfa51e5-dirty PodCIDR: 172.16.0.0/16 PodCIDRs: 172.16.0.0/16 Non-terminated Pods: (34 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system coredns-d76bd69b-rpsgh 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 3d16h ix-traefik traefik-778bbdbc64-c579h 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h kube-system nvidia-device-plugin-daemonset-4dv5n 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-k8s-gateway k8s-gateway-74b5579c44-tglkf 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h kube-system svclb-satisfactory-query-eab207c0-tp84g 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-traefik svclb-traefik-wf7vv 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-gittea svclb-gittea-gitea-ssh-fkp69 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-pihole svclb-pihole-dns-tcp-6xwkv 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h kube-system svclb-satisfactory-c62002de-7fhvt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-k8s-gateway svclb-k8s-gateway-22gz7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h kube-system svclb-satisfactory-beacon-7ea4217b-7kzhh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-k8s-gateway k8s-gateway-74b5579c44-sp2t9 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-home-assistant svclb-home-assistant-4pp4t 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-traefik svclb-traefik-tcp-m6gfz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-gittea gittea-memcached-77589bccdc-xd84m 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-esphome svclb-esphome-cvhp5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h kube-system openebs-zfs-node-h5mz2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-pihole svclb-pihole-zqjx5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-pihole svclb-pihole-dns-42lfk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h kube-system openebs-zfs-controller-0 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h ix-kavita kavita-659d87895c-wf6p9 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-gittea gittea-postgresql-0 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-pihole pihole-86bbffddc7-n9wnq 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-esphome esphome-67797c88f9-9pc98 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-nextcloud nextcloud-redis-0 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-onlyoffice-document-server onlyoffice-document-server-postgresql-0 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-onlyoffice-document-server onlyoffice-document-server-redis-0 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-home-assistant home-assistant-postgresql-0 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-gittea gittea-gitea-86d47567c7-z5j6k 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-nextcloud nextcloud-postgresql-0 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-onlyoffice-document-server onlyoffice-document-server-f958bff5f-j5x7x 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-home-assistant home-assistant-7bf7dc5f59-hnl9d 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-nextcloud nextcloud-6f55649b69-ncn87 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 3d16h ix-jellyfin jellyfin-7596f64f8d-zxk2g 10m (0%) 4 (33%) 50Mi (0%) 8Gi (52%) 2d1h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 280m (2%) 72 (600%) memory 970Mi (6%) 147626Mi (940%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) nvidia.com/gpu 0 0 Events: <none>
Does anyone have any idea how I can fix this issue?
Thanks for the support