Truenas SCALE unable to boot without monitor connected to GPU.

madforic

Cadet
Joined
Nov 27, 2013
Messages
6
Hi, folks!
I have running NAS based on CORE over 10 years since it was FreeNAS 8 version.
Now building new setup based on TrueNAS-SCALE-23.10.2 and falling in troubles.
I have 1 GPU - Nvidia Quadro P400, this GPU suppose to use as transcoding accelerator in APPS.
But I have some issueses:
1. Unable to boot without monitor connected to GPU. Nothing happens after 10-20-30 minutes after start (average boot time 2 min eith monitor connected). When I connect monitor nothing there, just black screen, no ctrl+alt+del reaction, only reset or pwr helps to restart. This annoying I dont want monitor being always connected to my server and cost me additional $. Is it truenas SCALE bootloader issue, or I should setup my BIOS properly?
2. When I boot with monitor connected, SCALE boots correctly, and I was able to assign GPU to Emby APP. But after some time I have no GPU available to assing, however nvidia-smi shows my P400 up and running.
Code:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P400                    Off | 00000000:04:00.0 Off |                  N/A |
| 46%   50C    P0              N/A /  N/A |      0MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                        
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Kernel modules loaded
Code:
04:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P400] (rev a1)
        Subsystem: NVIDIA Corporation GP107GL [Quadro P400]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_current_drm, nvidia_current
04:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
        Subsystem: NVIDIA Corporation GP107GL High Definition Audio Controller
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel


But no available GPU for APPS
1710535901680.png


This problem is common, according to forum threads, but there is still no solution. So I ran the same troubles)
I think there is a plugin container issue, but I`m noob on it) so it hard to suggest next steps for troubleshoot.

Code:
# k3s kubectl get pod -A | grep nvidia
kube-system              nvidia-device-plugin-daemonset-f8tsb                  0/1     CrashLoopBackOff           36 (4m40s ago)   164m


I will be grateful for the help
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Some non-server boards are not designed to work as servers. Most servers are designed to work without video, or use primitive video through IPMI.

Further, some power saving features may kick in, (via video driver), and put the video to sleep after a while.

One solution is to buy a "fake" monitor dongle. It looks like a small USB flash drive, except it has HDMI or DisplayPort connector. Simply get your computer configured the way you want, then shutdown to power off state. Disconnect the monitor and plug in the "fake" monitor dongle. Power up and it may solve your problem.
 

madforic

Cadet
Joined
Nov 27, 2013
Messages
6
Thanks a lot! Fake monitor dongle is oredered, waiting to try.
The second issue was resolved to. I found some pods in unexpected status:

Code:
 k3s kubectl get pods -A
NAMESPACE                NAME                                                  READY   STATUS                     RESTARTS          AGE
ix-nextcloud-tr          nextcloud-tr-notify-65f656bd54-lkxw8                  0/1     UnexpectedAdmissionError   0                 13h
ix-torrserve             torrserve-ix-chart-5d94b7bf46-6x5pm                   0/1     UnexpectedAdmissionError   0                 13h
kube-system              nvidia-device-plugin-daemonset-f8tsb                  0/1     CrashLoopBackOff           163 (3m42s ago)   13h


I killed them with command:
Code:
k3s kubectl delete pods -n ix-<app-name> <pod name>


Then after some time nvidia-device-plugin-daemonset-f8tsb restarted automatically and GPU available for APPS again.
 

madforic

Cadet
Joined
Nov 27, 2013
Messages
6
Actually solved problem 1.
Disabled CSM in BIOS, then reinstall Truenas and import config.
Now I can boot headless and GPU available for APPS.

Problem 2 occures every reboot.
some applications hangs in UnexpectedAdmissionError and
nvidia-device-plugin-daemonset goes to CrashLoopBackOff

kube-system nvidia-device-plugin-daemonset-f8tsb 0/1 CrashLoopBackOff 163 (3m42s ago) 13h

The only way i found to resolve this is manually kill al hanged procces
 
Top