SOLVED TrueNAS Core 12-U2 not booting with "KDB: debugger backends: ddb"

Joined
Oct 24, 2021
Messages
13
Hello

I hope you can help me with this issue.
Suddenly today my TrueNAS Core installation (plattform is a Supermicro Server CSE-848X 4x 12C Xeon E7-4850 v2 2,3GHz 256GB 24xLFF) with 10 disks running a fusion pool with NVMes, is not booting anymore. When I use the remote kvm console I see that it hands with a message: KDB: debugger backends: ddb (please see screenshot). The system properly did shutdown the day before. I did not change anything in hardware or software.

How can I get the system up again?
TrueNas.png
 
Joined
Oct 24, 2021
Messages
13
Booting in Verbose mode also shows nothing which would enlighen me. In the BIOS in see all HDDs being detected
I really need help here, because also Booting in "Safe Mode" does yield the same error.
TrueNas2.png
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,924
If you have 4 64GB memory sticks on board, your second screenshot would suggest a memory stick error to me.
 
Joined
Oct 24, 2021
Messages
13
If you have 4 64GB memory sticks on board, your second screenshot would suggest a memory stick error to me.
The system has a total of 256GB ECC RAM which passes the initial check before booting... I doubt there are 64GB Modules, the entire body is filled with memory sticks, at least 16 of them (never counted them). Is there a way to figure out which DIMM could be affected or is it now the good old try and error? o_O
 
Joined
Oct 24, 2021
Messages
13
Finally I read through the linux kernel code documentation and figured out what SLIT.Localities are, so the 4 means 4 processors (numa nodes) and the below table its the access latecy between those nodes for local vs. remote memory access to said node ( https://www.leidinger.net/FreeBSD/dox/dev_acpica/html/d1/d8f/acpi__pxm_8c_source.html ).

Which part of the messages points into the direction of the memory DIMMs? The SRATs above? I am new to FreeBSD, but otherwise know my way around IT.. just here I need help. I am willing to plug out all memory and start experimenting though. The data on the system is important and like all good IT experts I have no complete backup... :/
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,924
My comment was based on the Memory enabled, then disabled, adjacent display lines, that's all- a shot in the dark, and turn out to be a red herring, (and my 4 by 64 on the total of 256 and 4 display lines with Memory enabled reported).

If the memory passes a check on post then I'd assume it's good, too.

I hope that someone knowledgeable can chip in here and give you some real guidance!

Good Luck!
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
On my system, immediately after the
FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
line, I see
VT(efifb): resolution 1280x1024

Could it be your server somehow believes it's now using the serial console for boot output? Can you get into the BIOS to check?
 
Joined
Oct 24, 2021
Messages
13
I
On my system, immediately after the line, I see

Could it be your server somehow believes it's now using the serial console for boot output? Can you get into the BIOS to check?
I will check if I see something in the BIOS. I am logged int the system wie IPMI / KVM. Its good, because if I watch it boot it boots in cycles, but when I not watch it boot, it hangs! Thats weirding me out. I have not too much experience with Supermicro BIOS but I will try my best. Could it help to physically connect a monitor to VGA-out?
 
Joined
Oct 24, 2021
Messages
13
On my system, immediately after the line, I see

Could it be your server somehow believes it's now using the serial console for boot output? Can you get into the BIOS to check?
This are my settings. I did not chage them since setting up the system last year.
iKVM_capture.jpg

The second console seems enabled with parameters:
iKVM_capture.jpg

Is maybe KVM using this to display up in the BMC of the Supermicro?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
OK, that confirms your system is redirecting all output to COM2/Serial-over-LAN. If the SuperMicro BIOS is like my AMI BIOS, try disabling that redirection. KVM should then have the same output as the VGA-out, and you'll be able to view more of the boot output. I suspect your failure to boot is because your system lacks the loader tunable console="comconsole".
 
Joined
Oct 24, 2021
Messages
13
OK, that confirms your system is redirecting all output to COM2/Serial-over-LAN. If the SuperMicro BIOS is like my AMI BIOS, try disabling that redirection. KVM should then have the same output as the VGA-out, and you'll be able to view more of the boot output. I suspect your failure to boot is because your system lacks the loader tunable console="comconsole".
I have disabled the now all console redirection, but it still yields to the same result. Stops at after printing out the SLIT.Localities :/
I also disbaled the COM ports, just to be sure, but still no change. This is disturbing.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Does the board boot with any Linux live USB drive? You may be in the unfortunate situation of a motherboard failure.
 
Joined
Oct 24, 2021
Messages
13
Does the board boot with any Linux live USB drive? You may be in the unfortunate situation of a motherboard failure.
I will try to create a Live USB from FreeBSD and check. Is it save to use it or might it to somehting to the storage pool of TrueNas?
I found a suspicous message in the BIOS'en syslog, claiming there is somehing wrong with DIMM A1 sporadically. Maybe its still worth a try to check the main memory.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
To be on the safe side, you should disconnect your pool disks while you're running the Live USB boot checks. A bad stick would also definitely result in a boot failure, but is easier to overcome.
 
Joined
Oct 24, 2021
Messages
13
To be on the safe side, you should disconnect your pool disks while you're running the Live USB boot checks. A bad stick would also definitely result in a boot failure, but is easier to overcome.
Its a fusion pool with 10 HDDs and 3 NVMe (PCI connected). Guess I will have to open the system anyhow ... (weighs 30kg^^)

iKVM_capture_mem.jpg
Are those normal?
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Joined
Oct 24, 2021
Messages
13
My TrueNAS is up and running again.
I removed all riser cards for memory and replaces all DIMMs for which I got errors in the log and put them back. The EEC memory passed all checks though - makes me wonder what that check checks. I lost three 4G memory bars, but well.
@Redcoat: Seems your "red hearing" was it
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,924
Joined
Oct 24, 2021
Messages
13
Well. how about that? For once a red herring may not in fact have been one. So glad you have it fixed!
Strange.... TrueNAS boots, but only when a physical monitor is connected now.
If not, then it hangs either directly in the BIOS when I watch with IMPI and if I do not watch it hangs later on when TrueNas boots at some USB stuff. Well.. backup is runing know. But still, suggestions are wellcome.
 
Top