Boot Issues with TrueNAS-SCALE-22.02.0.1

bacon_o

Cadet
Joined
May 2, 2022
Messages
6
I am having two boot issues with a fresh install of Truenas Scale on my Lenovo TD340. I attempted to search google and the forums but I was not able to find any relevant posts or info about the issues I am having.

System info:
Lenovo ThinkServer TD340 bios: A3TSF5A
MoBo: L32TT2 (Lenovo)
CPU: Intel Xeon E5-2420 v2
Memory: 48G DDR3 1333 ECC
Boot Drive: Crucial CT256M4SSD2
NIC: AOC-STGN-I2S (Intel 82599)
Disk Controller: TBD
Storage Array: TBD

1. Booting with SR-IOV enabled: When I had SR-IOV enabled in the BIOS the Truenas system would never fully boot. It booted to GRUB and then started the Truenas bootup process however it would go into what appeared to some sort of loop in the bootup process. I left the bootup process running for about an hour and never fully booted It constantly showed the same error messages:

May 1 19:18:33 truenas kernel: Sending NMI from CPU 6 to CPUs 2:
May 1 19:18:33 truenas kernel: NMI backtrace for cpu 2
May 1 19:18:33 truenas kernel: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.93+truenas #1
May 1 19:18:33 truenas kernel: Hardware name: LENOVO ThinkServer TD340 /ThinkServer TD340, BIOS A3TSF5A 09/11/2020
May 1 19:18:33 truenas kernel: RIP: 0010:io_serial_in+0x14/0x20
May 1 19:18:33 truenas kernel: Code: 00 00 d3 e6 48 63 f6 48 03 77 10 8b 06 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 0f b6 8f b9 00 00 00 8b 57 08 d3 e6 01 f2 ec <0f> b6 c0 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 0f b6 8f b9 00
May 1 19:18:33 truenas kernel: RSP: 0000:ffffb467400239d8 EFLAGS: 00000002
May 1 19:18:33 truenas kernel: RAX: ffffffffb7be5700 RBX: ffffffffb95f5220 RCX: 0000000000000000
May 1 19:18:33 truenas kernel: RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffffb95f5220
May 1 19:18:33 truenas kernel: RBP: 0000000000001eec R08: 0000000000000002 R09: 0000000000000882
May 1 19:18:33 truenas kernel: R10: 000000000000075d R11: 207375625f696370 R12: 0000000000000020
May 1 19:18:33 truenas kernel: R13: ffffffffb94ea3ee R14: 0000000000000001 R15: 0000000000000000
May 1 19:18:33 truenas kernel: FS: 0000000000000000(0000) GS:ffff9fbcffa80000(0000) knlGS:0000000000000000
May 1 19:18:33 truenas kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 1 19:18:33 truenas kernel: CR2: 0000000000000000 CR3: 000000072520a001 CR4: 00000000001706e0
May 1 19:18:33 truenas kernel: Call Trace:
May 1 19:18:33 truenas kernel: wait_for_xmitr+0x40/0xb0
May 1 19:18:33 truenas kernel: serial8250_console_putchar+0x18/0x30
May 1 19:18:33 truenas kernel: ? wait_for_xmitr+0xb0/0xb0
May 1 19:18:33 truenas kernel: uart_console_write+0x43/0x50
May 1 19:18:33 truenas kernel: serial8250_console_write+0x300/0x380
May 1 19:18:33 truenas kernel: ? vt_console_print+0x2bb/0x3f0
May 1 19:18:33 truenas kernel: console_unlock+0x3c6/0x530
May 1 19:18:33 truenas kernel: vprintk_emit+0x208/0x250
May 1 19:18:33 truenas kernel: dev_vprintk_emit+0x12c/0x150
May 1 19:18:33 truenas kernel: dev_printk_emit+0x4e/0x65
May 1 19:18:33 truenas kernel: ? __dev_printk+0x2d/0x69
May 1 19:18:33 truenas kernel: _dev_info+0x6c/0x83
May 1 19:18:33 truenas kernel: pci_bus_dump_resources.cold+0x14/0x19
May 1 19:18:33 truenas kernel: pci_bus_dump_resources+0x5b/0x70
May 1 19:18:33 truenas kernel: pci_assign_unassigned_root_bus_resources+0xfd/0x1c0
May 1 19:18:33 truenas kernel: pci_assign_unassigned_resources+0x1f/0x7c
May 1 19:18:33 truenas kernel: pcibios_assign_resources+0x1b/0xcd
May 1 19:18:33 truenas kernel: ? xsk_init+0xbe/0xbe
May 1 19:18:33 truenas kernel: do_one_initcall+0x44/0x1d0
May 1 19:18:33 truenas kernel: kernel_init_freeable+0x21e/0x280
May 1 19:18:33 truenas kernel: ? rest_init+0xb4/0xb4
May 1 19:18:33 truenas kernel: kernel_init+0xa/0x10c
May 1 19:18:33 truenas kernel: ret_from_fork+0x22/0x30
May 1 19:18:33 truenas kernel: pci_bus 0000:0c: resource 10 [mem 0x80000000-0xfbffffff window]
May 1 19:18:33 truenas kernel: rcu: rcu_sched kthread starved for 706 jiffies! g-1119 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1

I don't require the functionality of SR-IOV so leaving it disabled in the BIOS is okay for me. I was just wondering if this was the expected result.

2. Long boot up times: The other issue I have am having is what I think would be considered long bootup times. If I time the bootup process from the moment GRUB starts to when the web gui is available the boot times is usually in the 20-25 minute range. Not looking for desktop level boot times here but this does seem longer than it should be. Also the console output with issue #1 is seen a few times while booting to the working system (just doesn't get stuck in a loop). Wondering if there is anything I should be checking in the BIOS or else where that can help bring those times down a bit.

I ran the system through 24hr mem test and CPU stress test and everything passed.

Not really sure what info is most relevant so I'd figure I just post I'd start with the debug file.
 

Attachments

  • debug-truenas-20220502112639.tgz
    407.8 KB · Views: 151

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
That is interesting, I'd suspect some BIOS option combinations with SR-IOV is playing havoc here. You are running latest BIOS, so that is good. You can check if it's a software / kernel issue by testing the Bluefin ISO image potentially:


Otherwise a bug ticket in Jira with the debug may be best path forward here.
 

bacon_o

Cadet
Joined
May 2, 2022
Messages
6
Thanks for the reply!

I did some additional testing with different boot environments and it looks like opening a bug ticket is probably the way to go.

1. Truenas Bluefin with SR-IOV Enabled: Got stuck in the boot loop message.
2. Truenas Bluefin with SR-IOV Disabled: System booted but it took over 20 minutes. Was stuck on "Loading initial ramdisk" for about 15 minutes.
3. Debian 11.3 with SR-IOV Enabled: I'd figure I would just try a base Debian install to see what occurs. No issues during boot, system booted up in less than a minute from GRUB. It was just a base system so not many packages had to load, but at least I know linux can boot correctly with SR-IOV enabled.
4. Truenas Core 12.0-U8 SR-IOV Enabled: System booted successfully, and was well under 2 minutes until the system was fully booted.
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Please do open a ticket with all the relevant information. The fact it booted with Bluefin is better, but still may indicate we have something configured differently in the kernel or boot loader... We'll want to investigate that further.
 

bacon_o

Cadet
Joined
May 2, 2022
Messages
6
I opened a bug ticket NAS-116023 and provided all the info. In the interim I will probably use Truenas core while the issue is investigated as it appears to be behaving better with my system.
 

bacon_o

Cadet
Joined
May 2, 2022
Messages
6
I did some additional testing and discovered that my issue seems to be with GRUB boot option console=ttyS0,115200. After removing that option the system booted up in a normal amount of time with SR-IOV enabled in the BIOS. With console=ttyS0,115200 configured in GRUB I tried a variety of serial configurations changes in the BIOS but none fixed my issue.

I don't require console output so leaving that option disabled in GRUB is a valid workaround for me. I am going to update the JIRA ticket as well with this info.

Steps I did to fix the issue - EDIT * Doesn't fix the issue across reboots *
  1. On first boot wait for GRUB to load. Select the Truenas SCALE option press 'e' to edit entry
  2. Navigate to the line
    Code:
    linux   /ROOT/22.02.0.1@/boot/vmlinuz-5.10.93+truenas root=ZFS=boot-pool/ROOT/22.02.0.1 ro console=ttyS0,115200 console=tty1 libata.allow_tpm=1 systemd.unified_cgroup_hierarchy=0 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=on
    and delete console=ttyS0,115200 then press F10 (or F12 I forget) to boot the system
  3. Once system is booted edit the file /etc/default/grub.d/truenas.cfg removing console=ttyS0,115200 from the line GRUB_CMDLINE_LINUX="console=ttyS0,115200 console=tty1"
  4. Run # update-grub
  5. I then rebooted the system to make sure it booted up correctly
Looks like the fix is much simpler I just need to disable console from the web gui. I go to System Settings --> Advanced --> Console, and set Enable Serial Console: Disabled.
 
Last edited:

CrownedMartyr

Dabbler
Joined
Nov 20, 2021
Messages
21
@bacon_o, I'm also having this same issue on my ThinkServer TD340. The issue I reported with long boot times on the "loading initial ramdisk" process is still occurring, but the boot process does continue after 15-30 minutes or so.

However, I get stuck in a boot loop similar to the one you posted. I haven't tried editing GRUB as you did, but seeing as though it appears to be a temp. fix, I'm inclined not to try it at all. It is interesting that we both have the same hardware.

Hopefully someone can come up with a solution because I have about given up on troubleshooting this further on my own.

Note that I see the same behavior on the latest Bluefin too.
 

bacon_o

Cadet
Joined
May 2, 2022
Messages
6
This issue has been resolved for me since I disabled the serial console output.
 
Top