SOLVED Problem: TrueNAS SCALE upgrade from bluefin to latest corbia version hangs

sylaan

Cadet
Joined
Mar 9, 2024
Messages
9
Hello all,

I have a problem upgrading my TrueNAS SCALE system to the latest 23.10 corbia version. I had an older version of bluefin so after reading on the upgrade path, I upgraded the system without issues via the GUI to TrueNAS-SCALE-22.12.4.2. Then I attempted to upgrade to TrueNAS-23.10.2 and that's when things went wrong. The system hang hard after boot, at a very early stage. I could watch this via the IPMI of my Supermicro board, could not see all the messages but the latest ones were:

Code:
......
[Thu Mar  7 23:06:56 2024] PCI host bridge to bus 0000:00
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [mem 0x000cc000-0x000cffff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [mem 0x000dc000-0x000dffff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [mem 0x000e0000-0x000e3fff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [mem 0x000e4000-0x000e7fff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xfeafffff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [mem 0xc00000000-0xfbfffffff window]
[Thu Mar  7 23:06:56 2024] pci_bus 0000:00: root bus resource [bus 00-3e]
[Thu Mar  7 23:06:56 2024] pci 0000:00:00.0: [8086:0c08] type 00 class 0x060000
[Thu Mar  7 23:06:56 2024] pci 0000:00:01.0: [8086:0c01] type 01 class 0x060400
[Thu Mar  7 23:06:56 2024] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
[Thu Mar  7 23:06:56 2024] pci 0000:00:14.0: [8086:8c31] type 00 class 0x0c0330
[Thu Mar  7 23:06:56 2024] pci 0000:00:14.0: reg 0x10: [mem 0xf7800000-0xf780ffff 64bit]
[Thu Mar  7 23:06:56 2024] pci 0000:00:14.0: PME# supported from D3hot D3cold
[Thu Mar  7 23:06:56 2024] pci 0000:00:16.0: [8086:8c3a] type 00 class 0x078000
[Thu Mar  7 23:06:56 2024] pci 0000:00:16.0: reg 0x10: [mem 0xf7816000-0xf781600f 64bit]
[Thu Mar  7 23:06:56 2024] pci 0000:00:16.0: PME# supported from D0 D3hot D3cold
[Thu Mar  7 23:06:56 2024] pci 0000:00:16.1: [8086:8c3b] type 00 class 0x078000
[Thu Mar  7 23:06:56 2024] pci 0000:00:16.1: reg 0x10: [mem 0xf7815000-0xf781500f 64bit]
[Thu Mar  7 23:06:56 2024] pci 0000:00:16.1: PME# supported from D0 D3hot D3cold
[Thu Mar  7 23:06:56 2024] pci 0000:00:1a.0: [8086:8c2d] type 00 class 0x0c0320
[Thu Mar  7 23:06:56 2024] pci 0000:00:1a.0: reg 0x10: [mem 0xf7813000-0xf78133ff]
[Thu Mar  7 23:06:56 2024] pci 0000:00:1a.0: PME# supported from D0 D3hot D3cold
[Thu Mar  7 23:06:56 2024] pci 0000:00:1c.0: [8086:8c10] type 01 class 0x060400
[Thu Mar  7 23:06:56 2024] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold


It obviously has something to with some PCI devices (I think), something in the new version (or kernel) is maybe not ok with some of my hardware, even though it works just fine on TrueNAS-SCALE-22.12.4.2. I had to power cycle the server and tried several times but it hangs at the exact same spot.

If I choose the latest bluefin version (TrueNAS-SCALE-22.12.4.2) from the boot menu, then it boots ok. For comparison, this are the boot messages for a working boot (https://pastebin.com/Qu6rCLmt), one can see there what the boot should look like, compared to when it hangs.

One of the first few lines after the one where it hangs above are:

Code:
[Thu Mar  7 23:06:56 2024] pci 0000:00:1c.0: Enabling MPC IRBNCE
[Thu Mar  7 23:06:56 2024] pci 0000:00:1c.0: Intel PCH root port ACS workaround enabled
[Thu Mar  7 23:06:56 2024] pci 0000:00:1c.2: [8086:8c14] type 01 class 0x060400
[Thu Mar  7 23:06:56 2024] pci 0000:00:1c.2: PME# supported from D0 D3hot D3cold


I am not sure what that is, something to do with Intel. No idea why that fails on corbia. This is my hardware:

OS Version: TrueNAS-SCALE-22.12.4.2
Mainboard: X10SLM+-LN4F
CPU: Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz
Memory: 32 GB (non-ECC)

Disks:
1x Crucial 256GB SSD: boot pool
3x 8TB Seagate Exos 7E10 (ST8000NM017B-2TJ): data pool

NICs:
4x Intel i210 Gb ports (built into the mainboard, not connected/used)
2x Intel 82571EB/82571GB ports (not connected/used)
1x ConnectX-3 Mellanox 10Gbps NIC (connected, used).

Anyone has any idea what happens here ? Or any other info that I can provide ?

Thank you in advance, any help is much appreciated.

--
Sylaan
 

sirknight115

Dabbler
Joined
Oct 18, 2015
Messages
10
How do you upgrade to this new version? I am running the same version you have 22.12.4.2 and I am having issue with my app emby and want to upgrade to 23 can you help me on how to do this?
Thank You
 

sirknight115

Dabbler
Joined
Oct 18, 2015
Messages
10
How do you upgrade to this new version? I am running the same version you have 22.12.4.2 and I am having issue with my app emby and want to upgrade to 23 can you help me on how to do this?
Thank You
Ok, I found out how to do it and I upgraded my truenas to the new version and it took care of the problem I have been having with Emby. I hope someone helps you out.
 

sylaan

Cadet
Joined
Mar 9, 2024
Messages
9
Glad it worked for you. I am still on 22.12.4.2, I am a bit reluctant to try again until I have an idea why it fails to hard.
 

sirknight115

Dabbler
Joined
Oct 18, 2015
Messages
10
Hey, I was thinking about your issue and I ran into something like this a couple of years but I don't remember what I did. but have you checked your bios? is it updated? and also checked the time and date on the bios, I ran into another issue loading truenas and the time and date were off on my computer.
 

sylaan

Cadet
Joined
Mar 9, 2024
Messages
9
The CPU is a Intel(R) Xeon(R) CPU E3-1231 v3, which is definitely 64-bit. Time is also ok. The BIOS is not the latest, I'll admit, but I didn't think that would have such a bit impact since I am not changing the hardware and it works fine now. But there is a new version for this Supermicro board and I'll update it, see if that helps.
 

sylaan

Cadet
Joined
Mar 9, 2024
Messages
9
BIOS upgrade didn't help, I'll keep looking. I have a feeling it might be related to the 2-port NIC I have, I'll see if removing that helps.
 

sylaan

Cadet
Joined
Mar 9, 2024
Messages
9
It seems I was right and it was the 2-port NIC after all. After removing the card from the system, TrueNAS booted on 23.10 without problems.

So for some reason, there was some sort of incompatibility between the kernel used in TrueNAS SCALE 23.10 (6.1.74-production+truenas) and that 2-port NIC (Intel 82571EB/82571GB). Maybe in combination with my particular Supermicro motherboard, hard to say.

The only strange thing which happened after booting into 23.10 was that one of my local users (which was used for a SMB share) completely disappeared from the system. It was missing from the GUI and also from /etc/passwd. Dataset permissions were still referencing the non-existent UID and GID. The associated SMB share was also gone. Not sure what happened but re-creating the user and the share fixed everything.
 

sirknight115

Dabbler
Joined
Oct 18, 2015
Messages
10
Well, I'm glad everything worked out! I also have two network ports on my computer and I was thinking of setting one up for my VPN and the other for normal traffic but I am running out of storage space so I might work on getting bigger hard drives before I set up anything else. I hope the next version of Truenas gets better over the last couple of years it has been great!
 
Top