Migrate back to TrueNAS Core from Scale due to too many kernel panics

Purplebogan

Cadet
Joined
Dec 2, 2022
Messages
3
Hi all,
After upgrading my TrueNAS Core server (latest release version) to TrueNAS Scale (latest release version), it has been doing nothing but constantly kernel panicing and then not responding.
Here is a typical screen shot of a panic after attaching a monitor to it.

The hardware is an HP N54L Microserver with 16GB of ram, I have owned it since it was new.
The whole time I have had FreeNAS/TrueNAS core on it and it has NEVER had a single kernel panic or any failure.

The OS runs on a crucial SSD and worked perfect on TrueNAS core/freebsd.

In trying to get the linux based trueNAS Scale more stable I upgraded the ZFS pools to the latest version, that didn't help at all.
I only upgraded to Scale because I wanted to try Home Assistant which became only available in Scale.
I have decided I dont need Home Assistant and would much rather just be on TrueNAS Core and not get any more kernel panics etc.

So my biggest question is can I reinstall a fresh latest TrueNAS Core and will my ZFS pools upgraded via the latest version of Scale be compatible with Core or is Core behind in anyway in that regard?

If there is a more stable version of the TrueNAS Scale kernel available then I would be happy to use it, I don't care if its last generation or a super slow version that runs everything at half the speed...
 

Purplebogan

Cadet
Joined
Dec 2, 2022
Messages
3
Here is a typical Linux kernel panic, I have since updated the bios of the HP N54L.
20221114_183212 - Copy.jpg
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Interesting. Seems like it's unhappy during ATA SCSI queue command. Which would mayhap point to an issue with the module for that particular hardware.

What are you using for a controller? Onboard ports? Is this the AMD Turion II? And are SATA ports set to AHCI or some other setting?

Could this be related? https://bugzilla.kernel.org/show_bug.cgi?id=201693

Did you try
Code:
libata.force=noncq,3.0G
in kernel parameters and if so, did it help?
 

Purplebogan

Cadet
Joined
Dec 2, 2022
Messages
3
Interesting. Seems like it's unhappy during ATA SCSI queue command. Which would mayhap point to an issue with the module for that particular hardware.

What are you using for a controller? Onboard ports? Is this the AMD Turion II? And are SATA ports set to AHCI or some other setting?

Could this be related? https://bugzilla.kernel.org/show_bug.cgi?id=201693

Did you try
Code:
libata.force=noncq,3.0G
in kernel parameters and if so, did it help?
Thanks for the suggestion, I tried that and it appeared to help a tiny bit. I got convinced it was some kind of corruption I did the following things.
I replaced the SATA cable to the SSD OS/TrueNAS drive with a brand new one (which is the one SATA connection that doesn't use the 3.5 bays.
And also fully stripped the HP N54L Microserver down and cleaned every connector possible with electrical contact cleaner, including the RAM slots/USB/sata points, power connectors, everything.
I also re-thermal pasted the CPU and MB chipset heatsinks, for the first time in its life.

I also did a fresh clean install of TrueNAS SCALE Bluefin RC1 at the same time, onto the SSD.

And since then everything has been super-solid, not a tiny beep of an error/problem. I have since upgraded to Bluefin release, also no problems.
So while it's a tad annoying to not isolate my steps and testing in between to know exactly what made the difference, I am glad its solid now.

Just wanted to say thanks for pointing me in this direction.
 
Top