Truenas Scale complete Lockup/Crash.

Firebird99ta

Dabbler
Joined
Sep 14, 2014
Messages
11
So, before I get into the issue I am having, here are the specs of my system:


Motherboard: Supermicro x11ssh

CPU: Intel E3-1245v5 4c/8t

Memory: 64gig Micron DDR4 2400 ECC

Hard Drives: 2-Western Digital Red 8tb 5460rpm 128meg cache CMR – Raid 1 mirror

2-Samsung 870 Evo 500gig 2.5” SSD Raid 1 mirror for Apps

2-PNY Optima 120gig mirrored boot drives.

Network Cards: Onboard Intel i210 1gb ethernet controller

Power supply is 600 watt 80+ Gold


Network Environment:

PFsense Appliance running Intel X550-t2 negotiating at 5g ISP connection. Then feeding 1-8port Gigabit Netgear Managed switch, which then splits to 2 more Gigabit Netgear Managed switch’s. All the switches are the same, allow for flow control, and 2 port binding. Various devices on the network from game systems, to workstations, to streaming appliances show no packet issues, collisions, dropped connections. No cabling issues. For all intensive purposes, a clean environment. No prior network issues, comm issues etc.


Onto the problem:

Supermicro has the current BIOS/BMC-IPMI firmware installed.

Truenas Scale is installed on the 120gig mirrored SSDs, with 16gig swap file, in UEFI mode. Boots normally and allows clean access with no problems.

Mounted the 2 Samsung SSDs and created a Dataset to store App data, configurations, backups Etc.

Mounted the 2 WD Red drives in an initial Mirror with multiple main Datasets, I.E. Documents, Music, Videos, Programs etc.

Whenever I try to copy data (Could be as little as 300mb, or as large as 30gigs) from either my workstation or VM server, Truenas locks up to the point I have to initiate a server reset through IPMI, or power off and reset,

The transfers will start out strong, and gradually fall off hitting 0mb transfer, causing windows to not be able to find to folder share, at that point browser shows searching for Truenas, and IPMI shows nothing out of the ordinary.

IPMI doesn’t show any alerts, systems issues, temp issues, cpu, nothing.

Ran memtest, memory checks fine. It’s running at 2133mhz which shouldn’t be a problem at all.


Steps I have used to try and solve the issue are as follows:

Swapped out the 64gig Micron for a 16gig Samsung stick. – Same issues.

Hooked up both 1gig Nic ports – Same issue.

Tried Network Flow control – Same issue.

Tried moving Truenas server to the same switch as either the workstation or VM server. – Same Issue

I don't see any threads here or on some of my other forums having this exact issue lest maybe worded differently.


What I am missing?
 

Firebird99ta

Dabbler
Joined
Sep 14, 2014
Messages
11
Update:

As it stands I installed Truenas Core and setup a basic dataset/test folder, then did a direct connect between my workstation station and NAS transferring 300+gigs of information at a constant 900MB/sec rate with no drop.

This points more towards Scale being an issue then Core.

All in all, I think I may have come to an idea what the problem is or was. I'll have a better idea when I reinstall Scale.
 

Firebird99ta

Dabbler
Joined
Sep 14, 2014
Messages
11
Well I installed Scale the same way and setup the same test datasets and permissions and the crash still occurred. So it looks like for me anyway its gonna be Core for now.
 
Joined
Feb 14, 2024
Messages
4
Did you ever find a resolution to this? I've run into a very similar issue after upgrading from Proxmox 7 to 8 on the same CPU, motherboard, and possibly RAM. Is it a hard lock where all logging stops abruptly with no error? I suspect a Linux kernel regression or firmware bug that the kernel is "hitting" in a new way. Core wouldn't be affected since it obviously uses FreeBSD.
 

Firebird99ta

Dabbler
Joined
Sep 14, 2014
Messages
11
Did you ever find a resolution to this? I've run into a very similar issue after upgrading from Proxmox 7 to 8 on the same CPU, motherboard, and possibly RAM. Is it a hard lock where all logging stops abruptly with no error? I suspect a Linux kernel regression or firmware bug that the kernel is "hitting" in a new way. Core wouldn't be affected since it obviously uses FreeBSD.
Sadly no I have not. That being the case I ended up using core for now until they acknowledge they know the problem exists. I may go a completely different route if Scale ends up being the primary goto. I haven't downloaded the latest version update to test, but given your response, that tells me the issue/s still exists.
 
Joined
Feb 14, 2024
Messages
4
Sadly no I have not. That being the case I ended up using core for now until they acknowledge they know the problem exists. I may go a completely different route if Scale ends up being the primary goto. I haven't downloaded the latest version update to test, but given your response, that tells me the issue/s still exists.
If it's a firmware or kernel bug for such an old motherboard, it may never get fixed. Were you experiencing hard locks with no log output?
 

Firebird99ta

Dabbler
Joined
Sep 14, 2014
Messages
11
I don't think its related to the age of the motherboard, especially since it just came off of EOL, also Linux tends to be far more current and liberal with hardware support compared to FreeBSD. My opinion, its related to how the software handles network communications to the storage. I first used the on board Intel 1g NIC ports, then I installed a Intel 10gig card and had the same problem. Its possible it could be the SATA bottleneck, but it would still need to handle the data transfers accordingly. Scale is the ONLY software that this happens on. CORE, Unraid, OMV, Proxmox, XCP-NG all handle storage and network traffic with out any issues.

Hard lock after initiating a network transfer. No specific time, no specific data size. One thing I hadn't tried was a direct transfer mounting an existing drive with data.

Since CORE works without issue I just stayed with it and used my Hypervisor to redirect internal/external storage requests versus using the built in apps. They are not as easily installed and configured compared to SCALE, but honestly though, I would never use a NAS software as a VM, or vice versa, or use the built in apps, your opening up a can of worms.
 
Joined
Feb 14, 2024
Messages
4
No, it's more likely a firmware bug than the age of the board. I can't even figure out how to update the firmware, since IPMI BIOS updating is locked behind a paywall and I get nothing but a black screen from DOS boot disks. I tried FreeDOS and MS-DOS, both of which boot perfectly on my desktop. I'm ready to throw this motherboard in the trash and purchase ASRock or something.
 
Joined
Feb 14, 2024
Messages
4
By following a helpful guide (https://peterkleissner.com/2018/05/27/reverse-engineering-supermicro-ipmi/) I was able to update the firmware, but it didn't help with hard locks. After hours of trial and error I finally managed to get a stable system.

Unfortunately, the only advice I can give is to try swapping PCI cards into different slots and possibly disable SuperIO in the BIOS. After playing with other BIOS settings I somehow got the SuperMicro firmware to hang on "SuperIO Initialization" and couldn't even get back into BIOS settings. After re-flashing the firmware via IPMI, I was able to disable SuperIO and everything has been running great since. I even got iGPU passthrough to work, which would always cause hangs when I tried it in the past.
 
Top