New install - thread already at 100%

m3m

Dabbler
Joined
Dec 28, 2022
Messages
11
Afternoon.

Just installed Truenas Scale TrueNAS-SCALE-22.12.1 on a brand new system. Below are the specs of the new system.

AsRock EPYCD8-2T
Epyc 7302P 12-Core 3.1 GHz
Crucial 256GB DDR4 3200
2x Icy Dock 8x 2.5" Bay
8x PNY CS900
8x Crucial MX500
Silicom PE310G4I71LBEU-XR-LP(Intel 710X)
Qlogic 16Gb HBA
LSI 9400-16i HBA IT-mode(internal connection to SSD)

After a UEFI install, logged in and saw one thread at 100%. Jumped into shell and tried to see what was causing the issue. Two kworker and one ksoftirqd chewing up cycles. As a test installed TrueNAS-13.0-U4 and CPU was at 0% usage. No run away thread. So I reset the CMOS and installed Scale again, boom one thread at 100%. Same processes as before chewing up cycles. Nothing has been configured or setup so I don't understand why there is one thread at 100%. Any ideas where to look? Scale is not a must but would be nice to utilize the SED later down the line as I keep expanding this NAS. Thanks.
 
Last edited:

m3m

Dabbler
Joined
Dec 28, 2022
Messages
11
Welp. Found the issue. It's the LSI HBA causing the issue. Don't understand why seeing that it was already in IT-Mode and it worked with Truenas core? I will continue digging and if anyone has any ideas, I'm all ears.
 
Last edited:
Joined
Jun 15, 2022
Messages
674
I would have guessed it was related to the Realtek adapter. I wonder what's causing the HBA to have issues? Is it somehow fighting with the Qlogic 16Gb HBA???
 

m3m

Dabbler
Joined
Dec 28, 2022
Messages
11
This is so weird. Just moved PCI-e slots and the run away thread went away. This board has been doing all kinds of weird shtuff. Anyways. Time to dig in and start doing some testing. Looking forward to this build and have some crazy plans for it. Running file shares as well as iSCSI to both my VMWare lab and Proxmox lab. LET'S DO THIS!!

PS Sorry for the false alarm.
 

m3m

Dabbler
Joined
Dec 28, 2022
Messages
11
I would have guessed it was related to the Realtek adapter. I wonder what's causing the HBA to have issues? Is it somehow fighting with the Qlogic 16Gb HBA???
Thanks for the idea. There is something really weird happening with this ASRock board and going to zero in on slot 2. I have a few other cards to test with to see what happens. Worried that Slot 2 may have a problem and this board is brand spanking new.
 
Joined
Jun 15, 2022
Messages
674
Could the boards be fighting on shared PCIe slots, and moving one of the boards to a non-shared slot solved the problem?
 

m3m

Dabbler
Joined
Dec 28, 2022
Messages
11
That's good question. Diving into it to see what happens. The vendor's block diagram shows that everything has it's own access to the CPU. Let's hope that is right and a flaw didn't show up. Will post when I have more.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
LSI 9400-16i HBA IT-mode(internal connection to SSD)
Switching an HBA to IT mode is not the same as flashing it with IT firmware. If your words were chosen precisely, you may an issue here.
 
Joined
Jun 15, 2022
Messages
674
Switching an HBA to IT mode is not the same as flashing it with IT firmware. If your words were chosen precisely, you may an issue here.
That's an excellent (and helpful) point which I missed. Sometimes those little details are indeed the key to success.

 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Could the boards be fighting on shared PCIe slots, and moving one of the boards to a non-shared slot solved the problem?
Very unlikely on an AMD Epyc, which has 128 PCIe lanes off the CPU. (Unless using dual socket, then it is 64 PCIe lanes off each CPU.)

Also, I think the OP meant this system board - ROMED8-2T. This board lists slot 2 as either 16x lanes or 8x lanes if the NVMe and or OCuLink is in use. All other slots are 16x lanes.
 

m3m

Dabbler
Joined
Dec 28, 2022
Messages
11
Everyone has very good points and I greatly appreciate it.
Switching an HBA to IT mode is not the same as flashing it with IT firmware. If your words were chosen precisely, you may an issue here.
I wish what you were saying was true but it was flashed because if it wasn't I would have problems with Truenas even seeing the drives. All drives should up as available and peliminary tests show decent numbers with different configurations. Currently testing all slots on the mobo, seeing that I sent the board to ASRock for testing only to get the same board back with beta bios, which has me a little concerned. Truenas keeps complaining about "downgraded" links/speed of the pci bridge. Yet, its not just Truenas, its Ubuntu 20/22 and Centos 8. I don't like to working with beta anything.
 

m3m

Dabbler
Joined
Dec 28, 2022
Messages
11
Very unlikely on an AMD Epyc, which has 128 PCIe lanes off the CPU. (Unless using dual socket, then it is 64 PCIe lanes off each CPU.)

Also, I think the OP meant this system board - ROMED8-2T. This board lists slot 2 as either 16x lanes or 8x lanes if the NVMe and or OCuLink is in use. All other slots are 16x lanes.
Actually its an ASRock EPYCD8-2T. - mobo specs - BUT per what you are saying, to which I agree, the blurry block diagram in the manual shows all pcie slots, OCuLink, nvme, mini-sas hd connections have a direct path to the CPU. What is confusing me is the OS says that everything has 16x lanes(causing "downgrading"), that is completely impossible, which makes me think the block diagram isn't completely correct. I have a theory that the OCuLink, mini-sas hd and nvme share lanes or the firmware is providing incorrect information. OCuLink x8 + mini-sas hd x4,+ nvme x4 = 16 lanes. There are two sets of each which would mean 32 lanes total. When I get a U.2 drive from a friend I will be able to test this theory. In the mean time I will be swapping different cards, from different gens, to make sure the slots function like they should. Gut is telling me I might run into problems.
 

m3m

Dabbler
Joined
Dec 28, 2022
Messages
11
Short update. Things just got interesting. Slot 2 looks to have a problem only because I have swapped out 3 pcie cards(10GbE mellanox, Qlogic QLE2672 and 9300-8i) which all caused a single thread max. I even force gen version in the bios which did absolutely nothing. So I have engaged ASRock support to find out what is going on with my slot 2. Used the same three cards on all other ports and no max thread. hm......
 
Last edited:
Top