Machine running TrueNAS Core 12.0-U8.1 has high CPU

firsway · May 4, 2022

Hi,

I hope somebody will be able to help me figure out this puzzle, please;

The Set-up

I have 2x machines, each running TrueNas Core 12.0-U8.1. They are not exactly the same spec;

Server A

AMD Ryzen 5 3600 6-cores
32GB unbuffered Memory (non-ECC)
8 x 6TB SAS 12Gb split into 2x RAIDZ1 vdevs 4x drives each
1x NVMe Log drive
2 x 10GbaseSR LC-LC Fibre interfaces MTU 9000 Hardware Offloading Enabled

Server B

AMD Ryzen 7 2700x 8-cores
64GB unbuffered ECC Memory
8 x 10TB SAS 12Gb split into 2x RAIDZ1 vdevs 4x drives each
1x NVMe Log drive
1x NVMe Cache drive
2 x 10GbaseSR LC-LC Fibre interfaces MTU 9000 Hardware Offloading Enabled

Each of the above uplinks to a Cisco Nexus 3064 switch also with MTU set to 9000. In turn the Cisco is also linked to 2x HP DL360p servers each equipped with 10GbaseSR cards, and running VSphere 6.7
iSCSi is enabled on both Server A and B, bound to the fibre interfaces on respective VLANs
2x zvols of 6TB and 3TB respectively are created on the singular volume for each server, logical blocksize 512k, and then presented on LUNs to VSphere, where it has created VMFS6 datastores for each.
Multipath IO is set up round-robin IOP=1

The problem

Traffic from A to B and vice-versa, as would be generated by VMotion seems to flow OK and as expected. Performance I feel could be better, but that will involve some more configuration within VSphere.
What I do notice though is when transferring large amounts of data between A and B or from B to A;

Server A CPU usage remains virtually on the floor - maybe 10% max whilst both 10GbaseSR interfaces are showing Outs of appox 50MiB/s each
Server B CPU usage is bobbing up and down ranging from 30% to 100%, the 10GbaseSR interfaces are showing Ins that are similar to Server A Outs i.e. 50Mib/s

The situation can be reversed, by getting traffic to flow from B to A at similar rates. Similar differences in the CPU levels are seen i.e. Server A = Low, Server B = High

I've been racking my brains to try figure out why there would be such a difference in the CPU load. I see no obvious configuration differences across the 2 servers, other than the difference in hardware specs per above.
It might not even be a problem as such, just a "feature", but could anybody please offer me a clue as to why Server B CPU is high? Thanks in advance

Andy (P.S. Please let me know if you need me to provide any other config information)

firsway · May 5, 2022

Any guesses please? Is there any more information I can give to help? Thanks

Morris · May 7, 2022

My only observation is the 1x NVMe Cache drive in B

NIC models and Disk Controller models may matter

Try turning off the cache drive and see what happens

firsway · May 9, 2022

Morris said:
My only observation is the 1x NVMe Cache drive in B

NIC models and Disk Controller models may matter

Try turning off the cache drive and see what happens

Hi, thanks for your input. I will certainly give disabling the cache a go, and report back.
The controllers and NICs across both machines are the same.

Morris · May 9, 2022

With the processors you are using, the mother boards may have restrictions regarding PCIe bus width. Read the manuals carefully and check the slots your NICs and HBA are in

firsway · May 9, 2022

Morris said:
With the processors you are using, the mother boards may have restrictions regarding PCIe bus width. Read the manuals carefully and check the slots your NICs and HBA are in

Doesn't appear to be the cache! I'll check over the hardware. I think the boards are the same too, although I'll double-check the slots. Could ECC memory play a part in what I'm seeing?

Morris · May 9, 2022

firsway said:
Doesn't appear to be the cache! I'll check over the hardware. I think the boards are the same too, although I'll double-check the slots. Could ECC memory play a part in what I'm seeing?

I would not expect ECC to matter unless there are a huge number of errors and I'd expect them to log

mav@ · May 17, 2022

Since you mention high CPU usage as a problem, it would be good to see what is on top of `top -SHIz` output.

What do you mean with "logical blocksize 512k"? WebUI does not even allow so big block sizes for ZVOLs. For RAIDZ1 I'd recommend ZVOL block size of 32KB. Though for virtualization HDD pools we generally recommend mirrors to get maximum possible IOPS.

Important Announcement for the TrueNAS Community.

Machine running TrueNAS Core 12.0-U8.1 has high CPU

firsway

Dabbler

firsway

Dabbler

Morris

Contributor

firsway

Dabbler

Morris

Contributor

firsway

Dabbler

Morris

Contributor

mav@

iXsystems

Similar threads

Important Announcement for the TrueNAS Community.

Machine running TrueNAS Core 12.0-U8.1 has high CPU

Dabbler

Dabbler

Contributor

Dabbler

Contributor

Dabbler

Contributor

iXsystems

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Machine running TrueNAS Core 12.0-U8.1 has high CPU"

Similar threads