Machine running TrueNAS Core 12.0-U8.1 has high CPU

firsway

Dabbler
Joined
Oct 20, 2018
Messages
32
Hi,

I hope somebody will be able to help me figure out this puzzle, please;

The Set-up

I have 2x machines, each running TrueNas Core 12.0-U8.1. They are not exactly the same spec;

Server A

AMD Ryzen 5 3600 6-cores
32GB unbuffered Memory (non-ECC)
8 x 6TB SAS 12Gb split into 2x RAIDZ1 vdevs 4x drives each
1x NVMe Log drive
2 x 10GbaseSR LC-LC Fibre interfaces MTU 9000 Hardware Offloading Enabled
Server B

AMD Ryzen 7 2700x 8-cores
64GB unbuffered ECC Memory
8 x 10TB SAS 12Gb split into 2x RAIDZ1 vdevs 4x drives each
1x NVMe Log drive
1x NVMe Cache drive
2 x 10GbaseSR LC-LC Fibre interfaces MTU 9000 Hardware Offloading Enabled

Each of the above uplinks to a Cisco Nexus 3064 switch also with MTU set to 9000. In turn the Cisco is also linked to 2x HP DL360p servers each equipped with 10GbaseSR cards, and running VSphere 6.7
iSCSi is enabled on both Server A and B, bound to the fibre interfaces on respective VLANs
2x zvols of 6TB and 3TB respectively are created on the singular volume for each server, logical blocksize 512k, and then presented on LUNs to VSphere, where it has created VMFS6 datastores for each.
Multipath IO is set up round-robin IOP=1

The problem

Traffic from A to B and vice-versa, as would be generated by VMotion seems to flow OK and as expected. Performance I feel could be better, but that will involve some more configuration within VSphere.
What I do notice though is when transferring large amounts of data between A and B or from B to A;

Server A CPU usage remains virtually on the floor - maybe 10% max whilst both 10GbaseSR interfaces are showing Outs of appox 50MiB/s each
Server B CPU usage is bobbing up and down ranging from 30% to 100%, the 10GbaseSR interfaces are showing Ins that are similar to Server A Outs i.e. 50Mib/s

The situation can be reversed, by getting traffic to flow from B to A at similar rates. Similar differences in the CPU levels are seen i.e. Server A = Low, Server B = High

I've been racking my brains to try figure out why there would be such a difference in the CPU load. I see no obvious configuration differences across the 2 servers, other than the difference in hardware specs per above.
It might not even be a problem as such, just a "feature", but could anybody please offer me a clue as to why Server B CPU is high? Thanks in advance


Andy (P.S. Please let me know if you need me to provide any other config information)
 

firsway

Dabbler
Joined
Oct 20, 2018
Messages
32
Any guesses please? Is there any more information I can give to help? Thanks
 

Morris

Contributor
Joined
Nov 21, 2020
Messages
120
My only observation is the 1x NVMe Cache drive in B

NIC models and Disk Controller models may matter

Try turning off the cache drive and see what happens
 

firsway

Dabbler
Joined
Oct 20, 2018
Messages
32
My only observation is the 1x NVMe Cache drive in B

NIC models and Disk Controller models may matter

Try turning off the cache drive and see what happens
Hi, thanks for your input. I will certainly give disabling the cache a go, and report back.
The controllers and NICs across both machines are the same.
 

Morris

Contributor
Joined
Nov 21, 2020
Messages
120
With the processors you are using, the mother boards may have restrictions regarding PCIe bus width. Read the manuals carefully and check the slots your NICs and HBA are in
 

firsway

Dabbler
Joined
Oct 20, 2018
Messages
32
With the processors you are using, the mother boards may have restrictions regarding PCIe bus width. Read the manuals carefully and check the slots your NICs and HBA are in
Doesn't appear to be the cache! I'll check over the hardware. I think the boards are the same too, although I'll double-check the slots. Could ECC memory play a part in what I'm seeing?
 

Morris

Contributor
Joined
Nov 21, 2020
Messages
120
Doesn't appear to be the cache! I'll check over the hardware. I think the boards are the same too, although I'll double-check the slots. Could ECC memory play a part in what I'm seeing?

I would not expect ECC to matter unless there are a huge number of errors and I'd expect them to log
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Since you mention high CPU usage as a problem, it would be good to see what is on top of `top -SHIz` output.

What do you mean with "logical blocksize 512k"? WebUI does not even allow so big block sizes for ZVOLs. For RAIDZ1 I'd recommend ZVOL block size of 32KB. Though for virtualization HDD pools we generally recommend mirrors to get maximum possible IOPS.
 
Top