High load on idle server

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
I decided to test out Scale by converting my backup server. It has no apps, virtualization, shares, etc on it. I only use it as a replication target for my primary server which runs Core. Replication tasks are scheduled for overnight. It is configured to use a remote syslog server.

The backup server is a Xeon E3-1220v3. The CPU reporting shows that it's mostly idle, but the load never drops below 1 for any of them. The short term occasionally spikes to 2-3 for a minute or two but then drops back down to around 1. I'm not sure what's causing the spikes.

Any suggestions for what to look for? Nothing obvious is jumping out at me from top. I don't recall having this problem with Core. I'm currently running 23.10.1.1 because 23.10.1.3 broke my drive temp reporting.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Please post a screen capture showing what you are talking about. I typically would not get too excited about small SCALE issues at this time, in my opinion it still has a ways to go before I migrate to it. I run tests using SCALE but with ZFS problems and migrations, I will not use it. Maybe in another year it will be mature enough. However you can look at Jira to see if there is a problem reported on this issue.

EDIT: If you unplug the network cable, does the activity go away? Just in case it is something on the network.
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
Please post a screen capture showing what you are talking about. I typically would not get too excited about small SCALE issues at this time, in my opinion it still has a ways to go before I migrate to it. I run tests using SCALE but with ZFS problems and migrations, I will not use it. Maybe in another year it will be mature enough. However you can look at Jira to see if there is a problem reported on this issue.

EDIT: If you unplug the network cable, does the activity go away? Just in case it is something on the network.

This is what I'm talking about. Not much is going on and yet my load never drops below 1. I'm pretty sure the spikes in the top graph are just my browsing the UI.

I have not tried unplugging the network cable as this is my backup NAS and that would prevent the backups and my checking the load.

scale cpu.jpg
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I have not tried unplugging the network cable as this is my backup NAS and that would prevent the backups and my checking the load.
Just unplug it for 1 hour, see what the statistics look like then. If they drop then you know it's some kind of network traffic. It doesn't need to be specifically related to the NAS operations, it could be a computer on your network asking who it out there.

So what are you focusing on here? CPU Idle (first posting) or System Load (last posting)?

To understand what all these values are, a Google Search of "debian cpu iowait" will help you understand I/O Wait.
For Nice search "debian cpu nice".
For System Load Average, same kind of thing.

Can you see what I'm doing here? Look up all the items and you will have a better idea what they mean. If you still feel there is a problem, you will need to disable/isolate things such as services, one at a time. I doubt anyone will just tell you "the problem is your RealTek NIC". If you have one, it actually could be the problem.

If you need further help, please follow the forum rules and post your system specs. It may or may not help us out but it will not hurt.
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
I'm not sure why you're confused about what I'm concerned about. As I said, the cpu is idle yet I have a high load. None of my Core systems do that. Even my main NAS running all of my jails doesn't, although admittedly it has a lot more cores.

I ran Core on this server for a decade, starting when it was originally my primary NAS and then continuing when it moved to backup status. https://www.cjross.net/my-freenas-build/ I'm familiar with TrueNAS requirements and know the common gotchas.

From what I can tell it's something to do with Scale and irrespective of the hardware. If I can find it, I might plug the old Core drive in and see what it shows, but more likely I'll come back to it whenever I convert another one of my Core boxes to Scale.

There's no service degradation that I've noticed so it's not been high on my list to troubleshoot. It's just concerning as it shouldn't be doing that and probably wasting power. Which is a something that I'm curious about between Core and Scale for the same loads, but again, low on my list of priorities.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I am running SCALE 23.10.1.3 and do not have the same issue, see my screenshot. The only burp was when I logged into the system at 16:30. System_Load.jpg

As I said, the cpu is idle yet I have a high load.
My definition of "High Load" and yours are quite different but I understand what you are saying, it is not what you want and you would like to figure out why it is higher than basically zero.

I recommend that you install 23.10.1.3 in spite of the Temperature being broken jus to test if the problem is corrected. Also I already said that you could disable your services one at a time to see if the issue correlates to one of those.
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
I am running SCALE 23.10.1.3 and do not have the same issue, see my screenshot. The only burp was when I logged into the system at 16:30. View attachment 75519

Interesting. I will admit that I only have the current sample size of one. I'll probably revisit once I've loaded Scale on another machine or two.

My definition of "High Load" and yours are quite different but I understand what you are saying, it is not what you want and you would like to figure out why it is higher than basically zero.

If there were things running on the server I wouldn't necessarily consider this high load. But for a system that spends 90+ percent of it's time idle, I consider a constant load of 1-3 to be high.

I recommend that you install 23.10.1.3 in spite of the Temperature being broken jus to test if the problem is corrected. Also I already said that you could disable your services one at a time to see if the issue correlates to one of those.

I'll probably just update to Dragonfish instead. It's a backup server so as long as snapshot replication and scrubs are working I'm not too concerned about other issues. The temperature problem just happened to occur right as I was replacing all of the fans and trying to make sure the drives were staying cool.

In regards to services, I'm only running SMART, SSH, and UPS. I did create a VM a while back but it's been turned off for weeks.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Have you tried looking at running "htop" with CLI and do a bit of investigating.
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
Have you tried looking at running "htop" with CLI and do a bit of investigating.

Yes. That was the very first thing I did. There's nothing obviously running. As shown in the percentage graph, there's not much showing as using the cpu.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Yes. That was the very first thing I did. There's nothing obviously running. As shown in the percentage graph, there's not much showing as using the cpu.
Are you using the default "htop" settings?
I find if you press "F2" you can change how "htop" behaves (refresh rate, Tree view...).
You can have a very stable tree and more clearly see which command/function is used.
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
Are you using the default "htop" settings?
I find if you press "F2" you can change how "htop" behaves (refresh rate, Tree view...).
You can have a very stable tree and more clearly see which command/function is used.

I was actually just using top as I forgot htop was installed. Also, I didn't realize htop not did IO. When did that happen?

htop shows no IO activity and very little CPU. The only things of note are some spikes from middleware and occasionally smaller ones from netdata. htop is using more CPU than netdata is.
 
Top