CPU usage spikes every 20 seconds

tprelog

Patron
Joined
Mar 2, 2016
Messages
297
After upgrading to TrueNAS-SCALE-23.10-BETA.1 I'm seeing a spike in CPU usage every 20 seconds.
EDID_CPU-history.png


I noticed EDID block 0 is all zeroes also appears in the console every 20 seconds.

EDID_console.png


Ticket filed


Reverting to SCALE-22.12.3.1
 

tprelog

Patron
Joined
Mar 2, 2016
Messages
297
EDID Block seems to be related to monitor and integrated GPU and Kernel 6.1

Reading through that thread, there is a link to the same issue in the Unraid forum. Basically, the OP happens to be using the same MB and CPU as me.

6.11 update > EDID block 0 is all zeroes

Following that thread, I tried isolating my iGPU, which seems to be the workaround people were successful with. Connecting monitors and/or blackisting kernel modules seem to be no help.

Trying to isolate my iGPU fails with the following message:

Code:
0000:00:02.0 GPU pci slot(s) consists of devices which cannot be isolated from host


I'm not running a media server right now, so the iGPU is not needed for transcoding atm. I went into the BIOS and disabled it. Now the EDID message is gone, and the CPU is back to normal.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Reading through that thread, there is a link to the same issue in the Unraid forum. Basically, the OP happens to be using the same MB and CPU as me.

6.11 update > EDID block 0 is all zeroes

Following that thread, I tried isolating my iGPU, which seems to be the workaround people were successful with. Connecting monitors and/or blackisting kernel modules seem to be no help.

Trying to isolate my iGPU fails with the following message:

Code:
0000:00:02.0 GPU pci slot(s) consists of devices which cannot be isolated from host


I'm not running a media server right now, so the iGPU is not needed for transcoding atm. I went into the BIOS and disabled it. Now the EDID message is gone, and the CPU is back to normal.

So it might be related to the specific iGPU.
Is it integrated Intel HD Graphics P530?
 

tprelog

Patron
Joined
Mar 2, 2016
Messages
297
Is it integrated Intel HD Graphics P530?
Yep, that's the one.

My server specs are in my sig. But I don't think that shows up on mobile, so here are the basics again, just in case.
Code:
Supermicro X11SSH-F
Intel Xeon CPU E3-1245 v5 @ 3.50GHz
32GB ( 2x Samsung M391A2K43BB1-CPB 16GB DDR4-2133 ECC )
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
EDID (https://en.wikipedia.org/wiki/Extended_display_identification_data)

Is specifically related to displays. Based on the error printing, I am assuming you don't have a monitor plugged in? Consumer GFX cards (including onboard) all seem to have periodically had this problem or similar ones over the years. I've had cards that, when passed through to a VM in ESXI, would refuse to work and throw driver errors in Windows. This seems similar.
IIRC this type of problem has impacted certain generations of nvidia and AMD cards, and isn't specific to Intel.

Not sure what the current state is but two things to try:
  1. What happens if you plug in a monitor?
  2. What happens if you don't have a monitor plugged in?
 

tprelog

Patron
Joined
Mar 2, 2016
Messages
297
Hi Nick,
Normally, I have a monitor connected (turned off unless in use) and access through the remote console, where I grabbed the screenshot. I tried all combinations of power on/off and monitor plugged vs. unplugged, but with iGPU enabled I have the same EDID message - When I disable iGPU, the monitor can be connected without the EDID error.
 

tprelog

Patron
Joined
Mar 2, 2016
Messages
297
I just noticed the specs for my CPU do not list VGA as a supported output. I'm guessing that's why the EDID is all zeros.

1693138436407.png

My Motherboard only has VGA, and I still get graphics from the onboard ASPEED AST2400 BMC


At some point, I will need to enable the iGPU to use Quick Sync.
Should I file a separate Jira ticket for the inability to isolate the iGPU?
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Clever! So this is interesting because I have a ASPEED controller and a external graphics card but I never ran into this issue. Perhaps iGPUs are handled differently, whether on purpose or not. I also have been passing through this card since day 1 and I don't have an iGPU that doesn't have VGA support on a board with only VGA...So I think your issue may be an "edge case"?

Given the weird join of these two pieces of information...perhaps TrueNAS is trying, and failing, to use the VGA output on your board THROUGH the iGPU...I think we are onto something.

What happens if you mess around with this:
1693144624569.png
 

tprelog

Patron
Joined
Mar 2, 2016
Messages
297
What happens if you mess around with this:

With the iGPU enabled, I see both GPUs.
GPU.png

I am able to isolate the onboard ASPEED graphics without any error, but that kills my monitor and remote console once TrueNAS boots.

When I try to isolate the iGPU, I get the following message
Code:
0000:00:02.0 GPU pci slot(s) consists of devices which cannot be isolated from host
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
With the iGPU enabled, I see both GPUs.
View attachment 69728

I am able to isolate the onboard ASPEED graphics without any error, but that kills my monitor and remote console once TrueNAS boots.

When I try to isolate the iGPU, I get the following message
Code:
0000:00:02.0 GPU pci slot(s) consists of devices which cannot be isolated from host
IMO You're hitting what is most likely one of two problems. Either their a bug at the driver/kernel level or theirs a bug in how TrueNAS is handling this edge case. I'm still not sure which. Given the problems in UNRAID as well it's probably not TrueNAS?

Do you see any any clues in /var/log/messages or /var/log/middlewared.log
During the time you were messing around with this setting? Also, I don't remember what the new locations of those files is in the new debug/logging system but they will be in different places.

It may give us a better idea of whats happening. If you want my help PM me a debug, or just post some snippets from those logs when you are messing around.

EDIT: Looks like you're already down the same rabbit hole in your bug ticket. Sorry :) But I think we have more context now than you did last week and I don't think you ever uploaded a debug to your bug report.
 
Last edited:

tprelog

Patron
Joined
Mar 2, 2016
Messages
297
I don't think you ever uploaded a debug to your bug report.

I did, but I used the private upload link.

I've been poking around in a fresh debug, but I don't see anything helpful (at least nothing meaningful to me) - I'll PM you a copy, maybe you'll see something I don't. And thanks for mentioning that new debug/logging system. It really is helpful and not something I would have not thought to look through prior to your resource.
 

bcat

Explorer
Joined
Oct 20, 2022
Messages
84
I had a similar issue with my setup (ASRock Rack E3C246D4M-4L mobo with Intel E-2276G CPU) after upgrading to Cobia. Hardware transcoding worked fine, but dmesg was spammed with "EDID block 0 is all zeroes" errors every few seconds.

My suspicion is indeed that the issue comes from having a "headless" iGPU not actually connected to any video outputs. (The mobo's VGA output is connected to the BMC, and can be used at the same time as the iGPU is used for transcoding.)

Blacklisting the i195 driver or setting i915.modeset=0 resolves the log spam, but prevents hardware transcoding from working, so that's not a useful option for me. But I found a different option that seems to work: i915.disable_display=1. Set it as follows:

Code:
$ sudo midclt call system.advanced.update '{"kernel_extra_options": "i915.disable_display=1"}'

In my (admittedly limited) testing so far, this option displays the i915 driver's polling for connected displays, but preserves transcoding support. It resolves the kernel log spam issue, and hardware transcoding still works correctly in Plex. So if anyone else runs into this issue, I suggest giving this kernel parameter a try. (No need to isolate the GPU, pass it through to a VM, or anything else fancy.)
 
Last edited:

tprelog

Patron
Joined
Mar 2, 2016
Messages
297
Thanks for the tip, @bcat - I will try this ASAP — hopefully sometime this week. For now, I'm back on Bluefin.
 

carp3-noctem

Cadet
Joined
Jun 21, 2019
Messages
3
I'm not running a media server right now, so the iGPU is not needed for transcoding atm. I went into the BIOS and disabled it. Now the EDID message is gone, and the CPU is back to normal.
Hey,
running the same board, having the same message displayed.
For now I have done this as well. I don't need the transcoding or anything, thanks for sharing!

With the EDID message displayed in the shell and copy of data to my System, I also did get constant freezes of the trueNAS application and needed to force a reset to the Server. They are gone for now. Never have any logs with entry of anything, and just found this when looking into the iKVM screen, that each time it freezes it shows the EDID message.
 

tprelog

Patron
Joined
Mar 2, 2016
Messages
297
But I found a different option that seems to work: i915.disable_display=1.
Thank you for this! It has also stopped the "EDID block 0 is all zeroes" errors and CPU spikes for me.

Unfortunately, I can't go any further to test transcoding since I'm still unable to isolate my iGPU so I can pass it through to a VM (Sorry, the whole "apps" thing just doesn't fit my use case) - I'm watching the Unable to isolate iGPU... thread now.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Testing this now. I am hoping this resolves instability. I migrated from Core to SCALE and after a day or so, SCALE would stop responding entirely - no UI, no SSH. The only obvious error message was this one. One step at a time. If this change doesn't resolve the instability, it's on to more troubleshooting.
 

Vertex

Cadet
Joined
Jan 14, 2023
Messages
8
I have a QNAP TS-664-4G running TrueNAS-SCALE-23.10.2 (I migrated from Core a few weeks ago) and I had the same issue.
Code:
Feb 28 17:46:30 truenas kernel: EDID block 0 is all zeroes
Feb 28 17:46:31 truenas kernel: EDID block 0 is all zeroes
Feb 28 17:46:54 truenas kernel: EDID block 0 is all zeroes

Code:
lspci -k
00:02.0 VGA compatible controller: Intel Corporation JasperLake [UHD Graphics] (rev 01)
    DeviceName: Onboard - Video
    Subsystem: Intel Corporation JasperLake [UHD Graphics]
    Kernel driver in use: i915
    Kernel modules: i915

I cannot isolate the GPU too
Bildschirmfoto 2024-03-02 um 10.07.22.png


@bcat Your code solved the issue for me - thank you very much
Code:
$ sudo midclt call system.advanced.update '{"kernel_extra_options": "i915.disable_display=1"}'

[EDIT] Just for reference, here is the CPU graph before and after:

Bildschirmfoto 2024-03-02 um 10.14.12.png
 
Last edited:
Top