Correct tunable to reduce out of control scrub CPU usage (z_rd_int)?

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
I've got a 4 core CPU + 256 GB RAM, and scrub on 2 pools is leaving almost no resources at all for user activity. SSH and Web UI sessions cant connect, the shell prompt and console key presses take around 3~12 seconds to appear after keys are pressed, nice --5 doesnt seem to help. It's not RAM starvation as top says it's got 115GB free.

Pausing scrub fixes it immediately, but I need scrub to be able to work in the background so it's not really a solution.

Top says its related to z_rd_int, and there are threads on Google saying that indeed this can max out CPU. But I can't find a thread that says, specifically, on FreeBSD, how to tune it back under control. The only relevant tunables I can find seem to be vdev.scrub_max_active (amount of simultaneous IO that scrub can issue, as z_rd_int sounds like scrub IO) and sync_taskq_batch_pct which controls max % of CPU the queue can use. They don't seem to be helping.

So I swapped the 4 core Xeon for an 8 core Xeon and immediately scrub now wants to consume 100% of 8 cores not 100% of 4 cores. Great win!

The server is totally idle otherwise - no local tasks, no client use, other than a console session and scrubbing.

Any suggestions as to which tunables might be encouraging this excessive resource use, or able to bring it under control, would be really helpful.

IMG_20200811_101932 v2.jpg

(8 core / 16 thread Xeon)
Mostly free CPU is 0.0%, this is a rare time its nonzero at 0.4%​
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Doesn't help.
I've tried all tunables I can find that could bring scrub under control without affecting other operations. Right now I'm at:

vfs.zfs.vdev.scrub_min_active=0 (down from 1, essentially allowing it to do nothing)
vfs.zfs.vdev.scrub_max_active=1 (down from 2)
vfs.zfs.scan_vdev_limit=50000 (down from 40M)


and it's still using 100% of an 8 core Xeon v4 for scrub to the point that WebUI, SSH and largely, keystrokes in the console, fail or are greatly delayed.

Probably because scrub controls focus on I/O, not the implied CPU hashing/checksumming demands??

Any ideas appreciated
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
Doesn't help.
I've tried all tunables I can find that could bring scrub under control without affecting other operations. Right now I'm at:

and it's still using 100% of an 8 core Xeon v4 for scrub to the point that WebUI, SSH and largely, keystrokes in the console, fail or are greatly delayed.

Probably because scrub controls focus on I/O, not the implied CPU hashing/checksumming demands??

Any ideas appreciated
My main system doesn't even break a sweat ( well, the disks get hotter) when doing a scrub... CPU is quite low and the system is perfectly usable throughout the process (nearly 18 hours these days).

I note you use the same HBA as I do... LSI 9305-24i (PCIe 3.0) IT mode, no boot/efi

I am running an insanely old firmware on it (just never updated from factory), but I note the thread from @JoshDW19 regarding the lockups of that series of card and the special .12 version of the latest firmware (which is not available for our specific card... at least not yet)... see here:


Maybe there's something linked to that since your card will be under maximum load during a scrub and maybe the resets then cause CPU max in attempts to reconnect and catch up once the card comes back? just a shot in the dark.

Also to note my very old version is not impacted by the reset, so I can't speak from direct experience on it.
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
My main system doesn't even break a sweat ( well, the disks get hotter) when doing a scrub... CPU is quite low and the system is perfectly usable throughout the process (nearly 18 hours these days).

I note you use the same HBA as I do... LSI 9305-24i (PCIe 3.0) IT mode, no boot/efi

I am running an insanely old firmware on it (just never updated from factory), but I note the thread from @JoshDW19 regarding the lockups of that series of card and the special .12 version of the latest firmware (which is not available for our specific card... at least not yet)... see here:


Maybe there's something linked to that since your card will be under maximum load during a scrub and maybe the resets then cause CPU max in attempts to reconnect and catch up once the card comes back? just a shot in the dark.

Also to note my very old version is not impacted by the reset, so I can't speak from direct experience on it.
Interesting.

Almost all (but not quite all) my HDDs are SAS. There are a few SATA.

# sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

Adapter Selected is a Avago SAS: SAS3224(A1)

Controller Number : 0
Controller : SAS3224(A1)
PCI Address : 00:04:00:00
SAS Address : 500062b-2-02f0-4680
NVDATA Version (Default) : 10.00.00.03
NVDATA Version (Persistent) : 10.00.00.03
Firmware Product ID : 0x2228 (IT)
Firmware Version : 16.00.01.00 <---------------- below 16.00.12.00
NVDATA Vendor : LSI
NVDATA Product ID : SAS9305-24i
BIOS Version : 08.37.00.00
UEFI BSD Version : 18.00.00.00
FCODE Version : N/A
Board Name : SAS9305-24i
Board Assembly : 03-25699-02004
Board Tracer Number : SP73907428

Finished Processing Commands Successfully.
Exiting SAS3Flash.



I'd like to try it..... it seems a long shot that this is the issue, but worthwhile anyway. But I'm waiting to hear if it affects 9305 as well as 9300 series, and if @JoshDW19 knows of a firmware for the 9305-24i since the resource only has firmwares for up to 16 port cards. So I can't test if that helps, until I have a firmware to flash.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
Let's try it..... it seems a long shot that this is the issue, but worthwhile anyway.
Well, not much point until there's a version we can try... but I agree, it's interesting to see if it will make a difference (assuming LSI comes to the party).

Just for the sake of airing my shame (at exactly how old the firmware is...):

Code:
root@freenas:/mnt # sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

    Adapter Selected is a Avago SAS: SAS3224(A1)

    Controller Number              : 0
    Controller                     : SAS3224(A1)
    PCI Address                    : 00:03:00:00
    SAS Address                    : 500062b-2-0299-0c80
    NVDATA Version (Default)       : 00.25.00.03
    NVDATA Version (Persistent)    : 00.25.00.03
    Firmware Product ID            : 0x2228 (IT)
    Firmware Version               : 09.00.100.00
    NVDATA Vendor                  : LSI
    NVDATA Product ID              : SAS9305-24i
    BIOS Version                   : 08.27.00.00
    UEFI BSD Version               : 13.00.00.00
    FCODE Version                  : N/A
    Board Name                     : SAS9305-24i
    Board Assembly                 : 03-25699-02004
    Board Tracer Number            : SP63808066

    Finished Processing Commands Successfully.
    Exiting SAS3Flash.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
Check out the posts above yours in the patch thread... from @WaltR... seems to have been directed to this issue by engineering and has a 9305.
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Check out the posts above yours in the patch thread... from @WaltR... seems to have been directed to this issue by engineering and has a 9305.
Confused what this is about. Its just a post from one more person (WaltR) with a 9305 looking for a possible firmware. Not a solution or a firmware for that chip? No useful content? Or am I missing something?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
Confused what this is about. Its just a post from one more person (WaltR) with a 9305 looking for a possible firmware. Not a solution or a firmware for that chip? No useful content? Or am I missing something?
He's referring to the fact that engineering referred him, so is some kind of confirmation that there should be a .12 version somewhere for our card. That's all. (since you had asked if it was applicable to the 9305 in your question to Josh)
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Stiliez,

Did you find any workarounds for this issue? I'm wondering if this exhibits more because you have a reasonably large amount of RAM in your system? Scrub uses all that RAM to create sequential I/O that then overwhelms the CPU.

How many VDEVs and drives are there - 9 VDEVs and 18 drives?

Morgan
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Stiliez,

Did you find any workarounds for this issue? I'm wondering if this exhibits more because you have a reasonably large amount of RAM in your system? Scrub uses all that RAM to create sequential I/O that then overwhelms the CPU.

How many VDEVs and drives are there - 9 VDEVs and 18 drives?

Morgan
No workaround yet. 1 pool, 4 x 3 way mirrored SAS3 7200 enterprise drives for data, and twin mirrored NVMe optane 905p 480G for special.

Other disks exist for other test pools and twin mirror SSD boot pool, but only that one data pool is active - the rest are static with no data or scrub activity at the time. So excluding the boot device, 5 vdevs 14 drives on that pool.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
There is a proposed OpenZFS fix under review. If it passes review it may make it into 12.0-U4, but there is a lot of testing it has to pass.
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
There is a proposed OpenZFS fix under review. If it passes review it may make it into 12.0-U4, but there is a lot of testing it has to pass.
Yah, I saw on Jira. Partial fix, its cautiously described as.
Well done that Motin!!
And thank you team!
 

chruk

Dabbler
Joined
Sep 4, 2021
Messages
27
Hi guys, the reason vfs.zfs.vdev.scrub_min_active=0 is ineffective is because the min value is 1.

I have had issues with scrub strangling i/o on openzfs machines that run on spindles, older zfs used to allow scrub to pause when was activity but for some reason this got removed, and I have not been find a way to imitate the old behaviour.

However there is ways to reduce the cpu usage.

You can revert scrub to a legacy mode which stops it sorting out metadata first, this is cpu intensive so should help. It is the vfs.zfs.scan_legacy variable.
 
Last edited by a moderator:
Top