Scrubbing a very large pool - Strange system load pattern

Status
Not open for further replies.

jallis

Dabbler
Joined
Sep 13, 2016
Messages
12
I have a very large pool of 63 * 8TB disks across 7 raidz2 VDEVs. The server has 384GB RAM and 24 CPU cores.

When running scrub on this pool I see a very strange system load where the system is completely idle for about two hours, and then running at 100% capacity for about two hours, and then repeating this on/off cycle. See attached screenshot.

Is this normal system load for a scrub? Why is it doing this? Will adding *even more* memory help anything at all?

When the system load is at 100% the system is barely usable, can I use the "Resilver Priority" setting to avoid 100% system load during work hours?
 

Attachments

  • 2018-02-23-090452.png
    2018-02-23-090452.png
    39.5 KB · Views: 180

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What version of FreeNAS?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Okay, first of all, 11.1-U0 has several extremely nasty bugs. Definitely update to 11.1-U1 or later ASAP.

I believe you may be hitting one of them in a not-as-serious condition. Some people's pools would cause a near deadlock when scrubbing and effectively halt the system. It seems fairly likely that in your case the deadlock gets solved somehow (your system is rather grunty, brute force might be enough to overcome it).
 

jallis

Dabbler
Joined
Sep 13, 2016
Messages
12
Okay, first of all, 11.1-U0 has several extremely nasty bugs. Definitely update to 11.1-U1 or later ASAP.

I believe you may be hitting one of them in a not-as-serious condition. Some people's pools would cause a near deadlock when scrubbing and effectively halt the system. It seems fairly likely that in your case the deadlock gets solved somehow (your system is rather grunty, brute force might be enough to overcome it).

Thank you for feedback. I'll update as soon as this scrub is finished :)

Are you able to answer my question regarding usage of "Resilver Priority" setting also?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I expect that it will be an academic question after the update. Performance limitations for your applications would be caused by IOPS limitations, in which case you can try the resilver priority option. It's fairly recent, so there's not much data around it yet.
 
Status
Not open for further replies.
Top