ZFS "ARC" is getting smarter with version 2.2+ :smiley_face_emoji:

Joined
Oct 22, 2019
Messages
3,641
This is a continuation and update of this thread:

Looks like OpenZFS 2.2+ introduces some cleaned up and rewritten code¹ to more intelligently and gracefully handle ARC data / metadata eviction from RAM. It's such a major change to the code that it will not be backported to version 2.1.10.

We'll also be able to use a new tunable parameter² named zfs_arc_meta_balance, which defaults to a value of "500". (Any value over “100” prioritizes metadata, and the higher you set this value, the lower the pressure for metadata eviction. Rest assured, it can never be set to a value that will absolutely prevent metadata eviction.)

In other words, we should see less abrupt metadata eviction as the ARC "adapts" to your workflow, and there's always the option to tweak this value higher or lower to guide the ARC's logic to best suit your general needs. :cool:

This is a feature I'm happily looking forward to!

If it works as intended, not only will it address a longstanding issue with aggressive metadata eviction, but it also keeps things simpler and cleaner with less tweaking involved. The ARC will behave more closely to its namesake of being truly "adaptive".


Here is a graphical representation of how to use this new parameter, as described by its developers:

zfs_arc_meta_balance-dial.png


Because it defaults to a value of 500, this means that ZFS 2.2.x prioritizes metadata in the ARC, without any adjustment or tweaking. The idea is that you can "dial it lower or higher", depending on your needs. (Any value below 100 will favor data over metadata.)


[1] https://github.com/openzfs/zfs/pull/14359

[2] https://openzfs.github.io/openzfs-docs/man/4/zfs.4.html#zfs_arc_meta_balance
 
Last edited:

NickF

Guru
Joined
Jun 12, 2014
Messages
763
As someone who is a strong supporter of the metadata special device VDEV, this is an interesting development. Rather than worry about this problem I threw hardware at it and the results have been impressive.
 
Joined
Sep 10, 2023
Messages
2
This is a continuation and update of this thread:

Looks like OpenZFS 2.2+ introduces some cleaned up and rewritten code¹ to more intelligently and gracefully handle ARC data / metadata eviction from RAM. It's such a major change to the code that it will not be backported to version 2.1.10.

We'll also be able to use a new tunable parameter² named zfs_arc_meta_balance, which defaults to a value of "500". (Any value over “100” prioritizes metadata, and the higher you set this value, the lower the pressure for metadata eviction. Rest assured, it can never be set to a value that will absolutely prevent metadata eviction.)

In other words, we should see less abrupt metadata eviction as the ARC "adapts" to your workflow, and there's always the option to tweak this value higher or lower to guide the ARC's logic to best suit your general needs. :cool:

This is a feature I'm happily looking forward to!

If it works as intended, not only will it address a longstanding issue with aggressive metadata eviction, but it also keeps things simpler and cleaner with less tweaking involved. The ARC will behave more closely to its namesake of being truly "adaptive".

[1] https://github.com/openzfs/zfs/pull/14359

[2] https://openzfs.github.io/openzfs-docs/man/4/zfs.4.html#zfs_arc_meta_balance

Hi there, can you help me with four questions:

1. As the recent update has made ARC much smarter, how does your performance now on the latest update compare to back when you could add a hard limit? Is it equally as fast, or is ARC still evicting some metadata, or is it miraculously faster?

2. Given the benefits realized from forcing ARC to NOT EVICT metadata so aggressively, is there really a need for a Special Metadata device?

3. Would a Special Metadata device be faster than caching metdata in ARC?

4. Have you tried the new tuneable? How well do you like it, compared to the previous tuneable?
 
Joined
Oct 22, 2019
Messages
3,641
Hi there, can you help me with four questions:
You only get two questions. To unlock this limitation, please subscribe to the premium plan. Use code WINNIE20 at checkout for 20% off your first year.


1. As the recent update has made ARC much smarter, how does your performance now on the latest update compare to back when you could add a hard limit? Is it equally as fast, or is ARC still evicting some metadata, or is it miraculously faster?
This was actually a regression that was corrected with another hotfix release. It wasn't affecting metadata only: it affected the ARC in general. Massive evictions were hurting performance and RAM efficiency. It's no longer an issue now with the latest releases of TrueNAS Core and SCALE. (It's also unrelated to the aggressive metadata eviction issue.)


2. Given the benefits realized from forcing ARC to NOT EVICT metadata so aggressively, is there really a need for a Special Metadata device?
That's a matter of opinion. In my opinion, I found no need for a Special (Metadata) VDEV. It requires an additional device, which also requires redundancy, and introduces another complexity into the pool. (If you lose the Special VDEV, you lose the pool.) Once aggressive metadata eviction was resolved (with the tuneable), the ARC did what it does best and leveraged the speed of RAM for ideal performance. Other users may in fact opt for a Special VDEV, but I implore them to first resolve the ARC issue, and then gauge if they really do need a Special VDEV or L2ARC.

Always increase your RAM if possible (and affordable) before tinkering with Special VDEVs. (As it stands now, you can play with the tuneable in the original thread to figure out a sweet spot to prevent aggressive metadata eviction. However, this will soon become moot since such tweaks will be replaced by a simplified universal "dial" in OpenZFS 2.2. (See the first post in this thread.)


3. Would a Special Metadata device be faster than caching metdata in ARC?
ARC lives in RAM and thus it's always faster than an HDD, SSD, or NVMe. This same principle also applies to an L2ARC vdev, i.e, "secondary cache".


4. Have you tried the new tuneable? How well do you like it, compared to the previous tuneable?
OpenZFS 2.2 has not been released yet. As of last week, it's in "release candidate" RC4 stage, so probably very soon. I'm definitely going to share my experience with it once it lands in TrueNAS Core. (Can't speak for SCALE, sorry.)


Keep in mind that even when OpenZFS 2.2 lands in Core and SCALE, the memory management is different in SCALE. So even with this new tuneable, you might not see an obvious benefit for SCALE. (For example, SCALE currently limits the ARC's maximum to 50% of available RAM. This not only affects metadata eviction, but how much overall data can be held in ARC altogether.)
 
Last edited:
Joined
Sep 10, 2023
Messages
2
Thanks for the response.

It provided some much needed clarity. From what you've mentioned, the new version of ZFS has not been released. Meaning, ARC behavior is still the same as it was from your initial post - aggressive metadata evictions.

I'm on scale, and I'd like to try the tuneable for assigning ram to the metadata in ARC.

Are you familiar with the process for Scale?
I've tried searching online for someone implementing this on scale but haven't had much luck.
 
Joined
Oct 22, 2019
Messages
3,641
Meaning, ARC behavior is still the same as it was from your initial post - aggressive metadata evictions.
Correct. Unless you use the tuneable (from the original thread.)


I'm on scale, and I'd like to try the tuneable for assigning ram to the metadata in ARC.

Are you familiar with the process for Scale?
Since SCALE is based on Linux, the name / format is a bit different. The correct format was discovered halfway through the original thread.


You'll have to manually invoke (as the root user):
Code:
echo 4294967296 > /sys/module/zfs/parameters/zfs_arc_meta_min

* Replace 4294967296 with the number in "bytes" for this value.

For example:
  • 4294967296 = 4 GiB
  • 6442450944 = 6 GiB
  • 8589934592 = 8 GiB

To persist across reboots, you can have this command run as a "Pre-Init" script. Here is the particular post for reference, which works on SCALE.

The Init/Shutdown page in TrueNAS SCALE
AddInitShutdownScriptConfigScreen.png




On TrueNAS Core 13.0-U5.3, I'm sticking with 6 GiB, since I greatly prioritize metadata over everything else in RAM. It's worked wonderfully. In reality, total metadata in the ARC never exceeded 4 GiB (yet). But 6 GiB seems like a nice value with room to breathe, and it doesn't affect userdata in the ARC. The 6 GiB isn't a "hard reservation", but rather a threshold that when exceeded, aggressive eviction of metadata will commence. Even though only 4 GiB of metadata actually lives in my ARC, it doesn't mean that the remaining 2 GiB cannot be used for other data. But those remaining 2 GiB of available RAM in the ARC will prioritize metadata over userdata if such competition occurs.


* Keep in mind none of the above will be relevant once OpenZFS 2.2 lands in TrueNAS, because...
  • The (old) ZFS parameter will no longer exist
  • There will no longer be (hopefully) erratic aggressive metadata eviction without external pressure
  • The default behavior will prioritize metadata
  • A new tuneable parameter will be available, which works like "turning a dial"
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Since OpenZFS 2.2 was released today, the tweaks and tuneables used for ZFS version 2.1.x will no longer be necessary nor available on systems with 2.2.x.

The only relevant tuneable applies to the newly introduced zfs_arc_meta_balance, as described in the opening post of this thread. Think of it as a "dial" that you adjust according to your needs.



Here is a graphical representation of how to use this new parameter, as described by its developers:

zfs_arc_meta_balance-dial.png



Because it defaults to a value of 500, this means that ZFS 2.2.x prioritizes metadata in the ARC, without any adjustment or tweaking. The idea is that you can "dial it lower or higher", depending on your needs. (Any value below 100 will favor data over metadata.)
 
Last edited:

artstar

Dabbler
Joined
Jan 10, 2022
Messages
36
As promising as this seems, my rsync push performance seems to have suffered even more.

The graph below indicates very clearly what is happening starting from the ritual 4am rsync, but running for approximately 16 hours.

In reading these graphs, with my 32GB RAM, how can I determine how much RAM I would need at minimum to improve this performance? Are there additional metrics I can manually gather to see if I need 64GB or 128GB RAM? Server memory isn't cheap, so I want to be sure I'm not pouring money in for diminishing returns.

Thanks in advance!

1700961894994.png
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Having trouble reading the charts. Are you on SCALE? Does each rsync task actually last for... 16 hours?
 

artstar

Dabbler
Joined
Jan 10, 2022
Messages
36
Having trouble reading the charts. Are you on SCALE? Does each rsync task actually last for... 16 hours?
Truly! Starts at 4am, done at around 8pm. And yes, this is Scale, Bluefin and Cobia having the same problem. Target unit is a Synology NAS running DSM7.2 but I have tested with an Ubuntu lab server for a few weeks to determine if it was the receiver rather than my TrueNAS appliance.
 

masterjuggler

Dabbler
Joined
Jan 19, 2022
Messages
10
With this change, it's still worth running a startup job with something like find /mnt > /dev/null to build the metadata on each boot, right?
 
Joined
Oct 22, 2019
Messages
3,641
With this change, it's still worth running a startup job with something like find /mnt > /dev/null to build the metadata on each boot, right?
I would not. Here's why:

That command will crawl your entire pool (or pools), even for directories and files which you rarely access. It might not even read all the metadata that you want held in the ARC.

What would be preferable is to either (a) only crawl the root directories that you use often, such as for Rsync, SMB, NFS, etc; or (b) let your daily routine re-cache all the metadata back into the ARC from normal usage, including executing a dry-run with Rsync.
 
Top