ZFS "ARC" doesn't seem that smart...

Joined
Oct 22, 2019
Messages
3,641
What was your final 'reasonable floor'? 4GB? or +16GB?
4 GiB is working beautifully for me, even still to this day.


I traversed my directory tree and watched the size of arc metadata.
Yet I don't seem to find it growing particularly much. I felt the NFS share directory traversal to become snappier, but expected to find arcstats.metadata_size to have at least approached the new floor of 4GB. But it does not, it stays at 1.35GB.

Mine doesn't go much past 2 GiB. The reason I'm sticking with 4 GiB is just in case I need the extra breathing room in the future. Technically, I could set mine to 2.1 GiB, and I would still be cruising. :smile: But there's no harm done with 4 GiB, since it might be needed later (as more and more files populate the filesystems, and hence, grow the metadata.)

In your case, all your metadata fits within 1.35 GiB (assuming there are no other filesystems on the TrueNAS server in question which haven't yet been traversed.) In my case, from three different clients (three different datasets), all the metadata fits within 2.1 GiB.

What setting mine to 4 GiB really means is: "I'm happy with metadata taking up to 4 GiB of the ARC for itself if it's needed." For now, it's only taking up about 2 GiB in the ARC, not 4 GiB. But should it ever need to take up 4 GiB, it has the means/permission to do so. :wink:

I suppose "floor" isn't the right term to use. A more accurate way to refer to it is perhaps "allowable ceiling before ZFS starts aggressively evicting metadata from the ARC."


Additionally, I stumbled upon this, which might be of interest to your testing too:
I believe I tried those parameters as well, and found no difference. (As others from different communities shared the same grievances.)

The above tuneable you proposed is only a specific part of the metadata, but not "metadata in the ARC in general".


For what it's worth, finding the appropriate value for vfs.zfs.arc.meta_min does the trick, which you've also experienced yourself by traversing and listing directories via NFS (and I've noticed with rsync listings, as well as browsing SMB shares with folders containing thousands of files.)



Setting it 4 GiB seems to be the best-case scenario. Here's why I believe that:

Metadata doesn't deal with raw user data (large amounts of pure user data, which requires much RAM), so for any system with 32+ GiB of memory, and typical use-case scenarios, it's unlikely their total metadata will exceed 4 GiB. Which means that for a 32 GiB system, if they somehow manage to saturate their entire 4 GiB in the ARC with pure metadata, that's still only 12% of their total system memory. For 64 GiB of RAM? About only 6% of total system memory. And that's the highest possible "cost" for a snappier system with a performance boost: immensely faster rsync tasks, much faster directory listings and metadata reads, and snappier browsing over NFS and SMB.

I consider it a "win". :cool:
 
Last edited:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Very nice!

I'm attempting to get a grasp on actually validating the usecase of a L2ARC, as I'm currently trigger happy e-shopping... if I were to setup one, and configure it as metadata=only I'd not see more use than a few GB's out of it?

edit:
I stressed the system a bit harder than previous. I did a full NFS -> properties on share -> check amount of files from client.

I believe approx 20TB and probably half a million of files should
and ...that rush took a higher toll on the arc hitratio.
1653554007216.png



Seeing this, I expect the metadata to start filling up, sort of validating the "need" for a L2ARC
However, when looking frequently at the
Code:
kstat.zfs.misc.arcstats.metadata_size: 


After first few TB's checked:3411508736
peaks, then operation crawls slow (less than 1k files scanned per sec now, drops to about 200 files/sec later)3449173504
eviction starts happening??3425596928
more eviction3388564992
some 10mins later2905722368
less than halfways through the scan3054305792

Am I not looking at evictions even with the timespan of some 15mins?

As ARC evictions seems to occur, is it correct to assume there is no improvement provided installing a L2ARC either as metadata=only nor normal configuration?


(realizing the second question is approaching the related topic here https://www.truenas.com/community/threads/impact-of-svdev-on-rsync-vs-l2arc.93371/page-2)

edit:
Rerunning the same "calculate number of files and size" over NFS for a second run, is FAR SLOWER than the first time. That has me really confused now.
 

Attachments

  • 1653551673953.png
    1653551673953.png
    18.8 KB · Views: 209
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I could try lowering the value from 4 GiB to 3 GiB, and do more tests, but 4 GiB seems like a comfortable amount with enough breathing room for future uses.
I would be very curious to see your arc_summary output and in particular the "Metadata cache size (current):"

I would expect that this must not be larger that the tunable setting or it won't do any good.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
I would be very curious to see your arc_summary output and in particular the "Metadata cache size (current):"
I would expect that this must not be larger that the tunable setting or it won't do any good.
Here's what mine look like. (~20TB, ~1.5million files, 32GB RAM no L2ARC [but curious about topic as Im rebuidling a system with 128GB atm])

Code:
arc_summary | grep "Metadata cache"
        Metadata cache size (hard limit):              75.0 %   23.2 GiB
        Metadata cache size (current):                 17.9 %    4.2 GiB

At the same moment in time:
Code:
sysctl -a | grep kstat.zfs.misc.arcstats.metadata_size && date
kstat.zfs.misc.arcstats.metadata_size: 3061378560


Do I interpret this correctly - this means I'm consuming 4.2GiB as metadata, which uses in turn approx 3GiB of ARC to remain indexed?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Interestingly, I was seeing different numbers on my system before applying any setting:
kstat.zfs.misc.arcstats.metadata_size: 3482709504

with

Code:
        Metadata cache size (hard limit):              75.0 %   95.1 GiB
        Metadata cache size (current):                  7.0 %    6.7 GiB


I don't think I've done anything to cause a change of any significance yet after adding the 4GB setting, but we'll see soon enough.

The first change I see is a small reduction:
kstat.zfs.misc.arcstats.metadata_size: 3203688960

And correspondingly:
Code:
        Metadata cache size (hard limit):              75.0 %   95.1 GiB
        Metadata cache size (current):                  6.7 %    6.4 GiB
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
And to continue the story a bit... it continued to go down, so I changed the tunable to 8GB and following that it started to climb instead:

kstat.zfs.misc.arcstats.metadata_size: 3330087424 (after having dropped to 3177687040 at the low-point just before I changed to 8GB)

To answer a point from earlier, I'm not 100% on the link between the tunable, the kstat.zfs.misc.arcstats.metadata_size value and the Metadata cache size (current): , but what I wanted to do was confirm my suspicion that if you set the tunable below the current amount, you won't be helping to hold metadata in ARC.
 

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
I would like to set the zfs.arc.meta_min variable in SCALE, what is the correct name there? A post in this thread implied it differered between CORE and SCALE.

Is there a reference available where you can look up tunables for Scale?
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
I would like to set the zfs.arc.meta_min variable in SCALE, what is the correct name there? A post in this thread implied it differered between CORE and SCALE.

Is there a reference available where you can look up tunables for Scale?
zfs.arc.meta_min I expect to be the one used in SCALE.
CORE; vfs.zfs.arc.meta_min
 

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
zfs.arc.meta_min I expect to be the one used in SCALE.
CORE; vfs.zfs.arc.meta_min
I tried that one:
arc_summary.png


After a reboot nothing appears to have changed:
Code:
# arc_summary | grep meta_min      
        zfs_arc_meta_min                                               0
 
Joined
Oct 22, 2019
Messages
3,641
I would be very curious to see your arc_summary output and in particular the "Metadata cache size (current):"

I would expect that this must not be larger that the tunable setting or it won't do any good.

It's always remained comfortably below 4 GiB.

Code:
kstat.zfs.misc.arcstats.metadata_size: 2296452608

Metadata cache size (current):                 13.6 %    3.2 GiB


Though I'm confused about the discrepancy between the two numbers.

Why does kstat.zfs.misc.arcstats.metadata_size report approximately 2.1 GiB, yet "metadata cache size" report 3.2 GiB?




I would expect that this must not be larger that the tunable setting or it won't do any good.

As you can see from both reports it stays below the tuneable that I set to 4 GiB (which I refer to as the sweet spot.) Likewise, I have not experienced any aggressive metadata eviction (from the ARC) ever since setting this parameter.

However, my interpretation is that it still does come in handy even if the 4 GiB threshold is exceeded, because I'm assuming that it will not evict all metadata from the ARC. If my system gets to a point where housing all my metadata exceeds the 4 GiB threshold, then I'm left with two considerations:
  • Should I accept that anything beyond 4 GiB (in regards to metadata and snappier operations) should fallback to the default ZFS behavior of prioritizing userdata over metadata in the ARC? (I.e, "Fine, fine, after 4 GiB, go ahead and start evicting metadata if it's really that important at this point.")
  • Should I consider bumping up this tuneable to maybe 6 GiB? 8 GiB? How important is it to keep as much metadata in my ARC as is possible? Where should I draw the line in which I begin to accept the first bullet-point above?


Right now with my use-case for a system with 32 GiB of RAM and my current datasets/operations, it looks like 4 GiB is the sweet spot and even has room to breathe if my metadata needs slowly grow in the foreseeable future.

This value (4 GiB) also seems to be a good starting point, because it will either (a) not be required, in which your metadata takes up even less space in the ARC, with the unused space still available for userdata, or (b) if you notice you easily exceed 4 GiB from normal usage, rsyncs, browsing, etc, then you can start to consider the above points on your own.



The real challenge is what approach would work best for a generic TrueNAS installation? I have a few ideas, which I will list later and perhaps file a feature request in the Jira.
 
Joined
Oct 22, 2019
Messages
3,641
After a reboot nothing appears to have changed:
Not sure with SCALE.

Anyone using SCALE know the correct variable name?

UPDATE: I would try using what @HoneyBadger wrote in an earlier post in this thread:

arc_meta_min

Try this variable instead, without the "zfs." at the start of it.


Or maybe with all underscores, like this (to match the variable name):

zfs_arc_meta_min

It's these differences in naming and formatting that really gets to me... o_O
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Rerunning the same "calculate number of files and size" over NFS for a second run, is FAR SLOWER than the first time. That has me really confused now.
I think I know what might be happening.

Something about calculating the total number of files and used space (on a share) needs to pull up additional metadata, versus simply navigating, browsing, and rsync'ing the same filesystem tree. (What that "additional" metadata stuff is, I'm not really sure, and it's beyond my simple understanding. But it seems to add an extra toll that is above and beyond rsync crawls, browsing, directory tree listings, etc?)

Because when I re-created what you did (albeit over an SMB share with 750,000 files), I did see my "Metadata cache size" shoot up an extra half GiB.

So in your case, you might be exceeding the 4 GiB threshold, and ZFS begins to evict metadata from your ARC, which will subsequently slow down other metadata-heavy operations.
 

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
Not sure with SCALE.

Anyone using SCALE know the correct variable name?

UPDATE: I would try using what @HoneyBadger wrote in an earlier post in this thread:

arc_meta_min

Try this variable instead, without the "zfs." at the start of it.

Or maybe with all underscores, like this (to match the variable name):

zfs_arc_meta_min

It's these differences in naming and formatting that really gets to me... o_O
It's listed as zfs_arc_meta_min in the arc_summary so I tried that, sadly you immediately get an error message when trying to save it:

Code:
Value can start with a letter and end with an alphanumeric. A period (.) once is a must. Alphanumeric and underscore characters are allowed
 
Joined
Oct 22, 2019
Messages
3,641
It's listed as zfs_arc_meta_min in the arc_summary so I tried that, sadly you immediately get an error message when trying to save it:

Code:
Value can start with a letter and end with an alphanumeric. A period (.) once is a must. Alphanumeric and underscore characters are allowed

What about:

zfs.arc_meta_min

UPDATE: What do you get if you grep with sysctl:

Code:
sysctl -a | grep meta_min
 

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
What about:

zfs.arc_meta_min

UPDATE: What do you get if you grep with sysctl:

Code:
sysctl -a | grep meta_min
With zfs.arc_meta_min there's no difference:
Code:
# arc_summary | grep meta_min
        zfs_arc_meta_min                                               0


Code:
sysctl -a | grep meta_min

I tested that earlier hoping to find the name of the tunable but it just returns to the prompt with no new output.

Looking at the openzfs page the tunable should take effect immediately since Change is listed as Dynamic. I have tried rebooting just to be sure but I guess that should not be necessary. I just need to find the correct activation mojo.
 
Joined
Oct 22, 2019
Messages
3,641
What about skipping the GUI (in SCALE) and trying to see if you can set the variable manually, and then check if it reports the updated value with "arc_summary"?
sysctl -w zfs.arc.meta_min=4294967296

And then checking to see if it is updated:
arc_summary | grep meta_min

EDIT: And if that still doesn't work, maybe try the same format as with CORE?
sysctl -w vfs.zfs.arc.meta_min=4294967296
 
Last edited:

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
What about skipping the GUI (in SCALE) and trying to see if you can set the variable manually, and then check if it reports the updated value with "arc_summary"?

sysctl zfs.arc.meta_min=4294967296

And then checking to see if it is updated:

arc_summary | grep meta_min
Code:
]# sysctl zfs.arc.meta_min=4294967296
sysctl: cannot stat /proc/sys/zfs/arc/meta_min: No such file or directory

Hmm.

Edit: While it may have been implied I'll add that I confirmed that arc_summary was not updated with the change.
 
Last edited:

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
I found a file called zfs_arc_meta_min with the contents of "0" here:
/sys/module/zfs/parameters/

However, I am hesitant to change it. Shouldn't this be something that is changed in the UI?

Edit: I decided to do this:
Code:
echo 4294967296 >> /sys/module/zfs/parameters/zfs_arc_meta_min

And verified it with:
Code:
# arc_summary | grep meta_min
        zfs_arc_meta_min                                      4294967296

So it's there. But this seems like a hack. I rather apply it with the UI so that it survives export/import of the configuration, updates, and so on.
 
Joined
Oct 22, 2019
Messages
3,641
Oops! I forgot the -w

sysctl -w

I rather apply it with the UI so that it survives export/import of the configuration, updates, and so on.

Maybe it’s an issue with SCALE’s GUI then? I don’t have SCALE to test this out.
 
Last edited:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
I think I know what might be happening.

Something about calculating the total number of files and used space (on a share) needs to pull up additional metadata, versus simply navigating, browsing, and rsync'ing the same filesystem tree. (What that "additional" metadata stuff is, I'm not really sure, and it's beyond my simple understanding. But it seems to add an extra toll that is above and beyond rsync crawls, browsing, directory tree listings, etc?)

Because when I re-created what you did (albeit over an SMB share with 750,000 files), I did see my "Metadata cache size" shoot up an extra half GiB.

So in your case, you might be exceeding the 4 GiB threshold, and ZFS begins to evict metadata from your ARC, which will subsequently slow down other metadata-heavy operations.

I tested the idea of 4GiB being surpassed. I bumped the tunable to 8GiB, and ran the same "check files numbers/size" over NFS.
With the help of a oneliner;

Code:
while true; do (date &&  sysctl -a | grep kstat.zfs.misc.arcstats.metadata_size) && sleep 10;  done
kstat.zfs.misc.arcstats.metadata_size: 4751194624
Fri May 27 09:49:27 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4792864256
Fri May 27 09:49:37 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4833134592
Fri May 27 09:49:47 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4872423424
Fri May 27 09:49:57 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4915468288
Fri May 27 09:50:08 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4946579456
Fri May 27 09:50:18 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4931018752
Fri May 27 09:50:28 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4751501312
Fri May 27 09:50:38 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4738164224
Fri May 27 09:50:48 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4724104704
Fri May 27 09:50:59 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4714394112
Fri May 27 09:51:09 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4698387456
Fri May 27 09:51:19 CEST 2022
kstat.zfs.misc.arcstats.metadata_size: 4692469248

I find that the number did surpass 4GiB, but nowhere close to 8GiB.
Still it is being evicted.

I believe I have too high hopes on being able to "prime" the system.

edit: 1.5hrs later, it is back down to 3108356608
 
Last edited:
Top