Is L2ARC (even with secondarycache=metadata) Not A Suitable Metadata Special Device Replacement?

NateroniPizza

Dabbler
Joined
Dec 19, 2022
Messages
14
I keep seeing metadata-only L2ARC being offhandedly thrown around as a safe alternative to metadata VDEVs, but I've never seen anyone actually post comparisons about their performance or behavior. I've tried both over the last week or so (switching back and forth a few times, rebuilding the pool as necessary), and I've not been able to get L2ARC to work as an actual replacement for a metadata special VDEV. Note that I've got a pair of Optane P1600X drives I had purchased specifically for the purpose of a metadata special VDEV mirror, but given the advantages I've more recently read (safer + ability to remove it), I'd really like to get it working as metadata-only L2ARC if at all possible.

When I do a "/bin/ls -lahR /mnt/", I get a sub-2 minute list time when I have a metadata special VDEV set up. However, regardless how I configure it, I cannot get the persistent L2ARC to get under 18 minutes on a fresh boot (subsequent times running it before the next reboot is extremely fast, of course, as it is then cached in ARC).

I've tried both secondarycache=all and secondarycache=metadata.

I have both l2arc_noprefetch=0 and l2arc_headroom=0. Also, as I am running TrueNAS Scale, l2arc_rebuild_enabled is enabled by default.

I've verified after a reboot that these settings took, using arc_summary and zfs get secondarycache {poolname}.

How does one get this so-often-recommended metadata special VDEV-alternative to actually work as a metadata special VDEV alternative? It's very possible I'm doing something wrong, but given how often this is mentioned without any caveats, I would have thought it would work without any additional configuration.
 

NateroniPizza

Dabbler
Joined
Dec 19, 2022
Messages
14
No ideas? Given how many "just use L2ARC with secondarycache=metadata!" responses I've seen to people mentioning metdata special devices while researching this, I would've thought there'd be at least someone that's found it work effectively.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
No ideas? Given how many "just use L2ARC with secondarycache=metadata!" responses I've seen to people mentioning metdata special devices while researching this, I would've thought there'd be at least someone that's found it work effectively.

It sure feels like you're not trying that hard. I posted response #59 in


which is a thread in which you asked effectively the same question, AND THEN YOU ALSO REPLIED TO. I gave three timing examples which showed accesses of a large amount of metadata in about a second from a system where secondarycache=metadata has been in use for ... maybe as much as two years?
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
467
What you see is the expected behavior - except the part where your system doesn't seem to honor the persistence of l2arc. That seems to be an issue by itself, but an important one.
 

NateroniPizza

Dabbler
Joined
Dec 19, 2022
Messages
14
It sure feels like you're not trying that hard. I posted response #59 in


which is a thread in which you asked effectively the same question, AND THEN YOU ALSO REPLIED TO. I gave three timing examples which showed accesses of a large amount of metadata in about a second from a system where secondarycache=metadata has been in use for ... maybe as much as two years?
My bad, I'd only seen (and replied to) your first post (#58, the one you had directed to me).

Just read through your second post. I will have a look at the arcstats when I am back home this evening.

In the meantime, the l2arc_noprefetch=0 parameter should essentially make things like kstat.zfs.misc.arcstats.mfu_evictable_metadata be the entirety of the metadata in ARC, correct? Since l2arc_noprefetch=0, if I am understanding the description correctly, should be making the entire contents of ARC eligible for caching to L2ARC.

EDIT: Speaking of the message I replied with, any idea whether any of these interpretations of the parameters are incorrect? When I'd looked up the first 6 referenced, they all seemed to be related to the performance of L2ARC pulling data over from ARC, rather than how much of ARC is eligible to be cached to L2ARC, so I would be interested in understanding how those can help in this situation. I understand how in a busy environment where ARC is being cycled through quickly those would apply (since the data would be purged from ARC before L2ARC could grab it), but for a system that is purely being tested at this stage, and is seeing no usage outside of a directory/file listing or some other task that is almost entirely metadata-focused, and only uses 500MB of the available ARC, I'm not sure how those would affect it.
The following are all related to tuning how quickly L2ARC is populated on a busy server, correct?
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608

The two properties you mentioned that affect what data is eligible to go into L2ARC are what I've changed. Here are those two items, and what I understand them to do (do correct me if my understanding of them is wrong):
l2arc_headroom=0
From what I've read on this, this setting expands the portion of ARC eligible for L2ARC to copy information from, to the entirety of the ARC (rather than a small portion at the "tail" of ARC). It should be able to start caching data the instant it shows up in ARC with the value set to 0.

l2arc_noprefetch=0
This setting will make all data in ARC potentially eligible for caching to L2ARC, rather than only marking ARC data that gets used as eligible.

Even with secondarycache=all (and also when it was set to secondarycache=metadata), and the L2ARC having grown larger than ARC is able to grow with the limited usage it has been getting during testing, it is still obviously not caching metadata - upon reboot, a "/bin/ls -lahR /mnt/" takes just as long as if there was no cache whatsoever.

EDIT2: Just re-read your message above, and realized your point was that you'd given an example of one working. Fair enough.

What you see is the expected behavior - except the part where your system doesn't seem to honor the persistence of l2arc. That seems to be an issue by itself, but an important one.
I'm not certain that it isn't honoring persistence - after a reboot, the ZFS stats in the Reports (I believe that's what the tab is called) still shows the ZFS being populated. I'll need to check those ZFS kstats that jgreco referenced to see if they are still including data. I'm not sure whether this is a matter of L2ARC persistence not working, or L2ARC not caching the metadata from ARC despite having the settings configured to, I believe, render the entirety of ARC eligible for caching, and having more than ample time to do so, regardless of what the l2arc_write_max parameter is set to.
 
Last edited:

NateroniPizza

Dabbler
Joined
Dec 19, 2022
Messages
14
It sure feels like you're not trying that hard. I posted response #59 in


which is a thread in which you asked effectively the same question, AND THEN YOU ALSO REPLIED TO. I gave three timing examples which showed accesses of a large amount of metadata in about a second from a system where secondarycache=metadata has been in use for ... maybe as much as two years?
Question about your post: You mentioned that the system was "warmed-up." Do you get similarly-fast performance when recently rebooted? If not, then the posted results weren't a relevant test - L2ARC is the focus of my posts, not ARC (which is why I moved the question to a separate thread - the previous one was focused on ARC). I, too, am getting good performance when accessing metadata on a warmed-up system, when the metadata is already present in ARC.

-----------------------------------

I've spent the evening checking what my server is showing for those metrics you mentioned under various circumstances. I'm seeing that the "arc_meta_used" (1339445536) and "metadata_size (615198208)" is substantially exceeding the "L2ARC size (adaptive)" (357.6 MiB, from arc_summary). It is obviously not pulling over all of the metadata, despite the two parameters I have found that seem to be applicable to this:
l2arc_noprefetch=0
l2arc_headroom=0


I also spent some time looking through the ZFS Github, and it appears that metadata mishandling is a ZFS issue that has been ongoing for some time... One example of a few potentially-related issues I ran across: https://github.com/openzfs/zfs/issues/12028


------------------------------------

My goal has been to have all metadata cached in L2ARC, so that no matter what happens to ARC (metadata evicted, or a reboot), the metadata would be quickly accessible.

Given that this appears to be a larger, ongoing issue (with ZFS, obviously not TrueNAS-specific), I'm going to resort to running an automated warming task on boot to force metadata into ARC, then set the arc_meta_min parameter to around 2-4GB (as referenced here), and see how things go. If I notice that the metadata is being evicted from ARC after some time regardless of arc_meta_min, I will then set a scheduled periodic warming task. If I find these to not work satisfactorily for whatever reason, I will then return to my original plan of using these two Optane drives as metadata special devices.... and re-visit this in however many years, whenever I upgrade the pool, to see if it has been fixed on the ZFS side.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Question about your post: You mentioned that the system was "warmed-up." Do you get similarly-fast performance when recently rebooted?

For the stuff I intended to be cached, yes. The server is only restarted every year or so, and it only has 32GB RAM handling a large pool. Adding a 128GB L2ARC and then a little bit of effort hinting at what datasets would benefit from metadata caching is highly effective.

L2ARC is the focus of my posts, not ARC (which is why I moved the question to a separate thread - the previous one was focused on ARC). I, too, am getting good performance when accessing metadata on a warmed-up system, when the metadata is already present in ARC.

Well, in this context, warmed up is a bit different. The cached data has to be read in from pool HDD the first time; that's the warmup. Once in the ARC system, the L2ARC is still going to be slower than ARC, of course, but the L2ARC can handle thousands of IOPS whereas the pool HDD's can handle far less (large RAIDZ3 array, so, figure maybe low hundreds of IOPS). The goal is to avoid hitting the pool HDD unnecessarily. Since it's a large archival pool, there's still a million files I can access that will incur HDD IOPS because they haven't been accessed in a decade.

I've spent the evening checking what my server is showing for those metrics you mentioned under various circumstances. I'm seeing that the "arc_meta_used" (1339445536) and "metadata_size (615198208)" is substantially exceeding the "L2ARC size (adaptive)" (357.6 MiB, from arc_summary). It is obviously not pulling over all of the metadata, despite the two parameters I have found that seem to be applicable to this:
l2arc_noprefetch=0
l2arc_headroom=0


I also spent some time looking through the ZFS Github, and it appears that metadata mishandling is a ZFS issue that has been ongoing for some time... One example of a few potentially-related issues I ran across: https://github.com/openzfs/zfs/issues/12028

Well, you're basically doing the sorts of experimentation that I would do if I cared much about this. In my experience, ZFS requires some experimentation and research to understand the finer points. You've basically surpassed the level of interest I had, so it is likely that you know more than I do at this point, or if not, you soon will. Do feel free to post back anything interesting you learn.
 

NateroniPizza

Dabbler
Joined
Dec 19, 2022
Messages
14
For the stuff I intended to be cached, yes. The server is only restarted every year or so, and it only has 32GB RAM handling a large pool. Adding a 128GB L2ARC and then a little bit of effort hinting at what datasets would benefit from metadata caching is highly effective.



Well, in this context, warmed up is a bit different. The cached data has to be read in from pool HDD the first time; that's the warmup. Once in the ARC system, the L2ARC is still going to be slower than ARC, of course, but the L2ARC can handle thousands of IOPS whereas the pool HDD's can handle far less (large RAIDZ3 array, so, figure maybe low hundreds of IOPS). The goal is to avoid hitting the pool HDD unnecessarily. Since it's a large archival pool, there's still a million files I can access that will incur HDD IOPS because they haven't been accessed in a decade.



Well, you're basically doing the sorts of experimentation that I would do if I cared much about this. In my experience, ZFS requires some experimentation and research to understand the finer points. You've basically surpassed the level of interest I had, so it is likely that you know more than I do at this point, or if not, you soon will. Do feel free to post back anything interesting you learn.
Understood - thank you for taking the time to post your thoughts on this. At this point, I believe I've got a good-enough workaround for my use case to not spend much more time tweaking it. I may revisit later on, but believe I'm settled for now.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Understood - thank you for taking the time to post your thoughts on this. At this point, I believe I've got a good-enough workaround for my use case to not spend much more time tweaking it. I may revisit later on, but believe I'm settled for now.
Are all your tweaks mentioned in this thread, or do you have additional ones?
I'm looking to implement your settings, to see where it takes me.

I've already seen a 7x improvement in L2ARC hits by
vfs.zfs.l2arc_noprefetch = 0
vfs.l2arc_headroom=0
For 2 days, not making any other "warm up" actions, ie, this is as it unfolds.

Maybe would you consider writing up a little piece on what settings you've settled for, if possible any reasoning to why the given value?

edit: Particularly interested in vfs.l2arc_headroom=0 why it would help.
Documentation, the way I read it, would rather benefit from a higher number than default? https://openzfs.github.io/openzfs-d...arameters.html?highlight=l2arc#l2arc-headroom
 
Last edited:

NateroniPizza

Dabbler
Joined
Dec 19, 2022
Messages
14
Are all your tweaks mentioned in this thread, or do you have additional ones?
I'm looking to implement your settings, to see where it takes me.

I've already seen a 7x improvement in L2ARC hits by
vfs.zfs.l2arc_noprefetch = 0
vfs.l2arc_headroom=0
For 2 days, not making any other "warm up" actions, ie, this is as it unfolds.

Maybe would you consider writing up a little piece on what settings you've settled for, if possible any reasoning to why the given value?

edit: Particularly interested in vfs.l2arc_headroom=0 why it would help.
Documentation, the way I read it, would rather benefit from a higher number than default? https://openzfs.github.io/openzfs-docs/Performance and Tuning/Module Parameters.html?highlight=l2arc#l2arc-headroom
Yes, the only settings I've changed are in this thread - I haven't really found others that sound like they would help with this.

For l2arc_headroom=0, see here: https://openzfs.github.io/openzfs-docs/man/4/zfs.4.html Setting it to 0 sets the entirety of ARC as eligible for caching.

For l2arc_noprefetch=0, my understanding of this setting (from the page above as well as reading other places), is it makes pre-fetched data (which sounds like data that hasn't specifically been called for, but that ZFS anticipates may be needed) eligible for caching.

I have more recently set vfs.zfs.arc.meta_min=4294967296, but that is only to work around the difficulties I've run into with this.
 
Top