Odd ARC Memory Usage Behavior in 13.0-U4

GeorgePatches

Dabbler
Joined
Dec 10, 2019
Messages
39
Meanwhile at Villabajo — things magically dropped over night by 100GB. Never saw anything like this before. Neither with Solaris/OmniOS, nor with freeBSD. The use case is constantly the same: Hourly backups from 3 up to 5 MacOS clients (TimeMachine vis SMB), daily backups from 2 Linux clients (rsync via NFS); daily replication to a remote (vanilla) freeBSD ZFS file server (zfs send via SSH). It's all about scientific data. No DVD-archives, no video editing. And of course: no reboots/service daemon restarts:
Just wanted to say that I'm seeing behavior very similar to this with my very similar use case. Make a big file write (larger than my system has RAM for the file) as part of daily backups and when that write finishes it seems to just dump the whole ARC for no reason I can surmise.

Sounds like it's indeed a bug and I'll be reading through the links. It's not hitting our performance noticeably, but it definitely just seems wrong. Our use case is very light on the cache, mostly throw a bunch of data at TrueNAS during nightly backups and read that data back to sync it into AWS.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
The patch has been approved for ZFS-2.1.10. The new version will be included in U5, as it seems.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Just for fun last weekend I've tried to run 32-bit FreeBSD VM, where I observed this issue much worse due to extreme memory pressure, since its minuscule amount of RAM can be wiped in a fraction of a second. I've found that the patch helps it to happily use stable 1-1.2GB of ARC out of ~1.7GB of kernel memory.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
The patch has been approved for ZFS-2.1.10. The new version will be included in U5, as it seems.
Which means U4 is kinda dead. Sad.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Which means U4 is kinda dead. Sad.
The extra arc_reduce_target_size() call has been there since switch to OpenZFS in FreeNAS 12. The issue can't be new, may be just some of its triggers has changed inadvertently. If it is so annoying, I think small explicit reduction of maximum ARC size may help to reduce memory pressure and a frequency of the events.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
The code logic issue itself may not be new. But the effect is. For sure. I tested it.

Seems (as already written above) that some recent improvements/changes in other modules/functions concerning metadata handling had some unintended effects on (old) ARC code. That‘s why the ARC prune/purge wasn‘t triggered before.

And (again as already mentioned): a complete rewrite of the arc function is underway for ZFS 2.2, as Alexander Motin mentioned in the Jira-Ticket. That the old function in ARC has been there since version xy doesn‘t mean it was right all the time.

BTW: What „helps“ is a

Code:
sysctl vfs.zfs.arc.meta_prune=0


as long as you don‘t have hundreds of GBytes of metadata in cache and can afford some pages „wasted“.
 
Last edited:

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
I only followed this topic superficially, since I am still on v12. The disconcerting aspect, at least for me, is that history seems to be repeating itself here. Unless I am mistaken, and I would be happy to be completely wrong, the 13.0-U4 update introduced a major regression. My problem is less the fact that it happened. But it appears to be a lack in test coverage, and that is what happened with various v12 updates as well.

It would be great to have someone from iXsystems shed some light onto the situation. Again, I am happy to be on the wrong track here.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I only followed this topic superficially, since I am still on v12. The disconcerting aspect, at least for me, is that history seems to be repeating itself here. Unless I am mistaken, and I would be happy to be completely wrong, the 13.0-U4 update introduced a major regression. My problem is less the fact that it happened. But it appears to be a lack in test coverage, and that is what happened with various v12 updates as well.

It would be great to have someone from iXsystems shed some light onto the situation. Again, I am happy to be on the wrong track here.
It's a pity since v13 was great up till now. Especially U3.1 was of high quality imho.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Unless I am mistaken, and I would be happy to be completely wrong, the 13.0-U4 update introduced a major regression.
There is no data corruption, there is no service outage, and if not the graphs, nobody would even notice it. Yea, "major regression", sure.
But it appears to be a lack in test coverage
We do have lots of automated tests, plus people of QA and performance teams. But do you know that full performance characterization of new software version for a SINGLE SYSTEM takes TWO MONTHS of pure run time? And that is if everything perfect and none of tests fail during that time. We actually have open positions in all of those teams, wishing to pay money to those who want to make TrueNAS better.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
There is no data corruption, there is no service outage, and if not the graphs, nobody would even notice it. Yea, "major regression", sure.
Even though the first part is perfectly right and even though TN is a free product, so nobody needs to complain, I really must say: Wow. Thanks a lot.

and if not the graphs, nobody would even notice it.

Well, that's why Chris asked, I guess. Does nobody even notice by looking at graphs in two months evaluation run time? Or does that mean that even though somebody eventually notices performance is not (that) relevant?

Edit:

And with all due respect, and just in case anyone asks, performance CAN end up being relevant to stable operation. In my company, for example, we have critical backup rotations ranging from 8 to 30TB daily. These backups are created every 8 hours and replicated offsite daily. The pure write performance is not the big issue. But the read performance for the verification of the backups is. Currently, we are at 5.5 hours per rotation on the final ZFS target, which runs the (Linux) file verify. And we only achieve these 5.5 hours because we have 1TB of RAM, an 8x L2ARC with MFU cache only (a verify with MRU over 8TB of data would probably need around 8TB of RAM, which we cannot install due to the lack of slots on the motherboard) and a few other tuning measures to squeeze out the last percent of performance. We operate such systems extremely conservatively, so that only TrueNAS 12-U8.1 is running on the iSCSI target to date. We time every run and have the timing constantly on screen as one of our control metrics. This is also the reason why we test everything for these systems on test systems in different scenarios before we put it in.
But let's assume for a moment that I was an early adopter or even believed the statement that data integrity would not (immediately) be affected by the ARC behaviour ... if a backup verify takes longer than 8 hours and thus runs into the next rotation cycle, the systems are at a standstill. And THEN we would be at risk of significant data loss.


Now you could always say: "So what? It's OSS. No guarantees. If you want them, sign the service contract." But my point is not THAT a bug exists. It's about how you classify it and how you react. So whether a (mis)development means "a major regression", I think that has to be left to the users (and voluntary beta testers).
 
Last edited:

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Does nobody even notice by looking at graphs in two months evaluation run time? Or does that mean that even though somebody eventually notices performance is not (that) relevant?
I've noticed this behavior myself while working on something else few weeks before the ticket was opened. I've made a mental note, but I had other more important things to do first. After we got the ticket -- it was fixed in 4 days.
But my point is not THAT a bug exists. It's about how you classify it and how you react. So whether a (mis)development means "a major regression", I think that has to be left to the users (and voluntary beta testers).
We are working hard 40+ hours a week to make a better product. And considering how many users have TrueNAS in production, I would not call them exactly "beta testers". We are highly grateful to early adopters actually running BETA and RC releases and reporting issues, helping us to help others. Others just have no moral rights to criticize here, not without opening tickets, providing requested data, agreeng for remote debugging, testing patches, etc. We've gone a long path improve quality over the years. But nevertheless for EVERY release there appear some people who tell that the new release is a major regression, our testing is sucks, they will run some old version that is out of support and will never upgrade. I take it personally, as if my work does not matter. And you know, IT HURTS! So considering the circumstances, and after everything constructive is already done, patches upstreamed and the ticket closed, I keep my right for a healthy dose of sarcasm!
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
[...]


The Jira-Ticket:


A deep bow to Richard Kojedzinszky for submitting and Alex Motin from iXsystems for the fix.

[...]
I add another for you.

And just for the records: I didn't call all users "beta testers" either. That's why I named both categories. I fall under both (on different systems).
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
After a few months of observation (since U5 came out with the fix), I can say the new behavior is consistent: the system apparently has a target of 1GB of free RAM.
 

Juan Manuel Palacios

Contributor
Joined
May 29, 2017
Messages
146
After a few months of observation (since U5 came out with the fix), I can say the new behavior is consistent: the system apparently has a target of 1GB of free RAM.
Can vouch for this, indeed, I'm pretty much seeing the same after day-to-day usage and observation.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Was looking at the arc_summary and found the arc_free_target tunable set to 172676: this might be the reason for the new behaviour. Anyone willing to comment on this? Has this been present before the U5 fix or this tunable being set is part of the fix? What are the impacts on performance (aka, why is set to a value greater than 0)?

Currently on CORE 13-U5.3
 
Joined
Oct 22, 2019
Messages
3,641
What are the impacts on performance (aka, why is set to a value greater than 0)?
Seems to only be applicable to ZFS on FreeBSD.

It's a value (multiplied by 4K pages to reveal total number of bytes), that works as a sort of early warning "cushion" if your memory requirements might soon exceed your total RAM.

Once free available memory drops below this value (e.g, 512MiB), the ARC will release and shrink itself, so that the system can meet its memory demands. Meanwhile, the OS (FreeBSD) understands "Hey, I know the ARC had to make a temporary sacrifice, so once my demands soften, the ARC's target size can safely return."

Something like that. I think.

It's this seamless integration between system memory management and the ARC that makes ZFS on FreeBSD a first-class citizen.
 
Top