TrueNAS SCALE VM + ZFS cache memory usage

Status
Not open for further replies.

chillyphilly

Cadet
Joined
Feb 5, 2022
Messages
7
I have a scale 22.02 system with 32G of ram and about 12TB usable space (18TB raw). When I first installed I saw the cache memory usage usually around 15G, which was as expected as half the system memory. I've now created a couple of VMs and I have noticed that the zfs cache is limited to half the system memory minus any memory given to vms. So I have two VMs (one with 8G and one with 2G) and my zfs cache maxes out at about 5G (15G-10G). This seems odd as it leaves approximately half the system ram doing nothing. I have tested, as soon as a VM is stopped the cache is allowed to expand into the freed space and vice versa when a VM is started.

My question is, is this on purpose or a bug (I know it is a beta OS). And in either case what would the best way forward be to overcome this. I have tried setting the zfs cache size in a boot script but it doesn't apply becuase as soon as the VMs are started it gets overridden.

Thanks in adcance for the help!
 

chillyphilly

Cadet
Joined
Feb 5, 2022
Messages
7
Just to clarify the OS version is
TrueNAS-SCALE-22.02-RC.2
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
We've not done extensive tuning of memory yet, right now ZFS will default to using 1/2 or total memory for ARC. We're doing some internal testing now so we can determine how to tune this for best performance mix.
 

dirtyfreebooter

Explorer
Joined
Oct 3, 2020
Messages
72
yea, i also encountered the situation where i upped my ZFS arc max, as i bought 256gb of memory to use it for arc, not for 1/2 of it to go unused. i noticed whenever i started a VM, whatever i set in postinit script via echo <size> > /sys/module/zfs/parameters/zfs_arc_max was overridden back to 1/2.

i filed a bug to make ZFS parameters more of a first class citizen SCALE, as sysctl doesn't apply here in linux and this is after all a ZFS appliance :)

feel free to up vote my ticket :)
 

chillyphilly

Cadet
Joined
Feb 5, 2022
Messages
7
We've not done extensive tuning of memory yet, right now ZFS will default to using 1/2 or total memory for ARC. We're doing some internal testing now so we can determine how to tune this for best performance mix.
Hi Kris, thanks for taking the time to reply. I understand how zfs on linux uses half the memory by default, and that this can be overridden like @dirtyfreebooter has said. Seems like the cache upper limit is being purposely adjusted when VMs are started and stopped, but maybe this is by design while you do the memory tuning?
 

chillyphilly

Cadet
Joined
Feb 5, 2022
Messages
7
Hello, I am now running the official release version of SCALE and still seeing this behaviour. Can anyone confirm if this is the expected behaviour, I can't seem to see anyone else talking about it.
 

andyjay777

Dabbler
Joined
Jan 31, 2022
Messages
27
Hello, I am now running the official release version of SCALE and still seeing this behaviour. Can anyone confirm if this is the expected behaviour, I can't seem to see anyone else talking about it.
Yes it is the same for me. 64GB ram. ARC is 32GB initially (with 12GB of services running).
When I start a VM with 6GB of ram allocated, ARC drops to 26GB, services increases to 18GB and free ram remains the same.

Once you have a stable amount of services / VMs running, it would be good to be able to adjust the ARC upwards (future Scale feature perhaps?)
 

chillyphilly

Cadet
Joined
Feb 5, 2022
Messages
7
Yes it is the same for me. 64GB ram. ARC is 32GB initially (with 12GB of services running).
When I start a VM with 6GB of ram allocated, ARC drops to 26GB, services increases to 18GB and free ram remains the same.

Once you have a stable amount of services / VMs running, it would be good to be able to adjust the ARC upwards (future Scale feature perhaps?)
That's really interesting, but good to know I am not the only one (sort of).
It's interesting becuase, as far as my understanding goes, that isn't how ARC is supposed to work. It is supposed to automatically decrease it's size when the system runs out of free memory, but this looks like the VM subsystem is manually changing the arc size leaving lots of free ram.
It can be changed manually, but that is annoying to do every time a VM is started.
I hope this gets changed in a future release.
 

andyjay777

Dabbler
Joined
Jan 31, 2022
Messages
27
That's really interesting, but good to know I am not the only one (sort of).
It's interesting becuase, as far as my understanding goes, that isn't how ARC is supposed to work. It is supposed to automatically decrease it's size when the system runs out of free memory, but this looks like the VM subsystem is manually changing the arc size leaving lots of free ram.
It can be changed manually, but that is annoying to do every time a VM is started.
I hope this gets changed in a future release.
You are right it appears the VM subsystem is taking ARC ahead of using free ram. Hope this is an easy fix in a future release.

It can be changed manually, but that is annoying to do every time a VM is started.
How?
 

chillyphilly

Cadet
Joined
Feb 5, 2022
Messages
7

mroptman

Dabbler
Joined
Dec 2, 2019
Messages
23
I can confirm the same behavior is occurring on 22.02.02 - Running VMs cause ARC to decrease even when the system has free memory. The below memory consumption stats were captured while the Scale system 1) sending/receiving snapshots and 2) One SMB client adding/downloading files. The testing and stats were done with a Scale system with the following memory specs:

Total system memory = 62.7 GiB
50% ARC target = 31.35 GiB

It is expected that max ARC is 50% of the system memory due to a ZFS on Linux limitation per the below ticket:
Thats unfortunately just how it is for ZFS on linux, half the memory is the safe margin for ARC due to fragmentation on SLUB allocator.

You may try to tweak it but you may find issues.

Zero VMs Running:

Services = 7.2 GiB
ARC = 31.3 GiB
Free = 24.2 GiB
  • With zero VMs running, the ARC will always expand to 50% of system memory depending on system load
  • This is expected memory usage (50% ARC max size)
Starting Two VMs with 8 GiB Memory each:

Services = 9.7 GiB
(2.5 GiB increase used by the two VMs, the host only allocates used VM memory which is great)​
ARC = 15.4 GiB
(Decreased to 25% of system memory and continues to decrease for each powered on VM for an unknown reason)​
Free = 37.6 GiB
  • VM memory consumption is reflected in Services category
    • Would be nice if VM memory was a separate graph category
  • Free memory increases while VMs are on
    • Free memory is not used for VMs first (ARC is decreased and then used for VMs first)
    • The max ARC size decreases when any VMs are booted
  • Shutting down all VMs causes ARC to resume using 31.4 GiB / 50% of system memory as expected
  • Expected behaviors:
    • Free memory should be used before reclaiming/reducing ARC
    • About 20-24 GiB of memory should have been available to the rest of the system before decreasing max ARC size
 
Last edited:

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
Wow. I would say I can understand a little because SCALE is so new, but I believe this behavior was added to CORE first.

I am (was?) seriously considering using TrueNAS SCALE on bare metal and replacing ESXi as my hypervisor as while at first it made sense, these days my usage of it is quite light and therefore it's overkill. If I switched to SCALE I would actually only need 1, maybe 2 VMs tops.

I get that by default the ARC target is 50% since this is the standard strategy on Linux, but I've seen many reports of people increasing it safely to around 75% on ZFS dedicated machines. Regardless though, for my purposes 50% is fine.

I'd have 64GB and would want the breakdown to be like this:
- 32GB ARC
- 4GB Hass.io VM
- 8GB for Plex transcode RAM disk
- 6GB for 2 (including plex) Charts
- 14GB leftover for the general system

However it seems that because of this instead the 4GB for thr VM will come put of ARC and I'll just have 18GB floating around...

To be frank, this is absurd in a NAS focused system. It's just silly...

The initial 4GB isn't. Huge loss but like I said I might run an additional VM (and may get more RAM then, but the problem still applies).

Normally, I'd just increase the ARC size to compensate (i.e. in this initial case setting thr max to 36GB) so that when all VMs are on the ARC size gets reduced back to 32GB, but if I'm understanding what people have said here correctly, once you start VMs TrueNAS effectively resets the max to 50% and then starts pulling away RAM from ARC, meaning this isn't even an option -_-. Is that true?

Is there any kind of workaround that persists? As needing to readjust the value via the console everytime a VM is started (likely on reboot of the system) is just too jank to me.

Please tell me that charts don't work this way as well...
 

mroptman

Dabbler
Joined
Dec 2, 2019
Messages
23
All my testing so far points to:
  • With 0 VMs running, ARC will use 50% of system memory (only verified with a 64GB memory system)
  • With 1-2 VMs running (probably more), ARC decreases to ~20% or less of system memory and [ARC] continues to decrease as the VMs consume memory while free memory increases
    • Lowest ARC size was about 0.6 GiB
    • When the all VMs are powered off, ARC increased back to 50% system memory
I cannot understand why this is happening. Perhaps memory tunables could change/configure memory management behavior with QEMU. I'd like to believe that most Scale users want memory usage to behave like on Core, IE "unused memory is wasted memory". The current memory usage state in Scale looks like an anti-pattern: memory demands increase, so ARC is decreased first even when there is free memory. It seems very strange that with 2 VMs running (or more) there would be more free memory and with 0 VMs less free memory.

As a sanity check, my Core system (32 GB memory) that has been serving my non-commerical homelab needs since 2015 reports:
  • 8.9 GiB Services
  • 22.3 GiB ARC
  • 0.5 GiB memory free
@oblivioncth Looks like we are in similar situations. I too am trying to migrate off of an ESXi "all in one" setup with Core + ESXi VMs to using Scale + KVM instead. Unfortunately the Scale Kubernetes app functions look really cool, but I have not had the time to dedicate to learning this tech stack yet nor would I trust any of the community provided helm charts from a security perspective. It is just too easy to slip untrusted software into a container these days; cannot take that risk.
 

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
@mroptman not that I expect much from it but you can vote on the suggestion to change this here:


I only just started looking into it but it does seem this may just be a matter of Linux being much less efficient at using RAM for ARC than BSD. Going off what you noted I found the following:


Which basically says that (regardless of how likely this is to occur) in a theoretical worst case scenario, because of Linux memory management overhead that I know almost nothing about, the amount of memory consumed by ARC activities can be twice as much as is actually effectively utilized. This basically means that to be 100% save you always need to have double the RAM of your cache size, hence the 50% default.

Technically, the logic SCALE uses is still flawed in that case as it pulls all RAM for VMs out of ARC when it only needs to do half by that same principle. E.g. VM needs 4GB of RAM, 2 can be pulled from ARC, which means 2 less needs to be ready for this overhead so it can be taken from "free" RAM.

If this is whats happening:
1) I'm shocked this is the first time I'm hearing of this as that seems like a major shortcoming of ZFS on Linux
2) At the very least the TrueNAS UI should make it clear that that memory isn't actually free but is instead provisioned for ARC reliability. I imagine it doesn't do this currently as it isn't a stat reported by ZFS, but just a practice, so it would be on TrueNAS to determine the "reserved" portion to assign a percentage to, the problem being its not actually reserved, it's just the result of capping the ARC at 50%. It's negative space, a shadow. If you increase ARC to 70% it isn't that now 30% is reserved, you're just playing closer to the fire now and essentially missing "theoretically required" memory, so I do see how labeling that "usage" is tricky.

All in all I'm still kind of astonished and want to look into this more as I feel like I would have seen such a limitation talked about more. I think I may have seen some sentiment that you can reasonably increase above 50% on a dedicated ZFS appliance because the things that would cause that worse case scenario aren't really happening on such a machine, but I'd have to go back and check to be sure. Though the question then would be does that change the moment you start running VMs and other more non-ZFS focused services?
 

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
Tangential, but a few days ago the ability to pass through individual USB devices was just merged into the source for Bluefin (22.12):


This is much better then having to passthrough a whole controller, namely for Hass.io, and makes SCALE lineup better an ESXi replacement for my uses.

Think I'll try the switch when Bluefin releases and just hope the VM/RAM situation is improved over time, as well as getting more RAM if needed.
 

inman.turbo

Contributor
Joined
Aug 27, 2019
Messages
149
instead of scaling back ARC when a vm started which needs some of that memory, scale will just warn you, then crash your system (out of memory)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
@inman.turbo, I must kindly ask that the discussion not be spread out all over the place.

 
Status
Not open for further replies.
Top