Services uses way too much RAM

Dunuin

Contributor
Joined
Mar 7, 2013
Messages
110
For months I'm seeing services using way too much RAM. Sometimes it looks normal, then it increases very high forcing the ARC to shink to a minimum and after some hours it drops down to normal values.
It looks like this:
services1b.png


I tried to don't start services that aren't critical to see if one of that services is creating some memory leaks or something but its still there. Right now I'm down to only running SSH, SMB, SMART, UPS and SNMP. No VMs, Jails or Plugins are running. I normally would like to start 3 VMs using 3+1+0,5GB RAM but bhyve won't let me start them because I'm out of memory...

I don't know how to find out what is causing this. Processes shown by top don't use all that RAM:
services3.png

This time most of the RAM is "Inact" but I also seen nearly all of the RAM being shown as "wired".

Any ideas?

Edit:
And for about a week I always get that "Getting started" popup when logging in. Shouldn't that only show up on the first visit after installing TrueNAS?

Edit:
services4.png
 
Last edited:

Dunuin

Contributor
Joined
Mar 7, 2013
Messages
110
I watched the RAM now for 24 hours and services are always that high that ARC is only using between 2 and 3 GB of my 32 GB RAM. That really sounds not healthy when 34 TB of disks only got 2GB of ARC...
services.png
services2.png
services3.png
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
Well we know you have 32GB RAM
We know you have a E2-1230V3
And we know you have 3 pools, one of which seems to be USB based - which is a big no no

I think its doing really well to run on no motherboard, and have three pools with no disks.

So - what do you expect us to do?
 

Dunuin

Contributor
Joined
Mar 7, 2013
Messages
110
Well we know you have 32GB RAM
We know you have a E2-1230V3
And we know you have 3 pools, one of which seems to be USB based - which is a big no no

I think its doing really well to run on no motherboard, and have three pools with no disks.
So - what do you expect us to do?
Its a Supermicro X10SLL-F, Intel E3 1230v3, maximal possible ECC RAM (32 GB), Mellanox Connectx-3, 2x LSI 9211-8i IT-Mode, 4x 8TB WD Whites, 5x Intel S3710 400GB and 3x Intel S3710 200GB (that aren'T in use now because I can't add them as special devices until I destroy my HDD pool and recreate it with native instead of GELI encryption). It was initially created as FreeNAS 11.2 so until this week (destroyed and recreated the SSD pool this week with ZFS native encryption) both pools used GELI encryption but I needed a place to store some scripts that can be executed while the pools aren't unlocked so I used two USB Sticks as a mirror to store these. I know that ZFS will kill USB sticks in no time but I only need these to store some KB of scripts and that worked fine so far (over a year now). But now that my SSD pool uses native encryption I will most likely move these scripts to a unencrypted dataset on the SSD pool which wasn't possible before.

The question is why TrueNAS services are using that much RAM if basically nothing is running except for some network shares? I guess most of that is some kind of non ZFS caching or memory leaks because the sum of all RAM used by processes shown by top is very small. All drives are using ZFS and ZFS uses its ARC for caching, so what is forcing the ARC to shrink down to 2 GB all the time? As far as I understand it RAM should be used for running processes and all RAM that isn't used by processes should be used by ZFS to speed up IO instead of being wasted by being free and unused. Atleast that is what all my other linux servers are doing running ZFS. There for example the page file cache won't force the ARC to shink to a minimum. And there is basically no other mentionable caching because ZFS is used for everything so there should be a big ARC.

If I for exampe run sysctl vfs.zfs.arc_min="8589934592" the ARC will grow from 2 to 8GB an stay there instead of always being between 2 and 3GB. So I guess something is caching and has higher priority than the ARC so the ARC is forced to shink to a minimum size to just being able to operate. So what might be caching there (if it is caching at all) and should that be really be more imporant than ZFS?
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
OK - so its proper kit. 32GB is plenty to Run TN in.
I agree with what you are saying - if you look at my setup - I have a single Jail and services are using 22.5GB
1634960994346.png

The Jail is Plex - which generally shouldn't be memory hungry
On another NAS with only 8GB RAM (its a backup used solely to receive snapshots so ARC is rather less important
1634961374489.png

Both servers have been running for a while.

I am hoping someone more knowledgeable than me will be along soon as all I can do is confirm that you seem to have an issue.

One suggestion I do have is to make a backup of the config file and then reinstall - just to see if that makes any difference. Ideally you would do a blank install, import the pools and see what happens.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Its a Supermicro X10SLL-F, Intel E3 1230v3, maximal possible ECC RAM (32 GB), Mellanox Connectx-3, 2x LSI 9211-8i IT-Mode, 4x 8TB WD Whites, 5x Intel S3710 400GB and 3x Intel S3710 200GB (that aren'T in use now because I can't add them as special devices until I destroy my HDD pool and recreate it with native instead of GELI encryption). It was initially created as FreeNAS 11.2 so until this week (destroyed and recreated the SSD pool this week with ZFS native encryption) both pools used GELI encryption but I needed a place to store some scripts that can be executed while the pools aren't unlocked so I used two USB Sticks as a mirror to store these. I know that ZFS will kill USB sticks in no time but I only need these to store some KB of scripts and that worked fine so far (over a year now). But now that my SSD pool uses native encryption I will most likely move these scripts to a unencrypted dataset on the SSD pool which wasn't possible before.

The question is why TrueNAS services are using that much RAM if basically nothing is running except for some network shares? I guess most of that is some kind of non ZFS caching or memory leaks because the sum of all RAM used by processes shown by top is very small. All drives are using ZFS and ZFS uses its ARC for caching, so what is forcing the ARC to shrink down to 2 GB all the time? As far as I understand it RAM should be used for running processes and all RAM that isn't used by processes should be used by ZFS to speed up IO instead of being wasted by being free and unused. Atleast that is what all my other linux servers are doing running ZFS. There for example the page file cache won't force the ARC to shink to a minimum. And there is basically no other mentionable caching because ZFS is used for everything so there should be a big ARC.

If I for exampe run sysctl vfs.zfs.arc_min="8589934592" the ARC will grow from 2 to 8GB an stay there instead of always being between 2 and 3GB. So I guess something is caching and has higher priority than the ARC so the ARC is forced to shink to a minimum size to just being able to operate. So what might be caching there (if it is caching at all) and should that be really be more imporant than ZFS?
Your equipment seems fine, as far as that goes.

Your HDD pool is 94% full -- I wonder if that has something to do with the problem?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
Hadn't spotted that. I've not run a pool that full.
 

Dunuin

Contributor
Joined
Mar 7, 2013
Messages
110
Your equipment seems fine, as far as that goes.

Your HDD pool is 94% full -- I wonder if that has something to do with the problem?
I set a pool wide quota to 90% so ZFS always got some free space to work with. So its only 94% of 90% so real 85%. But thats only because I needed some space temporarily because I'm right now doing a Upgrade to Win11. Its the same problem when my HDDpool is only 72% filled. In general I try to keep my pool below 80% usage.

When I reboot my server it only uses around 4GB for services 2GB for ARC and the 26GB are free. Then the ARC is growing using the free RAM so its more like 4GB for services, 25GB ARC and 1GB free. And then the services are growing and its more like 8-14 GB for services but also possible that is goes up to 29GB like in the pictures above. Sometimes the servies will decrease in RAM size over time and leave ARC space to grow, sometimes not and ARC will stay at only 2GB while everything else is used by services.
 

Dunuin

Contributor
Joined
Mar 7, 2013
Messages
110
I found a thread where someone reported something similar and he wrote that switching from Jumboframes back to 1500 MTU fixed it for him. I'm not sure when it started here (months ago) but it could be at the time I installed my 10Gbit NIC which is using jumboframes because otherwise the CPU can't handle all the packets and only 3-4 Gbit of the 10Gbit would be usable. Does someone know if the network stack or NIC driver could cause the services to grow? I think the services are growing faster if I copy over alot of data. But that also could be NFS oder SMB or something else doing caching.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I found a thread where someone reported something similar and he wrote that switching from Jumboframes back to 1500 MTU fixed it for him. I'm not sure when it started here (months ago) but it could be at the time I installed my 10Gbit NIC which is using jumboframes because otherwise the CPU can't handle all the packets and only 3-4 Gbit of the 10Gbit would be usable. Does someone know if the network stack or NIC driver could cause the services to grow? I think the services are growing faster if I copy over alot of data. But that also could be NFS oder SMB or something else doing caching.
The network stack can consume memory for buffers and so forth, but I don't know that it would ever use tens of gigabytes.

Are you using any network tunables? Here are the ones I use for 10GbE:
network-tunables-2021-10-03.jpg


The hw.sfxge.* settings are specific to my hardware, and the cc_cubic_load and net.inet.tcp.cc.algorithm settings are for congestion control, but the rest should be applicable to any system using a 10GbE NIC and should limit the total memory used for network buffers.

These may or may not help, and it won't hurt anything to try them.

I still wonder if this has to do with your pool being so full; even at 80% you're still running above the recommended maximum if I remember correctly. I believe that ZFS handles things differently when a pool is nearly full, and that this may be the source of your problems. Perhaps someone more knowledgeable about the inner workings of ZFS will enlighten us.
 

Dunuin

Contributor
Joined
Mar 7, 2013
Messages
110
The network stack can consume memory for buffers and so forth, but I don't know that it would ever use tens of gigabytes.

Are you using any network tunables? Here are the ones I use for 10GbE:
View attachment 50179

The hw.sfxge.* settings are specific to my hardware, and the cc_cubic_load and net.inet.tcp.cc.algorithm settings are for congestion control, but the rest should be applicable to any system using a 10GbE NIC and should limit the total memory used for network buffers.

These may or may not help, and it won't hurt anything to try them.

I still wonder if this has to do with your pool being so full; even at 80% you're still running above the recommended maximum if I remember correctly. I believe that ZFS handles things differently when a pool is nearly full, and that this may be the source of your problems. Perhaps someone more knowledgeable about the inner workings of ZFS will enlighten us.
Thanks, I will try some network tunables. Right now I'm not using any tunables except for powerd_enable and powerd_flags.

I've read that pools might getting slower after reaching 80% and ZFS will switch into panic mode after 90% and above. Thats why I said the quota to 90 %. And normally I'm under 80% so that should be fine.
 

Dunuin

Contributor
Joined
Mar 7, 2013
Messages
110
Now the ARC is down from 8GB to 2-4GB again even when minimum arc size is set to 8GB...
services4.png

services5.png
 

Dunuin

Contributor
Joined
Mar 7, 2013
Messages
110
So after reboot without doing any writes sevices stayed at 8GB for 11 hours and remaining used for ZFS. Then I copied over a 8GB file via SMB. While transfering the services RAM was going back and forth between 8 and 10 GB. Right after the transfer has completed services jumped from 8GB to 18GB and then stayed there...

So it should be some non ARC file caching, some strange SMB problem, some ZFS stuff or a general problem with the network. According to top that additional 10GB RAM is "wired" and belongs to no process...

Edit:
If I do reads using SMB the ARC is growing again and services are shrinking down to 8GB...so it really looks like some kind of non ARC caching that is bloating up the services. Why is TrueNAS caching 10GB as services if I write 8GB file and if that is some kind of read cache, why isn't it using the ARC for that?
Doesn't make sense to me that two caches are fighting against each other.
 
Last edited:

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
If you search TrueNas Issues for "Laundry" you will see a few more angles on this.
 
Top