What is Services memory?

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
I have a massive amount of duplication several of my datasets. To combat this, I've installed rmlint in a jail. Fantastic program, BTW.

Whenever I run rmlint, my Services memory balloons. Shutting the jail down doesn't make the Services memory usage drop. Looking at top and the memory reporting show that I have over 20G of memory Inactive.

What is Services representing and how do I reclaim it? Normally my Services memory usage is very stable.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
Post the output of top -o res, please.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
VMs, Jails, SMB, NFS, RAMdisks (if you have them)

Try restarting SMB, it's often caching a lot, so can use a lot of memory if there's some free.
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
Post the output of top -o res, please.

Is there something specific you're looking for?

Code:
last pid: 32094;  load averages:  0.17,  0.24,  0.25                                                                                                                                                                    up 0+17:16:21  08:15:25
123 processes: 1 running, 122 sleeping
CPU:  0.8% user,  0.1% nice,  0.1% system,  0.1% interrupt, 99.0% idle
Mem: 832M Active, 24G Inact, 7260K Laundry, 36G Wired, 1648M Free
ARC: 29G Total, 24G MFU, 3568M MRU, 3244K Anon, 229M Header, 1250M Other
     26G Compressed, 28G Uncompressed, 1.09:1 Ratio
Swap: 6144M Total, 6144M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 7104    975       69  52    0  2872M   645M uwait    8   3:56   0.09% java
 6541 syncthing    27  29    9  1472M   492M uwait    7  33:53   0.03% syncthing
 5920 plex         19  52    0   793M   474M uwait    4   5:51   0.08% Plex Media Server
 1793 root         32  20    0   552M   316M kqread   2   5:16   0.01% python3.9


VMs, Jails, SMB, NFS, RAMdisks (if you have them)

Try restarting SMB, it's often caching a lot, so can use a lot of memory if there's some free.

It's not SMB. My services memory is generally very stable. It's rmlint causing the ballooning. I can watch it increase with every invocation. If I don't run rmlint, I don't get a services memory increase.

That's why I'm trying to figure out why it's not going back down once I shut down the rmlint jail.
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
Looking at the reporting graphs from when I ran rmlint previously, it looks like all the memory gets dumped into Inactive and this shows up as Services on the UI.

Not sure why ARC isn't reclaiming the memory. It looks like it might end up getting released as Free but I'm not sure how long that will take.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
Seems like a „reporting problem“ to me.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
To me it looks like all the ARC, that is "claimed"/used by VM/jail storage, is de facto counted as "Service memory". Usually inactive mem gets freed (after some time). How long did you wait after searching for duplicates? Is the system anywhere near starting to swap?

BTW: Where do you run your duplicate files finder scripts?
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
To me it looks like all the ARC, that is "claimed"/used by VM/jail storage, is de facto counted as "Service memory". Usually inactive mem gets freed (after some time). How long did you wait after searching for duplicates? Is the system anywhere near starting to swap?

Inactive is showing as Services, not ZFS Cache. So far it's been three hours since I shut the jail down and Inactive has not changed. ARC did increase when Plex kicked off it's scheduled tasks, but that just consumed half of the Free memory. It hasn't swapped yet.

When I noticed the problem previously, it was after the ARC had been reduced to it's bare minimum so I rebooted the machine. I'm not sure how long I'm going to give it this time. I'd like to get more duplicates cleared.

BTW: Where do you run your duplicate files finder scripts?

I just created a jail and attached my datasets to it. Then I run rmlint and execute the scripts from the same directory I ran rmlint from.

If you want to see the rmlint scripts properly formatted, you'll need to install bash and use that to run them. Additionally, I install ncdu as I find it useful to determine the best places to start rmlint so that it doesn't take forever.
 
Joined
Oct 22, 2019
Messages
3,641
How many files are you dealing with? Thousands? More than 10,000? More than 50,000?
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
Inactive is showing as Services, not ZFS Cache.

That's what I meant, when I wrote "seems like a 'reporting problem'". The middleware in TN is doing fancy things. Had the same "problem" with TN12 (don't remember the minor), when replicating locally. Half the ARC (at the minimum) got "attributed" to services.

So far it's been three hours since I shut the jail down and Inactive has not changed. ARC did increase when Plex kicked off it's scheduled tasks, but that just consumed half of the Free memory. It hasn't swapped yet.

So no harm done.


[...]
I just created a jail and attached my datasets to it. Then I run rmlint and execute the scripts from the same directory I ran rmlint from.

Well, just imagine the ARC gets grabbed by / "attributed to" this jail (seen by TN as a "service") ... that's, what I tried to explain above.

If you want to see the rmlint scripts properly formatted, you'll need to install bash and use that to run them. Additionally, I install ncdu as I find it useful to determine the best places to start rmlint so that it doesn't take forever.

No problem on my side :wink:
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
That's what I meant, when I wrote "seems like a 'reporting problem'". The middleware in TN is doing fancy things. Had the same "problem" with TN12 (don't remember the minor), when replicating locally. Half the ARC (at the minimum) got "attributed" to services.

I thought so but just wanted to confirm. :)


So no harm done.

Except for the fact that it evicted almost half my ARC and still hasn't gone down.


Well, just imagine the ARC gets grabbed by / "attributed to" this jail (seen by TN as a "service") ... that's, what I tried to explain above.

But the jail has been stopped since before I started this thread. I would have expected the ARC to begin creeping back up.

It almost seems like the ARC can't use any memory that's marked as Inactive and has to wait for it to move to Free.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
You name it. It almost seems … I don‘t believe the colored dash. It‘s like „motor temp deduced by the _shown_ cooling temps“.

Those days, when my replication jobs seemed to take over memory, the ARC „returned“ and services‘ mem got freed as soon as the regular caching „pressure“ came back (by net backups in my case). If I were you, I‘d wait and see … as long as there is no swapping, all is fine IMHO.
 
Joined
Oct 22, 2019
Messages
3,641
How many files are you dealing with? Thousands? More than 10,000? More than 50,000?
What about this? (I'm referring to the directory tree that you're running the rmlint scan against.)
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
You name it. It almost seems … I don‘t believe the colored dash. It‘s like „motor temp deduced by the _shown_ cooling temps“.

Those days, when my replication jobs seemed to take over memory, the ARC „returned“ and services‘ mem got freed as soon as the regular caching „pressure“ came back (by net backups in my case). If I were you, I‘d wait and see … as long as there is no swapping, all is fine IMHO.

Maybe. My concern is that without knowing what's going on under the covers, it's going to lead to a performance degradation or a sudden crash. I don't want other workloads to break because I'm trying to clean up my files.

What about this? (I'm referring to the directory tree that you're running the rmlint scan against.)

Sorry, I didn't see the original post. The dataset is missive, but I'm scanning smaller chunks in order to try and break it up. I'm not particularly concerned about rmlint's memory usage, especially as I'm running it in paranoid mode.

My issue is that I would have expected the memory to be freed up once I shut down the jail. Inactive still hasn't dropped any since I shut down the jail before posting this thread.
 
Joined
Oct 22, 2019
Messages
3,641
but I'm scanning smaller chunks in order to try and break it up.
There's a reason I'm asking.

In total, about how many files are being crawled by rmlint? Thousands? Over 10,000? Over 50,000? Not just "one" of the runs, but added up all together, even if they are broken down into "smaller chunks".
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
This issue prompted me to look through the memory reporting graph. Going back to the beginning of the year, I can't find Inactive memory going down without a reboot.

There's a reason I'm asking.

In total, about how many files are being crawled by rmlint? Thousands? Over 10,000? Over 50,000? Not just "one" of the runs, but added up all together, even if they are broken down into "smaller chunks".

The smaller chunks are achieved by running rmlint over a directory instead of the whole dataset. So it would only see those files. I would wager that today's runs were over 50k files with a mix of small and large files.

What's the reason?
 
Joined
Oct 22, 2019
Messages
3,641
I would wager that today's runs were over 50k files with a mix of small and large files.
That's a lot of metadata to crawl. What you could be seeing is that much RAM is used for metadata ARC. ZFS, to this day, has a propensity to aggressively evict metadata from the ARC, even if there is no immediate userdata to take its place.

Because it's being run in a jail, it could be "reported" as "Services" memory (noted by @awasb), and then as it's evicted, it fails to report this as "Free".


Gauge this output in three distinct moments. (Run this on the TrueNAS system, not within a jail.)
Code:
arc_summary | head -n 20

a) Before you run an extensive rmlint crawl.

b) Immediately after you finish an rmlint crawl. (Or near the end of a crawl.)

c) An hour after an rmlint crawl (without using the server much, as to not inadvertently read records into the ARC).
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
That's a lot of metadata to crawl.

That's why I'm trying to break it up. :)

What you could be seeing is that much RAM is used for metadata ARC. ZFS, to this day, has a propensity to aggressively evict metadata from the ARC, even if there is no immediate userdata to take its place.

Because it's being run in a jail, it could be "reported" as "Services" memory (noted by @awasb), and then as it's evicted, it fails to report this as "Free".

But wouldn't the ARC start filling back up fairly quickly?

Gauge this output in three distinct moments. (Run this on the TrueNAS system, not within a jail.)
Code:
arc_summary | head -n 20

a) Before you run an extensive rmlint crawl.

b) Immediately after you finish an rmlint crawl. (Or near the end of a crawl.)

c) An hour after an rmlint crawl (without using the server much, as to not inadvertently read records into the ARC).

I'll give this a try once the existing workloads finish. I'll have to try and figure out when the best time to try and get the third one is. I have a decent amount of things running relatively constantly. Which is part of why I'm surprised the ARC isn't getting bigger.
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
Turns out I had cleaned things up more than I expected. This run was against 1900 files and deleted 63 files saving 23GB.

It looks like it's the rmlint script causing the jump, not the command itself. I assume it's due to running it in paranoid mode which invokes this function for each deletion.

Code:
check_for_equality() {
    if [ -f "$1" ]; then
        # Use the more lightweight builtin `cmp` for regular files:
        cmp -s -- "$1" "$2"
    else
        # Fallback to `rmlint --equal` for directories:
        "$RMLINT_BINARY" -p --equal  -- "$1" "$2"
    fi
}


After starting the jail.
Code:
ARC size (current):                                    41.3 %   26.0 GiB
        Target size (adaptive):                        43.6 %   27.4 GiB
        Min size (hard limit):                          3.2 %    2.0 GiB
        Max size (high water):                           31:1   62.8 GiB
        Most Frequently Used (MFU) cache size:         84.2 %   20.8 GiB
        Most Recently Used (MRU) cache size:           15.8 %    3.9 GiB
        Metadata cache size (hard limit):              75.0 %   47.1 GiB
        Metadata cache size (current):                  5.2 %    2.5 GiB
        Dnode cache size (hard limit):                 10.0 %    4.7 GiB
        Dnode cache size (current):                    13.6 %  657.8 MiB


After running rmlint.
Code:
ARC size (current):                                    40.5 %   25.5 GiB
        Target size (adaptive):                        40.6 %   25.5 GiB
        Min size (hard limit):                          3.2 %    2.0 GiB
        Max size (high water):                           31:1   62.8 GiB
        Most Frequently Used (MFU) cache size:         79.3 %   19.1 GiB
        Most Recently Used (MRU) cache size:           20.7 %    5.0 GiB
        Metadata cache size (hard limit):              75.0 %   47.1 GiB
        Metadata cache size (current):                  5.0 %    2.4 GiB
        Dnode cache size (hard limit):                 10.0 %    4.7 GiB
        Dnode cache size (current):                    14.6 %  703.7 MiB


After running rmlint.sh.
Code:
ARC size (current):                                    15.4 %    9.7 GiB
        Target size (adaptive):                        15.4 %    9.7 GiB
        Min size (hard limit):                          3.2 %    2.0 GiB
        Max size (high water):                           31:1   62.8 GiB
        Most Frequently Used (MFU) cache size:         91.1 %    7.6 GiB
        Most Recently Used (MRU) cache size:            8.9 %  757.8 MiB
        Metadata cache size (hard limit):              75.0 %   47.1 GiB
        Metadata cache size (current):                  4.9 %    2.3 GiB
        Dnode cache size (hard limit):                 10.0 %    4.7 GiB
        Dnode cache size (current):                    14.7 %  708.3 MiB


After shutting down the jail.
Code:
ARC size (current):                                    16.1 %   10.1 GiB
        Target size (adaptive):                        16.2 %   10.2 GiB
        Min size (hard limit):                          3.2 %    2.0 GiB
        Max size (high water):                           31:1   62.8 GiB
        Most Frequently Used (MFU) cache size:         91.6 %    8.0 GiB
        Most Recently Used (MRU) cache size:            8.4 %  754.7 MiB
        Metadata cache size (hard limit):              75.0 %   47.1 GiB
        Metadata cache size (current):                  5.2 %    2.4 GiB
        Dnode cache size (hard limit):                 10.0 %    4.7 GiB
        Dnode cache size (current):                    14.6 %  706.9 MiB
 
Top