SOLVED RAM & Pagefile issues

Status
Not open for further replies.

HughT

Dabbler
Joined
Nov 21, 2017
Messages
10
Hi to the FreeNAS community! First post here and relatively new to FreeNAS so apologies if I've not followed the right procedures (I have read the forum guidelines and searched for this issue but couldn't find a solution).

My daily emailed security run output last night showed dozens of get_swap_space failed messages (as per below) which indicate that it had run out of space in the pagefile.

Code:
freenas.local kernel log messages:
> swap_pager_getswapspace(19): failed
> swap_pager_getswapspace(19): failed
> swap_pager_getswapspace(21): failed
.......
> swap_pager_getswapspace(11): failed
> swap_pager_getswapspace(6): failed
> swap_pager_getswapspace(17): failed
-- End of security output --

Looking at the reports (see images below which I've tried to show in chronological order) I saw that the system had run out of RAM (32Gb) so had overflown into the pagefile. I rebooted the system this morning and the RAM usage returned to more normal levels however it does seem to be slowly creeping up and is already at c16Gb (slightly higher than I would expect assuming 1Gb per 1Tb rule of thumb and with the system containing c8Tb of files).

Untitled-1.png


Looking this with my inexperienced eye, a few things of note stick out.
1) The pagefile size decreased about the time that I migrated from FreeNAS 9 to 11 - dont think this is relevant
2) Historically my memory usage has been c100%, is this normal??
3) Sometime around the 29th of Jan the graph shows my total memory available dropping to c18Gb - presume just a GUI reporting error however at the same time the pagefile started being used. Over the next week the total memory available increased back to 32Gb but the pagefile increased to 100% at the same time.

Has anyone experienced something similar, is it a memory leak? My main concerns are
a) memory usage being so high when I only have 8Tb of files so expecting c8Gb of usage (I thought I had over provisioned with 32Gb - do I need more?)
b) that I didn't notice the system ticking over into the page file and then running at 100% - is there an easy way of tracking this using the daily reports?

As an incidental I have been getting infrequent "mps1: Out of chain frames, consider increasing hw.mps.max_chains." which seem to be related to FreeBSD, the forum discussions of these are a little bit over my head and seem to suggest this shouldn't be causing an issue...

Any thoughts would be much appreciated!

Thanks

Hugh

System Specs:
MOBO/CPU: X11SSL-CF / Xeon E3-1220 v6
HDD: 8x 4TB WD Black RAIDZ2
RAM: 2x 16GB Samsung DDR4 ECC

Software:
FreeNAS-11.1-RELEASE
No Jails or VM's

Workflow:
An office file server with one share over SMB (only other service running is SMART) running an periodic snapshot from 9am to 6pm with 2 week retention. 8Tb used and 12Tb available in dataset with c500k files. c30 users accessing each with individual accounts that are all part of the same group. I run a separate nightly process that backs up to secondary servers but is initiated by them.
 

HughT

Dabbler
Joined
Nov 21, 2017
Messages
10
Hi Dlavigne,

Thanks for coming back so swiftly. I'm going to run the update tonight after all users are off and backup processes have run and will then feedback tomorrow.

Thanks. H
 

HughT

Dabbler
Joined
Nov 21, 2017
Messages
10
Hi guys, updated last night and let a day of usual activity to occur and it seems to have fixed the problem, as per the image below which shows yesterday vs today "wired / kernel" memory is staying at a much more reasonable level through peak usage.

Untitled-1.png


Thanks for the help!

Hugh
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
Why is there such a large amount of inactive memory on Wednesday? (which is presumably after you upgraded.)
 

HughT

Dabbler
Joined
Nov 21, 2017
Messages
10
Hi toadman,

Mid point of the left graph (tues morning) was when I rebooted under 11-stable. Left of the right graph is when I upgraded to 11-U1 (tues evening) and rebooted again. Overnight no users were using the system causing the flat usage up to the mid point of the right graph at which point everyone starts using it (around 30 users) and the snapshots are run hourly causing the RAM spike from c8am to c6pm.

With my limited knowledge of FreeNAS I think the old information is stored in the RAM on the off chance it might be requested again. When the server stopped being used it looks like the system has cleared the inactive RAM into free RAM (you can see the drop at the rightmost end of the right graph).

Interestingly enough over the last two hours i have seen the Wired / kernel RAM allocation gradually ticking up 10% for no reason I can discern so will keep monitoring and feedback if anything further anomalous happens.

Thanks
 
Last edited by a moderator:

toadman

Guru
Joined
Jun 4, 2013
Messages
619
given that the ARC is included in "Wired" I would have expected the ramp of Wired to look like it did on the left. i.e. as the ARC fills up Wired ramps up to a large percentage of the total. Inactive memory is just sitting around doing nothing, so you don't want large amounts of it. Maybe the ARC just hasn't warmed up yet.

I don't think the ARC evicts things unless it has to do so, so I would expect the ARC to keep live data as Wired, not release large amounts to inactive. I could be wrong. So if you continue to see high amounts of Inactive memory (as a percentage of the system) then something else may be amiss. For example, on my 16GB system my inactive memory amount is typically below 1GB total.
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
i.e I think it should look more like your week 49-50 in the original graphs.
 

HughT

Dabbler
Joined
Nov 21, 2017
Messages
10
Ah, I wasn't aware ARC sat there (I actually thought it was in inactive, clearly I still have some reading up to do!).

Wired usage did eventually tick up but it was actually overnight when the backups are run, then though the day it's been slowly ticking down, odd...

As an aside I did get another "mpr0: Out of chain frames, consider increasing hw.mpr.max_chains." from the daily log but apart from that the system seems okay.

Also a huge thanks for all your insight!

Untitled-1-Recovered.png
 
Status
Not open for further replies.
Top