SOLVED Swapspace Lockup

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
I have been getting kernel errors in my email about swapspace running out. Today, the machine actually hung completely. I can't find the source in the logs as to which process is hogging the memory because its run along for a couple months and swap never got to be more than 2g. The only thing that's more recent is Plex. Even that though has run weeks without issue. Ideas of what logs I can dig through to see what's killing the server?

swapframe.png


FreeNAS-11.2-U4.1
(Build Date: May 10, 2019 21:33)
Processor:
AMD Ryzen 5 2600 Six-Core Processor (12 cores)
Memory:
16 GiB

Jails --
DNSMasq
192.168.0.2/24
none
up
jail
11.2-RELEASE-p9

plexmediaserver-plexpass
DHCP: 192.168.0.143
none
up
pluginv2
11.2-RELEASE-p9

rslsync
192.168.0.60/24
none
up
pluginv2
11.2-RELEASE-p9

ZFS Info
I have 8 4TB drives in RAIDZ1. And the boot is 120GB SSD (which I realize is overkill).

nasverse.rrealms.com kernel log messages:
> swap_pager_getswapspace(2): failed
> swap_pager_getswapspace(32): failed
> swap_pager_getswapspace(5): failed
> swap_pager_getswapspace(32): failed
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
So looking at the data you provided it looks like you are using a lot of Swap Space. A properly configured system will rarely use swap space so I'd recommend more RAM. Lets look at a few things...

1. On your dashboard how much RAM does FreeNAS say it has? I would expect 16GB but maybe it's less.
2. Provide a screen shot of the Memory (like you did the swap above).
3. Provide the output of
Code:
gpart show
so we can verify the swap partition size and which drives have it. By default you should have 2GB of swap per hard drive (total of 16GB for your 8 drives).
4. Maybe you have a run-away jail, I'd disable the Plex jail first to ensure it's not causing your issue, since it was the most recent change. If that still fails then disable the other jails. You will need to monitor your swap file usage, any significant use is not good and as I said, you may need more RAM. You said you manually created the jail, could be something you did. You might also try to delete that jail and try again.

That is all I can think about off the top of my head. BTW, you shouldn't bump a thread, it makes the responce counter go up from 0 to 1 and many of us thing the problem is being addressed. I start looking for a 0 count first and if I have time after answering those postings, I'll look at some others.

Good Luck
 

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
So looking at the data you provided it looks like you are using a lot of Swap Space. A properly configured system will rarely use swap space so I'd recommend more RAM. Lets look at a few things...

1. On your dashboard how much RAM does FreeNAS say it has? I would expect 16GB but maybe it's less.
2. Provide a screen shot of the Memory (like you did the swap above).
3. Provide the output of
Code:
gpart show
so we can verify the swap partition size and which drives have it. By default you should have 2GB of swap per hard drive (total of 16GB for your 8 drives).
4. Maybe you have a run-away jail, I'd disable the Plex jail first to ensure it's not causing your issue, since it was the most recent change. If that still fails then disable the other jails. You will need to monitor your swap file usage, any significant use is not good and as I said, you may need more RAM. You said you manually created the jail, could be something you did. You might also try to delete that jail and try again.

That is all I can think about off the top of my head. BTW, you shouldn't bump a thread, it makes the responce counter go up from 0 to 1 and many of us thing the problem is being addressed. I start looking for a 0 count first and if I have time after answering those postings, I'll look at some others.

Good Luck

The box has a single stick of 16GB Kingston ECC memory in it. Swap stays at zero usage for days (and sometimes weeks) of usage... the only thing that taxes the box is when torrent sync is backing big changes from my development PC and when Plex is generating thumbnails and stuff. That 6 core AMD CPU just blasts through everything ... it is so overkill in every other regard... it rarely gets above 5% utilization and the 16 gigs of memory again hardly ever goes above 8 gigs of usage. There is SOMETHING that happens where the swap space just starts to be consumed.

Here's how it looks right now... and how it looks before whatever triggering event is happening.
mempic.png


When it goes haywire it takes about 30 hours to croak... so this is some sort of leak I think. I had seen something related to something that happens after a scrub. Two scrubs just happened to kick off today, one of the system and one of the data pool... but everything looks fine.

Here's a picture of the swap consumption progression: (It takes almost 30 hours for it to hang)
swapit.png


Some other possible factors, I have one jail in which I'm running DNSMasq, and Squid (to act as the proxy for the network). I initially thought I had configured squid weird, but there's nothing in the logs and it's configured to the defaults. There's a minimal configuration of nginx in there with webmin to configure the services in that jail. That's the only thing that's not a standard plug-in. Plex and Rslsync are standard and I've been keeping them up to date with the iocage update command line thing until that's fixed in a future release.

I do notice the system shows 8 GB of swap... even though there are 8 drives... which should be 16GB.

swapinfo.png


As a defensive measure, I created a cron job that runs every 10 minutes looking for
Code:
dmesg -a | /usr/bin/egrep -e "swap_pager_getswapspace.*failed"

and if it does it will snapshot /var/log/* and do a "shutdown -r now" and hopefully that will catch whatever is doing this in the act.

Here's the gpart show you requested:
Code:
=>       40  488397088  nvd0  GPT  (233G)
         40       1024     1  freebsd-boot  (512K)
       1064  488396056     2  freebsd-zfs  (233G)
  488397120          8        - free -  (4.0K)

=>        40  7814037088  da0  GPT  (3.6T)
          40          88       - free -  (44K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842688    2  freebsd-zfs  (3.6T)
  7814037120           8       - free -  (4.0K)

=>        40  7814037088  da1  GPT  (3.6T)
          40          88       - free -  (44K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842688    2  freebsd-zfs  (3.6T)
  7814037120           8       - free -  (4.0K)

=>        40  7814037088  da2  GPT  (3.6T)
          40          88       - free -  (44K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842688    2  freebsd-zfs  (3.6T)
  7814037120           8       - free -  (4.0K)

=>        40  7814037088  da3  GPT  (3.6T)
          40          88       - free -  (44K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842688    2  freebsd-zfs  (3.6T)
  7814037120           8       - free -  (4.0K)

=>        40  7814037088  da4  GPT  (3.6T)
          40          88       - free -  (44K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842688    2  freebsd-zfs  (3.6T)
  7814037120           8       - free -  (4.0K)

=>        40  7814037088  da5  GPT  (3.6T)
          40          88       - free -  (44K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842688    2  freebsd-zfs  (3.6T)
  7814037120           8       - free -  (4.0K)

=>        40  7814037088  da6  GPT  (3.6T)
          40          88       - free -  (44K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842688    2  freebsd-zfs  (3.6T)
  7814037120           8       - free -  (4.0K)

=>        40  7814037088  da7  GPT  (3.6T)
          40          88       - free -  (44K)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  7809842688    2  freebsd-zfs  (3.6T)
  7814037120           8       - free -  (4.0K)

 
Last edited:

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
I decided to follow the instructions in this post:
Reallocate Swap

16Gswap.png


Here is the swapinfo after a reboot:

16Gswap2.png


I'm thinking at the very least it will double the length of time it takes the leak to frag up everything.

There's plenty of space on that Samsung EVO SSD (240G)... shouldn't the swap actually be 2x the memory size... i.e. 32GB?
 

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
Think I may have found the culprit, though I don't know what to do about it... there's a pattern in the Arc usage that coincides with the swap going crazy.

ZFSARC.png
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Well the way I read this graph is less and less ARC spare is required. The other thing you may be able to say given the fact that SWAP space if being used so heavily is availble RAM is getting shorter. Also you r Memory graph shows Inactive RAM dropping down to practically zero. Glad you realocated the swap space, much better with 16GB, however again, a properly designed and operating system will rarely use swap space. This slows things down considerably. Turn off a jail, wait and see if the problem goes away or never comes back, but it it adequate time so you know if it's the fault. If you turn off all your jails/plugins and the problem persists then I'd write up a bug report.

Something else to try, type
Code:
top
and then
Code:
o
and then
Code:
size
to show the stuff running and have it sorted by size. See what might be eating up your RAM. I'd reboot the system and after a few minutes, take a screen shot. Then wait 24 hours or however long you think you need to wait, 2 days? Take another screen shot. Compare them, is one using more RAM than expected? Do this periodically until you think you can isolate the problem.

I'm out of suggestions right now and good luck!
 

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
So, studying this more. It does appear that arc is taking up as much as 10gigs... and when swap only had 4 gigs... this would exhaust the space. Another part of this is when swap is allocated, it is apparently never DE-allocated. So each time something brushes up against it... more is used because the previous allocation is not freed. So, this seems like a swap leak / bug ... though I don't know how to prove that. I did find a swap-in script which I guess I will run until the matter is resolved. A factor in this is swap is being used even though there's free memory available. The thing had 7 gigs of free memory AND 325MB of swap in use... which seems buggy to me.

So, solutions I took are increased swap to 32 gigs... moved it onto the SSD which should prevent slowdowns when swap gets used for whatever reason. The leak is slow so it would take a week to exhaust the larger allocation. I have a swap error detection script which should catch it if there is some catastrophic issue, and I've put the swap in script in place which I had to modify a little to work with the SSD. I anticipate this should keep the machine from locking up going into the foreseeable future. *fingers crossed*
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Maybe you understand this or maybe not but FreeBSD does not free up memory immediately all the time, actually it only frees up memory when it feels like it. Also I suspect that the 325MB of space being used was for something not needing to be in RAM at the time. 325MB is a small amount of space and I'd only be conserned with values above 1GB, but that is just me. I doubt increasing your swap space to 32GB will help but the move to the SSD should make your system operate smoother during the swapping of RAM. As I said, it's likely one of your jails causing the issue. Keep those fingers crossed!
 

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
@joeschmuck thank you, I believe I have singled out the culprit. I did a top -w ... and sorted by swap and discovered that squid running in one of the jails is periodically going nutz... not sure what to make of that... it goes for hours and then starts eating memory and swap... guess I'll start tweaking the settings. The odd thing is it's doing it when the other computers on the network shouldn't be hitting it. Ah the mysteries of IT... I work in computers all day programming... I don't want to do systems admin when get home... I just want my stuff to work. o_O
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I work in computers all day programming... I don't want to do systems admin when get home... I just want my stuff to work. o_O
I hear you on that one brother. Glad you found the culprit and good luck making it work properly.
 
Top