Memory management (OOM causing middlewared to be killed)

jenksta

Dabbler
Joined
Sep 4, 2022
Messages
10
So been having an issue on SCALE for a while where I would get random issues where apps would end or the web UI would go down. After some investigation today I've noticed in my logs that due to being 'out of memory' the system is killing processes to free up more (admittedly I only have a measly 16gb of RAM but don't have that much running as just a small home server primarily used for plex streaming - need to see if there's a memory leak causing excessive usage but for now I'm just going to install more RAM).

I've noticed that the reason I cannot get back into the Web UI sometimes when this happens (temp fix is SSH in and restart middlewared and nginx service) is that the 'out of memory' handler is repeatedly killing middlewared - is this desired behaviour?
 

MBlais13

Cadet
Joined
Dec 27, 2021
Messages
2
I am seeing this in my dell r710 currently. While trying to wipe two of my 8TB SAS drives it froze the truenas GUI and started killing processes.
Not sure how to resolve this, I guess ill leave it on for a few days and see if it does its thing :rolleyes:
memkill.png
 

grigory

Dabbler
Joined
Dec 22, 2022
Messages
32
I am seeing this in my dell r710 currently. While trying to wipe two of my 8TB SAS drives it froze the truenas GUI and started killing processes.
Not sure how to resolve this, I guess ill leave it on for a few days and see if it does its thing :rolleyes:
View attachment 69598
Hey, you may want to check this thread as well:
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
So been having an issue on SCALE for a while where I would get random issues where apps would end or the web UI would go down. After some investigation today I've noticed in my logs that due to being 'out of memory' the system is killing processes to free up more (admittedly I only have a measly 16gb of RAM but don't have that much running as just a small home server primarily used for plex streaming - need to see if there's a memory leak causing excessive usage but for now I'm just going to install more RAM).

I've noticed that the reason I cannot get back into the Web UI sometimes when this happens (temp fix is SSH in and restart middlewared and nginx service) is that the 'out of memory' handler is repeatedly killing middlewared - is this desired behaviour?
So, if you're SSHing in, then, you should be able to determine which process(es) are consuming all the memory via top or other tools before doing the middlewared restart. Have you done so? Which process(es)?
 

MBlais13

Cadet
Joined
Dec 27, 2021
Messages
2
So, if you're SSHing in, then, you should be able to determine which process(es) are consuming all the memory via top or other tools before doing the middlewared restart. Have you done so? Which process(es)?
Looking at my grafana stats, when I started wiping my 8 TB SAS drives it started using my ram as a cache. The ARC started to consume all of my memory instead of leaving room for the middleware processes causing them to 'run out of memory'.
 

grigory

Dabbler
Joined
Dec 22, 2022
Messages
32
I have also had a tricky time figuring out what causes oom-killer to start. It's easy to see what is using a lot of ram, it's not easy to see which process caused oom-killer to start killing the things with the worst oom score.
I dont have any real time monitoring set up, so it's hard to zero in on the time that oom killer starts and find out what was happening.
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
Looking at my grafana stats, when I started wiping my 8 TB SAS drives it started using my ram as a cache. The ARC started to consume all of my memory instead of leaving room for the middleware processes causing them to 'run out of memory'.

I was of course speaking to the OP as he said he is on restarting middlewared while it is happening, at least the way I read ir, that is the most useful time as commands can be done. ZFS is designed to ideally consume all free ram, though on Scale it is currently not doing so. When other apps put memory pressure on ZFS, it releases ram. So, that ZFS is using "all my memory" that is expected. Really need to see what else is consuming memory as otherwise there is not much to go on. I had read the other thread a while back. I am an Emby user and also do transcoding, never causes an issue for me. But something is obviously.

What I would be curious to see (now, not when it's happening) is results of the two commands below. Just out of my curiosity. I've seen zfs indirectly causing oom-killer to run under certain conditions and I'm curious.

cat /proc/spl/kstat/zfs/arcstats | grep c_

and

free -m
 
Top