smbd memory leak

alpha754293 · Sep 1, 2022

mistermanko said:
The pie chart is just a representation of the what topis also showing, see:

I understand that, but there's got to be some way that the pie chart is able to discern between the different categories of usages for the RAM (i.e. free vs. ZFS cache vs. services).

Prior to rebooting the server, top showed that 13 GB wired mem was in use but the sum of the first 14 rows I think (output of

Code:

top -o res

) only works out to be 4329M out of the 13 GB wired that top is reporting. So I am not sure what else is using the remaining roughly 8.671 GB of RAM.

mistermanko said:
Can you share a log file where you've found that error message regarding smbd?

Here is what I've been able to find in /var/log/console.log:

The interesting thing about that is that the swap space usage is that at about the time when the system wrote those messages to the console, here is what the RAM and swap usage looked like:

You can see that it was never close to the 10 GiB maximum and yet, the console.log saws that quote "out of swap space".

2022-08-17 was the last time that I had to hard-reset the system, and until today, you can see that the RAM usage kept creeping up. In preparation for the migration and also in order to protect the system from having this issue again, I rebooted the server and I am currently in the process of backing up the data that's on the system prior to performing the update, just in case. (It's going to take a while to backup ~33 TB of data.)

I asked about the smbd issue because that's what the console tells me, even though, again, the plot clearly shows that I wasn't remotely close to running out of swap space, at least not according to the plot.

Your help is greatly appreciated.

Thank you.

mistermanko · Sep 2, 2022

The fact that your system has to grab swap is showing you that there is a problem. Mainly insufficient RAM. What's your use case? How many active samba users? How many smb shares? Are you aware that 16GB is the bare minimum that TrueNAS requires?

anodos · Sep 2, 2022

mistermanko said:
The fact that your system has to grab swap is showing you that there is a problem. Mainly insufficient RAM. What's your use case? How many active samba users? How many smb shares? Are you aware that 16GB is the bare minimum that TrueNAS requires?

smbd getting reaped by oom killer doesn't mean that it's to blame. "services" in the GUI doesn't mean sharing services. For instance it can include page cache IIRC. Your best bet is probably to update and see if it's still an issue.

mistermanko · Sep 2, 2022

anodos said:
smbd getting reaped by oom killer doesn't mean that it's to blame.

If it correlates with system freezes/crashes - which he said it does - I'd say it does.

anodos said:
"services" in the GUI doesn't mean sharing services.

I never said that.

alpha754293 · Sep 2, 2022

mistermanko said:
The fact that your system has to grab swap is showing you that there is a problem. Mainly insufficient RAM.

What's interesting though is that you can see from the memory plot that there is a fair bit of time where it DOESN'T need to grab swap and then it would increase relatively suddenly and dramatically.

My understanding about the system was that ZFS would release the ZFS cache when other things on the system need or want to use the memory.

So far, what I haven't been able to do is to be able to successfully complete an audit of the system in order to try and fully account for everything that's running under "services".

Based on the screenshots shown above, I am not sure why PID 297 python3.8 is taking up 1.5 GB of RAM for middlewared.

If I can't figure out what's "eating up" the RAM, then I can't really figure out why the system is grabbing swap.

My thought process is working backwards through a root cause analysis to try and figure out why "services" is consuming 13.3 GiB of RAM and maybe that's why it results in the system needing to grab swap.

Fix the RAM usage issue (if one exists), then I would fix the swap grab. (At least, in theory.)

mistermanko said:
What's your use case?

My system is a dumb file server.

There are no VMs, no jails, nothing else running on it.

I have a iSCSI target for my Steam library.

NFS is so that I can have my Linux systems connect to the server.

mistermanko said:
How many active samba users?

One. Me.

mistermanko said:
How many smb shares?

One.

mistermanko said:
Are you aware that 16GB is the bare minimum that TrueNAS requires?

No, I wasn't aware of that, but then again, I was just repurposing an old dual Xeon E5310 server to be my TrueNAS server, so it was whatever the system came with when I bought it.

anodos said:
smbd getting reaped by oom killer doesn't mean that it's to blame.

What I don't understand is that if it is printing out error messages to

Code:

/var/log/console.log

, but there's supposed to be 10 GiB of swap available, why would the system say that it is out of swap when it has, at peak, only used up 3.56 GiB out of 10 GiB of swap?

I must be missing something because 3.56 GiB swap used out of 10 GiB of swap available should not produce an "out of swap space" error. I don't understand why it is doing that. If the system was say at like 9.9 GiB swap used out of 10 GiB of swap available, then I can understand why it would be printing the "out of swap space" error message to

Code:

/var/log/console.log

.

But that's not the case here. So I don't really understand why it would be doing that at only 35.6% swap used, 64.4% swap free.

anodos said:
"services" in the GUI doesn't mean sharing services.

This is why I am trying to find out what all is in "services".

Code:

top

doesn't really seem to provide that answer as it lists only about 14 processes which only accounts for something like one-third of the total amount of RAM that's used by said "services".

anodos said:
For instance it can include page cache IIRC

So, I've implemented this script to page out what's in RAM to disk in order to prevent the kernel from crashing.

Script to pagein any used swap to prevent kernel crashes

I tested hot-swap on my server that I'm commisioning today, and the kernel crashed because of this known issue https://forums.freenas.org/index.php?threads/swap-with-9-10.42749/ In response I've written a script: # This script is designed to page in used swap on any device that has swap in...

www.truenas.com

anodos said:
Your best bet is probably to update and see if it's still an issue.

Yeah, I'm still in the middle of backing up said ~33 TB of data before I can safely run the upgrade, just in case if something goes horribly wrong with said upgrade.

And if that doesn't work, I am getting ready to prep my old Core i7-6700K (which has 64 GB of RAM) to be my new TrueNAS server instead of using the dual Xeon that's almost 15 years old (the processors alone) by now.

And whilst my Core i7-6700K would be an almost 8 year old processor by now, it's still better than two almost-15-year-old processors.

I'm still puzzled by why this error state appears to be happening in the first place.

Your help is greatly appreciated.

Thank you.

Important Announcement for the TrueNAS Community.

smbd memory leak

alpha754293

Dabbler

Attachments

mistermanko

Guru

anodos

Sambassador

mistermanko

Guru

alpha754293

Dabbler

Script to pagein any used swap to prevent kernel crashes

Similar threads

Important Announcement for the TrueNAS Community.

smbd memory leak

alpha754293

Dabbler

Attachments

mistermanko

Guru

anodos

Sambassador

mistermanko

Guru

alpha754293

Dabbler

Script to pagein any used swap to prevent kernel crashes

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "smbd memory leak"

Similar threads