runaway smbd memory usage upon upgrade to 8.3.0

Status
Not open for further replies.

Erik Carlson

Dabbler
Joined
Aug 2, 2012
Messages
19
Hello All,

I have an HP N40L microserver now with 16GB of RAM & 3 hard disks, 2 x 1TB drives which are mirrored, and 1 x 3TB drive which is on its own.

I have run the system on 8.0.4 without issue, and for most of the time on only 8GB RAM. However i have wanted some of the features provided in later releases. I previously tried to upgrade to freenas 8.2.0 but when i did I had issues where my physical memory would fill up, then swap would fill up and then the machine would die. It was discussed briefly in another thread, but i rolled back to 8.0.4 before we could spend time figuring out the problem.

I recently discovered that I could upgrade my machine to 16GB of RAM so i did that, and decided to give the upgrade another try, this time with 8.3.0.

Unfortunately I am seeing similar behaviour, only this time the system seems to get close to falling over (all physical memory and swap is full) and something then happens to reset memory usage before it rises again. Behold the graphs below for physical memory and swap usage:

physmem.png swap.png

top shows at the time of the graph snapshot:

last pid: 24559; load averages: 0.30, 0.27, 0.28 up 1+11:40:41 22:55:31
33 processes: 2 running, 31 sleeping
CPU: 3.9% user, 0.0% nice, 6.2% system, 0.0% interrupt, 89.8% idle
Mem: 5800M Active, 31M Inact, 1467M Wired, 620K Cache, 172M Buf, 8445M Free
Swap: 6144M Total, 51M Used, 6092M Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
21371 root 1 52 0 5743M 5712M select 0 46:08 15.19% smbd
17606 root 2 44 0 54432K 7116K select 1 11:29 0.10% python
2091 root 6 44 0 191M 87700K CPU1 1 10:49 0.00% python
2295 root 7 44 0 70304K 6672K ucond 1 1:11 0.00% collectd
17230 root 1 44 0 37696K 2940K select 0 0:08 0.00% afpd
1836 root 1 44 0 11672K 1880K select 1 0:06 0.00% ntpd
2774 root 1 76 0 90868K 16748K ttyin 1 0:02 0.00% python
23810 www 1 44 0 14400K 2568K kqread 1 0:02 0.00% nginx
2046 root 1 44 0 40340K 3904K select 1 0:01 0.00% nmbd
2206 avahi 1 44 0 16804K 2104K select 1 0:01 0.00% avahi-daem
2217 root 2 76 0 25208K 2784K select 0 0:01 0.00% afpd
1622 root 1 44 0 6784K 1044K select 1 0:01 0.00% syslogd
2048 root 1 44 0 47920K 5496K select 1 0:01 0.00% smbd
2415 root 1 76 0 7840K 492K nanslp 1 0:01 0.00% cron
17231 root 1 44 0 32416K 2992K select 1 0:00 0.00% cnid_dbd
2643 root 1 44 0 7844K 1008K select 0 0:00 0.00% rpcbind
2054 root 1 44 0 13292K 732K nanslp 1 0:00 0.00% smartd

if i were to sample again later you would see smbd's memory usage spiral.

the system doesn't do anything clever, it just serves media around the house for me and my family who are away (hence my having time to try the upgrade).

While it's good that the system doesn't fall over completely I need to get this fixed or I will have to rollback to 8.0.4 and restore my old config.

I have not upgraded my zfs pools to the new version , i am not using autotune, dedup, or anything clever (as far as i know), the conifguration is currently pretty much untouched from after the upgrade apart from a minor tweak discussed in another thread to restore 2GB of the swap that wasn't being picked up.

Anyone have any ideas please?

Thanks
Erik
 

Erik Carlson

Dabbler
Joined
Aug 2, 2012
Messages
19
so the kernel log confirms that smbd exhausts all memory , then swap and so the system kills it from whence the usage starts to climb again.

+swap_pager_getswapspace(2): failed
+swap_pager_getswapspace(2): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(2): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(2): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(4): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(16): failed
+swap_pager_getswapspace(1): failed
+pid 13117 (smbd), uid 1001, was killed: out of swap space

I can't find anything else about this on the forums or googling the web at large. I cannot believe this is just me, my setup is quite unremarkable.
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
Not seeing smbd overusing RAM here, on an N36L with 5gig or another with just 1gig. (I believe the N40L is the same beast except for a slightly higher speed CPU)

That implies that it's happening due to a config setting (possibly tickling a bug). Could you should post up your shares setups?
 

Erik Carlson

Dabbler
Joined
Aug 2, 2012
Messages
19
Not seeing smbd overusing RAM here, on an N36L with 5gig or another with just 1gig. (I believe the N40L is the same beast except for a slightly higher speed CPU)

That implies that it's happening due to a config setting (possibly tickling a bug). Could you should post up your shares setups?

Hi Jaimie,

Thanks for your reply. Yes, the system worked perfectly for almost a year with 8.0.3/8.0.4, so I agree it is likely some part of config. I haven't changed the configuration at all since the upgrade apart from the missing swap drive fix mentioned above.

My 2 x 1TB drives are mirrored. my 1 x 2TB drive stands alone.

I have one username which has full access to everything on both disks and this is used by all devices on my LAN which access the NAS.

I have enabled AFP and CIFS for both shares. Other than that the only services switched on are SMART and SSH.

I hope this answers your question, apologies if you were expecting the output of some command. You may have to spell it out for me!

Thanks
Erik
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
That all sounds pretty minimal and unlikely to cause issues. I have very similar here.

What do you have in the CIFS share setup panel? The one you get when clicking on a share under Sharing/Windows Shares. Mine are
Name (the name)
Comment (empty)
Path (the path)
Export RO (no)
Browseable (yes)
Inherit owner (no)
Inherit perms (no)
Export bin (no)
Show hidden (no)
Allow guest (no)
Only allow guest (no)
Then in Advanced Mode,
Hosts allow (empty)
Hosts deny (empty)
Auxiliary parameters - you'll like this, since you have mixed AFP and Windows networking: it hides all those .Apple* files and folders from the Windows machines.
veto files = /Temporary Items/.DS_Store/.AppleDB/.TemporaryItems/.AppleDouble/.bin/.AppleDesktop/Network Trash Folder/.Spotlight/.Trashes/.fseventd/
delete veto files = yes
hide dot files = yes​
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Are you sharing the same files through AFP and CIFS? That's not recommended as it can result in file corruption. It could be that Samba is getting stuck in some kind of feedback loop when AFP is also being used. I assume if you turn off CIFS when the system is using swap space it will suddenly go away? What happens if you turn off AFP when the system is using swapspace?
 

Erik Carlson

Dabbler
Joined
Aug 2, 2012
Messages
19
That all sounds pretty minimal and unlikely to cause issues. I have very similar here.

What do you have in the CIFS share setup panel? The one you get when clicking on a share under Sharing/Windows Shares. Mine are
Name (the name)
Comment (empty)
Path (the path)
Export RO (no)
Browseable (yes)
Inherit owner (no)
Inherit perms (no)
Export bin (no)
Show hidden (no)
Allow guest (no)
Only allow guest (no)
Then in Advanced Mode,
Hosts allow (empty)
Hosts deny (empty)
Auxiliary parameters - you'll like this, since you have mixed AFP and Windows networking: it hides all those .Apple* files and folders from the Windows machines.
veto files = /Temporary Items/.DS_Store/.AppleDB/.TemporaryItems/.AppleDouble/.bin/.AppleDesktop/Network Trash Folder/.Spotlight/.Trashes/.fseventd/
delete veto files = yes
hide dot files = yes​

Hi Jamie,

Thanks for the reply. For the two mirrored 1TB drives it looks like:

Name: share
Comment: empty
Path: /mnt/share
Export Read Only: unticked
Browseable to Network Clients: ticked
Inherit Owner: unticked
Inherit Permissions: unticked
Export Recycle Bin: unticked
Show Hidden Files: unticked
Allow Guest Access: unticked
Only Allow Guest Access: unticked
Hosts Allow: empty
Hosts Deny: empty
Auxiliary Parameters: was empty but i have added your useful parameters.

for the 3TB standalone drive parameters are as above, except for:

Name: 3TB_Share
Path: /mnt/NASVOL3TB

So looks like we are very similar , so why am I being picked on ;(

Thanks
Erik
 

Erik Carlson

Dabbler
Joined
Aug 2, 2012
Messages
19
Are you sharing the same files through AFP and CIFS? That's not recommended as it can result in file corruption. It could be that Samba is getting stuck in some kind of feedback loop when AFP is also being used. I assume if you turn off CIFS when the system is using swap space it will suddenly go away? What happens if you turn off AFP when the system is using swapspace?

I'd say your setup is actually pretty "remarkable" if you are sharing the same files with both AFP and CIFS.

Hi noobsauce,

Thanks for the reply.

Yes I am sharing the same files through AFP and CIFS. I didn't realise this was a bad idea, i did it on 3 generations of buffalo linkstation which the readynas box has replaced, and as I said it has worked fine for almost a year under 8.0.x .I have had another look through the wiki and see that the reason for the non-recommendation is to do with file locking. In my setup i only need to be able to write from AFP, so i have now set the flag to make the CIFS shares read only hope that is sufficient to pretect me from the corruption you mention.

I have tried turning off AFP, but unfortunately memory usage continues to increase. we've had two more cycles of maxing out memory and smbd being restarted already today. Here is top with no afp running.

last pid: 33939; load averages: 0.24, 0.33, 0.32 up 2+01:19:52 12:34:42
30 processes: 1 running, 29 sleeping
CPU: 4.7% user, 0.0% nice, 9.0% system, 0.0% interrupt, 86.3% idle
Mem: 12G Active, 33M Inact, 1571M Wired, 452K Cache, 172M Buf, 1741M Free
Swap: 6144M Total, 43M Used, 6101M Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
27925 root 1 55 0 12335M 12316M select 1 98:09 20.17% smbd
17606 root 2 44 0 54432K 7664K select 1 11:46 0.20% python
2091 root 6 44 0 193M 91516K uwait 0 12:18 0.10% python
2295 root 7 44 0 70304K 6656K ucond 1 1:40 0.00% collectd
1836 root 1 44 0 11672K 1880K select 1 0:08 0.00% ntpd
2774 root 1 76 0 90868K 16748K ttyin 1 0:02 0.00% python
2046 root 1 44 0 40340K 3904K select 0 0:02 0.00% nmbd
1622 root 1 44 0 6784K 1044K select 0 0:01 0.00% syslogd
2415 root 1 76 0 7840K 492K nanslp 0 0:01 0.00% cron
2048 root 1 44 0 47920K 5496K select 1 0:01 0.00% smbd
17231 root 1 44 0 32416K 3360K select 1 0:00 0.00% cnid_dbd
31483 www 1 44 0 14400K 2580K kqread 0 0:00 0.00% nginx
2643 root 1 44 0 7844K 1008K select 1 0:00 0.00% rpcbind
2054 root 1 44 0 13292K 1000K nanslp 1 0:00 0.00% smartd
2153 root 1 44 0 14400K 1532K pause 0 0:00 0.00% nginx
2052 root 1 44 0 47920K 5008K select 1 0:00 0.00% smbd
33939 root 1 44 0 9240K 1792K CPU1 0 0:00 0.00% top

Hope this is helpful. If i don't crack this by tomorrow night i will have to roll back to 8.0.4 again :(

Really hoping this gives you, or someone else who knows this better than me, a clue!

To answer your other question, if I turn CIFS off, even if I leave AFP on then memory usage comes right back down again:

last pid: 34442; load averages: 0.35, 0.37, 0.33 up 2+01:23:37 12:38:27
29 processes: 1 running, 28 sleeping
CPU: 2.1% user, 0.0% nice, 0.0% system, 0.2% interrupt, 97.7% idle
Mem: 126M Active, 23M Inact, 1534M Wired, 416K Cache, 172M Buf, 14G Free
Swap: 6144M Total, 38M Used, 6105M Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
33942 root 2 44 0 54432K 13516K select 1 0:03 0.10% python
2091 root 6 44 0 193M 91580K uwait 0 12:21 0.00% python
2295 root 7 44 0 70304K 6644K ucond 1 1:40 0.00% collectd
1836 root 1 44 0 11672K 1880K select 1 0:08 0.00% ntpd
2774 root 1 76 0 90868K 16748K ttyin 1 0:02 0.00% python
2046 root 1 44 0 40340K 3904K select 1 0:02 0.00% nmbd
1622 root 1 44 0 6784K 1044K select 1 0:01 0.00% syslogd
34187 root 1 76 0 25208K 3692K select 1 0:01 0.00% afpd
2415 root 1 76 0 7840K 492K nanslp 1 0:01 0.00% cron
17231 root 1 44 0 32416K 3360K select 1 0:00 0.00% cnid_dbd
31483 www 1 44 0 14400K 2580K kqread 1 0:00 0.00% nginx
2643 root 1 44 0 7844K 1008K select 0 0:00 0.00% rpcbind
2054 root 1 44 0 13292K 1000K nanslp 1 0:00 0.00% smartd
2153 root 1 44 0 14400K 1532K pause 0 0:00 0.00% nginx
34436 root 1 44 0 9240K 1796K CPU0 0 0:00 0.00% top
34402 avahi 1 46 0 16804K 2460K select 0 0:00 0.00% avahi-daem
34435 root 1 44 0 7024K 2424K wait 0 0:00 0.00% bash

Thanks
Erik
 

Erik Carlson

Dabbler
Joined
Aug 2, 2012
Messages
19
If it helps this is the entire contents of the log.smbd file, it looks like the logs were emptied when i did the upgrade:
[2013/01/04 11:16:01, 0] smbd/server.c:1053(main)
smbd version 3.6.7 started.
Copyright Andrew Tridgell and the Samba Team 1992-2011
[2013/01/04 11:16:01.606617, 1] smbd/files.c:218(file_init)
file_init: Information only: requested 16384 open files, 11075 are available.
[2013/01/05 04:00:09.598884, 1] smbd/service.c:1114(make_connection_snum)
10.0.10.2 (10.0.10.2) connect to service share initially as user erik (uid=1001, gid=0) (pid 13117)
[2013/01/05 17:54:07.397857, 1] smbd/server.c:319(remove_child_pid)
Scheduled cleanup of brl and lock database after unclean shutdown
[2013/01/05 17:54:07.534801, 1] smbd/service.c:1114(make_connection_snum)
10.0.10.2 (10.0.10.2) connect to service share initially as user erik (uid=1001, gid=0) (pid 21370)
[2013/01/05 17:54:08.811112, 1] smbd/service.c:1378(close_cnum)
10.0.10.2 (10.0.10.2) closed connection to service share
[2013/01/05 17:54:09.588775, 1] smbd/service.c:1114(make_connection_snum)
10.0.10.2 (10.0.10.2) connect to service share initially as user erik (uid=1001, gid=0) (pid 21371)
[2013/01/05 17:54:27.400563, 1] smbd/server.c:272(cleanup_timeout_fn)
Cleaning up brl and lock database after unclean shutdown
[2013/01/06 04:00:22.939791, 1] smbd/service.c:1378(close_cnum)
10.0.10.2 (10.0.10.2) closed connection to service share
[2013/01/06 04:00:23.704075, 1] smbd/service.c:1114(make_connection_snum)
10.0.10.2 (10.0.10.2) connect to service share initially as user erik (uid=1001, gid=0) (pid 27925)
[2013/01/06 12:37:49.409255, 1] smbd/service.c:1378(close_cnum)
10.0.10.2 (10.0.10.2) closed connection to service share
[2013/01/06 12:39:48, 0] smbd/server.c:1053(main)
smbd version 3.6.7 started.
Copyright Andrew Tridgell and the Samba Team 1992-2011
[2013/01/06 12:39:48.236977, 1] smbd/files.c:218(file_init)
file_init: Information only: requested 16384 open files, 11075 are available.
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
In my setup i only need to be able to write from AFP, so i have now set the flag to make the CIFS shares read only hope that is sufficient to pretect me from the corruption you mention.

It will, yes - though occasionally you may write files over AFP that don't show up in a Windows browser without a refresh. I have been sharing single locations both AFP and CIFS always, and have never had any issues.

Can't help with the memory problem though, all this setup and the logs you've posted seem fine. Given your setup seems pleasantly simple, it may be worth running off a new 8.3.0 stick and reconfiguring from scratch - should only take ten minutes, and if it fixes the problem then that's a win.

Perhaps saving the config first and including it while logging this as a bug would be handy.
 

Erik Carlson

Dabbler
Joined
Aug 2, 2012
Messages
19
It will, yes - though occasionally you may write files over AFP that don't show up in a Windows browser without a refresh. I have been sharing single locations both AFP and CIFS always, and have never had any issues.

Can't help with the memory problem though, all this setup and the logs you've posted seem fine. Given your setup seems pleasantly simple, it may be worth running off a new 8.3.0 stick and reconfiguring from scratch - should only take ten minutes, and if it fixes the problem then that's a win.

Perhaps saving the config first and including it while logging this as a bug would be handy.

Hi Jamie,

Thanks again for your reply. There don't seem to be any other suggestions and I'm out of time so I want to try your ideas, but I'm afraid I have some newbie questions if you wouldn't mind.

1. I am happy to post the bug seperately including config files from both 8.0.4 before the upgrade and 8.3.0 after upgrading the earlier install, however looking through the files I am a bit concerned that they may contain recoverable passwords, or other security information. Do you have any suggestions for editing the file without rendering it useless for addressing the bug? Am I being paranoid?

2. I am happy to do the new install and redo the config importing the old volumes. but I am a little nervous about screwing up my data. I have a fairly recent backup but backing up that much data takes more time than I have. I looked at the pages at http://doc.freenas.org/index.php/Volumes, but these seem to be incomplete and the illustrations referred to in the text seem to be missing. Is Storage → Volumes → Auto Import Volume fairly easy to use ? anything I should look out for to avoid trashing my data?

3. Do you think it's worth me trying the "factory restore" and then reconfiguring before doing a fresh install?

Thanks
Erik
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
1. It will include the passwords you have set for users and shares, but you can save config, change passwords, save config to send, reload first config.

2. Auto Import does indeed Just Work. You should update the backup first though - set up an rsync task and iit'll only update the differences, not the whole lot.

3. Should be the same thing - factory restore wipes all the user settings and takes you back to a newly-installed stick. I'd burn a new stick though, since you never know - it might even be a problem with the actual USB stick having a slightly corrupted install.
 

Erik Carlson

Dabbler
Joined
Aug 2, 2012
Messages
19
1. It will include the passwords you have set for users and shares, but you can save config, change passwords, save config to send, reload first config.

2. Auto Import does indeed Just Work. You should update the backup first though - set up an rsync task and iit'll only update the differences, not the whole lot.

3. Should be the same thing - factory restore wipes all the user settings and takes you back to a newly-installed stick. I'd burn a new stick though, since you never know - it might even be a problem with the actual USB stick having a slightly corrupted install.

Thanks Jamie,

1. I can't go back and clean up the passwords in the original 8.0.4 config because i don't have a working 8.0.4 system any more, I only have the backup i took before the upgrade. I will clean the passwords in the 8.3.0 config as you suggest.

2 & 3, will do, have just picked up a new memory stick.

Thanks for your help! I will post whether the new install is successful.

Erik
 
Joined
Mar 14, 2013
Messages
6
Hi All
Not sure if I should have just started a new thread as this one died without resolution.
Everything works like a champ with 4 Mobotix cams writing to my freenas except the climbing memory usage. New system with 8.3.0 any thoughts would be greatly appreciated.
Memory.JPG
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Start a new thread and provide detailed information about your hardware and how you've configured your disk storage.

Not sure if I should have just started a new thread
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
Also the climbing memory usage is pretty normal - ZFS will max out your RAM over time and handle the reallocation. The weird thing about the OPs system was the resulting crash and usage of Swap.
 
Status
Not open for further replies.
Top