Samba using up most of the RAM

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
I think that I experience the same problem, so far I have tested on 11.1, 11.1-U1, 11.1-U2.

FreeNAS is running as a Fileserver for a small company with active directory. Zeroconf is unticked. Every new SMB connection uses about 150MB of ram, but after some activity it grows to ~2GB per user.
View attachment 23051
View attachment 23052

The memory is freed again after the connection is closed, but at the end of each day there are only 20GB of the 50GB ram left for the ARC.

View attachment 23053

Can you please send me a private message with a debug file. System -> Advanced -> Save Debug.
 

BaT

Explorer
Joined
Jun 16, 2017
Messages
62
Can you please send me a private message with a debug file. System -> Advanced -> Save Debug.

Or better open (yet another) ticket in bugtracking system with the debug log attached.

We have numerous reports already attached to the https://redmine.ixsystems.com/issues/28585, but still unable to reproduce this memory leakage in the test environment. So, all details regarding configuration and, in particular - the activities, performed by the users for whom smbd takes a lot of memory and their directory structure are of a great help for us.

If you can reliably reproduce the growth of the processes in your environment - it could be feasible to run audit module and analyze exact operations which were performed. Also, we can supply a memory debug version of Samba which can track down the offending parts of the code(but that would have performance penalties, of course).
 

m7ed

Cadet
Joined
Feb 26, 2018
Messages
2
Thanks for your quick response. I tried to submit a bug report in the FreeNAS UI, but I only got an "Invalid proxy server response" error. Guess I am doing something wrong, so I have sent the debug file to anodos via PM.

Most users work with AutoCAD (usually with many referenced files), Photoshop, Office and Outlook (a pst file as archive is saved on FreeNAS). Furthermore Folder Redirection is enabled for Desktop and My Documents. SBS2011 is used as domain controller.

I can reproduce the growth of the smbd process by examining the properties of a large folder (170GB, 135k files, 10k subfolders) with the Windows 10 Explorer. While it counts the number of files the momory usage of the smbd process slowly rises, about 20MB per minute. I can repeat this even with the same folder over and over and the smbd process becomes larger.

Your suggestion to use an audit module/debug version of Samba sounds promising. Please let me know if I can be of any help.
 

Morpheus187

Explorer
Joined
Mar 11, 2016
Messages
61
I've got a hint from anodos to do the following:

Services->SMB:
1) Uncheck Unix Extensions
2) Add auxiliary parameter wide links = yes

For me this change helped, smbd is not using ridiculous amounts of memory anymore and stays at around 163 MB, even after 30 minutes of heavy file transfer.
 

michael.samer

Dabbler
Joined
Feb 19, 2018
Messages
21
Hi anodos
when will the next update be available (11.1U5 is planned for about 30.4.2018)? Is there a workaround/fix currently available, like pfsense is using x.y.z-p1...?
Cheers
Michael
 
Last edited:

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Hi anodos
when will the next update be available (11.1U5 is planned for about 30.4.2018)? Is there a workaround/fix currently available, like pfsense is using x.y.z-p1...?
Cheers
Michael

The workaround is to disable Unix Extensions under "services"->"SMB", then set the auxiliary parameter "widelinks = yes" under "services"->"SMB".

11.1U5 will contain the fix.
 

michael.samer

Dabbler
Joined
Feb 19, 2018
Messages
21
Hi anodos
Darn! We already have these settings in use (and rebootet) since 10 days or so, but to no avail: the mem leak is still there (as well the crashes). Should I open a new redmine ticket for this case?
Cheers
Michael
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Also, if you don't mind, use top to find an smbd process that is exhibiting the memory leak, then target it with the following dtrace script:
Code:
#!/usr/sbin/dtrace -s
pid$1:::entry
{ @hist[probefunc,probemod] = count ();}

You can do this as follows:
1) type the following command nano /tmp/dtrace.dt (or whatever text editor you prefer)
2) paste the above lines into the text editor and save.
3) chmod +x /tmp/dtrace.dt
4) /tmp/dtrace.dt <pid of smdb process> -o /tmp/pidhist.txt e.g. /tmp/dtrace.dt 8765 -o /tmp/pidhist.txt
5) Let it run for about 3-5 seconds while the process is actively leaking, then hit <ctrl +c> to kill the script.

Once you've done that, download the output file and upload it here. This will give me a list of internal samba functions that are being called on your system, but give me the smb4.conf file first :)
 

michael.samer

Dabbler
Joined
Feb 19, 2018
Messages
21
Hi anodos
just seen that you continued:

[global]
interfaces = 127.0.0.1 192.168.1.2 192.168.80.2
bind interfaces only = yes
encrypt passwords = yes
dns proxy = no
strict locking = no
oplocks = yes
deadtime = 15
max log size = 51200
max open files = 942599
logging = file
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes
getwd cache = yes
guest account = nobody
map to guest = Bad User
obey pam restrictions = yes
ntlm auth = no
directory name cache size = 0
kernel change notify = no
panic action = /usr/local/libexec/samba/samba-backtrace
nsupdate command = /usr/local/bin/samba-nsupdate -g
server string = AEV Filer
ea support = yes
store dos attributes = yes
lm announce = yes
unix extensions = no
acl allow execute always = true
dos filemode = yes
multicast dns register = no
domain logons = no
idmap config *: backend = tdb
idmap config *: range = 90000001-100000000
server role = member server
workgroup = DEVNET
realm = DEVNET.AEV
security = ADS
client use spnego = yes
local master = no
domain master = no
preferred master = no
ads dns update = yes
winbind cache time = 7200
winbind offline logon = yes
winbind enum users = yes
winbind enum groups = yes
winbind nested groups = yes
winbind use default domain = yes
winbind refresh tickets = yes
idmap config DEVNET: backend = rid
idmap config DEVNET: range = 20000-90000000
allow trusted domains = no
client ldap sasl wrapping = plain
template shell = /bin/sh
template homedir = /home/%D/%U
netbios name = DEVNETNAS
netbios aliases = FILER
create mask = 0666
directory mask = 0777
client ntlmv2 auth = yes
dos charset = CP1250
unix charset = UTF-8
log level = 2
wide links = yes


[IT$]
path = "/mnt/SAN2LV0_500TB"
comment = Replikationsshare SAN2LV0
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = no
access based share enum = no
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-12m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr
hide dot files = no
hosts allow = 192.168.1.0/24
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[LV0$]
path = "/mnt/SANLV0_500TB"
comment = Admin SAN1 LUN0 Share
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = no
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
vfs objects = zfs_space zfsacl streams_xattr recycle
hide dot files = no
hosts allow = 192.168.1.0/24
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[checkpointdata]
path = "/mnt/SANLV0_500TB/DataSet3_5TiB/aev21"
comment = CheckPointData
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[checkpointdaten]
path = "/mnt/SANLV0_500TB/DataSet3_5TiB/aev21"
comment = CheckPointDaten
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[datasets]
path = "/mnt/SANLV0_500TB/DataSet2_50TiB/aev21"
comment = Datasets
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[datensets]
path = "/mnt/SANLV0_500TB/DataSet2_50TiB/aev21"
comment = Datensets
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[exchange]
path = "/mnt/SANLV0_500TB/DataSet5_Exchange"
comment = Exchange
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = no
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
vfs objects = zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[rawdata]
path = "/mnt/SANLV0_500TB/DataSet1_200TiB/aev21"
comment = Rawdata
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[rohdaten]
path = "/mnt/SANLV0_500TB/DataSet1_200TiB/aev21"
comment = Rohdaten
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[smartcar]
path = "/mnt/SANLV0_500TB/DataSet4_80TiB"
comment = smartcar Poppinga
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Hi anodos
just seen that you continued:

[global]
interfaces = 127.0.0.1 192.168.1.2 192.168.80.2
bind interfaces only = yes
encrypt passwords = yes
dns proxy = no
strict locking = no
oplocks = yes
deadtime = 15
max log size = 51200
max open files = 942599
logging = file
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes
getwd cache = yes
guest account = nobody
map to guest = Bad User
obey pam restrictions = yes
ntlm auth = no
directory name cache size = 0
kernel change notify = no
panic action = /usr/local/libexec/samba/samba-backtrace
nsupdate command = /usr/local/bin/samba-nsupdate -g
server string = AEV Filer
ea support = yes
store dos attributes = yes
lm announce = yes
unix extensions = no
acl allow execute always = true
dos filemode = yes
multicast dns register = no
domain logons = no
idmap config *: backend = tdb
idmap config *: range = 90000001-100000000
server role = member server
workgroup = DEVNET
realm = DEVNET.AEV
security = ADS
client use spnego = yes
local master = no
domain master = no
preferred master = no
ads dns update = yes
winbind cache time = 7200
winbind offline logon = yes
winbind enum users = yes
winbind enum groups = yes
winbind nested groups = yes
winbind use default domain = yes
winbind refresh tickets = yes
idmap config DEVNET: backend = rid
idmap config DEVNET: range = 20000-90000000
allow trusted domains = no
client ldap sasl wrapping = plain
template shell = /bin/sh
template homedir = /home/%D/%U
netbios name = DEVNETNAS
netbios aliases = FILER
create mask = 0666
directory mask = 0777
client ntlmv2 auth = yes
dos charset = CP1250
unix charset = UTF-8
log level = 2
wide links = yes


[IT$]
path = "/mnt/SAN2LV0_500TB"
comment = Replikationsshare SAN2LV0
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = no
access based share enum = no
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-12m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr
hide dot files = no
hosts allow = 192.168.1.0/24
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[LV0$]
path = "/mnt/SANLV0_500TB"
comment = Admin SAN1 LUN0 Share
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = no
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
vfs objects = zfs_space zfsacl streams_xattr recycle
hide dot files = no
hosts allow = 192.168.1.0/24
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[checkpointdata]
path = "/mnt/SANLV0_500TB/DataSet3_5TiB/aev21"
comment = CheckPointData
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[checkpointdaten]
path = "/mnt/SANLV0_500TB/DataSet3_5TiB/aev21"
comment = CheckPointDaten
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[datasets]
path = "/mnt/SANLV0_500TB/DataSet2_50TiB/aev21"
comment = Datasets
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[datensets]
path = "/mnt/SANLV0_500TB/DataSet2_50TiB/aev21"
comment = Datensets
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[exchange]
path = "/mnt/SANLV0_500TB/DataSet5_Exchange"
comment = Exchange
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = no
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
vfs objects = zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[rawdata]
path = "/mnt/SANLV0_500TB/DataSet1_200TiB/aev21"
comment = Rawdata
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[rohdaten]
path = "/mnt/SANLV0_500TB/DataSet1_200TiB/aev21"
comment = Rohdaten
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare


[smartcar]
path = "/mnt/SANLV0_500TB/DataSet4_80TiB"
comment = smartcar Poppinga
printable = no
veto files = /.snapshot/.windows/.mac/.zfs/
writeable = yes
browseable = yes
access based share enum = yes
recycle:repository = .recycle/%U
recycle:keeptree = yes
recycle:versions = yes
recycle:touch = yes
recycle:directory_mode = 0777
recycle:subdir_mode = 0700
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:localtime = yes
shadow:format = auto-%Y%m%d.%H%M-6m
shadow:snapdirseverywhere = yes
vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr recycle
hide dot files = no
guest ok = no
nfs4:mode = special
nfs4:acedup = merge
nfs4:chown = true
zfsacl:acesort = dontcare
Also post output of ps aux | grep smbd

I haven't had coffee yet so these are the first things to come to mind.

What is the indication that you have a memory leak? vfswrap_getcwd() definitely leaks memory, and there may be some usage patterns that tickle the bug more than others. In my testing environment with those parm changes, I was still calling that function occasionally, but not thousands of times like with them on defaults.
 
Last edited:

michael.samer

Dabbler
Joined
Feb 19, 2018
Messages
21
Here's the trace.
Curretnly only a testtransfer (slow) is running in the background as my only main user /via rsync transfer of Files in TB range) already jumped the gun (stability) and changed back to the Windows AD storage.
 

Attachments

  • pidhist.txt
    93.4 KB · Views: 394

michael.samer

Dabbler
Joined
Feb 19, 2018
Messages
21
xavier.ricou 92548 57.0 0.7 280608 247408 - R 13:52 8:38.20 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
root 7257 0.0 0.4 190412 147768 - S 14:07 0:00.04 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
root 83049 0.0 0.4 175140 145020 - Ss 23:00 0:03.86 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
root 83065 0.0 0.3 130608 102408 - S 23:00 1:29.90 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
root 83066 0.0 0.3 132160 103960 - S 23:00 0:00.11 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
root 7259 0.0 0.0 6696 2596 0 S+ 14:07 0:00.00 grep smbd
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
xavier.ricou 92548 57.0 0.7 280608 247408 - R 13:52 8:38.20 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
root 7257 0.0 0.4 190412 147768 - S 14:07 0:00.04 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
root 83049 0.0 0.4 175140 145020 - Ss 23:00 0:03.86 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
root 83065 0.0 0.3 130608 102408 - S 23:00 1:29.90 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
root 83066 0.0 0.3 132160 103960 - S 23:00 0:00.11 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
root 7259 0.0 0.0 6696 2596 0 S+ 14:07 0:00.00 grep smbd
Post output of top
 

michael.samer

Dabbler
Joined
Feb 19, 2018
Messages
21
last pid: 24721; load averages: 0.48, 0.69, 0.74 up 1+05:03:14 14:27:02
55 processes: 2 running, 53 sleeping
CPU: 16.1% user, 0.0% nice, 4.0% system, 0.2% interrupt, 79.7% idle
Mem: 273M Active, 425M Inact, 171M Laundry, 19G Wired, 11G Free
ARC: 16G Total, 1100M MFU, 14G MRU, 12M Anon, 106M Header, 785M Other
15G Compressed, 31G Uncompressed, 2.12:1 Ratio
Swap: 34G Total, 301M Used, 34G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
24662 root 1 78 0 22508K 18376K CPU1 1 0:02 93.43% python3.6
92548 root 1 52 0 312M 280M select 4 19:26 63.26% smbd
83065 root 1 20 0 128M 100M select 0 1:31 0.13% smbd
10981 root 18 29 0 46356K 14820K uwait 3 2:44 0.13% consul
228 root 22 20 0 200M 112M kqread 6 11:08 0.05% python3.6
10364 root 2 20 0 14392K 3888K select 0 0:38 0.03% vmtoolsd
96617 root 1 20 0 13776K 5928K select 3 0:01 0.02% sshd
83061 root 1 20 0 47976K 21428K select 4 0:08 0.00% winbindd
10950 root 1 20 0 147M 15728K kqread 6 0:08 0.00% uwsgi
10149 root 1 20 0 10336K 10436K select 3 0:03 0.00% ntpd
83094 root 1 20 0 93800K 65332K select 0 5:01 0.00% winbindd
11079 root 12 20 0 99712K 12224K nanslp 7 2:28 0.00% collectd
10938 root 1 22 0 101M 70672K select 6 1:08 0.00% python3.6
11965 root 15 20 0 226M 96484K umtxn 2 0:13 0.00% uwsgi
9721 root 2 20 0 23460K 5212K kqread 2 0:10 0.00% syslog-ng
10974 root 19 20 0 52440K 7536K uwait 0 0:07 0.00% consul-alerts
9522 root 1 20 0 9176K 1044K select 4 0:06 0.00% devd
83049 root 1 20 0 171M 142M select 4 0:05 0.00% smbd
10530 root 1 52 0 12908K 4940K select 3 0:03 0.00% sshd
12116 root 1 20 0 72708K 0K wait 5 0:02 0.00% <python3.6>
12191 root 17 20 0 33936K 7664K uwait 5 0:02 0.00% consul
11877 root 1 20 0 9004K 2888K select 6 0:01 0.00% zfsd
83044 root 1 20 0 37096K 15964K select 3 0:01 0.00% nmbd
96619 root 1 20 0 9792K 3892K pause 0 0:01 0.00% csh
83073 root 1 20 0 50864K 22220K select 1 0:00 0.00% winbindd
92962 root 1 32 0 7096K 1132K wait 4 0:00 0.00% sh
11891 root 1 20 0 6760K 836K nanslp 2 0:00 0.00% cron
9850 root 1 -52 r0 3520K 3584K nanslp 6 0:00 0.00% watchdogd
37638 www 1 20 0 31452K 6640K kqread 5 0:00 0.00% nginx
12190 root 9 25 0 32528K 6300K uwait 4 0:00 0.00% consul
83066 root 1 20 0 129M 102M select 3 0:00 0.00% smbd
24660 root 1 21 0 186M 145M select 2 0:00 0.00% smbd
83067 root 1 20 0 46004K 19092K select 5 0:00 0.00% winbindd
83072 root 1 20 0 46896K 19744K select 1 0:00 0.00% winbindd
33906 root 1 20 0 9792K 2072K ttyin 0 0:00 0.00% csh
10891 nobody 1 20 0 7148K 2432K select 1 0:00 0.00% mdnsd
 

michael.samer

Dabbler
Joined
Feb 19, 2018
Messages
21
P.S.: when I first started dtrace on the "memory Hog" smbd (PID:83089) dtrace stopped instantly and the smbd collapsed with it. Two hours ago this was what top reported (while the smbd was still quite tame):
root@DEVNETNAS:~ # top
last pid: 77421; load averages: 0.93, 0.87, 0.91 up 1+04:11:32 13:35:20
52 processes: 2 running, 50 sleeping
CPU: 4.0% user, 0.0% nice, 3.5% system, 1.4% interrupt, 91.1% idle
Mem: 2794M Active, 14G Inact, 596M Laundry, 13G Wired, 420M Free
ARC: 10G Total, 1268M MFU, 7954M MRU, 104M Anon, 90M Header, 776M Other
8756M Compressed, 23G Uncompressed, 2.66:1 Ratio
Swap: 34G Total, 301M Used, 34G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
83089 root 1 88 0 37476M 17361M CPU7 7 531:33 53.06% smbd
228 root 22 36 0 200M 112M kqread 3 10:49 0.53% python3.6
10981 root 18 30 0 42260K 16732K uwait 2 2:39 0.14% consul
83065 root 1 20 0 128M 100M select 2 1:27 0.12% smbd
10364 root 2 20 0 14392K 3864K select 7 0:37 0.03% vmtoolsd
9522 root 1 20 0 9176K 1044K select 5 0:06 0.02% devd
96617 root 1 20 0 13776K 5916K select 2 0:01 0.02% sshd
10149 root 1 20 0 10336K 10436K select 4 0:03 0.01% ntpd
11877 root 1 20 0 9004K 2888K select 7 0:01 0.00% zfsd
10950 root 1 20 0 147M 15728K kqread 6 0:08 0.00% uwsgi
83061 root 1 20 0 47968K 21360K select 5 0:07 0.00% winbindd
83094 root 1 23 0 93800K 65332K select 5 4:44 0.00% winbindd
11079 root 12 20 0 99712K 12212K nanslp 6 2:24 0.00% collectd
10938 root 1 34 0 101M 70672K select 1 1:06 0.00% python3.6
11965 root 15 20 0 226M 96472K umtxn 4 0:13 0.00% uwsgi
9721 root 1 20 0 23456K 5156K kqread 0 0:09 0.00% syslog-ng
10974 root 19 20 0 52440K 7424K uwait 0 0:07 0.00% consul-alerts
10530 root 1 52 0 12908K 4940K select 6 0:03 0.00% sshd
83049 root 1 20 0 171M 142M select 6 0:03 0.00% smbd
12116 root 1 20 0 72708K 0K wait 5 0:02 0.00% <python3.6>
12191 root 17 20 0 33936K 7600K uwait 5 0:02 0.00% consul
83044 root 1 20 0 37096K 15964K select 0 0:01 0.00% nmbd
83073 root 1 20 0 50864K 22216K select 0 0:00 0.00% winbindd
92962 root 1 32 0 7096K 1132K wait 4 0:00 0.00% sh
11891 root 1 40 0 6760K 836K nanslp 3 0:00 0.00% cron
9850 root 1 -52 r0 3520K 3584K nanslp 5 0:00 0.00% watchdogd
37638 www 1 20 0 31452K 6640K kqread 4 0:00 0.00% nginx
12190 root 9 25 0 32528K 6272K uwait 4 0:00 0.00% consul
96619 root 1 20 0 9792K 3264K pause 7 0:00 0.00% csh
83066 root 1 20 0 128M 101M select 7 0:00 0.00% smbd
83067 root 1 20 0 46004K 19088K select 7 0:00 0.00% winbindd
83072 root 1 20 0 46896K 19744K select 6 0:00 0.00% winbindd
33906 root 1 20 0 9792K 2072K ttyin 0 0:00 0.00% csh
10891 nobody 1 20 0 7148K 2432K select 3 0:00 0.00% mdnsd
10883 root 1 20 0 29404K 5092K pause 0 0:00 0.00% nginx


This was top yesterday (I croned a samba_service restart at 23:00h to avoid the daily crash:
last pid: 4554; load averages: 1.61, 1.04, 0.95 up 0+03:04:21 12:28:02
84 processes: 4 running, 80 sleeping
CPU: 17.2% user, 0.0% nice, 15.1% system, 0.4% interrupt, 67.3% idle
Mem: 11G Active, 3036M Inact, 1030M Laundry, 16G Wired, 660M Free
ARC: 12G Total, 1132M MFU, 9812M MRU, 1888K Anon, 196M Header, 793M Other
10G Compressed, 19G Uncompressed, 1.88:1 Ratio
Swap: 34G Total, 105M Used, 34G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
4493 root 1 52 0 80628K 70936K select 5 0:02 91.99% python3.6
5140 root 1 97 0 63296M 14271M CPU5 5 69:06 80.66% smbd
847 root 1 84 0 127M 122M CPU3 3 0:18 35.04% python3.6
315 root 1 22 0 65488K 28160K uwait 7 0:03 8.62% dtrace
10981 root 16 30 0 38676K 19348K uwait 6 0:18 0.23% consul
4491 root 1 20 0 8212K 3680K CPU0 0 0:00 0.19% top
644 root 1 20 0 59440K 26868K uwait 5 0:00 0.12% dtrace
10364 root 2 20 0 14392K 4868K select 6 0:04 0.04% vmtoolsd
311 root 1 20 0 65488K 27572K uwait 6 0:00 0.04% dtrace
96617 root 1 20 0 13776K 8480K select 7 0:00 0.02% sshd
386 root 1 20 0 67536K 26292K uwait 7 0:00 0.02% dtrace
850 root 1 22 0 7096K 3660K pause 1 0:00 0.01% sh
228 root 22 21 0 199M 133M kqread 3 1:23 0.01% python3.6
10950 root 1 20 0 147M 96356K kqread 3 0:06 0.00% uwsgi
16330 root 1 20 0 51584K 23988K select 2 0:02 0.00% winbindd
10149 root 1 20 0 10336K 10436K select 6 0:00 0.00% ntpd
17334 root 1 23 0 96404K 65532K select 7 1:00 0.00% winbindd
11079 root 12 20 0 99712K 20048K nanslp 0 0:15 0.00% collectd
10938 root 1 32 0 101M 84024K select 1 0:12 0.00% python3.6
11965 root 16 20 0 226M 131M umtxn 1 0:07 0.00% uwsgi
12116 root 1 20 0 72708K 65168K wait 5 0:02 0.00% python3.6
9721 root 3 20 0 23200K 6424K kqread 5 0:01 0.00% syslog-ng
10974 root 16 20 0 47704K 10400K uwait 2 0:01 0.00% consul-al
16324 root 1 20 0 171M 140M select 5 0:01 0.00% smbd
9522 root 1 20 0 9176K 1208K select 4 0:01 0.00% devd
16358 root 1 20 0 53868K 24348K select 5 0:00 0.00% winbindd
10530 root 1 52 0 12908K 6752K select 7 0:00 0.00% sshd
10884 www 1 20 0 31356K 8524K kqread 3 0:00 0.00% nginx
11877 root 1 20 0 9004K 4548K select 6 0:00 0.00% zfsd
16319 root 1 20 0 37096K 15008K select 5 0:00 0.00% nmbd
98464 root 1 20 0 13776K 8792K select 0 0:00 0.00% sshd
92962 root 1 34 0 7096K 3792K wait 3 0:00 0.00% sh
12191 root 9 20 0 32784K 13520K uwait 5 0:00 0.00% consul
99657 root 1 52 0 7096K 3660K pause 0 0:00 0.00% sh
16342 root 1 20 0 48280K 19192K select 3 0:00 0.00% winbindd
11891 root 1 20 0 6500K 2532K nanslp 4 0:00 0.00% cron
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
P.S.: when I first started dtrace on the "memory Hog" smbd (PID:83089) dtrace stopped instantly and the smbd collapsed with it. Two hours ago this was what top reported (while the smbd was still quite tame):
root@DEVNETNAS:~ # top
last pid: 77421; load averages: 0.93, 0.87, 0.91 up 1+04:11:32 13:35:20
52 processes: 2 running, 50 sleeping
CPU: 4.0% user, 0.0% nice, 3.5% system, 1.4% interrupt, 91.1% idle
Mem: 2794M Active, 14G Inact, 596M Laundry, 13G Wired, 420M Free
ARC: 10G Total, 1268M MFU, 7954M MRU, 104M Anon, 90M Header, 776M Other
8756M Compressed, 23G Uncompressed, 2.66:1 Ratio
Swap: 34G Total, 301M Used, 34G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
83089 root 1 88 0 37476M 17361M CPU7 7 531:33 53.06% smbd
228 root 22 36 0 200M 112M kqread 3 10:49 0.53% python3.6
10981 root 18 30 0 42260K 16732K uwait 2 2:39 0.14% consul
83065 root 1 20 0 128M 100M select 2 1:27 0.12% smbd
10364 root 2 20 0 14392K 3864K select 7 0:37 0.03% vmtoolsd
9522 root 1 20 0 9176K 1044K select 5 0:06 0.02% devd
96617 root 1 20 0 13776K 5916K select 2 0:01 0.02% sshd
10149 root 1 20 0 10336K 10436K select 4 0:03 0.01% ntpd
11877 root 1 20 0 9004K 2888K select 7 0:01 0.00% zfsd
10950 root 1 20 0 147M 15728K kqread 6 0:08 0.00% uwsgi
83061 root 1 20 0 47968K 21360K select 5 0:07 0.00% winbindd
83094 root 1 23 0 93800K 65332K select 5 4:44 0.00% winbindd
11079 root 12 20 0 99712K 12212K nanslp 6 2:24 0.00% collectd
10938 root 1 34 0 101M 70672K select 1 1:06 0.00% python3.6
11965 root 15 20 0 226M 96472K umtxn 4 0:13 0.00% uwsgi
9721 root 1 20 0 23456K 5156K kqread 0 0:09 0.00% syslog-ng
10974 root 19 20 0 52440K 7424K uwait 0 0:07 0.00% consul-alerts
10530 root 1 52 0 12908K 4940K select 6 0:03 0.00% sshd
83049 root 1 20 0 171M 142M select 6 0:03 0.00% smbd
12116 root 1 20 0 72708K 0K wait 5 0:02 0.00% <python3.6>
12191 root 17 20 0 33936K 7600K uwait 5 0:02 0.00% consul
83044 root 1 20 0 37096K 15964K select 0 0:01 0.00% nmbd
83073 root 1 20 0 50864K 22216K select 0 0:00 0.00% winbindd
92962 root 1 32 0 7096K 1132K wait 4 0:00 0.00% sh
11891 root 1 40 0 6760K 836K nanslp 3 0:00 0.00% cron
9850 root 1 -52 r0 3520K 3584K nanslp 5 0:00 0.00% watchdogd
37638 www 1 20 0 31452K 6640K kqread 4 0:00 0.00% nginx
12190 root 9 25 0 32528K 6272K uwait 4 0:00 0.00% consul
96619 root 1 20 0 9792K 3264K pause 7 0:00 0.00% csh
83066 root 1 20 0 128M 101M select 7 0:00 0.00% smbd
83067 root 1 20 0 46004K 19088K select 7 0:00 0.00% winbindd
83072 root 1 20 0 46896K 19744K select 6 0:00 0.00% winbindd
33906 root 1 20 0 9792K 2072K ttyin 0 0:00 0.00% csh
10891 nobody 1 20 0 7148K 2432K select 3 0:00 0.00% mdnsd
10883 root 1 20 0 29404K 5092K pause 0 0:00 0.00% nginx


This was top yesterday (I croned a samba_service restart at 23:00h to avoid the daily crash:
last pid: 4554; load averages: 1.61, 1.04, 0.95 up 0+03:04:21 12:28:02
84 processes: 4 running, 80 sleeping
CPU: 17.2% user, 0.0% nice, 15.1% system, 0.4% interrupt, 67.3% idle
Mem: 11G Active, 3036M Inact, 1030M Laundry, 16G Wired, 660M Free
ARC: 12G Total, 1132M MFU, 9812M MRU, 1888K Anon, 196M Header, 793M Other
10G Compressed, 19G Uncompressed, 1.88:1 Ratio
Swap: 34G Total, 105M Used, 34G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
4493 root 1 52 0 80628K 70936K select 5 0:02 91.99% python3.6
5140 root 1 97 0 63296M 14271M CPU5 5 69:06 80.66% smbd
847 root 1 84 0 127M 122M CPU3 3 0:18 35.04% python3.6
315 root 1 22 0 65488K 28160K uwait 7 0:03 8.62% dtrace
10981 root 16 30 0 38676K 19348K uwait 6 0:18 0.23% consul
4491 root 1 20 0 8212K 3680K CPU0 0 0:00 0.19% top
644 root 1 20 0 59440K 26868K uwait 5 0:00 0.12% dtrace
10364 root 2 20 0 14392K 4868K select 6 0:04 0.04% vmtoolsd
311 root 1 20 0 65488K 27572K uwait 6 0:00 0.04% dtrace
96617 root 1 20 0 13776K 8480K select 7 0:00 0.02% sshd
386 root 1 20 0 67536K 26292K uwait 7 0:00 0.02% dtrace
850 root 1 22 0 7096K 3660K pause 1 0:00 0.01% sh
228 root 22 21 0 199M 133M kqread 3 1:23 0.01% python3.6
10950 root 1 20 0 147M 96356K kqread 3 0:06 0.00% uwsgi
16330 root 1 20 0 51584K 23988K select 2 0:02 0.00% winbindd
10149 root 1 20 0 10336K 10436K select 6 0:00 0.00% ntpd
17334 root 1 23 0 96404K 65532K select 7 1:00 0.00% winbindd
11079 root 12 20 0 99712K 20048K nanslp 0 0:15 0.00% collectd
10938 root 1 32 0 101M 84024K select 1 0:12 0.00% python3.6
11965 root 16 20 0 226M 131M umtxn 1 0:07 0.00% uwsgi
12116 root 1 20 0 72708K 65168K wait 5 0:02 0.00% python3.6
9721 root 3 20 0 23200K 6424K kqread 5 0:01 0.00% syslog-ng
10974 root 16 20 0 47704K 10400K uwait 2 0:01 0.00% consul-al
16324 root 1 20 0 171M 140M select 5 0:01 0.00% smbd
9522 root 1 20 0 9176K 1208K select 4 0:01 0.00% devd
16358 root 1 20 0 53868K 24348K select 5 0:00 0.00% winbindd
10530 root 1 52 0 12908K 6752K select 7 0:00 0.00% sshd
10884 www 1 20 0 31356K 8524K kqread 3 0:00 0.00% nginx
11877 root 1 20 0 9004K 4548K select 6 0:00 0.00% zfsd
16319 root 1 20 0 37096K 15008K select 5 0:00 0.00% nmbd
98464 root 1 20 0 13776K 8792K select 0 0:00 0.00% sshd
92962 root 1 34 0 7096K 3792K wait 3 0:00 0.00% sh
12191 root 9 20 0 32784K 13520K uwait 5 0:00 0.00% consul
99657 root 1 52 0 7096K 3660K pause 0 0:00 0.00% sh
16342 root 1 20 0 48280K 19192K select 3 0:00 0.00% winbindd
11891 root 1 20 0 6500K 2532K nanslp 4 0:00 0.00% cron

Well, in my testing the widelinks / unix extensions workaround didn't 100% eliminate calls to the leaky function. It just reduced an arterial gush of memory to a "flesh wound", but different workloads / workloads will tickle this issue differently. Try to reproduce on 11.1-U5 when it comes out before filing a new bug ticket.
 

michael.samer

Dabbler
Joined
Feb 19, 2018
Messages
21
Hello Anodos
today I put some time into the leaking smbd process. Everytime I trigger a trace the task stops, no Up/download is interrupted in any way and leaves no trace:
a) I started top for the PID (task with huge ram and lots of CPU time, running sometimes for lot of hours)
b) I triggered your dtrace script with the PID
c) I look for the log and there is none
d) the smbd PID vanished as well and a new one appears
e) looking into the clients there's still the rsync (via cifs share) process running with no sign of interrupt.
As there's only one user currently using the server it looks like it's some kind of zombie process:

root@DEVNETNAS:~ # top
last pid: 26443; load averages: 0.89, 1.08, 1.04 up 0+05:48:02 18:07:39
51 processes: 1 running, 50 sleeping
CPU: 1.5% user, 0.0% nice, 10.9% system, 0.7% interrupt, 86.9% idle
Mem: 13G Active, 10G Inact, 38M Laundry, 7506M Wired, 573M Free
ARC: 4462M Total, 669M MFU, 2822M MRU, 7626K Anon, 75M Header, 888M Other
2688M Compressed, 5350M Uncompressed, 1.99:1 Ratio
Swap: 34G Total, 305M Used, 34G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
39705 root 1 52 0 68689M 23508M select 6 62:58 43.72% smbd
13508 root 1 21 0 128M 55720K select 7 3:49 1.39% smbd
25693 root 1 20 0 8212K 3592K CPU5 5 0:00 1.01% top
229 root 16 52 0 194M 108M kqread 0 2:23 0.61% python3.6
4460 root 16 30 0 39188K 14288K uwait 1 0:31 0.33% consul
4453 root 17 20 0 44248K 7288K uwait 1 0:01 0.29% consul-alerts
7778 root 1 20 0 13776K 6184K select 0 0:01 0.15% sshd
3834 root 2 20 0 14524K 3560K select 5 0:07 0.03% vmtoolsd
13501 root 1 20 0 46936K 14688K select 7 0:03 0.03% winbindd
13830 root 1 24 0 93232K 57144K select 4 1:51 0.00% winbindd
5316 root 15 20 0 220M 87128K kqread 6 0:33 0.00% uwsgi
4535 root 12 20 0 99M 14108K nanslp 4 0:30 0.00% collectd
4417 root 1 22 0 101M 27512K select 2 0:17 0.00% python3.6
4429 root 1 20 0 147M 94872K kqread 0 0:06 0.00% uwsgi
1809 root 1 20 0 21104K 3008K kqread 6 0:02 0.00% syslog-ng
5849 root 1 52 0 72808K 10656K ttyin 1 0:02 0.00% python3.6
13495 root 1 20 0 171M 97040K select 0 0:02 0.00% smbd
1608 root 1 20 0 9176K 896K select 5 0:01 0.00% devd
4003 root 1 52 0 13004K 5112K select 0 0:01 0.00% sshd
2737 root 1 20 0 10432K 10544K select 3 0:01 0.00% ntpd
4362 www 1 20 0 31452K 4116K kqread 1 0:01 0.00% nginx
13529 root 1 20 0 50440K 14804K select 1 0:00 0.00% winbindd
13490 root 1 20 0 37096K 11444K select 1 0:00 0.00% nmbd
5871 root 16 20 0 32912K 7860K uwait 2 0:00 0.00% consul
root@DEVNETNAS:~ # /root/dtrace.dt 39705 -o /root/pidtrace5.txt
^C
root@DEVNETNAS:~ # top
last pid: 26617; load averages: 0.90, 1.05, 1.04 up 0+05:48:37 18:08:14
51 processes: 1 running, 50 sleeping
CPU: 1.4% user, 0.0% nice, 6.0% system, 0.7% interrupt, 91.9% idle
Mem: 349M Active, 206M Inact, 38M Laundry, 7261M Wired, 23G Free
ARC: 4478M Total, 665M MFU, 2809M MRU, 13M Anon, 76M Header, 916M Other
2672M Compressed, 5482M Uncompressed, 2.05:1 Ratio
Swap: 34G Total, 305M Used, 34G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
26616 root 1 47 0 203M 121M select 2 0:02 45.38% smbd
13508 root 1 20 0 128M 55720K select 5 3:49 1.42% smbd
26617 root 1 21 0 8212K 3584K CPU1 1 0:00 0.46% top
4460 root 16 30 0 39188K 14288K uwait 1 0:31 0.15% consul
7778 root 1 20 0 13776K 6184K select 0 0:01 0.05% sshd
3834 root 2 20 0 14524K 3560K select 0 0:07 0.03% vmtoolsd
229 root 16 20 0 194M 108M kqread 0 2:23 0.00% python3.6
13830 root 1 20 0 93232K 57144K select 1 1:51 0.00% winbindd
5316 root 15 20 0 220M 87128K umtxn 6 0:33 0.00% uwsgi
4535 root 12 20 0 97664K 10324K nanslp 6 0:30 0.00% collectd
4417 root 1 21 0 101M 27512K select 2 0:17 0.00% python3.6
4429 root 1 20 0 147M 94872K kqread 3 0:06 0.00% uwsgi
13501 root 1 20 0 46936K 14688K select 4 0:03 0.00% winbindd
1809 root 3 20 0 21112K 3048K kqread 3 0:02 0.00% syslog-ng
5849 root 1 52 0 72808K 10656K ttyin 1 0:02 0.00% python3.6
13495 root 1 30 0 171M 97040K select 0 0:02 0.00% smbd
4453 root 17 20 0 44248K 7288K uwait 1 0:01 0.00% consul-alerts
1608 root 1 20 0 9176K 896K select 7 0:01 0.00% devd
4003 root 1 52 0 13004K 5112K select 7 0:01 0.00% sshd
2737 root 1 20 0 10432K 10544K select 2 0:01 0.00% ntpd
4362 www 1 20 0 31452K 4116K kqread 0 0:01 0.00% nginx
13529 root 1 20 0 50440K 14804K select 4 0:00 0.00% winbindd
13490 root 1 20 0 37096K 11444K select 3 0:00 0.00% nmbd
5871 root 16 20 0 32912K 7860K uwait 2 0:00 0.00% consul
root@DEVNETNAS:~ # ls -all
total 28
drwxr-xr-x 4 root wheel 15 Apr 19 14:16 .
drwxr-xr-x 21 root wheel 28 Apr 19 12:20 ..
-rw------- 1 root wheel 6 Apr 19 12:16 .bash_history
-rw-r--r-- 1 root wheel 1128 Apr 16 09:19 .bashrc
-rw-r--r-- 2 root wheel 887 Apr 16 09:19 .cshrc
-rw-r--r-- 1 root wheel 140 Apr 16 09:19 .gdbinit
-rw------- 1 root wheel 2482 Apr 19 12:17 .history
-rw-r--r-- 1 root wheel 80 Apr 16 09:19 .k5login
-rw-r--r-- 1 root wheel 224 Apr 16 09:19 .login
-rw-r--r-- 1 root wheel 559 Apr 16 09:19 .profile
-rw-r--r-- 1 root wheel 1128 Apr 16 09:19 .shrc
drwxr-xr-x 2 root wheel 6 Feb 6 13:35 .ssh
-rwxr-xr-x 1 root wheel 77 Apr 19 14:16 dtrace.dt
-rw-r--r-- 1 root wheel 50144 Apr 19 14:16 pidhist3.txt
drwxr-xr-x 3 root wheel 5 Apr 16 10:37 SMB



I experienced the vanishing two times today and three days ago when I first used your script.

My main problem with U5 is: when it's not 100% stopping the effect I'm overruled and we have to move to some other solution (RHEL+XFS or Win2012R2+ReFS) which lacks essential functions of FreeNAS, but is stable in other locations. So waiting patiently is not in my graps anymore.

I'm not much of a BSD scripter, but using your script and use ps (and a break a few seconds later) to deliver the pid for a cron task every few hours seems pretty much a task which solves my instant problem....
Cheers
Michael


P.S.: The move away from FN is mainly because of this incident as we could not find a third party professional (we tried two times via the ixsystems web site in search for a pro and one time via the german truenas support (named rahi or something like that) and no reply or paid help in two weeks. This makes my stand pretty harsh when we want to spend money and nobody can help us..
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Well, my script isn't supposed to kill your smbd process.... you can do that pretty easily through other means. It's just an incidental side-effect of the sheer volume of dtrace probes being hit. I'll give you a reduced-scope script in a second.
 
Top