Hello Anodos
today I put some time into the leaking smbd process. Everytime I trigger a trace the task stops, no Up/download is interrupted in any way and leaves no trace:
a) I started top for the PID (task with huge ram and lots of CPU time, running sometimes for lot of hours)
b) I triggered your dtrace script with the PID
c) I look for the log and there is none
d) the smbd PID vanished as well and a new one appears
e) looking into the clients there's still the rsync (via cifs share) process running with no sign of interrupt.
As there's only one user currently using the server it looks like it's some kind of zombie process:
root@DEVNETNAS:~ # top
last pid: 26443; load averages: 0.89, 1.08, 1.04 up 0+05:48:02 18:07:39
51 processes: 1 running, 50 sleeping
CPU: 1.5% user, 0.0% nice, 10.9% system, 0.7% interrupt, 86.9% idle
Mem: 13G Active, 10G Inact, 38M Laundry, 7506M Wired, 573M Free
ARC: 4462M Total, 669M MFU, 2822M MRU, 7626K Anon, 75M Header, 888M Other
2688M Compressed, 5350M Uncompressed, 1.99:1 Ratio
Swap: 34G Total, 305M Used, 34G Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
39705 root 1 52 0 68689M 23508M select 6 62:58 43.72% smbd
13508 root 1 21 0 128M 55720K select 7 3:49 1.39% smbd
25693 root 1 20 0 8212K 3592K CPU5 5 0:00 1.01% top
229 root 16 52 0 194M 108M kqread 0 2:23 0.61% python3.6
4460 root 16 30 0 39188K 14288K uwait 1 0:31 0.33% consul
4453 root 17 20 0 44248K 7288K uwait 1 0:01 0.29% consul-alerts
7778 root 1 20 0 13776K 6184K select 0 0:01 0.15% sshd
3834 root 2 20 0 14524K 3560K select 5 0:07 0.03% vmtoolsd
13501 root 1 20 0 46936K 14688K select 7 0:03 0.03% winbindd
13830 root 1 24 0 93232K 57144K select 4 1:51 0.00% winbindd
5316 root 15 20 0 220M 87128K kqread 6 0:33 0.00% uwsgi
4535 root 12 20 0 99M 14108K nanslp 4 0:30 0.00% collectd
4417 root 1 22 0 101M 27512K select 2 0:17 0.00% python3.6
4429 root 1 20 0 147M 94872K kqread 0 0:06 0.00% uwsgi
1809 root 1 20 0 21104K 3008K kqread 6 0:02 0.00% syslog-ng
5849 root 1 52 0 72808K 10656K ttyin 1 0:02 0.00% python3.6
13495 root 1 20 0 171M 97040K select 0 0:02 0.00% smbd
1608 root 1 20 0 9176K 896K select 5 0:01 0.00% devd
4003 root 1 52 0 13004K 5112K select 0 0:01 0.00% sshd
2737 root 1 20 0 10432K 10544K select 3 0:01 0.00% ntpd
4362 www 1 20 0 31452K 4116K kqread 1 0:01 0.00% nginx
13529 root 1 20 0 50440K 14804K select 1 0:00 0.00% winbindd
13490 root 1 20 0 37096K 11444K select 1 0:00 0.00% nmbd
5871 root 16 20 0 32912K 7860K uwait 2 0:00 0.00% consul
root@DEVNETNAS:~ # /root/dtrace.dt 39705 -o /root/pidtrace5.txt
^C
root@DEVNETNAS:~ # top
last pid: 26617; load averages: 0.90, 1.05, 1.04 up 0+05:48:37 18:08:14
51 processes: 1 running, 50 sleeping
CPU: 1.4% user, 0.0% nice, 6.0% system, 0.7% interrupt, 91.9% idle
Mem: 349M Active, 206M Inact, 38M Laundry, 7261M Wired, 23G Free
ARC: 4478M Total, 665M MFU, 2809M MRU, 13M Anon, 76M Header, 916M Other
2672M Compressed, 5482M Uncompressed, 2.05:1 Ratio
Swap: 34G Total, 305M Used, 34G Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
26616 root 1 47 0 203M 121M select 2 0:02 45.38% smbd
13508 root 1 20 0 128M 55720K select 5 3:49 1.42% smbd
26617 root 1 21 0 8212K 3584K CPU1 1 0:00 0.46% top
4460 root 16 30 0 39188K 14288K uwait 1 0:31 0.15% consul
7778 root 1 20 0 13776K 6184K select 0 0:01 0.05% sshd
3834 root 2 20 0 14524K 3560K select 0 0:07 0.03% vmtoolsd
229 root 16 20 0 194M 108M kqread 0 2:23 0.00% python3.6
13830 root 1 20 0 93232K 57144K select 1 1:51 0.00% winbindd
5316 root 15 20 0 220M 87128K umtxn 6 0:33 0.00% uwsgi
4535 root 12 20 0 97664K 10324K nanslp 6 0:30 0.00% collectd
4417 root 1 21 0 101M 27512K select 2 0:17 0.00% python3.6
4429 root 1 20 0 147M 94872K kqread 3 0:06 0.00% uwsgi
13501 root 1 20 0 46936K 14688K select 4 0:03 0.00% winbindd
1809 root 3 20 0 21112K 3048K kqread 3 0:02 0.00% syslog-ng
5849 root 1 52 0 72808K 10656K ttyin 1 0:02 0.00% python3.6
13495 root 1 30 0 171M 97040K select 0 0:02 0.00% smbd
4453 root 17 20 0 44248K 7288K uwait 1 0:01 0.00% consul-alerts
1608 root 1 20 0 9176K 896K select 7 0:01 0.00% devd
4003 root 1 52 0 13004K 5112K select 7 0:01 0.00% sshd
2737 root 1 20 0 10432K 10544K select 2 0:01 0.00% ntpd
4362 www 1 20 0 31452K 4116K kqread 0 0:01 0.00% nginx
13529 root 1 20 0 50440K 14804K select 4 0:00 0.00% winbindd
13490 root 1 20 0 37096K 11444K select 3 0:00 0.00% nmbd
5871 root 16 20 0 32912K 7860K uwait 2 0:00 0.00% consul
root@DEVNETNAS:~ # ls -all
total 28
drwxr-xr-x 4 root wheel 15 Apr 19 14:16 .
drwxr-xr-x 21 root wheel 28 Apr 19 12:20 ..
-rw------- 1 root wheel 6 Apr 19 12:16 .bash_history
-rw-r--r-- 1 root wheel 1128 Apr 16 09:19 .bashrc
-rw-r--r-- 2 root wheel 887 Apr 16 09:19 .cshrc
-rw-r--r-- 1 root wheel 140 Apr 16 09:19 .gdbinit
-rw------- 1 root wheel 2482 Apr 19 12:17 .history
-rw-r--r-- 1 root wheel 80 Apr 16 09:19 .k5login
-rw-r--r-- 1 root wheel 224 Apr 16 09:19 .login
-rw-r--r-- 1 root wheel 559 Apr 16 09:19 .profile
-rw-r--r-- 1 root wheel 1128 Apr 16 09:19 .shrc
drwxr-xr-x 2 root wheel 6 Feb 6 13:35 .ssh
-rwxr-xr-x 1 root wheel 77 Apr 19 14:16 dtrace.dt
-rw-r--r-- 1 root wheel 50144 Apr 19 14:16 pidhist3.txt
drwxr-xr-x 3 root wheel 5 Apr 16 10:37 SMB
I experienced the vanishing two times today and three days ago when I first used your script.
My main problem with U5 is: when it's not 100% stopping the effect I'm overruled and we have to move to some other solution (RHEL+XFS or Win2012R2+ReFS) which lacks essential functions of FreeNAS, but is stable in other locations. So waiting patiently is not in my graps anymore.
I'm not much of a BSD scripter, but using your script and use ps (and a break a few seconds later) to deliver the pid for a cron task every few hours seems pretty much a task which solves my instant problem....
Cheers
Michael
P.S.: The move away from FN is mainly because of this incident as we could not find a third party professional (we tried two times via the ixsystems web site in search for a pro and one time via the german truenas support (named rahi or something like that) and no reply or paid help in two weeks. This makes my stand pretty harsh when we want to spend money and nobody can help us..