SMB share access breaks, High CPU telemetry-gather.py and mdns lookup errors in syslog

Status
Not open for further replies.

RichardV

Cadet
Joined
Jan 6, 2017
Messages
5
Hi All,

I have a few FreeNAS servers running and until now had no problems that i could not solve, but now i could use some help, to be sure i do not forget any info i maybe a bit long.

I have problems with the SMB Shares, the NAS is AD Domain Member, and i have 3 SMB Shares witch are use by a Windows 2012R2 Document management server, at daily intervals there is a Indexing process on the Windows server that scans the SMB Share Directories and indexes these are small text files of about 10Kb the result is written to an index file.

This process runs for 45min and then suddenly breaks, i reports a SMB Share write error and stops, at that point the FreeNAS server is using all its CPU for 3 processes.

  • python2.7 (telemetry-gather.py )
  • syslog-ng
  • smbd

The syslog-ng process is running so high because it is processing 21799834 new log entries from smbd in about 15min, the syslog is filled with these messages :

Code:
Jan  6 11:27:26 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of HOSTNAME
Jan  6 11:27:26 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of HOSTNAME
Jan  6 11:27:26 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of HOSTNAME
Jan  6 11:27:26 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of HOSTNAME
Jan  6 11:27:26 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of HOSTNAME


The system was updated last night from FreeNAS-9.10.1-u2 on witch i had the same problem.
And what i did notice that the telemetry script was already running working on the messages log , and before the upgrade and reboot there were event more of those scripts running probably waiting for each other.....

Code:
nibeheer@vdl-nas01:~ % ps -waxu | grep python
root  79438 49.8  0.0  84536  13696  -  R  Fri03AM  6276:42.50 /usr/local/bin/python /usr/local/bin/telemetry-gather.py /var/log/messages /var/log/messages.0 /var/log/messages.0.bz2
root  91434 48.9  0.0  84536  13696  -  R  Wed03AM  1132:40.47 /usr/local/bin/python /usr/local/bin/telemetry-gather.py /var/log/messages /var/log/messages.0 /var/log/messages.0.bz2
root  27023 48.1  0.0  84536  13696  -  R  Mon03AM  2700:21.76 /usr/local/bin/python /usr/local/bin/telemetry-gather.py /var/log/messages /var/log/messages.0 /var/log/messages.0.bz2
root  61989 46.9  0.0  84536  13696  -  R  Sat03AM  4857:58.70 /usr/local/bin/python /usr/local/bin/telemetry-gather.py /var/log/messages /var/log/messages.0 /var/log/messages.0.bz2
root  9312 46.5  0.0  84536  13696  -  R  Tue03AM  1878:05.01 /usr/local/bin/python /usr/local/bin/telemetry-gather.py /var/log/messages /var/log/messages.0 /var/log/messages.0.bz2
root  44566 46.2  0.0  84536  13696  -  R  Sun03AM  3649:27.78 /usr/local/bin/python /usr/local/bin/telemetry-gather.py /var/log/messages /var/log/messages.0 /var/log/messages.0.bz2
root  73911 45.1  0.0  84536  13696  -  R  3:05AM  451:41.79 /usr/local/bin/python /usr/local/bin/telemetry-gather.py /var/log/messages /var/log/messages.0 /var/log/messages.0.bz2
root  26814 43.9  0.0  88632  17332  -  R  21Dec16  19210:19.65 /usr/local/bin/python /usr/local/bin/telemetry-gather.py /var/log/messages /var/log/messages.0 /var/log/messages.0.bz2
root  3741  0.4  0.2 282588 103304  -  S  15Nov16  124:00.47 python: alertd (python2.7)



Let's introduce the System :
  • Build FreeNAS-9.10.2 (a476f16)
  • Platform Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz
  • Memory 65396MB
  • Product Name: SSG-6048R-E1CR24L
  • System Board: X10DRH-iT
  • 20x WD Red 3Tb
  • 4x WD Red 6TB
  • 2x Intel SSD (ZIL & Log)

And the main ZPOOL
Code:
  pool: VDL-POOL01
 state: ONLINE
config:
   NAME  STATE  READ WRITE CKSUM
   VDL-POOL01  ONLINE  0  0  0
	raidz2-0  ONLINE  0  0  0
	gptid/e1436626-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e1ec5e0d-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e296f08b-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e33eb804-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e3e5da58-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	raidz2-1  ONLINE  0  0  0
	gptid/e49703e5-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e5405f58-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e5ef8625-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e69c6827-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e7424753-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	raidz2-2  ONLINE  0  0  0
	gptid/e7f6ed50-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e8a48b76-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e94e3ce7-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/e9f24a46-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/ea9c3cb9-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	raidz2-3  ONLINE  0  0  0
	gptid/eb56ea5b-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/ebff66c2-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/eca85bf4-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/ed4e33da-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
	gptid/edf86101-8564-11e6-984b-0cc47ade8080  ONLINE  0  0  0
   logs
	gptid/75b52b5f-8be7-11e6-8dc9-0cc47ade8081  ONLINE  0  0  0
   cache
	gptid/8d5cc7ad-8be7-11e6-8dc9-0cc47ade8081  ONLINE  0  0  0



And the SAMBA testparm output (Privacy data removed)

Code:
Load smb config files from /usr/local/etc/smb4.conf
Processing section "[smb_share1]"
Processing section "[smb_share2]"
Processing section "[smb_share3]"
Loaded services file OK.
WARNING: You have some share names that are longer than 12 characters.
These may not be accessible to some older clients.
(Eg. Windows9x, WindowsMe, and smbclient prior to Samba 3.0.)
Server role: ROLE_DOMAIN_MEMBER

Press enter to see a dump of your service definitions
# Global parameters
[global]
   bind interfaces only = Yes
   dos charset = CP437
   interfaces = 127.0.0.1 192.168.115.40
   multicast dns register = No
   realm = DOMAIN.LOCAL
   server string = VDL-NAS01 FreeNAS Server
   workgroup = DOMAIN
   domain master = No
   lm announce = Yes
   local master = No
   preferred master = No
   nsupdate command = /usr/local/bin/samba-nsupdate -g
   client ldap sasl wrapping = plain
   logging = file
   max log size = 51200
   cache directory = /var/tmp/.cache/.samba
   kernel change notify = No
   panic action = /usr/local/libexec/samba/samba-backtrace
   pid directory = /var/run/samba
   disable spoolss = Yes
   load printers = No
   printcap name = /dev/null
   allow trusted domains = No
   map to guest = Bad User
   obey pam restrictions = Yes
   security = ADS
   server role = member server
   deadtime = 15
   hostname lookups = Yes
   max open files = 1883378
   template shell = /bin/sh
   winbind cache time = 7200
   winbind enum groups = Yes
   winbind enum users = Yes
   winbind offline logon = Yes
   winbind refresh tickets = Yes
   dns proxy = No
   idmap config vdlindencs: range = 20000-90000000
   idmap config vdlindencs: backend = rid
   idmap config *: range = 90000001-100000000
   idmap config * : backend = tdb
   store dos attributes = Yes
   strict locking = No
   directory name cache size = 0
   dos filemode = Yes
   acl allow execute always = Yes
   ea support = Yes
   create mask = 0666
   directory mask = 0777


[smb_share1]
   comment = ***********
   path = /mnt/VDL-POOL01/SMB_SHARES/SMB_SHARE1
   veto files = /.snapshot/.windows/.mac/.zfs/
   read only = No
   vfs objects = zfs_space zfsacl aio_pthread
   zfsacl:acesort = dontcare
   nfs4:chown = true
   nfs4:acedup = merge
   nfs4:mode = special


[smb_share2]
   comment = *********
   path = /mnt/VDL-POOL01/SMB_SHARES/SMB_SHARE2
   veto files = /.snapshot/.windows/.mac/.zfs/
   read only = No
   vfs objects = zfs_space zfsacl aio_pthread
   zfsacl:acesort = dontcare
   nfs4:chown = true
   nfs4:acedup = merge
   nfs4:mode = special


[smb_share3]
   comment = *******
   path = /mnt/VDL-POOL01/SMB_SHARES/SMB_SHARE3
   veto files = /.snapshot/.windows/.mac/.zfs/
   read only = No
   vfs objects = zfs_space zfsacl aio_pthread
   zfsacl:acesort = dontcare
   nfs4:chown = true
   nfs4:acedup = merge
   nfs4:mode = special



I hope there are some ideas in what i can do/check or collect next.

Many thanks.
Richard
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
You might want to turn of telemetry, all it does is send performance data back to iX Systems. Since you're generating a lot of entries in the log, it's taking time to process it all.

You can disable it under: System > Advanced > Enable automatic upload of kernel crash dumps and daily telemetry
 

RichardV

Cadet
Joined
Jan 6, 2017
Messages
5
Hi m0nkey_ thanks voor the location, had found a post from cyberjock that there was an ON/OFF switch but could not find where ;-)
it's now disabled and will see how the system behaves.
 

RichardV

Cadet
Joined
Jan 6, 2017
Messages
5
OK, though i would post an update.

After disabling the telemetry gatherer the server is running a little better, that means is stays more responsive when it goes wrong.

Still after running an Indexing script of the 3rd party software after about 30min the logs fill up again with :

Code:
Jan 18 07:50:18 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of vdl-srv04.DOMEIN.local
Jan 18 07:50:18 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of vdl-srv04.DOMEIN.local
Jan 18 07:50:18 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of vdl-srv04.DOMEIN.local
Jan 18 07:50:18 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of vdl-srv04.DOMEIN.local
Jan 18 07:50:18 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of vdl-srv04.DOMEIN.local
Jan 18 07:50:18 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of vdl-srv04.DOMEIN.local
Jan 18 07:50:18 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of vdl-srv04.DOMEIN.local


This goes so fast that its almost impossible to see is de log.smb is giving anything useful around the time the messages starts.
the smbd process CPU goes to 100% in one of the cores, with the syslog-ng daemon also using lots of CPU in order to process all the
syslog messages as mentioned above.

When i open the SMB settings page and simply save without changing settings the SMBD daemon restarts and after logging the messages below
the 3rd party Indexing Script (wich just reported that the share was unavailable) simple continues where is was pauzed.

Code:
Jan 18 08:25:33 vdl-nas01 smbd: mdns: Unexpected return from select on lookup of vdl-srv04.DOMEIN.local
Jan 18 08:25:33 vdl-nas01 smbd: dnssd_clientstub DNSServiceProcessResult called with DNSServiceRef with no ProcessReply function
Jan 18 08:25:33 vdl-nas01 smbd: dnssd_clientstub DNSServiceProcessResult called with DNSServiceRef with no ProcessReply function
Jan 18 08:25:33 vdl-nas01 smbd: dnssd_clientstub DNSServiceProcessResult called with DNSServiceRef with no ProcessReply function
Jan 18 08:25:33 vdl-nas01 smbd: dnssd_clientstub DNSServiceProcessResult called with DNSServiceRef with no ProcessReply function
Jan 18 08:25:33 vdl-nas01 smbd: dnssd_clientstub DNSServiceProcessResult called with DNSServiceRef with no ProcessReply function
Jan 18 08:25:33 vdl-nas01 smbd: dnssd_clientstub DNSServiceProcessResult called with DNSServiceRef with no ProcessReply function
Jan 18 08:25:33 vdl-nas01 notifier: Waiting for PIDS: 27291.
Jan 18 08:25:33 vdl-nas01 notifier: Starting mdnsd.
Jan 18 08:25:33 vdl-nas01 notifier: Cannot 'reload' netatalk. Set netatalk_enable to YES in /etc/rc.conf or use 'onereload' instead of 'reload'.


So my question is for what do i need 'mdns' i do not use zeroconf share discovery and i have only one server that is accessing these shares.
can i disable this somewhere.

Some extra info about the Indexer script ;
The Indexing script scans the share for all files and reads some small (10kb) text files for indexing data i does this in 3 processes with
each 3 concurrent threads, i sends this data to an index database with a flush every 10 sec , these indexes are also on the same share.


Thanks in advance for any sugestions.
Richard
 
Status
Not open for further replies.
Top