SOLVED Samba: smbd processes using all available CPU, not responsive, don't react to signals

Status
Not open for further replies.

fsfsfs

Dabbler
Joined
Feb 5, 2018
Messages
24
Hi all,

thank you for providing such a great software free of charge, and thank everyone in this forum for their help and patience.

We are running FreeNAS-11.1-U5 on a HP Microserver N54L, 6GB RAM, 2x3 TB SATA (mirror). The system has seen quite a bit of software upgrades, the first FreeNAS on this machine was 9.3 IINM.

We use this machine as a small (5-10 users) workgroup NAS with samba.

The system usually runs well, but after a while, a smbd process - and after more time, more smbd processes - begin to take up 100 % CPU. The user that interacted with that smb process experiences hangs on their windows machine when trying to do anything with the NAS. We already had data/work lost that was in the making more than once :(

When this happens, I can't kill -HUP or regular kill the process on the command line, it does not react.

I have not found any real clues yet. Any help debugging this would be appreciated. I attach winbind and smbd log excerpts, TOP, and a copypasta from a backtrace in a gdb that I attached to the stalled process. I looks like it is doing something with xattrs and/or shadow copy a lot. But when I try to trace the process with truss, it comes up empty (maybe I'm using it wrong?) so it more looks like a deadlock to me than real work. I have no dtrace experience unfortunately, if that would help I would be glad to collect some data.

How to go on with troubleshooting?
 

Attachments

  • nas-2018-07-25-top.txt
    1 KB · Views: 377
  • nas-2018-07-25-smbd-backtrace.txt
    2.5 KB · Views: 347
  • nas-2018-07-25-log-wb-nas-example.txt
    908 bytes · Views: 529
  • nas-2018-07-25.png
    nas-2018-07-25.png
    203.1 KB · Views: 809
  • nas-2018-07-25-reporting.png
    nas-2018-07-25-reporting.png
    53.2 KB · Views: 755

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
We are running FreeNAS-11.1-U5 on a HP Microserver N54L, 6GB RAM
You are not using the MINIMUM amount of RAM for FreeNAS. For the purpose of a simple file server,
you must have 8GB or more. Please see the User Guide.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
How to go on with troubleshooting
It's going to be an exercise in frustration until your hardware meets
MINIMUM requirements.
I am aware of that.
Ok, that's good. Install some more memory and if you are still having the issues you posted about above, we will help out the best we can.
 

fsfsfs

Dabbler
Joined
Feb 5, 2018
Messages
24
I really don't want to offend you with my being stubborn. I just require that an OS gives me some kind of notice when it breaks because it is of running out of memory. And my system is clearly not doing that. If FreeNAS breaks for lack of RAM without the OS logging any errors, that is a huge quality problem. But it should not be my problem.

So. I have a box with two mirrored drives and 6 GB of ECC RAM. I believe that should be more than enough hardware for a samba server for a group of a half dozen people.

Can I somehow debug that smbd process to find out what it is doing?
 
Joined
Dec 29, 2014
Messages
1,135
If FreeNAS breaks for lack of RAM without the OS logging any errors, that is a huge quality problem. But it should not be my problem.

If you don't meet the minimum hardware requirements and it doesn't perform, that is on you.
 

fsfsfs

Dabbler
Joined
Feb 5, 2018
Messages
24
My smbd is not lacking memory. It has plenty. If it would lack memory, it would tell me, but it clearly does not. And if it lacked memory, it would log something and die or be killed by OOM or whatever. Having too little memory, it could even block, but it would not live lock like this. It doesn't make sense.

Thanks still.
 

fsfsfs

Dabbler
Joined
Feb 5, 2018
Messages
24
Thank you @anodos ! Here it comes:

Code:
Load smb config files from /usr/local/etc/smb4.conf
Processing section "[accurate]"
Processing section "[backup]"
Processing section "[cloudsim]"
Processing section "[guest]"
Processing section "[homes]"
Processing section "[sync]"
Loaded services file OK.
Server role: ROLE_STANDALONE

# Global parameters
[global]
		bind interfaces only = Yes
		dos charset = CP437
		interfaces = 127.0.0.1 192.168.1.12
		server string = FreeNAS Server
		workgroup = ACCURATE
		domain master = Yes
		lm announce = Yes
		preferred master = Yes
		nsupdate command = /usr/local/bin/samba-nsupdate -g
		logging = file
		max log size = 51200
		kernel change notify = No
		panic action = /usr/local/libexec/samba/samba-backtrace
		disable spoolss = Yes
		load printers = No
		printcap name = /dev/null
		time server = Yes
		map to guest = Bad User
		obey pam restrictions = Yes
		security = USER
		server role = standalone server
		deadtime = 15
		hostname lookups = Yes
		max open files = 174890
		dns proxy = No
		idmap config *: range = 90000001-100000000
		idmap config * : backend = tdb
		store dos attributes = Yes
		strict locking = No
		directory name cache size = 0
		dos filemode = Yes
		acl allow execute always = Yes
		ea support = Yes
		create mask = 0660
		directory mask = 0770


[accurate]
		comment = Authenticated users only
		path = "/mnt/storage/accurate"
		delete veto files = Yes
		hide dot files = No
		veto files = /Thumbs.db/.DS_Store/._.DS_Store/.apdisk/
		read only = No
		vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr
		zfsacl:acesort = dontcare
		nfs4:chown = true
		nfs4:acedup = merge
		nfs4:mode = special
		shadow:snapdirseverywhere = yes
		shadow:format = auto-%Y%m%d.%H%M-2d
		shadow:localtime = yes
		shadow:sort = desc
		shadow:snapdir = .zfs/snapshot


[backup]
		comment = Better safe than sorry
		path = "/mnt/storage/backup"
		delete veto files = Yes
		veto files = /Thumbs.db/.DS_Store/._.DS_Store/.apdisk/
		read only = No
		vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr
		zfsacl:acesort = dontcare
		nfs4:chown = true
		nfs4:acedup = merge
		nfs4:mode = special
		shadow:snapdirseverywhere = yes
		shadow:format = auto-%Y%m%d.%H%M-3d
		shadow:localtime = yes
		shadow:sort = desc
		shadow:snapdir = .zfs/snapshot


[cloudsim]
		comment = Stairway to heaven
		path = "/mnt/storage/cloudsim"
		veto files = /.snapshot/.windows/.mac/.zfs/
		read only = No
		vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr
		zfsacl:acesort = dontcare
		nfs4:chown = true
		nfs4:acedup = merge
		nfs4:mode = special
		shadow:snapdirseverywhere = yes
		shadow:format = auto-%Y%m%d.%H%M-2d
		shadow:localtime = yes
		shadow:sort = desc
		shadow:snapdir = .zfs/snapshot


[guest]
		comment = Free4all
		path = "/mnt/storage/guest"
		delete veto files = Yes
		veto files = /Thumbs.db/.DS_Store/._.DS_Store/.apdisk/
		create mask = 0777
		directory mask = 0777
		force group = nogroup
		force user = nobody
		guest ok = Yes
		read only = No
		vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr
		zfsacl:acesort = dontcare
		nfs4:chown = true
		nfs4:acedup = merge
		nfs4:mode = special
		shadow:snapdirseverywhere = yes
		shadow:format = auto-%Y%m%d.%H%M-7d
		shadow:localtime = yes
		shadow:sort = desc
		shadow:snapdir = .zfs/snapshot


[homes]
		comment = Home Directories
		path = "/mnt/storage/home/%U"
		browseable = No
		delete veto files = Yes
		veto files = /Thumbs.db/.DS_Store/._.DS_Store/.apdisk/
		read only = No
		valid users = %U
		vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr
		zfsacl:acesort = dontcare
		nfs4:chown = true
		nfs4:acedup = merge
		nfs4:mode = special
		shadow:snapdirseverywhere = yes
		shadow:format = auto-%Y%m%d.%H%M-3y
		shadow:localtime = yes
		shadow:sort = desc
		shadow:snapdir = .zfs/snapshot


[sync]
		comment = Our Group Share
		path = "/mnt/storage/sync"
		delete veto files = Yes
		hide dot files = No
		veto files = /Thumbs.db/.DS_Store/._.DS_Store/.apdisk/
		force group = wheel
		force user = btsync_jail
		read only = No
		vfs objects = shadow_copy2 zfs_space zfsacl streams_xattr
		zfsacl:acesort = dontcare
		nfs4:chown = true
		nfs4:acedup = merge
		nfs4:mode = special
		shadow:snapdirseverywhere = yes
		shadow:format = auto-%Y%m%d.%H%M-7d
		shadow:localtime = yes
		shadow:sort = desc
		shadow:snapdir = .zfs/snapshot
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Use top to find a pid that's spinning at 100% cpu, then increase it's log level via smbcontrol <pid> debug 7 for a few seconds, then drop it back down smbcontrol <pid> debug 1 and upload /var/log/samba4/log.smbd.
 

fsfsfs

Dabbler
Joined
Feb 5, 2018
Messages
24
I restarted samba on Friday and have no new 100% processes yet - will check regularly and let you know.

Maybe of interest: The smbd processes did not react (within about two or three minutes of waiting) to a kill -HUP or kill -TERM. I only got them to die with a kill -KILL.

So if you already have an idea what I can do next when smbcontrol doesn't work out, that might come in handy soon :)
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
I restarted samba on Friday and have no new 100% processes yet - will check regularly and let you know.

Maybe of interest: The smbd processes did not react (within about two or three minutes of waiting) to a kill -HUP or kill -TERM. I only got them to die with a kill -KILL.

So if you already have an idea what I can do next when smbcontrol doesn't work out, that might come in handy soon :)

The goal isn't to kill the smbd process. We want to know what's going on in it. If it doesn't respond to smbcontrol, then get the following:
1) pcap of traffic
tcpdump -i YOUR_NIC -p -s 0 -w FILENAME port 445 and host IP_ADDRESS_OF_THE_CLIENT

2) Sample truss output for the pid.
 

fsfsfs

Dabbler
Joined
Feb 5, 2018
Messages
24
The goal isn't to kill the smbd process. We want to know what's going on in it.

Yes you're right of course. I wanted to give a reason why I believe smbcontrol might not work :)

In here I try smbcontrol:

I have one smbd process using 100% cpu currently, its PID is 59596.

It seems running smbcontrol 59596 debug 7 did not have any effect. Here's the last couple of lines of /var/log/samba4/log.smbd - all of it after my issueing the smcontrol debug 7 command.

Code:
[2018/07/30 17:59:23.301178,  2] ../source3/auth/auth.c:314(auth_check_ntlm_password)
  check_ntlm_password:  authentication for user [ss] -> [ss] -> [ss] succeeded
[2018/07/30 18:00:23.280075,  2] ../source3/smbd/process.c:2878(deadtime_fn)
  Closing idle connection
[2018/07/30 18:00:23.444188,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[accurate]"
[2018/07/30 18:00:23.445223,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[backup]"
[2018/07/30 18:00:23.446196,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[cloudsim]"
[2018/07/30 18:00:23.447126,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[guest]"
[2018/07/30 18:00:23.448297,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[homes]"
[2018/07/30 18:00:23.449328,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[sync]"
[2018/07/30 18:00:23.458654,  2] ../source3/auth/auth.c:314(auth_check_ntlm_password)
  check_ntlm_password:  authentication for user [ss] -> [ss] -> [ss] succeeded
[2018/07/30 18:00:37.036461,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[accurate]"
[2018/07/30 18:00:37.037522,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[backup]"
[2018/07/30 18:00:37.038493,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[cloudsim]"
[2018/07/30 18:00:37.039425,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[guest]"
[2018/07/30 18:00:37.040596,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[homes]"
[2018/07/30 18:00:37.041640,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[sync]"
[2018/07/30 18:00:37.044191,  2] ../source3/auth/auth.c:332(auth_check_ntlm_password)
  check_ntlm_password:  Authentication for user [Sophie] -> [Sophie] FAILED with error NT_STATUS_NO_SUCH_USER, authoritative=1
[2018/07/30 18:00:37.044347,  2] ../auth/auth_log.c:760(log_authentication_event_human_readable)
  Auth: [SMB2,(null)] user [Dell-T1700-2]\[Sophie] at [Mon, 30 Jul 2018 18:00:37.044297 CEST] with [NTLMv2] status [NT_STATUS_NO_SUCH_USER] workstation [DELL-T1700-2] remote host [ipv4:192.168.1.154:64037] mapped to [Dell-T1700-2]\[Sophie]. local host [ipv4:192.168.1.12:445]
[2018/07/30 18:00:37.047976,  2] ../source3/smbd/service.c:338(create_connection_session_info)
  guest user (from session setup) not permitted to access this share (accurate)
[2018/07/30 18:00:37.048066,  1] ../source3/smbd/service.c:521(make_connection_snum)
  create_connection_session_info failed: NT_STATUS_ACCESS_DENIED
[2018/07/30 18:00:37.048881,  2] ../source3/smbd/service.c:338(create_connection_session_info)
  guest user (from session setup) not permitted to access this share (accurate)
[2018/07/30 18:00:37.048957,  1] ../source3/smbd/service.c:521(make_connection_snum)
  create_connection_session_info failed: NT_STATUS_ACCESS_DENIED
[2018/07/30 18:01:23.458137,  2] ../source3/smbd/process.c:2878(deadtime_fn)
  Closing idle connection
[2018/07/30 18:01:23.541661,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[accurate]"
[2018/07/30 18:01:23.542697,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[backup]"
[2018/07/30 18:01:23.543705,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[cloudsim]"
[2018/07/30 18:01:23.544666,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[guest]"
[2018/07/30 18:01:23.545842,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[homes]"
[2018/07/30 18:01:23.546895,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[sync]"
[2018/07/30 18:01:23.556097,  2] ../source3/auth/auth.c:314(auth_check_ntlm_password)
  check_ntlm_password:  authentication for user [ss] -> [ss] -> [ss] succeeded
[2018/07/30 18:01:29.622199,  2] ../source3/smbd/server.c:803(remove_child_pid)
  Could not find child 80116 -- ignoring
[2018/07/30 18:02:23.535067,  2] ../source3/smbd/process.c:2878(deadtime_fn)
  Closing idle connection
[2018/07/30 18:02:23.742423,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[accurate]"
[2018/07/30 18:02:23.743474,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[backup]"
[2018/07/30 18:02:23.744445,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[cloudsim]"
[2018/07/30 18:02:23.745408,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[guest]"
[2018/07/30 18:02:23.746580,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[homes]"
[2018/07/30 18:02:23.747615,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[sync]"
[2018/07/30 18:02:23.756905,  2] ../source3/auth/auth.c:314(auth_check_ntlm_password)
  check_ntlm_password:  authentication for user [ss] -> [ss] -> [ss] succeeded
[2018/07/30 18:03:23.736067,  2] ../source3/smbd/process.c:2878(deadtime_fn)
  Closing idle connection
[2018/07/30 18:03:24.548028,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[accurate]"
[2018/07/30 18:03:24.549084,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[backup]"
[2018/07/30 18:03:24.550053,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[cloudsim]"
[2018/07/30 18:03:24.550977,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[guest]"
[2018/07/30 18:03:24.552151,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[homes]"
[2018/07/30 18:03:24.553184,  2] ../source3/param/loadparm.c:2787(lp_do_section)
  Processing section "[sync]"
[2018/07/30 18:03:24.562504,  2] ../source3/auth/auth.c:314(auth_check_ntlm_password)
  check_ntlm_password:  authentication for user [ss] -> [ss] -> [ss] succeeded


Sophie really is no/has no valid user currently. She started working for the summer here today. Sophie and Sophia ( ss) are two distinct persons.

ss is currently having trouble with the NAS (her computer stalling, not able to connect, etc), and smbstatus shows her UID on the process in question:

smbstatus | grep ss
Code:
59596   ss		   accurate	 SS-X230.office.accu-rate (ipv4:192.168.1.217:64203) SMB3_11		   -
		partial(AES-128-CMAC)
74554   ss		   accurate	 SS-X230.office.accu-rate (ipv4:192.168.1.217:49736) SMB3_11		   -
		partial(AES-128-CMAC)
ss		   74554   SS-X230.office.accu-rate Mon Jul 30 17:11:10 2018 CEST	-			-
ss		   59596   SS-X230.office.accu-rate Mon Jul 30 14:59:12 2018 CEST	-			-
Pid		  Uid		DenyMode   Access	  R/W		Oplock		   SharePath   Name   Time
60253		1007	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/accurate   sync-archiv/projekte/redacted/1-requirements-engineering/Grundrisse   Mon Jul 30 17:56:03 2018
60253		1007	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/accurate   sync-archiv/projekte/redacted/1-requirements-engineering/Grundrisse   Mon Jul 30 17:56:03 2018
60253		1007	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/accurate   sync-archiv/projekte/redacted/3-simulation/3-2-szenario01-besucherfluss   Mon Jul 30 17:56:03 2018
60253		1007	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/accurate   sync-archiv/projekte/redacted/3-simulation/3-2-szenario01-besucherfluss   Mon Jul 30 17:56:03 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing/Pressemitteilungen   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing/Pressemitteilungen   Mon Jul 30 14:59:11 2018
59596		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/home/ss   .   Mon Jul 30 14:59:15 2018
74554		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/home/ss   .   Mon Jul 30 17:11:09 2018
74554		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/home/ss   .   Mon Jul 30 17:11:09 2018
59596		817		DENY_WRITE 0x13019f	RDWR	   LEASE(RWH)	   /mnt/storage/sync   20 Sales & Marketing/messen/~$messe-output.xlsx   Mon Jul 30 14:58:33 2018
59596		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/home/ss   Literatur   Mon Jul 30 14:59:11 2018
59596		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/home/ss   Literatur   Mon Jul 30 14:59:11 2018
59596		817		DENY_WRITE 0x12019f	RDWR	   LEASE(RWH)	   /mnt/storage/sync   20 Sales & Marketing/messen/messe-output.xlsx   Mon Jul 30 14:58:33 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   11 Business Development   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   11 Business Development   Mon Jul 30 14:59:11 2018


This is the other things that process is doing according to smbstatus:

smbstatus | grep 59596
Code:
59596   ss		   accurate	 SS-X230.office.accu-rate (ipv4:192.168.1.217:64203) SMB3_11		   -
		partial(AES-128-CMAC)
sync		 59596   SS-X230.office.accu-rate Mon Jul 30 14:57:29 2018 CEST	-			-
IPC$		 59596   SS-X230.office.accu-rate Mon Jul 30 15:05:30 2018 CEST	-			-
accurate	 59596   SS-X230.office.accu-rate Mon Jul 30 14:59:11 2018 CEST	-			-
ss		   59596   SS-X230.office.accu-rate Mon Jul 30 14:59:12 2018 CEST	-			-
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   10 Research & Development/workshops-konferenzen   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   10 Research & Development/workshops-konferenzen   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing/Pressemitteilungen   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing/Pressemitteilungen   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   10 Research & Development/Softwareentwicklung/produkt/schulung   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   10 Research & Development/Softwareentwicklung/produkt/schulung   Mon Jul 30 14:59:11 2018
59596		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/home/ss   .   Mon Jul 30 14:59:15 2018
59596		817		DENY_WRITE 0x120089	RDONLY	 LEASE(RWH)	   /mnt/storage/sync   50 Finance & Legal/finanzen/redacted/redacted.xlsx   Mon Jul 30 15:05:30 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   10 Research & Development   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   10 Research & Development   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing
 Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing
 Mon Jul 30 14:59:11 2018
59596		817		DENY_WRITE 0x13019f	RDWR	   LEASE(RWH)	   /mnt/storage/sync   20 Sales & Marketing/messen/~$messe-output.xlsx   Mon Jul 30 14:58:33 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing/10 Vorlagen   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing/10 Vorlagen   Mon Jul 30 14:59:11 2018
59596		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/home/ss   Literatur   Mon Jul 30 14:59:11 2018
59596		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/home/ss   Literatur   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   40 Operations & Organisation   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   40 Operations & Organisation   Mon Jul 30 14:59:11 2018
59596		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/accurate   sync-archiv/projekte/1-completed projects/redacted/1-requirements-engineering/1-2-ausgang/e-mails   Mon Jul 30 15:05:31 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing/50 Strategie   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing/50 Strategie   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal/finanzen   Mon Jul 30 15:05:28 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal/finanzen   Mon Jul 30 15:05:28 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal/finanzen/redacted   Mon Jul 30 15:05:30 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal/finanzen/redacted   Mon Jul 30 15:05:30 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   akquise/2018/redacted/03-angebot/vertrag   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   akquise/2018/redacted/03-angebot/vertrag   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal/finanzen/redacted/#archiv   Mon Jul 30 15:05:30 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal/finanzen/redacted/#archiv   Mon Jul 30 15:05:30 2018
59596		817		DENY_WRITE 0x12019f	RDWR	   LEASE(RWH)	   /mnt/storage/sync   20 Sales & Marketing/messen/messe-output.xlsx   Mon Jul 30 14:58:33 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   11 Business Development   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   11 Business Development   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   akquise/redacted   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   akquise/redacted   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal/finanzen/controlling   Mon Jul 30 15:05:28 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal/finanzen/controlling   Mon Jul 30 15:05:28 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal/finanzierungen/redacted   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   50 Finance & Legal/finanzierungen/redacted   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   .   Mon Jul 30 15:04:12 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   akquise   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   akquise   Mon Jul 30 14:59:11 2018
59596		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/accurate   sync-archiv/projekte   Mon Jul 30 14:59:11 2018
59596		1012	   DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/accurate   sync-archiv/projekte   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing/infopaket   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   20 Sales & Marketing/infopaket   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   akquise/2018/redacted/03-angebot/angebotsteile-vertrag   Mon Jul 30 14:59:11 2018
59596		817		DENY_NONE  0x100081	RDONLY	 NONE			 /mnt/storage/sync   akquise/2018/redacted/03-angebot/angebotsteile-vertrag   Mon Jul 30 14:59:11 2018
59596		1012	   DENY_NONE  0x80		RDONLY	 NONE			 /mnt/storage/accurate   .   Mon Jul 30 15:00:49 2018


That's that. Next up is truss and tcpdump.
 
Last edited:

fsfsfs

Dabbler
Joined
Feb 5, 2018
Messages
24
truss

Running truss -p 59596 for like 10 minutes while trying to use the NAS from Sophia's machine yields nothing. Just like I had written in my original post. Running the same command with the PID of the smbd process I am connected to from my workstation yields lots of output as soon as I interact with the NAS using windows explorer.

tcpdump

Running tcpdump -i bge0 -p -s 0 -w 2018-07-30_tcpdump_smbd.pcap port 445 and host 192.168.1.217 for a few minutes gave me the pcap file I attached to this message. I had a quick look with wireshark. To me, this looks like a couple of connection attempts that time out.

On the user side

I attached a screenshot of the error message that Sophia gets when she wants to work on the NAS. Looks like a time out to me. (It's German - the last line translates to "The specified network name is no longer available.")

What next? I tried generating a backtrace in gdb - that most often comes up looking like that specific smbd instance is doing something regarding xattrs - to see that, please refer to the attachment section of my original post.
 

Attachments

  • 2018-07-30_tcpdump_smbd.zip
    3.2 KB · Views: 278
  • photo5402571102179404055.jpg
    photo5402571102179404055.jpg
    24.2 KB · Views: 573
Last edited:

fsfsfs

Dabbler
Joined
Feb 5, 2018
Messages
24
From that GDB backtrace I kinda guess this has to do with ACLs. And I was just thinking: I couldn't care less about supporting ACLs. We have only one group, every user is in that group, that's it.

I could try turning off ACLs and their saving in ZFS XATTRs alltogether. I just don't know which things to turn off in which order, or how broken my machine will be afterwards. The machine is in production, after all :cool: ... but... #yolo

On serverfault Andrew Schulmann says it's possible to mount a ZFS file system without nfsv4acls (link). I like that better than trying to somehow recursively automatically delete all ACL info. Maybe that's worth a try? Any experience on this anyone? My googling turned up pretty moot :/
 
Last edited:

fsfsfs

Dabbler
Joined
Feb 5, 2018
Messages
24
Dafuq. Spent a way too long evening in the office trying to upgrade the N54L with an 8g stick from my own workstation to 12g or at least 8g. I tried all different kinds of configurations, but the Microserver wouldn't boot. Only later found out it's not even supposed to digest larger than 4g per slot. It's really a shitshow taking that thing apart and back together I must say. Sorry for not making any progress here @BigDave @Elliot Dierksen .
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
From that GDB backtrace I kinda guess this has to do with ACLs. And I was just thinking: I couldn't care less about supporting ACLs. We have only one group, every user is in that group, that's it.

I could try turning off ACLs and their saving in ZFS XATTRs alltogether. I just don't know which things to turn off in which order, or how broken my machine will be afterwards. The machine is in production, after all :cool: ... but... #yolo

On serverfault Andrew Schulmann says it's possible to mount a ZFS file system without nfsv4acls (link). I like that better than trying to somehow recursively automatically delete all ACL info. Maybe that's worth a try? Any experience on this anyone? My googling turned up pretty moot :/

Don't try to disable the ACLs. The xattr calls in your backtrace are from vfs_streams_xattr or trans2.c (most likely the latter).

Code:
#!/usr/sbin/dtrace -s
pid$1:::entry	
{ @hist[probefunc,probemod] = count ();}

Paste that into a file, make it executable, then run it with the smbd pid as an argument. Let it run for about 10 seconds once it has connected to the process.

Also generate a debug and send it to me via PM.

Is the [sync] share also a btsync jail?
 

ewhac

Contributor
Joined
Aug 20, 2013
Messages
177
Dafuq. Spent a way too long evening in the office trying to upgrade the N54L with an 8g stick from my own workstation to 12g or at least 8g. I tried all different kinds of configurations, but the Microserver wouldn't boot. Only later found out it's not even supposed to digest larger than 4g per slot. It's really a crapshow taking that thing apart and back together I must say. Sorry for not making any progress here @BigDave @Elliot Dierksen .
Off-topic: The N54L can be upgraded beyond 8GiB, but only if you use exactly the right stick(s) of DRAMs: http://n40l.wikia.com/wiki/Memory
 

fsfsfs

Dabbler
Joined
Feb 5, 2018
Messages
24
but only if you use exactly the right stick(s) of DRAMs
Thanks @ewhac! If I get everything to run smoothly, I shall order a pair of 8GiB ECC sticks.

Is the [sync] share also a btsync jail?
Yes, that's what's generating what little CPU load the machine has under normal circumstances. That could be a lead couldn't it? Now that I think of it. btsync probably won't care about ACLs? Maybe it's a good idea to disable ACLs only there? As you can see above, I force the user to a single unix UID there currently already.

Let it run for about 10 seconds once it has connected to the process.
I get this output quickly:
Code:
nas% sudo ./anodos_dtrace.sh 59847
Password:
dtrace: script './anodos_dtrace.sh' matched 33163 probes

and then nothing, even with waiting a couple of minutes. When I interrupt with CTRL+C I get

Code:


that is all.

I am generating the debug now. I guess it contains credentials? We ought to change them afterwards.
 
Status
Not open for further replies.
Top