smbd crashing on heavy use

marvin

Dabbler
Joined
Jan 7, 2014
Messages
13
Hi All, I'm a long time reader of these forums and have found them very useful, but this problem has got me stumped and I can't find any useful threads.

Due to some poor boot media choices, some poor design choices in 9.0 and some bad luck, I had to rebuild my FreeNAS from scratch recently on 11.1 U4 (now upgraded to U5). Unfortunately, since the rebuild, smbd crashes every night during some Windows server backups which use a CIFS share on the NAS. I can also replicate the failure by running the backups adhoc.

I've increased logging levels for Samba and I can see the sessions for backups ticking away for a long while when eventually I get a bunch of access errors prior to the service falling over.

I have a single pool with a number of data sets. From these I'm sharing using CIFS, NFS and AFS. The NAS is joined to my Active Directory domain which it uses for authentication for CIFS.

/var/log/messages just says:
Code:
Jun 25 04:06:36 newunxnas01 kernel: pid 65194 (smbd), uid 0: exited on signal 6 (core dumped)


Here's a snippet from log.smbd just before the service restarts:
Code:
 [2018/06/25 04:06:35.649166,  3] ../source3/param/loadparm.c:1598(lp_add_ipc)
  adding IPC service
[2018/06/25 04:06:35.649187,  3] ../source3/auth/auth.c:189(auth_check_ntlm_password)
  check_ntlm_password:  Checking password for unmapped user [xxx]\[xxx$]@[xxx] with the new password interface
[2018/06/25 04:06:35.649197,  3] ../source3/auth/auth.c:192(auth_check_ntlm_password)
  check_ntlm_password:  mapped user is: [xxx]\[xxx$]@[xxx]
[2018/06/25 04:06:35.650602,  3] ../source3/auth/auth.c:256(auth_check_ntlm_password)
  auth_check_ntlm_password: winbind authentication for user [xxx$] succeeded
[2018/06/25 04:06:35.654123,  3] ../auth/auth_log.c:760(log_authentication_event_human_readable)
  Auth: [SMB2,(null)] user [xxx]\[xxx$] at [Mon, 25 Jun 2018 04:06:35.654112 CEST] with [NTLMv1] status [NT_STATUS_OK] workstation [xxx] remote host [ipv4:10.13.10.10:49993] became [xxx]\[xxx$] [S-1-5-21-889256152-66508950-1569661102-1192]. local host [ipv4:10.13.10.5:445]
[2018/06/25 04:06:35.654146,  2] ../source3/auth/auth.c:314(auth_check_ntlm_password)
  check_ntlm_password:  authentication for user [xxx$] -> [xxx$] -> [xxx\xxx$] succeeded
[2018/06/25 04:06:35.654664,  3] ../auth/ntlmssp/ntlmssp_sign.c:509(ntlmssp_sign_reset)
  NTLMSSP Sign/Seal - Initialising with flags:
[2018/06/25 04:06:35.654681,  3] ../auth/ntlmssp/ntlmssp_util.c:69(debug_ntlmssp_flags)
  Got NTLMSSP neg_flags=0xe2088215
[2018/06/25 04:06:35.654705,  3] ../auth/ntlmssp/ntlmssp_sign.c:509(ntlmssp_sign_reset)
  NTLMSSP Sign/Seal - Initialising with flags:
[2018/06/25 04:06:35.654714,  3] ../auth/ntlmssp/ntlmssp_util.c:69(debug_ntlmssp_flags)
  Got NTLMSSP neg_flags=0xe2088215
[2018/06/25 04:06:35.655203,  3] ../source3/smbd/password.c:144(register_homes_share)
  Adding homes service for user 'xxx\xxx$' using home directory: '/home/xxx/xxx_'
[2018/06/25 04:06:35.657628,  3] ../lib/util/access.c:361(allow_access)
  Allowed connection from xxx (10.13.10.10)
[2018/06/25 04:06:35.657684,  3] ../source3/smbd/service.c:595(make_connection_snum)
  Connect path is '/mnt/cosas/Backups/Backups' for service [Backups]
[2018/06/25 04:06:35.657710,  3] ../source3/smbd/vfs.c:113(vfs_init_default)
  Initialising default vfs hooks
[2018/06/25 04:06:35.657720,  3] ../source3/smbd/vfs.c:139(vfs_init_custom)
  Initialising custom vfs hooks from [/[Default VFS]/]
[2018/06/25 04:06:35.657732,  3] ../source3/smbd/vfs.c:139(vfs_init_custom)
  Initialising custom vfs hooks from [streams_xattr]
[2018/06/25 04:06:35.657741,  3] ../source3/smbd/vfs.c:139(vfs_init_custom)
  Initialising custom vfs hooks from [zfsacl]
[2018/06/25 04:06:35.657751,  3] ../source3/smbd/vfs.c:139(vfs_init_custom)
  Initialising custom vfs hooks from [zfs_space]
[2018/06/25 04:06:35.657760,  3] ../source3/smbd/vfs.c:139(vfs_init_custom)
  Initialising custom vfs hooks from [shadow_copy2]
[2018/06/25 04:06:35.657924,  2] ../source3/smbd/service.c:841(make_connection_snum)
  xxx (ipv4:10.13.10.10:49993) connect to service Backups initially as user xxx\xxx$ (uid=21192, gid=20516) (pid 97507)
[2018/06/25 04:06:35.658657,  1] ../source3/smbd/vfs.c:926(vfs_GetWd)
[2018/06/25 04:06:35.654146,  2] ../source3/auth/auth.c:314(auth_check_ntlm_password)
  check_ntlm_password:  authentication for user [xxx$] -> [xxx$] -> [xxx\xxx$] succeeded
[2018/06/25 04:06:35.654664,  3] ../auth/ntlmssp/ntlmssp_sign.c:509(ntlmssp_sign_reset)
  NTLMSSP Sign/Seal - Initialising with flags:
[2018/06/25 04:06:35.654681,  3] ../auth/ntlmssp/ntlmssp_util.c:69(debug_ntlmssp_flags)
  Got NTLMSSP neg_flags=0xe2088215
[2018/06/25 04:06:35.654705,  3] ../auth/ntlmssp/ntlmssp_sign.c:509(ntlmssp_sign_reset)
  NTLMSSP Sign/Seal - Initialising with flags:
[2018/06/25 04:06:35.654714,  3] ../auth/ntlmssp/ntlmssp_util.c:69(debug_ntlmssp_flags)
  Got NTLMSSP neg_flags=0xe2088215
[2018/06/25 04:06:35.655203,  3] ../source3/smbd/password.c:144(register_homes_share)
  Adding homes service for user 'xxx\xxx$' using home directory: '/home/xxx/xxx_'
[2018/06/25 04:06:35.657628,  3] ../lib/util/access.c:361(allow_access)
  Allowed connection from xxx (10.13.10.10)
[2018/06/25 04:06:35.657684,  3] ../source3/smbd/service.c:595(make_connection_snum)
  Connect path is '/mnt/cosas/Backups/Backups' for service [Backups]
[2018/06/25 04:06:35.657710,  3] ../source3/smbd/vfs.c:113(vfs_init_default)
  Initialising default vfs hooks
[2018/06/25 04:06:35.657720,  3] ../source3/smbd/vfs.c:139(vfs_init_custom)
  Initialising custom vfs hooks from [/[Default VFS]/]
[2018/06/25 04:06:35.657732,  3] ../source3/smbd/vfs.c:139(vfs_init_custom)
  Initialising custom vfs hooks from [streams_xattr]
[2018/06/25 04:06:35.657741,  3] ../source3/smbd/vfs.c:139(vfs_init_custom)
  Initialising custom vfs hooks from [zfsacl]
[2018/06/25 04:06:35.657751,  3] ../source3/smbd/vfs.c:139(vfs_init_custom)
  Initialising custom vfs hooks from [zfs_space]
[2018/06/25 04:06:35.657760,  3] ../source3/smbd/vfs.c:139(vfs_init_custom)
  Initialising custom vfs hooks from [shadow_copy2]
[2018/06/25 04:06:35.657924,  2] ../source3/smbd/service.c:841(make_connection_snum)
  xxx (ipv4:10.13.10.10:49993) connect to service Backups initially as user xxx\xxx$ (uid=21192, gid=20516) (pid 97507)
[2018/06/25 04:06:35.658657,  1] ../source3/smbd/vfs.c:926(vfs_GetWd)
  vfs_GetWd: couldn't stat "." error Permission denied (NFS problem ?)
[2018/06/25 04:06:35.658686,  3] ../source3/smbd/filename.c:1382(get_real_filename_full_scan)
  scan dir didn't open dir [.]
[2018/06/25 04:06:35.658701,  3] ../source3/smbd/smb2_server.c:3115(smbd_smb2_request_error_ex)
  smbd_smb2_request_error_ex: smbd_smb2_request_error_ex: idx[1] status[NT_STATUS_ACCESS_DENIED] || at ../source3/smbd/smb2_create.c:293
[2018/06/25 04:06:36.406792,  3] ../source3/smbd/server.c:868(remove_child_pid)
  ../source3/smbd/server.c:867 Unclean shutdown of pid 65194
[2018/06/25 04:06:36.433500,  1] ../source3/smbd/server.c:877(remove_child_pid)
  Scheduled cleanup of brl and lock database after unclean shutdown


Please let me know if there's anything else of use I can add to this to help give a better picture of the problem.

Any help or advice will be much appreciated!
 

marvin

Dabbler
Joined
Jan 7, 2014
Messages
13
Apologies, it’s:
  • Dell Optiplex 9020
  • Core i7-4790
  • 16 GB RAM
  • onboard intel NIC
  • onboard SATA controller
  • PCIe Syba SI-PEX40064 SATA controller
  • 3 x 8TB Seagate ironwolf HDDs
  • 3 x 4TB Seagate NAS HDDs
  • 2 1 x 100GB SSD
(edited to reflect simplified hardware setup for troubleshooting)
 
Last edited:

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
what is the client? windows server or a desktop?
 

marvin

Dabbler
Joined
Jan 7, 2014
Messages
13
Clients are a Windows 10 and 2 x Windows Server 2012r2. Although I can replicate it by running the backup on any of them independently.
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456

marvin

Dabbler
Joined
Jan 7, 2014
Messages
13
Indeed, the current state of the system is temporary. This was to migrate from the old pool to the new one. I can remove the 3 x 4TB drives and test again. Although, I’d assume having a low ratio of RAM to storage would cause poor performance, not system instability. But I will test it.

FWIW the pool I am using is all on the on board SATA.

I’ll also add that this system was very stable using the same hardware (except the HDDs) on my old 9.0 build.

There’s a couple of things I would be suspect of in your build RE: system stability under any kind of usage.
I’d appreciate if you could elaborate so I can address them.
 

marvin

Dabbler
Joined
Jan 7, 2014
Messages
13
I've tried to simplify conditions and replicate again.
  1. Have torn out 3 x Seagate NAS HDDs
  2. Have removed Syba SATA card
  3. Have replicated the problem with just a single backup from a Windows 10 Machine
This does not look like a hardware issue to me so I've raised Bug #36098.

In the meantime, any further ideas and suggestions are welcome :)
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
sometimes newer software and hardware do not get along. Check your bios and make sure you are not in RAID mode on the SATA controller. Also try disabling any hardware(like sound cards and other things) that you are not going to be using. I bet it is a hardware issue more along compatibility probems....
 

marvin

Dabbler
Joined
Jan 7, 2014
Messages
13
Check your bios and make sure you are not in RAID mode on the SATA controller. Also try disabling any hardware(like sound cards and other things) that you are not going to be using. I bet it is a hardware issue more along compatibility probems....

SATA Mode = already AHCI
Sound card = now disabled
Other things = nothing else to disable

I've also just upgraded my BIOS to the latest version as it was a few years old.

Result.... same behaviour :(

@hescominsoon I know you want to blame my (lame) hardware, but I really think I'm hitting a bug.

This Optiplex was the largest system I could afford...
tumblr_lwwig5xqw21qj4b9to1_500.gif
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
This is mostmlikely a compatibility issue so it mainly your hardware not liking the software. for the server clients try setting up I scsi it's fairly simple and I can post a link after I get home if you want to find out how to do it on both ends. For the clients you might be able to use iscsi there as well I'm just not familiar with the client side file transfers other than SMB and see if that works
 

marvin

Dabbler
Joined
Jan 7, 2014
Messages
13
This is mostmlikely a compatibility issue so it mainly your hardware not liking the software.

I’m not what’s lead you to this conclusion. So far, what I have seen in the smbd logs regarding access errors along with the tests I have done, I’m not seeing any evidence to support your hypothesis.

w/r/t iSCSI, although it may work, it’s also not actually fixing the problem of flakey smb under heavy load.
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
1 gigabit is not a heavy load... Under any circumstance for freenas.
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
I’m not what’s lead you to this conclusion. So far, what I have seen in the smbd logs regarding access errors along with the tests I have done, I’m not seeing any evidence to support your hypothesis.

w/r/t iSCSI, although it may work, it’s also not actually fixing the problem of flakey smb under heavy load.
smb is a Microsoft specific technology...and if i can avoid using the unix version i do it. iscsi is an industry standard easily and well done by both unix and MS. If you want to try to work around your "heavy load" issue try using something that isn't proprietary to Microsoft. BTW I use 11.x U5 right now with a 10 GE fiber connection between my server and my freenas machine. I used it under smb and it was perfectly stable..but not as fast due to it being single threaded(bursts to 2.5 gigabits/sec) plus it isn't really native to unix. ISCSI is rock solid and now my chokepoint is the source server drives themselves(I now average 2.5 gigabits/sec with bursts to 4 from the SSD's). SMB is fine for truly heavy loads but if you want REAL reliability use a block protocol that is native to Unix and Windows.
 

marvin

Dabbler
Joined
Jan 7, 2014
Messages
13
Well that settles it. You’re dadNAS would definitely beat my NAS in a fight :p
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
I am assuming this is sarcasm as this was not a my nas is bigger than yours type of post....:)
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
Just as an FYI to anyone else reading this thread the debug dump was made available in Bug #36098.
it seems that bug has been restricted to private access...which makes sense the debug information could reveal everything about your setup.
 
Last edited:
Top