smb/cifs suddenly not accessible

Status
Not open for further replies.

Stuart Quimby

Dabbler
Joined
Mar 28, 2014
Messages
13
I have the following server on a medium-size network, regularly accessed by about 50 users:

Build FreeNAS-9.3-STABLE-201502162250
Platform Intel(R) Xeon(R) CPU X3430 @ 2.40GHz
Memory 8156MB

Over the past two days, smb/cifs has simply stopped responding. Clients attempting to connect get a variety of responses (depending on the client OS) that amount to 'timeout on server'. Clients are using the fixed IP of the server for access. Other client-to-client drive mappings (Windows 'mapped network drives') on the same network that don't involve the FreeNAS server are still working. The server is not part of an Active Domain. User rights doesn't seem to be the issue, as even volumes that allow guest access don't respond. It almost seems that the server is firewalled. I've checked all the firewall (Pfsense) rules and there isn't anything that could be affecting this server or the ports it uses.

An nmap scan of the server shows that the port for cifs is open on the server:
08:57:29 > Detected service: 445 (microsoft-ds)

Nothing was changed in the volume options or the CIFS share options, except that I checked the "Bind IP Addresses:" boxes in the CIFS options to see if I could get a response. I'm including screenshots of my CIFS screens (snapshot1 and snapshot2). I ran testparm and see no problems either (file included). In the course of working on this I did a minor upgrade to the latest 9.3 stable, necessitating a reboot. I also tried cycling CIFS off/on.

I've turned full logging for CIFS on and the only thing I can see that looks like a problem is this:
Code:
freenas winbindd[35449]:   sam_rids_to_names: possible deadlock - trying to lookup SID S-1-5-21-4229639678-1894977342-3747114420


I'm also getting a timeout on an NFS share, but that attempt is through a VPN which may be the issue.

Any clues would be appreciated.
 

Attachments

  • snapshot1.png
    snapshot1.png
    32.7 KB · Views: 283
  • snapshot2.png
    snapshot2.png
    37 KB · Views: 297
  • testparm.txt
    7.8 KB · Views: 283
Last edited:
D

dlavigne

Guest
Before the upgrade were you on an earlier 9.3 or still in 9.2.1.x? If you search for "sam_rids_to_names: possible deadlock" at bugs.freenas.org, you'll see that there are a couple of related issues. Do any of them seem to apply to your scenario?
 

Stuart Quimby

Dabbler
Joined
Mar 28, 2014
Messages
13
Before the upgrade were you on an earlier 9.3 or still in 9.2.1.x? If you search for "sam_rids_to_names: possible deadlock" at bugs.freenas.org, you'll see that there are a couple of related issues. Do any of them seem to apply to your scenario?

I was on the 1/29 version of 9.3 (pre-stable, I think). The problem developed a couple of weeks after the 1/29 version was installed, the upgrade to 9.3 Stable was just yesterday. The search in bugs.freenas.org yielded several results, it will take me a few minutes to work my way through them...
 

Stuart Quimby

Dabbler
Joined
Mar 28, 2014
Messages
13
So, I'm not seeing a direct connection between that thread and my issue, but I'm not well informed at this level.
For instance I changed the workgroup name to 'Freenasgrp', and then went to dolphin on a linux box on that network and and attempted to browse smb servers. The new workgroup shows up, but attempting to list servers in that group yields a timeout. Attempting something like smb:\\10.0.1.106 also yields a timeout. The bug you referenced seemed to relate more to user auth being refused than a global absence of the entire CIFS protocol.

I did notice that the SID listed in /var/log/messages matched the output of the 'net groupmap list' minus the -[group ID] appended to the end... with the exception of a the win-style user/group/share that I created earlier today from scratch as a test.

I also created a new NFS share and attempted to /mountaccess that on the linux box, and that protocol is timing out as well. I'll have someone on-site in the next few minutes and we'll test AFP. SSH file access ('fish://' in dolphin) works fine.
 

Stuart Quimby

Dabbler
Joined
Mar 28, 2014
Messages
13
Ok, their may be some connection to the bug you listed, but I don't have a clue what to do about it.
The SID listed when I run 'net getlocalsid' is:
S-1-5-21-1424582618-3971464718-817107718
compared to:
S-1-5-21-4229639678-1894977342-3747114420
which is listed in the /var/log/messages file on the lines with the 'sam_rids_to_names' issue.
I'm gathering that these should be identical and they're not.
Any suggestions on how to fix this?
 

Stuart Quimby

Dabbler
Joined
Mar 28, 2014
Messages
13
Progress report...
I followed through on https://bugs.freenas.org/issues/5828 concerning the 'sam_rids_to_names: possible deadlock', found the 'fixsid.py' program and (after cleaning up my user/groups to get rid of members with conflicting SID's) successfully ran the script and rebooted.

Unfortunately, even though a tail of my /var/log/message file shows no new instances of the deadlock, CIFS services are still invisible to the rest of the network.

Any new ideas would be appreciated. If there's a way to elevate this to professional support, we'd be happy to pay, as several people in the organization haven't been able to do their work for >2 days...

Thanks
 

Stuart Quimby

Dabbler
Joined
Mar 28, 2014
Messages
13
The windows 'map network drive' error that is generated now. The behavior has changed slightly, as it doesn't just spin its wheels and then time-out...
 

Attachments

  • snapshot6.png
    snapshot6.png
    24.5 KB · Views: 405

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You have a failing disk bro...

Feb 19 00:50:08 freenas smartd[2644]: Device: /dev/ada0, 189 Currently unreadable (pending) sectors
Feb 19 00:50:08 freenas smartd[2644]: Device: /dev/ada0, 189 Offline uncorrectable sectors
Feb 19 00:50:08 freenas smartd[2644]: Device: /dev/ada0, previous self-test completed with error (read test element)
Feb 19 00:50:08 freenas smartd[2644]: Device: /dev/ada0, new Self-Test Log error at hour timestamp 26322
Feb 19 01:20:09 freenas smartd[2644]: Device: /dev/ada0, 189 Currently unreadable (pending) sectors
Feb 19 01:20:09 freenas smartd[2644]: Device: /dev/ada0, 189 Offline uncorrectable sectors

What's happened is that disk has torn your pool down to such slow performance that Samba requests are timing out before you can process them. So yeah.. offline that disk (you don't even have to start resilvering a new disk) and you'll probably see the server instantly "just start working properly". You may or may not have to physically remove the disk from the system, but failing disks can (and have) brought pools to their knees.

Seen the problem before. I was gonna search your /var/log/messages for "smartd" but its spammed all over the log file.
 
Last edited by a moderator:

Stuart Quimby

Dabbler
Joined
Mar 28, 2014
Messages
13
Well thanks, cyberjock. I saw the unreadable sectors messages in the logs but figured that the FreeNAS gui would have popped an alert if it was as bad as you say. I'm rsync'ing a full backup before I take that disk offline (one of a 3 disk raid z) because otherwise it might be a bit too risky. After that, I'll pull the disk and replace it. Thanks again for the advice - along with a healthy side-dish of attitude ;-)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Gotta keep it fun in here. ;)

You should setup email monitoring for your server. If you had, you've have been spammed to death with SMART errors and you've have known about the failing disk without needing hookers and blow to tell you so. ;)
 

Stuart Quimby

Dabbler
Joined
Mar 28, 2014
Messages
13
So I off-lined the defective drive (leaving me with a 2 drive pool) and I still have no CIFS access. Waiting for the replacement drive to show up, but I really need to get Samba up and running again. My last pertinent /var/log/messages after a restart of CIFS is:
Code:
Feb 22 13:40:09 freenas winbindd[6390]:   STATUS=daemon 'winbindd' finished starting up and ready to serve connections[ 6389]: list trusted domains

which would lead me to believe that all is happy. Clients are simply timing out, no auth prompt, just spinning wheels, even when using \\IP\share to map...
SSH access to files works fine. But most of the clients on this network are Win with mapped network drives.

Any other ideas of where to look?
 
Status
Not open for further replies.
Top