SMB transfer fails across subnets

Chem

Cadet
Joined
Feb 2, 2018
Messages
2
Hello all,
I'm not one to post on forums much, although I read plenty of them. I'm out of avenues to test and have made little progress. FreeNAS has been working great, I can get NFS shares to work and iscsi mounts to work fine, SMB is an issue however and my limited experience in FreeNAS and FreeBSD has led me to a wall.

FreeNAS file transfers on the same subnet work perfectly. When I try to transfer data to a FreeNAS SMB share across subnets I get 0x8007003B. It doesn't matter if I transfer to or from, it fails. Transferring on the SAME subnet, everything works great. I've tried adding firewall rules to the top to completely open up every port and every type of traffic to no avail (pfsense). Windows to Windows transfers work fine across subnets, by the way. I tried removing any permissions to see if this was an issue, it doesn't seem to be. I tried enabling NTLMv1 with no luck. I also tried forcing SMBv1, but that fails as well. Windows 7 clients and Windows 10 clients both fail.

If I try transferring Windows -> FreeNAS over a different subnet, it fails outright. If I try transferring FreeNAS -> Windows, it usually runs at full speed for a couple seconds, then fails. However, I was able to see some action in the logs when I try pulling data from FreeNAS. I attached a bit of the log.smb file that shows some seemingly odd behavior. It appears as though the transfer is chugging along happily, then the Samba service somehow restarts completely. Thinking along these lines I looked at the services before and after a file transfer failure:

Before:
Code:
root@NAS:/var/log/samba4 # top | grep smbd 14493 root 1 52 0 169M 141M select 3 0:00 0.39% smbd 14502 root 1 52 0 128M 100M select 3 0:00 0.39% smbd 14501 root 1 48 0 128M 100M select 1 0:00 0.29% smbd


After:
Code:
root@NAS:/var/log/samba4 # top | grep smbd 14531 root 1 20 0 177M 146M select 1 0:01 0.00% smbd 14493 root 1 20 0 169M 141M select 3 0:00 0.00% smbd root@NAS:/var/log/samba4 #


Seems odd to me. I'm getting pretty desperate at this point, any help troubleshooting this would be greatly appreciated! Other posts on similar errors either don't have solutions that work for me, or just don't have solutions. I also attached my smb.conf file.

Currently I have 12-2TB SAS drives setup as six mirrors.
Motherboard X10SLM-F
Platform Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz

Memory 32689MB

Thanks for looking!
 

Attachments

  • SMB.conf.txt
    1.9 KB · Views: 311
  • log.smbd.txt
    13.2 KB · Views: 451

Chem

Cadet
Joined
Feb 2, 2018
Messages
2
Nope. I've poked around a little bit but haven't had a lot of time to work on it. I've had to test some new deployment software for SCCM for an upcoming job at work.

I have a second FreeNAS box (old build) that I hooked up and it has the same issue, so I'm back to thinking problems with my network. I think my next step will be to try a new router (I'll be able to virtualize it with a really, really basic config) and if that doesn't work, I have a different switch I can try to. Right now I'm using a 24port procurve of some sort, but I have an old tp-link laying around somewhere. I can also try putting version 9 back on my old FreeNAS box and see if that works.
 

deadhand

Cadet
Joined
Feb 20, 2018
Messages
4
EDIT: This does indeed appear to have been due to asymmetric routing due to a configuration issue with my pfSense failover cluster.

Hi,

I'm getting this same issue as well, though I'm not using FreeNAS for my file server. I am, however, using pfSense for my firewall / routing.
I thought I'd post here since the issue seems near identical to what you've experienced, though I do believe it's pfSense related, and not specifically Samba (though Samba doesn't appear to handle it as gracefully - whatever's going on).

I've been able to reproduce it in two environments, and while Windows <-> Windows (Both 2012 R2) share transfers don't seem to cut out completely with the network error (0x8007003B), they seem to exhibit the same pattern before 'recovering' (and subsequently transferring at a much faster speed, though still not top speed). If I do transfers between Windows <-> Samba share (in my case on ZoL, and two different configs - one with Winbind, the other with a stripped down, simple user / smbpass auth), they frequently cut out with that same error. It seems much more reproducible with larger files (1+ GB) than smaller files, which often go through successfully. Regardless, this doesn't seem to be related to any specific SMB implementation.

I've also noticed that the performance is rather bad, even when it does work.

Here's some windows <-> windows transfers: (two different transfers, but pattern is identical across both)
[See 'Screenshot #1' in post #10 at later point in thread] (sudden drops to 0 byte/s, stays like this for a few seconds)
[See 'Screenshot #2' in post #10 at later point in thread] (sudden 'recovery')

Note: I have 10G connections between Windows shares as well as Windows and Samba.

If I disable packet filtering on the pfSense, these issues go away. The Samba transfers work at full speed, and the windows transfers don't experience the initial hiccup / recovery. Simply setting rules to pass everything doesn't help, but disabling packet filtering entirely does. (Obviously unacceptable, but this is test env. and not primary firewall)

Here's another windows <-> windows transfer through pfSense, but this time with packet filtering disabled:



In my case, my general configuration is a bit complicated. I have pfSense virtualized as well as in a HA configuration (this is on Hyper-V, unfortunately), but another firewall does NAT. PfSense in my case only does routing / firewall between a set of subnets. (Shutting down the second pfSense instance doesn't seem to improve anything, so I don't think it's related to HA, and I don't believe there are any asymmetric routing issues in my case, either.) The pfSense logs seem clear of any traffic relating to this.

Mod note: Removed imgur links
 
Last edited:
Joined
Dec 29, 2014
Messages
1,135
There are several threads about this kind of issue. Based on both your descriptions (same subnet = good, across subnet = bad), this is almost certainly an asymmetric routing issue. If FreeNAS has IP's on both networks and the client references an IP that is on the other subnet, that is absolutely what is happening. Clients request goes through the firewall, so it sees one side of the conversation. FreeNAS responds back from the directly connected interface on the original subnet of the client. The firewall sees only half the conversation (which it hates), and decides to reset the connection. This type of connection will never ever work unless you turn off features in the firewall that make it reset connections where it only sees part of the traffic. That in effect cripples the firewall, so that isn't a good idea. The really short version (speaking as a full time network person) is firewalls are a bad choice for inter-vlan routing in general.
 

deadhand

Cadet
Joined
Feb 20, 2018
Messages
4
Hi Elliot,

Thanks for the quick response.

I don't believe I have any asymmetric routing issues (Don't have a given machine on more than one subnet at the moment, with the exception of pfSense), but I'll simplify my test environment as much as possible and re-test. It's certainly possible I've missed something.

I have had an asymmetric routing issue in the past, but it was obvious in the firewall logs at the time. Is it possible such a thing would be silent?

Thanks,

Deadhand
 
Joined
Dec 29, 2014
Messages
1,135
I am not familiar with pfSense, but I would certainly think there would be some kind of message indicating that it had reset/terminated the connection.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Please post any images you wish to include in your posts directly to the forum by copy/pasting them into your posts.
 

deadhand

Cadet
Joined
Feb 20, 2018
Messages
4
It does indeed appear to have been an asymmetric routing issue.
I thought I had ruled out the second instance being a potential cause, but apparently not.
The firewall logs on the second instance were populated by block logs that indicate such an issue.

Anyway, in my case it appears the solution was to check off 'Bypass firewall rules on the same interface' in pfSense.

There may have also been some issues with the pfSync LAG between both hosts the pfSense instances reside on, as Microsoft likes to assign the same MAC addresses for virtual management interfaces, the LAG interface, as one of the underlying physical interfaces of the LAG, and then complain about it in the logs.

I was able to change the MAC addresses to unique ones on one host, but after a couple hours I was unable to change it on the second host. (It just ignores the manual assignments... even after reboots...)

I think I'm done troubleshooting this for today, but if I have more detail in the future I can provide more information for those coming from Google search results.

Ericloewe: EDIT: I see they were removed, so I'll post them in now. I'll also update my first post to state that this indeed was an asymmetric routing issue.

EDIT: I get 'uploads are not available' if I try to directly drop the images into the post.
 

deadhand

Cadet
Joined
Feb 20, 2018
Messages
4
Screenshot upload works in a new post:

Screenshot #1:
0byte_transfer.png


Screenshot #2:
recovery.png
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
This is a problem with SMB and NFS itself. Im having the same problem whether its in the same VLAN or across subnets, you will notice if you ping the FreeNAS box via terminal you will get a response when the transfer has dropped.

Annoying because i have two VLANs on my FreeNAS box one for Media and the other for CCTV but its useless as it doesn't work.

Its not a network issue, thats for sure.

Im still on 11.1 because anything newer than 11.1 has issues with LAGG/ 4gig and 8gig links.
 
Top