SMB drops connection

_Chris_

Cadet
Joined
May 19, 2023
Messages
8
Hi,

short time lurker, first time poster.

I built a NAS with the latest version of TrueNAS-SCALE-22.12.2
When I connect to the NAS with my cheese grater MacPro the connection randomly drops. I've read that setting the net.inet.tcp.reass.maxqueuelen to 1436 or even 16384 solves the problem.

When I try it in System Settings > Advanced > Sysctl, then I always get the error message "Sysctl 'net.inet.tcp.reass.maxqueuelen' does not exist in kernel."

I'm not a Linux Pro... so any detailed description on how to do it would be highly appreciated.

How do I set that value correctly?

Thanks for any help!

---

My hardware:
Former repurposed 19" rack mount videowall controller
Mainboard: Asus P6T7 WS SuperComputer
CPU: Xeon X5690 @ 3.47 GHz
RAM: 24 GB (non ECC)
HD: 6 x 14TB WD Enterprise drives (RAIDZ2)
Bootdrive: Samsung SSD 850 EVO 256GB
EVGA 650 GT Power Supply

3 Corsair RGB fans (No, I will not apologize for that)
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
You're trying to apply a FreeBSD kernel config to Linux. The result is expected since you're running SCALE which uses Linux kernel.
 

_Chris_

Cadet
Joined
May 19, 2023
Messages
8
Ok. That explains that.

I'm trying to copy my files from the old NAS to the new one. I have it all mounted. I grab a folder with about 5000 files in it (190 GB) and drop it in in the new NAS. Sometimes it works, and sometimes it randomly stops after 20 minutes or so with an error message. When this happens, then I have to unmount the NAS and mount it again in order to get access to the NAS again.

What could be the problem?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Mainboard: Asus P6T7 WS SuperComputer

It's because your ASUS SuckyComputer sports not just one but TWO Realtek 8111 ethernet controllers, for double the suck.

2 x Gigabit LAN Controller(s), Realtek® 8111C


Strongly suggest you acquire an Intel Desktop CT ethernet PCIe card and use that instead.
 

_Chris_

Cadet
Joined
May 19, 2023
Messages
8
Thank you for the info about the Realtek 8111 controller. I know it's not the best thing in the world. That's why I was already testing other controllers...

Here are the configurations that I tested so far over the last 2 days:

Configuration 1:
ASUS SuckyComputer MB using onboard Realtek 8111 Port connected to Netgear Switch
MacPro connected to the same Netgear Switch
Transfer of a 190GB folder with about 5000 files to SuckyComputer

Software: TrueNAS TrueNAS-SCALE-22.12.2
Result: Transfer crashes randomly with error message before transfer is complete

Software: Windows 10, shared a folder to the network so that I can copy stuff on it from the MacPro
Result after multiple attempts: No crash


Configuration 2:

ASUS SuckyComputer MB with Intel Gigabit CT EXPI9301CTBLK PCIe card connected to Netgear Switch
MacPro connected to the same Netgear Switch
Transfer of a 190GB folder with about 5000 files to SuckyComputer

Software: TrueNAS TrueNAS-SCALE-22.12.2
Result: Transfer crashes randomly with error message before transfer is complete

Software: Windows 10, shared a folder to the network so that I can copy stuff on it from the MacPro
Result after multiple attempts: No crash


Configuration 3:

ASUS SuckyComputer MB with TRENDnet TEG-10GECTX PCIe card connected to Netgear Switch
MacPro connected to the same Netgear Switch
Transfer of a 190GB folder with about 5000 files to SuckyComputer

Software: TrueNAS TrueNAS-SCALE-22.12.2
Result: Transfer crashes randomly with error message before transfer is complete

Software: Windows 10, shared a folder to the network so that I can copy stuff on it from the MacPro
Result after multiple attempts: No crash


So, I see this pattern:
- TrueNAS does not work. The longest transfer I got was about 40 minutes long. Then the SMB share dropped and the file transfer failed.
- Stupid old Windows 10 always worked. Not a single drop. I was able to send that 190GB folder multiple time to Windows.

I did not test TrueNAS Core so far. I've heard that SCALE is new and not necessarily super stable.

Should I try Core??
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I did not test TrueNAS Core so far. I've heard that SCALE is new and not necessarily super stable.

Should I try Core??
It's worth a try. Why did you start with SCALE anyway? Unless you need the apps or maybe more robust VM support, I really don't see the need for SCALE.

Anyways, even the apps on SCALE are kinda' buggy as hell. I run one just for testing purposes and my Logitech Media Server has been stuck in "deploying" for a couple days now. Seven other apps have update but always fail when I do click upgrade, so they're stuck at whatever version they are now. To be fair, these are TrueCharts which isn't the official repository,but honestly, I find setting up whatever apps I need in CORE jails much more reliable.
 
Last edited:

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Maybe your issue is related to your MacPro capable of grinding only North American cheese?
You should try puting it against Western European World Famous highest quality raw milk cheese.
 

_Chris_

Cadet
Joined
May 19, 2023
Messages
8
Why did you start with SCALE anyway?
SCALE is newer and I thought that with it being at Version 22 already it's a stable and reliable piece of software. I was obviously wrong. SCALE is not reliable and very unstable and should have never been released.

I did another SCALE test over night where I was attempting to copy a single 100GB folder with big files (movies) to the NAS using the Intel NIC I have. Result: 4 attempts, all failed within 10-15 minutes each. SCALE crashed every time. I noticed that the GUI also resets to the login screen exactly when the transfer fails. This would not happen if the NIC was the problem. So I am very sure that this is a software problem with TrueNAS SCALE.

I then plugged in a HD with a Windows 10 installation, shared a folder and starting copying away, which never crashed.

So, my ASUS SuperComputer (yes, I also think that the name is ridiculous) motherboard with (according to jgreco's opinion) "horrible" Realtek NICs isn't too bad after all. This board was running as a server in a datacenter for several years and was only replaced because the customer wanted something newer and faster... and not because the MB was failing.

I will give CORE a shot today.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Version 22

That's the year. 2022.

SCALE crashed every time. I noticed that the GUI also resets to the login screen exactly when the transfer fails. This would not happen if the NIC was the problem. So I am very sure that this is a software problem with TrueNAS SCALE.

As much as I dislike Linux, that seems like it is unlikely. What you describe isn't a crash. Are you configuring your NAS with a static IP address, or are you using something like DHCP? A DHCP lease reissue can cause this sort of thing.
 

_Chris_

Cadet
Joined
May 19, 2023
Messages
8
I configured a static IP, setup in PfSense on a Protectli Vault. No IP conflict possible.

SCALE somehow drops the connection and then reconnects randomly. I was already attempting to time it to see if it happens in a certain interval. But it's totally random. The GUI then defaults to the login screen and any transfer stops.

The entire install is (was) 100% vanilla. No plugins, no anything added onto it.

I just killed the install and installed CORE. Let's see how this is behaving.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I configured a static IP, setup in PfSense on a Protectli Vault.

You set up a static IP for the NAS using pfSense? Or you actually configured the static IP address on the NAS in the Networking configuration for the NAS? Two very different things.
 

_Chris_

Cadet
Joined
May 19, 2023
Messages
8
When I started up SCALE the first time I did not define a static IP. PfSense assigned 192.168.0.70 to TrueNAS. That's where I did my first test to see if it works... and I already saw failed transfers in that configuration.

Then later on I took the MAC address and defined in PfSense a static IP 192.168.0.12 with the same result... failed transfers.

I did not change any Network settings in TrueNAS.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
When I started up SCALE the first time I did not define a static IP. PfSense assigned 192.168.0.70 to TrueNAS. That's where I did my first test to see if it works... and I already saw failed transfers in that configuration.


Okay, that's what I asked. When you say "pfSense assigned", you mean that you are using DHCP. When you use DHCP, it is possible for DHCP to renew the DHCP lease in such a way that the client interface (TrueNAS is the "client" in this case, a DHCP client) gets reconfigured and drops connections. The behaviour you are describing with the interrupted transfers and the logout of the GUI suggests that this may be what is happening.

Then later on I took the MAC address and defined in PfSense a static IP 192.168.0.12 with the same result... failed transfers.

This is ALSO a DHCP assigned address. The fact that you made it a fixed IP doesn't change the way it all works under the sheets.

Please assign a static IP address using the TrueNAS web GUI. Get DHCP out of the mix. DHCP is inappropriate for use with servers.
 

_Chris_

Cadet
Joined
May 19, 2023
Messages
8
Get DHCP out of the mix.

You're making a valid point. My opinion however is that it shouldn't make a difference if the router gives a client an available IP address (DHCP) or a static IP (based on a manually setup MAC address / IP list) or if you setup a static IP on the client and make the router ignore the client. The only thing that could theoretically get in the way the periodic renewal of the IP address lease, which I've however set to 86400 seconds in pfSense.

SCALE was dropping the connection all the time and not just once a day (a day has 86400 seconds). So I am pretty sure that a DHCP lease renewal was the cause for the dropping connection with SCALE. This is why I now deleted SCALE all together and installed CORE to see if CORE works better. And guess what: CORE works.

I have not done any changes on the router, because it's the same ASUS UltraSuperDuperMegaComputer hardware with the wonderfully working Realtek NICs.

I am currently copying the files from my old NAS (WDMyClouldEX4) to the new ASUS server running TrueNAS-13.0-U4 CORE. A few TB are already copies with zero connection drops. SCALE never got even close to that.

pfSense doesn't have a function to forcefully renew a DHCP lease, so I guess I have to wait and see what happens when it gets renewed tomorrow.

I will not test SCALE anymore. It miserably failed on me.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
This is why I now deleted SCALE all together and installed CORE to see if CORE works better. And guess what: CORE works.
I never expected that suggestion to work so well, but I'm glad it did.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,553
I never expected that suggestion to work so well, but I'm glad it did.
There is a edge-case with vfs_io_uring (SCALE) IIRC where if in-flight AIO gets cancelled by the client (possibly due to flakey network), the error handling will basically close the uring causing the connection to de-facto drop. Log messages would probably clarify what is happening. It would probably make more sense to drain the uring queue and re-init it in this case, but I'd probably have to see concrete reproducer and familiarize myself with the relevant APIs.

In Core I wrote the AIO backend to use kevent and FreeBSD's kernel AIO, and don't really have this sort of edge-case in error handling (though there is one pending issue that is getting resolved in U5 related to how we handle draining that pending queue during unexpected session drops).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
possibly due to flakey network

So I'm just going to say that in a dozen years doing support here on these forums, the Realtek's, especially the 8111's, have shown over and over to be flaky especially under stress. I'll leave it to the youngsters to analyze the details, but it makes a certain amount of sense that if there's something causing some sort of faux reset or drop, that could be affecting not only the transport connection, but perhaps also the web UI session. The Linux and FreeBSD drivers are known to behave somewhat differently, and the Linux ones are not always better behaved. @anodos has given you a good suggestion that you may wish to follow up on, and I encourage you to do so. You have the attention of someone who does a lot of high quality work with Samba and it could be beneficial to the world if this could be made better somehow.
 

_Chris_

Cadet
Joined
May 19, 2023
Messages
8
To setup the TrueNAS-SCALE-22.12.2 server easily I moved the test hardware all into one spot.

This part of the network looks like this:

One GS108 Netgear switch to which it all is connected:
- Apple 2010 MacPro A1289, Dual Xeon X5690
- Protectli Vault 2410
- ASUS P6T7 WS Supercomputer (it's still a ridiculous name from Asus)

There is one additional cable going from the GS108 switch to another switch, where my APs, printer, TVs, a MacMini for HomeAssistant and everything else is connected.

As I said in a post above, I've tested 3 different NICs on the ASUS computer:
- Both on-board Realtek 8111
- A Intel Gigabit CT EXPI9301CTBLK PCIe card
- A TRENDnet TEG-10GECTX PCIe card

All of the 3 NICs exposed the exact same behavior: Transferring hundreds of Gigs always stopped with an error message sometimes after 10 minutes, sometimes after 40 minutes.

After I deleted SCALE and installed CORE on the exact same SSD and without changing anything else, everything works. No more dropped connections. I copied 2 TB of data today to the ASUS computer without any problems at all. And on a sidenote: The ASUS computer is right now connected to the switch with the on-board Realtek 8111 NIC.

@anodos If you need any more info that can help you finding the problem and improve the product, I am glad to help.
 
Top