Anyone using Jumbo MTU on their FreeNAS Network ? SSH Issues ?

Status
Not open for further replies.

Dave Genton

Contributor
Joined
Feb 27, 2014
Messages
133
Wondering if anyone else is having issues with SSH on FreeNAS 9.3 Beta when using jumbo mtu ? My storage network is jumbo mtu's. I have multiple FreeNAS servers setup each with 2 NIC's live, one on management network with 1500 mtu and other on storage network with 9000 mtu. I was setting up replication but find that the ssh key scan fails when using the jumbo mtu network. I try to ssh between the boxes natively via shell and it also fails. If I set the NIC's back to 1500 between 2 of the 3 FreeNAS I can instantly SSH between them and of course complete the ssh key scan instantly. Is there something in SSH config that needs changed when using jumbo mtu or should I "let it slide" being a beta build ?? Being a data center network architect its common practice to use jumbo mtu on any isolated layer 2 vlan where data replication, iSCSI storage, etc. takes place but with freenas I've yet to see why ssh wont work when on 9000 mtu.

Also for FYI, I have been doing HEAVY data replication this week moving large amounts of data mostly just to stress the data center and also freenas 9.3 beta as I'm installing on 3 different networks currently. last night I had continuous replication failures I didn't have the last 3 nights prior on those builds. Again ssh says to be the culprit with either corrupted packet or stream being the error but the only change was the nightly build updates via the automated client. Then finally one of the 3 freenas boxes is sitting at a "db" prompt and it wont recover. Got notified a replication failed after about 400gig or so and in checking found one box console sitting at db>. did a reset, a reboot, nothing brings it back up. Power failed the box, returns to db.

Back to testing the two running boxes and replications appear to be flying by with reporting showing 800mbps network traffic and steady read/writes across all drives and no errors. This is of course after changing both storage nic's to 1500 from 9000 otherwise they dont talk at all. Any advice on box dropping dead to db prompt ?? Been doing this over 20 years and know networking and storage networking inside and out but admit I am not an expert on OS when it comes to Unix/BSD types, not new but not my area of expertise.

Boxes in lab have 16gb ram in 2, 32gb in 3rd, production box borrowed for testing, all use broadcom nic's, core i7 CPU's, all network switches are Cisco Systems being a Cisco Engineer :) So servers are using very little cpu but I can make them use most memory for cache and flood the NIC's between 800-900mbs with zero frame errors on network to be found although strangely detailed analysis is showing majority of frames being quite small in size despite attempts to make segments and frames larger.

Dave
 

Dave Genton

Contributor
Joined
Feb 27, 2014
Messages
133
My final dataset is replicating across the network from freenas 9.3 server 1 to freenas 9.3 server 2 with zero issues described above. Today's build of 9. 3 beta is no longer giving me any ssh errors or replication issues that have been failing lately. Its moving quite efficiently and quickly but I'd like to find out why ssh fails with mtu 9000 enabled on my nic's in freenas if anyone knows how that can be rectified. Thinking it may not be a beta thing but a ssh config thing :)

Dave
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Jumbo frames is what I consider a "dangerous game". And it rarely provides an improvement in performance for a bunch of reasons. If you read around the forums we not only don't recommend anything except the default MTU, but it seems everyone that tries it read the propaganda with jumbo frames and bought it lock, stock and barrel.


Your SSH is failing because your entire pipeline between source and destination doesn't support (or have enabled the frame size you are using). Note that if you set MTU=9000 on an Intel NIC and Realtek NIC, they aren't actually matched. This is why playing with MTUs are playing with fire. Likely to break shit you didn't want to break, and the upside is potentially a 1-3% increase in performance. Big whoop on that increase. The risk just isn't worth the benefit.


Set your MTUs back to default and watch everything "just work" ;)
 

Dave Genton

Contributor
Joined
Feb 27, 2014
Messages
133
[QUOTE="Your SSH is failing because your entire pipeline between source and destination doesn't support (or have enabled the frame size you are using)." [/QUOTE]

After writing a lengthy response which I have since deemed a waste of time I simply ask again for SSH configuration assistance in seeing why the session is failing. My pipeline is NOT the problem being my area of expertise for 23 years now building Enterprise Networks and Data Centers which includes Storage Fabrics. The pipeline can be dropped to 2 identical boxes in same switch, same vlan, same behavior. I cannot revert 2 of 3 boxes running 9.3 back to 9.2.1.9 due to zpool upgrades, nor have the time. 1 was reverted due to uncommented upon crash I had with 9.3 under heaving transfers that dropped OS to the "db>" prompt for which it would never recover, so I reloaded the OS with 9.2.1.9 and since been stable as a target to the 2 9.3 beta boxes. When I build Data Centers or Enterprise wide networks its always done based upon best practice. Some best practices exist despite not having much if any benefit over some other settings but none the less what a manufacturer deems "best practice" is what us Consulting Engineers stand by. Without debating, because trust me I have many times over the years, isolated layer 2 vlans created for iSCSI Storage or Boot networks are created with jumbo mtu enabled for most efficient and optimal use. Yes today mtu is not nearly as much of a concern as in the past when software dictated our routing and switching patterns where today specialized ASIC chips do this work in conjunction with ultra-fast TCAM lookups. Your answer doesn't help me fix my issue with my 9.3 FreeNAS box or SSH but just offers an opinion. In many applications I share your opinion but in many I dont and have seen the data because its my area of expertise and what I do. Wanting to use FreeNAS in my toolbox when testing failover and replication scenarios before handing a new data center network over to the customer I require that not only can I xfer data in like protocols and payloads but also create and hold session. During xfers and while session is established a long documented list of failover and redundancy tests are performed across core and distribution layers of my network. Long story short, a lost frame or packet here and there isn't the end of the world but if I one single time loose that ssh established session its a total failure. I redesign or work with Cisco on code rev, etc. so that I can repeatedly fail that link or port, or line card, or supervisor module, or entire node whatever it may be without loosing session. Using FreeNAS 9.3 already, I want to use them to also create session which can never happen when mtu 9000 is an option on both identical boxes. Syslog is configured and running but I get no outputs as to why when all other devices I bring in can. Not only do I want and need SSH access but I also replicate data which of course uses ssh. Not being the expert on BDS or FreeNAS is there help I that can be provided in the configuration of SSH ?? logging outputs or adding additional logging levels so I can see SSH debug ?? no errors are given, it just wont establish. Any help given would be appreciated and repect your knowledge of BSD/FreeNAS and why I asked, please respect mine in the network I built as its cannot be written off that simply, that's the first thing I thought and disproved.

So for FYI, I did run replication with loads taxing max output of a single NIC and the target box crashed and couldn't be recovered. Each time I got the box to run nearing 100% by replicating multiple data streams to a single 9.3 target it would eventually crash to the "db>" prompt. This final time, never to return with a reload or reboot command. While uncommented I finally reloaded box with 9.2.1.9 and it ran non stop through the night receiving many terabytes of data without a single error. So I have a single concern with 9.3 when taxed hard and will try it again, but ssh I need to fix and see if its ssh service configuration or 9.3 thing. I cannot take another box back to 9.2.1.9 to test ssh with jumbo again, sorry, so working with both beta and ssh conf.

Not be selfish in using FreeNAS as my portable traveling SAN for testing my new networks before the SAN vendor arrives. But working more and more with smaller businesses and local governments that cannot afford EMC or NetApp yet want SAN boot and virtualization technology benefits. Today I am seeing NAS' come out of wood work I haven't heard of and must integrate with them on my network, compute clusters and its a challenge in most cases with limited features. Using FreeNAS I have iSCSI SAN boot from remote LUN as they want plus iSCSI Storage with NFS and CIFS shares at the same time all while performing great at both 1GB and 10GB NIC speeds. These customers can benefit from watching their consultant to final redundancy and failover testing with FreeNAS. I invite them to load in their labs to use because with 9.3 features and performance when it reaches 9.2.1.9 stability will be what tell those customers to look into. TrueNAS will be a much better product than the ones I have seen lately doing their iSCSI and/or NFS shares by far so I'm promoting the product while attempting to get it working properly in 9.3 line which I am enjoying greatly. Hopefully all this will get a response beyond questioning my abilities to build a network or simple storage vlan on a network switch.

d-
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Did you try ssh -v up to ssh -vvv ?

Can you try to see what is going through the wire with an analyzer? I guess you can replicate the issue with only SSH running, so the picture would be clear.

To give us some perspective, could you please post here the output of ssh -V from both 9.2.1.9 and 9.3 systems?
 

Dave Genton

Contributor
Joined
Feb 27, 2014
Messages
133
I have not. Thank you that's exactly the type of thing I was asking for as again BSD is not my expertise. I will try and get in there today and run ssh with those switch options and see what it is saying both at 1500 then again at 9000. Correct on the SSH which is why I spoke of it. Found during replication failures but was using ssh alone to test and troubleshoot. When "mtu 9000" is the option added to each host simply put ssh fails, hangs while attempting to establish session. This is why I moved both boxes to be together on same switch to rule out network in the middle.

Thanks again for the requested ssh input.

dave
 

Dave Genton

Contributor
Joined
Feb 27, 2014
Messages
133
Did you try ssh -v up to ssh -vvv ?

Can you try to see what is going through the wire with an analyzer? I guess you can replicate the issue with only SSH running, so the picture would be clear.

To give us some perspective, could you please post here the output of ssh -V from both 9.2.1.9 and 9.3 systems?

Thanks for the logging/debug switches I was not aware of. I used those while using an analyzer to capture a SPAN session in the switch. I was able to connect without issues unlike 3 days prior even with "mtu 9000" enabled. I tried to replicate and again it was now without issue. I do still see this issue with only one node and each time it times out on same key exchange and SPAN shows re-transmits over and over until timed out. This node however was the same one that failed and dropped to the "db>" prompt last week. So aside from the node with known issues things are progessing well. Nothing was changed as far as network or settings on the FreeNAS boxes but they have had 4 daily updates done since initial failures which are now gone so I'm happy.. Creating a new thread now on one symptom I do see with replications when checkbox is checked for do verbose replication and remove stale snapshot on remote side.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
I'm glad that SSH works for you. However, I do not like when systems and networks do not behave predictably :)
 
Status
Not open for further replies.
Top