Multiple interfaces on same subnet

Status
Not open for further replies.

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
So I really don't understand this at all. There is absolutely no technical reason for not being able to have multiple interfaces on the same subnet, regardless of they are serving the same service or not.

I mean really, there's no reason why I can't have two iSCSI initiators in the same subnet for load balancing purposes. It's not anything new, and FreeNAS/TrueNAS is the only products I've come across that denies this. My old EMC VNXe's can do it, and I do just that, same subnet, two IP's bonded to two separate physical NICs connected to two different switches that are not capable of vPC or such.

So why, is this being actively blocked? As a test, I have my main lagg that is static on the subnet. I created a second lagg that is DHCP on the same subnet. At first, I tried to add two "portals", but it wouldn't let me set up two on the same subnet (go figure), so I told a single portal to listen on 0.0.0.0. Go to ESXi, add dynamic discovery for both IP's from both the laggs, and guess what, works perfectly and as expected. No hell portals have opened (didn't expect it to), or any of that vodoo nonsense I've seen mentioned as why "this is a bad idea".

NFS can run the same way, I have, for years, had a load-balancer in front of an NFS server with multiple IP's. Why, for the same reason, because not every network has high end switches that can do vPC's and such, and to balance across two switches you need two interfaces. Why eat a subnet for this when not needed?

Also, why can an entire system only have a single DHCP enabled interface? That's really limiting as well for no reason at all.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
We have a resource that will explain this to you.


old EMC VNXe's can do it, and I do just that, same subnet, two IP's bonded

Some specialized equipment does this because they're based on custom high performance IP stacks that have very limited functionality. Modern UNIX stacks are abstracted so that all the various networking abstractions such as bridges, tunnels, vlans, filtering, forwarding, NAT'ing, etc., are possible. This dictates a different design for modern operating systems.

and FreeNAS/TrueNAS is the only products I've come across that denies this.

Well, that's just not true.

works perfectly and as expected.

It's unlikely that it actually works "as expected." That's usually the pain point in this discussion. Please see the resource above and literally every previous instance of this discussion on the forums.

Also, why can an entire system only have a single DHCP enabled interface? That's really limiting as well for no reason at all.

Because options such as gateways, DNS servers, NTP servers, and other stuff that can be specified as DHCP options are systemwide properties.

If you get two different default gateways from two different networks, and you then "ping 8.8.8.8", which one do you choose?

If you get two different DNS servers, especially where one serves something like Active Directory DNS or ".local" DNS, and you try to look up "mycoolhost.local" out via the OTHER link, you get NXDOMAIN.

The scope of these settings is actually systemwide, not per-interface. Allowing DHCP on multiple interfaces makes the results nondeterministic.
 

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
Now granted, I should’ve said “of all the various NAS devices and software I’ve used over 25 years, FreeNAS/TrueNAS is the only one I’ve had this limitation with”. I will give you that.



As far as working as expected, since all the storage is on a single L2 segment, the gateway wouldn’t be an issue and binding a service to a particular IP/Interface is simple and just works. The iSCSI target is looking for a particular initiator, say 192.168.1.45/24. If there’s an interface that has an IP on that subnet then Bobs your uncle and you are on your way. If that interface has a gateway of 192.168.1.1/24 it doesn’t matter because it’s not the system default gateway. If the system didn’t have an interface with has an IP in that subnet, then there’s no entry in the routing table and the packet will traverse the default gateway, or the 0.0.0.0 route entry. The default gateway, 0.0.0.0, can only go out one interface on a system, nothing changes there, for the most part. If you know routing tables you can have fun. Systems like ESXi allows you to override the gateway for a VMKernel interface, but not the default gateway, which will always be metric 0 and the management network VMKernel will always receive the default gateway traffic. Traffic for segments attached to other VMKernels will go out the appropriate interface.



So really, you don’t assign a gateway when you put in the IP address for an interface in TrueNAS Core or Scale, so by having 5 interfaces on the same network will always use the same gateway. If you set an interface to bind the management console to, then you are saying, in the routing table, that 0.0.0.0 is available via 192.168.1.1 via eth0. Even if eth1,2,3,4 all are in the same subnet, the default route in the kernel routing table states “via eth0” so that’s the interface the traffic will always leave.



The issues I’ve had with all the discussions of the past, and why I posted the question hoping to not just get the blind canned response, is the focus on FreeBSD or other Unix systems, of which TrueNAS Scale is not since it’s a Debian base. The old arguments of FreeBSD don’t all apply, some do though, but really my points above do apply to FreeBSD as well, which is why things like pfSense work so well, it’s all about the routing table.



IP Aliases are a good way to stack multiple IPs on a single interface, but doesn’t resolve the overall issue I bring up.



So yes, there is a single default route in the routing table the defines 0.0.0.0 as going via a specific IP and interface with a metric of 0, but other gateways can be added, they just won’t be the default 0.0.0.0 as there can be only one. If there’s multiple interface all in the same subnet, then the management console/service would have to listen to all those interfaces because the packet will return to the originating interface. I would love to see an advanced option with a huge warning, that would allow me to add two interfaces, with static IPs, on my single storage subnet across two switches. Most iSCSI clients do multipathing very well and, in my case of ESXi, just works very well and exactly as expected. I have two paths, one has active IO and the other does not. You can change the multipath policy to round-robin if you want, but the purpose is for redundancy, not load balancing soo much.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
As noted in the linked article, I've had this discussion at least dozens of times over the years, and I'm documenting trivially provable facts.

[linked article is not valid because of] the focus on FreeBSD or other Unix systems, of which TrueNAS Scale is not since it’s a Debian base.

And yet Debian is not magic and has similar abstractions as other modern operating systems.

The old arguments of FreeBSD don’t all apply,

A few specific facts will change from OS to OS, granted. As a whole, the premise of my linked article is still 100% correct. Let's agree for the moment that I am trying to explain an overall premise and not micro-litigate the correctness of individual points on a per-OS basis.

but really my points above do apply to FreeBSD as well, which is why things like pfSense work so well, it’s all about the routing table.

Yes, but the routing table is used for directly connected hosts too.

So here. I'm going to ask you to do some homework, to repeat on your own network. I'm doing my best to educate you here because you seem to have thought about this a bit more than some of the others I've discussed this with over the years. I'm hoping you can accept correction gracefully. I am delighted to help educate. However as noted in the linked article, it is effing dreary to pointlessly argue people who are cocksure that they're right despite being wrong. So please work with me here if you want to learn something new.

Presented for your consideration is a freshly booted TrueNAS SCALE host, with four ethernet interfaces on a single network. Everything has been left to default, so it did in fact do multiple DHCP, which is arguably a bug which I've also discussed on the forums.

startup.png


And we go out to shell and run "ifconfig | more" which is conveniently sized to get all four interfaces on the screen.

step1.png


And the interesting and useful thing we will look at here are "RX packets" and "TX packets" counts for each interface.

Now we begin. We take a firehose, if you can call a FreeBSD VM on a Raspberry Pi a firehose, and send about a million packets.

step2.png


And we observe on SCALE:

step3.png


And look, about a million more packets are both RX and TX on ens256. That's the result you're expecting.

But now let's firehose ens161, 10.64.70.89. I've cropped the image to avoid confusion.

step4.png


And look at the SCALE-side results:

step5.png


ens161, a million packets received, but, what, only 59 transmitted?!?!? Oh look at that. Return path traffic was via ens256 which now has a million extra packets allocated to it.

Now, that's sufficient to prove my point. It works the way I said it does.

As far as working as expected, since all the storage is on a single L2 segment, the gateway wouldn’t be an issue and binding a service to a particular IP/Interface is simple and just works.

Clearly that's not true, as just demonstrated. This is all within a single L2 segment, sorry about the /23 netmask, there's only so much effort I'm willing to put into this discussion.

Now granted, I should’ve said “of all the various NAS devices and software I’ve used over 25 years, FreeNAS/TrueNAS is the only one I’ve had this limitation with”. I will give you that.

This is also probably not true, since a large majority of NAS devices are based on BSD or Linux kernels. The place where your statement will be true is where someone has used a specialized embedded OS with limited IP stack functionality that is just designed to drive the ethernet ports. I can confirm that a number of these do in fact allow what you're talking about, but when you move on down the line to your EMC StorCenter (linux based) or Synology (also linux based), they work the way I'm explaining to you.

Some of your other ideas about ESXi also appear to be errant but commonly held misconceptions.

Now, personally, this is as boring to me as it was back in 2014 when I first wrote the article for these forums;


The only reason I bothered to do this exercise this morning was because I was vaguely curious if there were any obvious corrections to the article I could make with regard to SCALE. But the article had previously been vetted against Linux networking behaviour anyways.

I have things I need to be moving on to this morning. I hope you can take correction gracefully. Please do go and read the linked article, don't just skim it. And please also take to heart that I was tired of debating people about how this works back in 2014. I am documenting the way things actually work and I really have no bandwidth for debates about how it could work, how it should work, how your NetApp ONTAP actually does do this, or any other pointless arguments. I cannot unilaterally change the way modern IP stacks work, so I do not need convincing of the utility of wanting to do this. Argumentative words are wasted words. It's a fait accompli. I am simply trying to help people understand a frequently-misunderstood aspect of modern networking. I really am trying to help, but you know the old saying about a horse and water. It's your choice whether to drink. I am just another community member like you, I don't get paid to work with TrueNAS or to participate on these forums, so there's a limited amount of effort I can put into this.

That said, even though my frustration is probably bleeding out your screen, I am happy to continue discussions in a productive direction, or at least spitball about concepts and ideas once we agree on facts.
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
To throw a wrench in the discussion, Solaris DOES allow multiple IPs on the same sub-net, over multiple NICs.

However, Sun Microsystems' engineers went through a lot of trouble to make this work. You do HAVE to use an IPMP, (Internet Protocol Multi-Pathing), group in an Active-Active configuration.

I have thoroughly tested this on Solaris 10 over 12 years ago. A server setup as a NAS, with 4 x 1Gbps ports seemed like it would be best to use IPMP for reliability. Turns out that while most clients were NFS, (this was AT Sun Microsystems), their were some Samba shares. So I put them on a different IP and assigned it to the other NIC in the IPMP group. Worked perfectly. Exactly as it should. All traffic, (both send & receive), for Samba went through 1 NIC, and NFS through the other.

To be clear, NAS initiated TCP connections would pick the lower IP for source connections UNLESS told otherwise by the source software. Which most programs don't do. Client initiated connections DO work fine to the proper IP & NIC.


But, @jgreco is correct about FreeBSD / TrueNAS Core. It does not have the extra software to make it work.

Linux is somewhat a crapshoot: Their are quite a few network options that getting one to work as an active-active would be beyond the scope of TrueNAS SCALE. Worse, a kernel update could break such a configuration.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Some of your other ideas about ESXi also appear to be errant but commonly held misconceptions.
Regarding how IP traffic flows outbound from ESXi, it can actually be forcibly changed via gateway overrides and service/port binding; however, this is a "feature" that's made possible by VMware doing naughty things in kernel-space. It isn't expected to extend to consumer OS's, and it also doesn't change the behavior of the routing table on the array's (eg: TrueNAS) side so it needs to have had similar "enhancements" server-side. Clients would be able to get "load-balanced writes" but would have no "load-balanced reads" - and the TrueNAS server would quickly overload whatever the default outbound interface is (ens256) so you'd be effectively bottlenecked by that interface.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
To throw a wrench in the discussion, Solaris DOES allow multiple IPs on the same sub-net, over multiple NICs.

Yes, but not by default. I'm not saying that there aren't implementations that are capable of this sort of screwing around. I'm saying that the common examples people use of "but I know this works because I did it on a {FreeBSD, Linux, Windows, even Solaris} box and it seemed to work" all tend to be easily disproven upon closer inspection, such as I did above.

I can actually make at least some of these things work, sorta, kinda, on various UNIX or UNIX-like systems, but the normal IP stack on most modern systems use the routing table even for directly connected networks (because it's really a performance/complexity tradeoff), and there are also examples of operating systems where receipt of an ARP request on ANY interface for one of the host's IP's will cause that interface's MAC to be returned as an ARP response. This leads to the Kafkaesque situation where you might end up with traffic neither ingressing nor egressing the "expected" interface.

I kinda wish every once in awhile someone would introduce fresh arguments. A decade later, I still don't think I've heard anyone ask about the "-interface" argument to route, which does actually (correctly!) allow DESTINATION-based specification of output interface traffic for a given route. I mean, the issue here in this thread is essentially that none of these things allow SOURCE-based (specifically the emission of traffic with a source address matching a system interface to be emitted from that interface) but there are so many interesting things that could be discussed, or hacked up in ipfw, or coded up in netgraph (where I kinda expect this problem could actually be solved).

Basically this whole topic is just a boring and uninteresting discussion to me. There will be people who are absolutely insistent on doing it the way they were taught by some vendor whose hardware implements a basic IP stack that actually does multiple interfaces this way. Usually this is because many storage people cannot wrap their heads around IP networking, so the simpler, the better. Doesn't make it correct, doesn't mean it works universally. I'm happy if I can disabuse someone of their misconceptions now and then though.
 

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
This actually doesn't really prove your point because I will bet, if you look at your routing table, ens256 is the interface for the default route with a metric of 0. In that case, this makes perfect sense because you are using a ping, which is not going to bind to a specific interface, and trying to demonstrate how a service/process that is bound to an interface will behave, which is to say they will not behave the same in any way, shape, or form. Additionally, ICMP does not replicate a TCP conversation that is involved in iSCSI, which is what the main topic is about, but I can replicate this on many fronts, DNS server, DHCP, etc. So all you actually shown is that ICMP can be received on one interface but will always go out the default interface, which is why ICMP is very limited in actual network troubleshooting and is not the end-all beat-all of network troubleshooting.

Now the default interface is handled pretty similarly in both Windows and Linux. I can't speak to FreeBSD because I do not know it's internals that well, but with the others the default interface use determined by three criteria, first, which one is assigned to the network that I am on (Source IP matching). If none match, then the default route (0.0.0.0) is selected. Then, which one of the remaining interfaces are connected (line protocol up), and finally, which one has the lowest metric. The metric is the "tie breaker" when multiple interfaces exist on the same subnet, and in the case of having multiple interfaces with a default gateway defined, if you do not explicitly state the metrics for each interface, it's literally first come, first serve. So which ever interface is initialized first (and get's DHCP first in your case) will be metric 0. Debian (and derivatives) will assign 100 as the metric on all other interfaces by default. What happens if you have multiple interfaces that meet all three criteria? You get a round-robin event, which can be a lot of fun. I don't think Linux will use the lower IP as the source every time, but then again, that's only been my observation of function, not an absolute statement.

Now, this is the caveat to the discussion that makes your broad statement true; if all the interfaces are ALL on the same subnet, then yes, you will have some goofy asymmetric routing because of that, but in the case of iSCSI, it doesn't matter because iSCSI is multipath aware and can handle having this occur. In other cases, such as OpenSSH Server, you bind it to an interface, and that interface is the only only that will accept incoming connections. As for the outgoing, it's not applicable because it's a constant connection that returns to the same interface and path it's received on, unless you really screw with your routing and force an asymmetric route.

Now, the key to why having multiple interfaces on the same subnet for the purposes of load balancing in a storage network is, the storage subnet being different than the one the console (or default gateway) is using. This is the same as what @Arwen mentions. This means that the first criteria, Source IP <> Subnet matching, will cause one of the storage interfaces to be selected over the default because it was matched first. In the case of multiples, well, if you leave the metric the same, then you get a round-robin event, which is only a concern if you are the client and you are not using a multipath aware protocol. Also, on these interfaces, you don't assign a default gateway, because quite frankly, I wouldn't want my iSCSI traffic going across my router, or even allow the router to see the broadcast domain at all. Yes, Linux (and Windows surprisingly) attach a specific interface to every route entry, which is how this "magic" works.

Observe my routing table below:
ss3.png


Notice how each route entry has a specific interface assigned to it? This is how I know what interface to expect certain traffic on. I don't know if FreeBSD does the same by default, but I know Linux does (case in point) and Windows does as well, to the best of it's ability. BTW, only ens160 is statically assigned, the other 3 are DHCP in my example, which is why he's the default route for the entire system. I have not defined that in my netplan file, but certainly could.

Any traffic destined for the 10.27.204.0/24 subnet will go out either ens224 or ens256, it will not go out ens160 or ens192 UNLESS both the other interfaces are down. Now, since this is iSCSI, my observation has been that the iSCSI initiator/portal is smart enough to know that since the request from the target was received on ens224, it will send the response traffic out that interface. In this case, and I have found nothing to contradict this, the iSCSI server is aware of this and able to attach itself directly to the IP stack and send out the desired interface. Again, this is my observation using Wireshark and having found no documentation to state otherwise, I have to accept it as the expected behavior. I am using the interfaces as examples, so take that with a grain of salt.

You mention that "Some specialized equipment does this because they're based on custom high performance IP stacks that have very limited functionality" which while technically are true, they still use a BSD or Linux kernel behind the scenes. I'm specifically referring to the COTS equipment that enterprises typically will use, which is what my experience is in. Some of these vendors that use basic IP implementations probably wouldn't have hit my radar in my career, so I have no knowledge of what they can and cannot do.

Now, this statement, "Some of your other ideas about ESXi also appear to be errant but commonly held misconceptions", I challenge you to tell me what "ideas" I have that are errant or misconceptions. I have been an ESXi and vSphere SME for a number of years both internally in my company and externally, so please, educate me and the rest of the world on these errant misconceptions. Like @HoneyBadger stated, there are some interesting things that are done in ESXi if you want to have multiple default gateways on a system, but I'm not even touching that here. I welcome the debate because what I have detailed above is exactly how my production environment runs, with absolutely no issues in performance, data degradation, or otherwise anything that says this is a problem. I also run my home network like this to balance across two physical switches, without using vPC, and having no issues because ESXi not only has multipath aware iSCSI, but because I don't have a single flat network. I am absolutely interested to hear this, because there may be something I learn from this, but honestly it is morbid curiosity on my part.

Overall, as a single broad statement, some of your article and what you've said is spot on, but when you start getting into specific use cases, the statement falls apart. I totally agree trying to do this on a single, flat network, is not the smartest idea in the world and things will be wonky, to say the least, but like anything in the IT world, there's not a one-size fits all scenario and blankets statements often lead to limited decision making. After looking back at my original post, I did not detail the separate interfaces on a segregated subnet piece, which honestly probably wouldn't have changed your response based on the "I do not wish to entertain a debate as to whether or not this is right or wrong." statement you make in the article. You are set in the knowledge you have and will not entertain even the possibility of being slightly incorrect or able to learn something new. Honestly, with that statement, why did you even bother to respond to my post, unless you are interested in just trolling folks and not entertaining open discussion that might alter your long held beliefs.

As for my qualifications and background, since you have made some assumptions. Like previously stated, I am a vSphere and ESXi SME, and I'm also a network "engineer" (I hate the legal distinctions with that word), and also manage OS level items in Linux and Windows. I actually am not a "storage person" by trade, to me it's all a network protocol to deal with, I don't deal with "It's a SAS. It's a SATA. It's a NVME..." That's all interface level stuff I care less about. What I care about is talking between the systems, let the storage folks geek out on the interface stuff and let me know the high level.

I see this as being an Advanced Feature that has all the appropriate warnings and the associated "I understand the risks" checkbox before proceeding, but to not even offer it as an option in edge cases is frustrating to say the least, and to have a "I'm right, you're wrong, piss off" stance is really a good way to make folks get defensive.

On the last point in my original post, which hasn't been addressed, why can't I have a 3 DHCP interfaces in different subnets? There's no technical reason behind this either, and could also be an Advanced Setting that we could enable if we want. I guess the main thing is I consider that management interface to be separate and isolated onto a management subnet that no services runs over, maybe I'm the edge case in this, who knows.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
This actually doesn't really prove your point because I will bet, if you look at your routing table, ens256 is the interface for the default route with a metric of 0.

Yes. So? The connected routes have a metric of zero too.

round2a.png


Whee. Who gives a flying f? What happens here is implementation specific, or nondeterministic.

In that case, this makes perfect sense because you are using a ping, which is not going to bind to a specific interface, and trying to demonstrate how a service/process that is bound to an interface will behave, which is to say they will not behave the same in any way, shape, or form.

I see what you're thinking. You're thinking that it's a hidden behaviour and that TCP will somehow magically "bind" to the interface. It doesn't. The user-level layers where IP "binding" happen are are far removed from the guts of the routing engine, which is what decides where packets go. This is part of what I mean when I say that the IP stack on modern systems is highly abstracted.

So let's see.

round2b.png


So let's light it up. I've run "iperf3 -s -B 10.64.70.89" on the TrueNAS host. I've then run "iperf3 -c 10.64.70.89; iperf3 -c 10.64.70.89 -R" to exercise both directions, from a random host.

round2c.png


So your theory is that all of this will have been transmitted via ens161 because of your TCP binding theory. My theory is that it will work the way I say it works.

round2d.png


Survey says ... it ran thru ens256, making you wrong again.

but I know Linux does (case in point)

And we know that I just showed this to be false.

and Windows does as well, to the best of it's ability.

Windows also does not. I even link to the article in my multiple interfaces article, and since you clearly can't be arsed to read it, I'll quote Microsoft for you here:

Assume that the server has to send a packet by using the TCP/IP protocol to a client whose address is 192.168.0.119. This address is located on the local subnet. Therefore, a gateway does not have to be used to reach the client. The protocol stack uses the first route that it finds in the local routing table. Typically, this is the first adapter that was installed.

Overall, as a single broad statement, some of your article and what you've said is spot on, but when you start getting into specific use cases, the statement falls apart.

This is a FreeNAS and now TrueNAS forum. The information I post here is specific to these systems, however, I have provided links to explanations of behaviour on other systems that correct your errant misconceptions as well. The only meaningful use case here is interacting with TrueNAS. If you go get yourself a Chelsio card and set it up in ESXi with a native Chelsio iSCSI initiator connecting to a NetApp SAN, then yes, some of my statements are not applicable because you are no longer dealing with system-level IP stacks. It's still considered bad networking practice to have multiple network interfaces on a single network, but storage admins tend to have difficulty with IP networking, so it has been made to work. The Chelsio/NetApp situation really isn't relevant to TrueNAS. But you might want to listen to what I'm teaching if you want to replace that NetApp with a TrueNAS host.

which honestly probably wouldn't have changed your response based on the "I do not wish to entertain a debate as to whether or not this is right or wrong." statement you make in the article. You are set in the knowledge you have and will not entertain even the possibility of being slightly incorrect or able to learn something new.

That's completely untrue. However, when I speak as a subject matter expert trying to explain issues in simple words to the users here, I am extremely cautious about what I say, and rarely if ever am I shown to be wrong. Having expertise and making correct statements based on that expertise has the marvelous effect of making you right almost all of the time.

My reason for saying "I do not wish to entertain a debate" is because there has been a stream of people who cannot believe that their trite understanding of networking is fundamentally flawed, or wish to engage in debates about how it "should" work, or otherwise waste my time. I don't have some magic lever I can move to a "make it work differently" position, so it is really incumbent on you to listen to the thing I'm saying and take it at face value.

Assuming that I do not want to learn something new would be a significant mistake on your part. I thrive on learning new things. You are confusing that for having a solid command of my facts in an area which I am well versed.

unless you are interested in just trolling folks and not entertaining open discussion that might alter your long held beliefs.

And yet I'm always happy to show my homework, and even to show YOU how to do my homework. This borders on insulting. Beliefs do not have any place in technical discussions where facts can be established.

As for my qualifications and background, since you have made some assumptions. Like previously stated, I am a vSphere and ESXi SME, and I'm also a network "engineer"

Great. So, just to clarify MY background, I'm the guy that's called when "engineers" like you muck it up. For certain classes of things, that is. I don't consider this to be a bad thing. Everyone should know their limits. In IT, people are often pushed past their limits. Those of us who brought the commercial Internet into being have significant experience with that. I admire people who are willing to try to find their way. There was a time "before Google" when it was particularly hard to find good information, and even today, I mean, look at this issue. It's obvious to me, but you're sitting here flailing trying to defend your understanding against what I know to be easily demonstrable truth. I respect that you're trying to understand. I'm also TRYING to help you understand, which is why I've taken extra time out to work with you on this. Because at least you're not mumbling through a mouthful of marbles about vague ideas, unlike so many before you. You have a clear, but incorrect, idea of how this works. I feel like I can work with that.

I see this as being an Advanced Feature that has all the appropriate warnings and the associated "I understand the risks" checkbox before proceeding, but to not even offer it as an option in edge cases is frustrating to say the least,

Yes. Except that it cannot be made to work correctly without significant redesign of the Linux and FreeBSD IP stacks. And iXsystems has been highly resistant to introducing broken functionality into their product. I feel that this is rightly so. You are free to your own opinion.

to have a "I'm right, you're wrong, piss off" stance is really a good way to make folks get defensive.

But while you are free to your own opinion, you are not welcome to make up your own facts. And quite frankly, I have had so many discussions over the years with people who are cocksure of their own errant "facts". Why do you think my stuff is posted here as resources? Because it's wrong? It's been sitting there posted publicly for years and years and years and it's wrong and no one's called me on it? I'm right because I've got the experience, I've done my homework, I've checked it multiple times, and then no one has found technical fault with it. So, I'm right, you're wrong. Please just deal with it. What can I do to get you on track? That is my goal. I can post a resource... heh.

On the last point in my original post, which hasn't been addressed, why can't I have a 3 DHCP interfaces in different subnets? There's no technical reason behind this either, and could also be an Advanced Setting that we could enable if we want. I guess the main thing is I consider that management interface to be separate and isolated onto a management subnet that no services runs over, maybe I'm the edge case in this, who knows.

This has been discussed several times in the last week or two, feel free to search the forums. Basically it's an issue with nondeterministic behaviour. A UNIX-like system with a modern IP stack behaves in an indeterminate manner with multiple default routes provided via DHCP, for example. Or what sort of resolver behaviour is correct? Do you forward requests out all interfaces and see what answers return? How do you arbitrate that? libc/resolv.conf doesn't work like that.

DHCP is generally something that shouldn't be used for infrastructure systems like NAS. It creates conundrums. Not always, and not unavoidable ones, because, yes, of course, you could have a network that didn't provide a gateway or DNS servers to the NAS, and of course that would be safe to run alongside another DHCP interface. But in general, DHCP creates fragility in a network by creating a new single point of failure. The classic example would be the data center that can't cold start because the hypervisors and the storage are both waiting for DHCP from DHCP servers that are VM's. Yes I am sure *you* understand that to be really idiotic, but it has actually happened to companies. And of course, if you're operating at hyperscale, with a staffed data center and out of band access to everything, then DHCP might actually be acceptable, except that you've probably implemented some infrastructure-as-code configuration system that does a whole lot more than DHCP anyways.

There's no technical reason behind this either, and could also be an Advanced Setting that we could enable if we want.

At the end of the day, iXsystems is building a product that they sell to their customers, and their customers are generally not using DHCP for infrastructure systems. To actually get multi-interface DHCP right involves considering the multiple defroute issues, which they do actually seem to have given some thought to; you can see some discussion of that in this thread here. I was actually quite impressed since this indicated a level of interest in supporting DHCP that I didn't think existed. I do feel like they might be willing to build on that, so feel free to submit a Jira ticket asking for it as a feature. Who knows. But it seems like a crapton of work for a modest return, honestly. You'd need to indicate which DHCP interface you wanted to allow to provide default route and resolver records from. There may be other less-obvious difficulties as well.

I guess the main thing is I consider that management interface to be separate and isolated onto a management subnet that no services runs over, maybe I'm the edge case in this, who knows.

Not at all. That's certainly BCP. However, it is also wicked difficult to do, because the IP stack of a modern FreeBSD or Linux system is abstracted and allows traffic flows that are undesirable. We've talked about that recently on the forums too. We had a guy who was convinced that VLAN's were the end-all of network segregation and had no idea that modern IP stacks will take a packet regardless of the ingress interface. Makes for horrible security attack surface and very difficult to safely allow an untrusted host L2 access to any network that a NAS is on. I think this discussion may be right up your alley, feel free to check in at

https://www.truenas.com/community/t...-to-avoid-this-issue.94713/page-2#post-672038
 

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
In the attachment of the netstat -rn, that doesn't show route metric, just FYI. The irtt column is "Initial Round Trip Time", which is different than metric, which is why I specifically use route -n so I can see the metric column and not netstat -rn.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Okay, fair enough:

Code:
truenas# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.64.70.1      0.0.0.0         UG    0      0        0 ens256
10.64.70.0      0.0.0.0         255.255.254.0   U     0      0        0 ens256
10.64.70.0      0.0.0.0         255.255.254.0   U     0      0        0 ens161
10.64.70.0      0.0.0.0         255.255.254.0   U     0      0        0 ens224
10.64.70.0      0.0.0.0         255.255.254.0   U     0      0        0 ens192
172.16.0.0      0.0.0.0         255.255.0.0     U     0      0        0 kube-bridge
truenas#


Is it just that I didn't notice that Linux is "special" about the columns it displays in "netstat -rn" and I didn't notice the column heading? Or was there a point you're making that flew over my head? Honestly I'm tired enough it could be either way.
 

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
Not really a point, but clarity. It's interesting that the metric is set to 0 on all the interface gateways, which is not the typical behavior of Debian or Ubuntu, so something in the TrueNAS code is definitely setting that. It really doesn't change the behavior you've demonstrated, but if your ens192 and ens224 are on a different subnet and you did the test from that subnet (not over a router), the packets would still go out the proper route, not the default, which was the crux of the point I was making.

Very interesting on the way this is done behind TrueNAS, not what I was expecting, but then again, the codebase is being ran on two vastly different OS bases, so I guess I should expect some oddity and "unexpected" results. I'm guessing netplan isn't used either, just a guess, because that doesn't exist in FreeBSD, so then you would have a fork in the code. At some point, that fork will have to happen, but in Buster, netplan is not yet required. I was honestly surprised when I saw that Scale runs on Linux for this very reason. The only reason I went with Scale is because I despise byhve as a hypervisor.

In the end, I guess burning a couple of 10.0.0.0/8 subnets isn't the worst thing in the world to achieve the same goal, and it would definitely make things much cleaner for sure. In keeping with KISS, keeping it clean and simple is probably the best way.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It's interesting that the metric is set to 0 on all the interface gateways, which is not the typical behavior of Debian or Ubuntu, so something in the TrueNAS code is definitely setting that.

Oh, my. We've now moved onto paranoid delusion grade theories. iXsystems re-coded this so that it doesn't work the normal way, is that the new line? Have to tell you, you're still wrong.

Code:
user@debian-1090-desktop-amd64-lab1:~$ su
Password:
root@debian-1090-desktop-amd64-lab1:/home/user# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

auto ens161
iface ens161 inet dhcp

auto ens192
iface ens192 inet dhcp

auto ens224
iface ens224 inet dhcp

auto ens256
iface ens256 inet dhcp

root@debian-1090-desktop-amd64-lab1:/home/user# /sbin/ifconfig -a
ens161: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.64.70.62  netmask 255.255.254.0  broadcast 10.64.71.255
        inet6 fe80::250:56ff:feab:f4d1  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:ab:f4:d1  txqueuelen 1000  (Ethernet)
        RX packets 3712  bytes 335136 (327.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 43  bytes 5906 (5.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens192: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.64.70.59  netmask 255.255.254.0  broadcast 10.64.71.255
        inet6 fe80::250:56ff:feab:3f51  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:ab:3f:51  txqueuelen 1000  (Ethernet)
        RX packets 3808  bytes 349182 (340.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 54  bytes 7232 (7.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens224: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.64.70.83  netmask 255.255.254.0  broadcast 10.64.71.255
        inet6 fe80::250:56ff:feab:915f  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:ab:91:5f  txqueuelen 1000  (Ethernet)
        RX packets 3756  bytes 341472 (333.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 46  bytes 6330 (6.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens256: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.64.70.51  netmask 255.255.254.0  broadcast 10.64.71.255
        inet6 fe80::250:56ff:feab:2dbd  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:ab:2d:bd  txqueuelen 1000  (Ethernet)
        RX packets 4741  bytes 1298122 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 608  bytes 56035 (54.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 8  bytes 480 (480.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 480 (480.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@debian-1090-desktop-amd64-lab1:/home/user# /sbin/route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.64.70.1      0.0.0.0         UG    0      0        0 ens256
10.64.70.0      0.0.0.0         255.255.254.0   U     0      0        0 ens256
10.64.70.0      0.0.0.0         255.255.254.0   U     0      0        0 ens192
10.64.70.0      0.0.0.0         255.255.254.0   U     0      0        0 ens224
10.64.70.0      0.0.0.0         255.255.254.0   U     0      0        0 ens161
root@debian-1090-desktop-amd64-lab1:/home/user#


So there we have it. A default Debian 10.9.0, all DHCP, all on the same network, metrics all zero. iXsystems did NOT rewrite Linux DHCP or Linux IP networking. They have always used available bits to the maximum extent possible. This is not some insidious plot to subvert this bizarre and broken use case. It's improper IPv4 networking design for a host to have multiple interfaces on a L2 network. Behaviours vary but are often not what you would expect.

So I don't even understand why you're hung up on metrics. If unique metrics were set, that would cause all traffic to flow out a single interface, the one with the lowest metric. That STILL proves my point about multiple interfaces on a single subnet not working "correctly" for the definition of "correctly" that I'm debunking.

It really doesn't change the behavior you've demonstrated, but if your ens192 and ens224 are on a different subnet and you did the test from that subnet (not over a router), the packets would still go out the proper route, not the default, which was the crux of the point I was making.

But that's not the point of this thread. The point of this thread isn't about how things become less-broken when you move closer to a proper network design. I've always advocated proper network design. I'm gleeful to teach proper network design.

You were explaining to me how this was all supposed to work correctly on a single L2 network. My point has always been that this does not work the way people generally expect. UNIX and Linux both egress packets on an interface selected by the routing table, which is inherently based on DESTINATION address, not the SOURCE address people imagine when they think of ethernet interfaces and binding sockets to an IP address that happens to be assigned to one of those ethernet interfaces. I *know* they expect all traffic to/from that bound IP address to egress/ingress via the related physical ethernet interface. The entire point of this lesson is that this doesn't happen, and because it doesn't happen, if you hypothetically have 10 ESXi hosts, and you put 10 gigabit ethernet interfaces in your NAS, numbered 192.168.0.[10...19] on a L2 network and have your first ESXi host connect to .0.10, the second to .0.11, the third to .0.12, etc., people EXPECT that the NAS is going to be transmit traffic to each ESXi through a "dedicated" ethernet port. For ingress traffic to the NAS, this is actually going to happen, since neither FreeBSD nor Debian are overly promiscuous about ARP processing, but for egress traffic, you get the single interface selected by the route table. And so you only get 1Gbps total output from the NAS towards *ALL* of your ESXi's, going over that single selected interface.

This is all stuff I explained back in 2014. It hasn't really changed.
 

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
Actually, "Now, the key to why having multiple interfaces on the same subnet for the purposes of load balancing in a storage network is, the storage subnet being different than the one the console (or default gateway) is using." So I didn't specifically say all interfaces on the same L2 segment, just the storage interfaces on the same L2 segment and management treated on a separate L2. I have several times made that distinction after noticing in my initial post I did not make that distinction unfortunately. I have also agreed with you that if you only have a single L2, then you do not want to do what I propose because you will have major issues.
As for the behavior, yes, traffic from the NAS to ESXi will come out of one of many of the ports in that network segment, and that's ok because, for this use case, iSCSI is fine with that, as long as it goes to the initiator that sent it, which it would.
As for the "conspiracy theory", that's a bit much. I was simply stating that your routing table is not the typical behavior I've seen over the years with Debian and multiple interfaces, where it just shoves them all as metric 0. The behavior I've always seen is reflected in my screenshot of my vanilla system, whereas yours is from TrueNAS, and is different. It's not a stretch for me to hypothesize that there may be some mechanism that is acting different because of TrueNAS, and I think I also correctly surmised that it could be because the codebase is targeting multiple OS's with different behavior.
I have also agreed with you in the end, that having two interfaces in different subnets to split across the switches when vPC is not available is most likely the better and cleaner solution, so why do you insist on taking such as offensive stance towards me? If you are as tired of discussing it as you claim, then just ignore the post, stop responding to it or similar, and move on with your life.
At this point, there is no further benefit from discussing further or even looking at this post, so I will not.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So I didn't specifically say all interfaces on the same L2 segment, just the storage interfaces on the same L2 segment and management treated on a separate L2.

It doesn't matter. N>1 interfaces of a UNIX host on a L2 segment is problematic. I don't care what you do with the OTHER interfaces that might also be on that host. That's a different network(/networks) and a separate problem.

for this use case, iSCSI is fine with that, as long as it goes to the initiator that sent it, which it would.

But that's not fine, because the behaviour people are expecting is that they can get greater-than-single-gigE output onto their storage network. This doesn't work.

I was simply stating that your routing table is not the typical behavior I've seen over the years with Debian and multiple interfaces, where it just shoves them all as metric 0. The behavior I've always seen is reflected in my screenshot of my vanilla system, whereas yours is from TrueNAS, and is different.

Everything I've shown you, including both TrueNAS and generic Debian, set up with zero metrics.

Your screen shot of metrics for 10.27.200.0/24 ALSO show metric 0.


So I don't see a difference and as I said there's no point to this anyways, as traffic will egress just one of them.

so why do you insist on taking such as offensive stance towards me?

Offensive? Because you keep posting factually incorrect statements. I'm sorry if you interpret that as "offensive". Some people are unable to take correction gracefully. My role over my career has often been to come in, establish the actual facts, and tell people the things that they don't want to hear. I am fine with them not liking the facts. I am also fine with my being unable to change facts. I am also used to people arguing the facts. I often make money correcting them, helping the client establish a path forward. I am the guy CTO's go to for a second opinion, when their own team isn't cutting it. I have had both successes and failures at educating people. It goes with the territory.

I basically do the same thing here in the forums, and here I do it for free.

If you are as tired of discussing it as you claim, then just ignore the post, stop responding to it or similar, and move on with your life.
At this point, there is no further benefit from discussing further or even looking at this post, so I will not.

Fine, I'll close the thread then.

People, at the end of the day, the story is as posted over at


The homework proving this for SCALE is enclosed in my posts above. My recommendation is still to move to faster networking rather than trying LACP or multiple network interfaces. Faster networking is much easier with fewer crufty sharp edges in arcane areas of networking knowledge.
 
Status
Not open for further replies.
Top