Network struggles with scale

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
Hi all,

I've been struggling with getting networking to work properly with TrueNAS SCALE. My goal is to have multiple network interfaces in multiple VLANS for segregation. Just to be clear we're talking different subnets. There are no IPs shared by the same subnets, each VLAN has it's own /24 subnet.

The "issues" I've been running into and question I have, issue does not per see mean that there is something wrong with TrueNAS:

- Just one interface can be configured using DHCP.
- Which gateway takes preference, the DHCP or the standard gateway? When using DHCP it does not set the standard gateway.
- Each interface seems to use the same standard gateway.
- How do you configure a gateway for an interface?
- From what I've read the Kubernetes implementation used can use only one external subnet. Is this indeed true?
- It seems established traffic is being routed back through the interface that has DHCP set. Either my networking knowledge is a lot worse then I though or that's really bad.

The network I'm testing this on allows traffic between all the vlans with a allow all rule.

My envisioned setup would be as follows:
1697977674460.png


en1/2 are 10 Gb/s, en3/4 are 1 Gb/s
en4 would be disabled at the switch level unless there are issues with the other interfaces. Since there are many avenues for issues I decided to simplify.
VLAN1: 192.168.1.42/24
VLAN2: 192.168.2.42/24
VLAN3: 192.168.3.42/24
VLAN4: 192.168.4.42/24
VLAN5, client network accessible through the gateway of each VLAN: 192.168.5.42/24

I've simplified for testing:
1697978029817.png


The WebUI is reachable on BOTH IPs from VLAN5 as long as both Interfaces en1, en4 are connected and active.
- 192.168.2.42 is not reachable if en1 is disconnected.
- The WebUI is not reachable from VLAN5 if en4 is disconnected
- The WebUI is reachable on 192.168.2.42 from a laptop connected to en1 with ip 192.168.2.41 and en4 is disconnected
- The WebUI is reachable on 192.168.2.42 from VLAN2 when en4 is disconnected

This leads me to believe that established traffic is routed back over the default gateway of vlan1 on en4. The weird thing is that en4 seems to keep it's ip configuration eventhough the nic en4 is down, I can see this when I login using the idrac virtual console.

I've tried setting the default gateway to 192.168.2.1 (default gateway for VLAN2), however that does make 192.168.2.42 available outside of VLAN2. It seems to ignore the defaut gateway.

The question is, was TrueNAS SCALE indeed designed to be segregated on a network. As it's fairly standard and TrueNAS is meant to be a enterprise level appliance I would assume so.

How would one go about defining multiple VLANS and routing the network traffic within TrueNAS properly?
 

Attachments

  • 1697977215634.png
    1697977215634.png
    12.8 KB · Views: 47
  • 1697977462705.png
    1697977462705.png
    13.4 KB · Views: 47

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
- Just one interface can be configured using DHCP.
Yes, of course.

- Which gateway takes preference, the DHCP or the standard gateway? When using DHCP it does not set the standard gateway.
If the DHCP server defines a default gateway that takes precedence.

- Each interface seems to use the same standard gateway.
It's called "default gateway" because it is the default of the system, not the interface.

- How do you configure a gateway for an interface?
You can't. There is only a single IP stack and a single routing table.

- From what I've read the Kubernetes implementation used can use only one external subnet. Is this indeed true?
Yes.

- It seems established traffic is being routed back through the interface that has DHCP set. Either my networking knowledge is a lot worse then I though or that's really bad.
That follows directly from the fact that there is only a single global routing table and a single default gateway. Of course traffic originating in a particular VLAN will stay in that VLAN.

You might want to read this resource:
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
Yes, of course.
Thank you for the reply. This is not as logical as you make it seem though ;) It also DOES not function separately from the interface, well not completely. So not so of course as you say.

If the DHCP server defines a default gateway that takes precedence.
For the interface the DHCP is configured on right? A device can have multiple interfaces which should be segregated.

You can't. There is only a single IP stack and a single routing table.
From what I read that is not the case for TrueNAS core. It is also extremely counter intuitive for enterprise level system ESPECIALY a storage system.

That follows directly from the fact that there is only a single global routing table and a single default gateway. Of course traffic originating in a particular VLAN will stay in that VLAN.
What do you mean by the last part, the way I interpret it, it clashes what you've said before. From my testing this also does not seem to be the case.
I hope we can agree that this is choice made by TrueNAS and not a limitation of Kubernetes.

You might want to read this resource:
I already read this. It ends in a sentence that I often miss and am looking for, ut am yet to find on this forum tbh.

"I am happy to discuss ways to do your IP networking within this framework to make all your systems happy though."

So let's continue on that notion. Since you did not react to my goals, just my test conclusions. I assume you understand what I want to achieve and this does not sound out of the ordinary to you?

So how would one achieve proper network segregation as one would with any other enterprise level storage solution or virtualization platform. Which is what TrueNAS appears to be according to: https://www.truenas.com/

I'll add another question I just thought of "what parts are considered, TreuNAS do not manually tweak will break during an update and what can I manually tweak?". I can get what I need working by working around TrueNAS, but not really something I want since it's generally not a good idea. I do want to be able to upgrade at some point after all.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
For the interface the DHCP is configured on right? A device can have multiple interfaces which should be segregated.
Yes, coulda/shoulda/wanna but they are not. Neither in SCALE nor in CORE.

All interfaces run the same single IP stack, routing table, etc.

You can of course use multiple interfaces as layer 2 connections for e.g. jails oder VMs in CORE. The NAS does not even need and frequently does not have an IP address on these. Sort of like a single VLAN vSwitch. That#s probably what you mean by segregation.
Look for bridge interfaces for that.

As soon as the NAS has got an IP address on an interface it's working with the same single IP stack.

I hope we can agree that this is choice made by TrueNAS and not a limitation of Kubernetes.
I guess so. But it's a fact in TrueNAS at the moment. I don't run Kubernetes. I prefer jails for containerization. So much that I run two data centres with more that 1000 of them in total :wink:

Back to segregation - you can layer 2 segregate VMs. And of course you can attach the NAS to several VLANs. And if a host with e.g. 192.168.4.101 in VLAN 4 contacts a service on the NAS at 192.168.4.1 in VLAN 4, these packets nor the replies will go through the default route. It is strictly local traffic to that VLAN. ARP etc. ... neighbor discovery in IPv6.

No if you have a client in VLAN 3 that sends a packet through its default gateway to the TrueNAS in VLAN 4 ... of course the TrueNAS has no other choice but to send the reply to its default gateway. Unless there is a local static route.

You need to keep in mind that TN does strict host based routing based on destination address and nothing else. Destination address on a locally connected network/VLAN --> that way. Destination address on a network where I do not have a direct connection to --> default gateway.

It's neither a router nor a switch nor as capable in terms of running multiple IP stacks with different policiers as e.g. VMware.

You need to carefully plan your network topology to achieve any kind of segregation, e.g. a separate storage network for iSCSI or NFS to some hypervisor.

HTH,
Patrick
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
Yes, coulda/shoulda/wanna but they are not. Neither in SCALE nor in CORE.
Just trying to make sure we understand each other correctly.

You can of course use multiple interfaces as layer 2 connections for e.g. jails oder VMs in CORE. The NAS does not even need and frequently does not have an IP address on these. Sort of like a single VLAN vSwitch. That#s probably what you mean by segregation.
Look for bridge interfaces for that.
Well, yeah that's a way to achieve segregation, layer 2 would indeed be optimal. There are many ways and it all depend on the specific implementation and use cases.

You describe jails, thus I assume you mean in the case of CORE or does this also work for SCALE in combination with VMs/Kubernetes?

I might be ignorant here. Is there a difference between a VLAN and NIC from the TrueNAS perspective? Both are interfaces and I would assume the bridge solution would work for both. Not saying it's something I want to do just curious.

Back to segregation - you can layer 2 segregate VMs. And of course you can attach the NAS to several VLANs. And if a host with e.g. 192.168.4.101 in VLAN 4 contacts a service on the NAS at 192.168.4.1 in VLAN 4, these packets nor the replies will go through the default route. It is strictly local traffic to that VLAN. ARP etc. ... neighbor discovery in IPv6.
This is indeed what I expected. I would've expected that at least established traffic would return to whence it came using layer2. Not sure this might've been a misconception.

You need to keep in mind that TN does strict host based routing based on destination address and nothing else. Destination address on a locally connected network/VLAN --> that way. Destination address on a network where I do not have a direct connection to --> default gateway.
Thank you. You say you have to keep in mind, I would counter with, thank you for adding it to my mind. Haven't read this anywhere yet so unambiguously. This does mean I CAN use VLANS and such even for TreuNAS things, hell I can even add one for my homeIO network. As long as there is a route or it's within the same subnet. For me this will probably be the latter.

It's neither a router nor a switch nor as capable in terms of running multiple IP stacks with different policiers as e.g. VMware.
This is what I'm indeed learning. Which for my use case is probably fine. It is a shame though.

You need to carefully plan your network topology to achieve any kind of segregation, e.g. a separate storage network for iSCSI or NFS to some hypervisor.
This what I'm trying to do, well actually what I did. The plan however, did not fit reality so it's back to the drawing board. I have a setup in mind that might work I'll try to post it tomorrow.

HTH,
Patrick
It's definitely helpful, I feel I'm probably not 100% to a full understanding yet though and I'll have to get my feet wet before. It's been a while since I've done anything networking on this level.

Thanks,

Davy
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I might be ignorant here. Is there a difference between a VLAN and NIC from the TrueNAS perspective? Both are interfaces and I would assume the bridge solution would work for both.
No. Same same. Physical interface, VLAN - just interfaces. You can bridge and route both to your heart's content. I run everything (1x CORE, 1x ESXi, 1x SCALE) in my home lab as VLANs on top of LAGG (or that ESXi proprietary half-assed link aggregation because LACP needs a full vSphere license).

You describe jails, thus I assume you mean in the case of CORE or does this also work for SCALE in combination with VMs/Kubernetes?
The problem is that you can assign the bridge interface or even a couple of them for each jail individually in CORE. While there's only a single assignment for all apps in SCALE. Not claiming this is an inherent property of K8S, but it's a fact in the current implementation of SCALE.

Another reason why I prefer to stick with CORE. All features nice and orthogonal to each other and combinable in whatever ways you choose.

Thank you. You say you have to keep in mind, I would counter with, thank you for adding it to my mind. Haven't read this anywhere yet so unambiguously.
Well, that's essentially routing 101 in the IP model. Routing works based on destination address only since the invention of IP. Anything different is a proprietary feature of a specific platform. TrueNAS uses what run-off-the-mill Linux or FreeBSD bring.

Which for my use case is probably fine. It is a shame though.
But then VMware only connects virtual machines and uses external storage as a client. It does not provide any storage services in itself. Most frequently people complain about a lack of separation of the control plane (web UI) and the storage services. To which one can only answer "deal with it", because that's the state of the software. No separate "Management Network" as in ESXi.
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
The problem is that you can assign the bridge interface or even a couple of them for each jail individually in CORE. While there's only a single assignment for all apps in SCALE. Not claiming this is an inherent property of K8S, but it's a fact in the current implementation of SCALE.
I think SCALE runs k3s not sure though. I indeed already came to that conclusion.

But then VMware only connects virtual machines and uses external storage as a client. It does not provide any storage services in itself. Most frequently people complain about a lack of separation of the control plane (web UI) and the storage services. To which one can only answer "deal with it", because that's the state of the software. No separate "Management Network" as in ESXi.
This is not necessarily true. https://www.vmware.com/products/vsan.html not saying you should use it. I at one used it when it was fairly new, you don't want to know...

To which one can only answer "deal with it", because that's the state of the software. No separate "Management Network" as in ESXi.
For home usage I agree. As TrueNAS is trying to position itself as an enterprise level system I STRONGLY disagree! Security is something to many companies are WAY to lacks in and to many people just give in.

Well, that's essentially routing 101 in the IP model. Routing works based on destination address only since the invention of IP. Anything different is a proprietary feature of a specific platform. TrueNAS uses what run-off-the-mill Linux or FreeBSD bring.
Well I'd disagree on that. It depends, a router on level 3 can and may take other things into consideration and there are ieee standards that describe this behavior. I do not know them by heart ans would have to look them up though.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
For home usage I agree. As TrueNAS is trying to position itself as an enterprise level system I STRONGLY disagree!
An enterprise level storage system does storage. Put it in the isolated storage VLAN and be done with it.
If it's not block storage for hypervisors but a fileserver for your office, put it in the office VLAN and be done with it.

About routing:

(4) The router examines the destination IP address of the IP
datagram, as described in Section [5.2.3], to determine how it
should continue to process the IP datagram. There are three
possibilities:

o The IP datagram is destined for the router, and should be
queued for local delivery, doing reassembly if needed.

o The IP datagram is not destined for the router, and should be
queued for forwarding.

o The IP datagram should be queued for forwarding, but (a copy)
must also be queued for local delivery.
Note the initial "the router examines the destination IP address of the IP datagram".

There is no other data on which to base a forwarding decision but the destination address in a regular router implementation. That was the genius of the design of IP!

Of course all sorts of commercial products include policy routing features. As do Linux and FreeBSD. But no system does any of this by default.
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
I think the original command was based on ARP though which is layer 2. According to this document that would then classify as a bridge. I think we got a bit side tracked. Since me forgetting about arp has little to do with routers.

An enterprise level storage system does storage. Put it in the isolated storage VLAN and be done with it.
If it's not block storage for hypervisors but a fileserver for your office, put it in the office VLAN and be done with it.

About routing:


Note the initial "the router examines the destination IP address of the IP datagram".

There is no other data on which to base a forwarding decision but the destination address in a regular router implementation. That was the genius of the design of IP!

Of course all sorts of commercial products include policy routing features. As do Linux and FreeBSD. But no system does any of this by default.
I disagree with your interpretation though. Yes, routers only use the destination address in the datagram. This is however not the only information it uses to make a decision. The routing table is specified as well.


I will most definitely concede that according to the referenced rfc, which we should take as the truth. In this case if there are two "incoming" interfaces which require different destination routes, this would not be a function of the router. You would instead require a router and routing table per interface.

I as many do not always exactly use the correct term in every situation.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I will most definitely concede that according to the referenced rfc, which we should take as the truth. In this case if there are two "incoming" interfaces which require different destination routes, this would not be a function of the router.
The RFC describes entire systems with possibly hundreds of interfaces. A router is a system that forwards packets and makes forwarding decisions based on layer 3 destination addresses. Pick any shiny new Cisco Nexus whatever for thousands of dollars and it will do exactly that.

The incoming interface as well as the source address are irrelevant for the routing process. How often does one need to repeat this?

You would instead require a router and routing table per interface.
One can wish for many things.

Fact: A TrueNAS host today (both CORE and SCALE) only has one routing table and one instance of what could be called a router.

Conclusion: The observed behavior of a TrueNAS host is perfectly in accordance with all standards and not in any way "messed up" or whatever some people call it.

Conclusion II: One might propose a feature request for policy based routing or isolation of the K3S IP stack/routing table from the hosts's. If the latter is possible in Linux. Chances of ever getting that implemented will depend on the question if iX see a business case, probably.

N.B. Actually CORE has got an advantage here with jails vs. the current state of Docker/containerd on SCALE, because VNET jails do have a completely separate virtual IP stack, can be connected by strictly layer 2 interfaces and communicate completely independent of the host's configuration.
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
As for network for my TrueNAS. I've been contemplating this as well and will be testing the following setup, please let me know if you spot errors:
1698152380239.png

Sorry, for the random iconography. The idea is basically that I have a single ip the management interface is available on. The only thing in this VLAN and subnet is TrueNAS and the firewall. The firewall will then make it available to the rest of the network in the management subnet/VLAN.

It's not the prettiest, but this achieved basically the same thing especially if I make the subnet of vlan666 as small as possible. This will also allow me to connect a laptop to en4 and always be able to access TrueNAS. Well unless I screw up, which is always a possibility!

I'll work this out further and test it when I can get to it, for now work is keeping me distracted ;) I'll post as soon as I have more questions and or results.
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
One can wish for many things.

Fact: A TrueNAS host today (both CORE and SCALE) only has one routing table and one instance of what could be called a router.

Conclusion: The observed behavior of a TrueNAS host is perfectly in accordance with all standards and not in any way "messed up" or whatever some people call it.

Conclusion II: One might propose a feature request for policy based routing or isolation of the K3S IP stack/routing table from the hosts's. If the latter is possible in Linux. Chances of ever getting that implemented will depend on the question if iX see a business case, probably.

N.B. Actually CORE has got an advantage here with jails vs. the current state of Docker/containerd on SCALE, because VNET jails do have a completely separate virtual IP stack, can be connected by strictly layer 2 interfaces and communicate completely independent of the host's configuration.
I thought we were talking of the routing principle in this case and not the specific implementation of TrueNAS (which is not a router in any case as you already stated and I accepted, non verbally).

@Conclusion: Yes, I agree it seems to be in accordance with the design. If the design is sound is another matter and is difficult to determine without knowing what the actual requirements are.

@Conclusion II: Well that is actually not a bad idea. But before I do anything of the kind I would need to be sure what the requirements would and that I understand why things are as they are. There might be good reasons I do not understand ;) Or aren't directly apparent.

@core: Yeah I already gathered that from our first conversations. I'll setup two test systems probably, to play with (one core and one scale). I was indeed assuming I would have this control over layer 2.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
@core: Yeah I already gathered that from our first conversations. I'll setup two test systems probably, to play with (one core and one scale). I was indeed assuming I would have this control over layer 2.
With SCALE only for VMs. With CORE for VMs and jails.

About your design in the post above: looks ok. Two things come to my mind:

1. I vaguely remember that the number of ports in an LACP bundle should be a power of 2. I might remember wrong.

2. If you plan to access apps (K3S, containerd, ...) across VLANs (so I assume with your firewall involved) you might need to NAT to the "app VLAN" to avoid asymmetric routing. Which seems to always start these discussions :wink:

HTH,
Patrick
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
1. I vaguely remember that the number of ports in an LACP bundle should be a power of 2. I might remember wrong.
Not sure tbh. I wanted to post I testested with 1 and that works, but after realizing that's a power of two as well I regretfully have to refrain :P

Currently I have 2 cables so that's a power of two and should be fine regardless.

2. If you plan to access apps (K3S, containerd, ...) across VLANs (so I assume with your firewall involved) you might need to NAT to the "app VLAN" to avoid asymmetric routing. Which seems to always start these discussions :wink:
I was planning to terminate the "main" network in a VLAN. But in hindsight it might be more prudent to use as untagged directly on the LAG and add VLANS for VMs if I need them.

I'll sleep on it and I'll try to update tomorrow.

Which seems to always start these discussions :wink:
Why did I feel a cold chill while reading that? :P
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Not sure tbh. I wanted to post I testested with 1 and that works, but after realizing that's a power of two as well I regretfully have to refrain :P

Currently I have 2 cables so that's a power of two and should be fine regardless.
I read your diagram as three interfaces in that lagg - that was the reason for my comment.
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
Hi, just wanted to add that I have it running now in a configuration that I find workable and wanted to add the things I specifically found for SCALE.

- A dedicated management LAN is possible, although a bit janky:
-> 1. Use a VLAN or physical LAN adapter in TrueNAS
2. Make the network independent from ANY other network using a unused subnet and vlan/phisical wiring. I would suggest a /31, that gives you 2 addresses
3. Use another device to create a NAT or a PAT (I would recommend the latter so only the port you want is available) from your management network to the address of TrueNAS in the /31 network
4. Configure the web interface to be available only on the IP in the /31 subnet for management
+ This has the added benefit that you can connect any device instead of the device that NATs so you have direct access in the same subnet
+ The interface that exposes your SMB and such will then not expose the webui

- If you get Kubernetes networking errors, usualy it'll mention a CIDR mismatch. Go into the settings of Apps and fix the config. For the dockers to become available if the subnet changed you'll need to reboot. Some dockers took a few reboots for some reason. I often find the advice delete everything and recreate everything, ixsystems dataset and all seeems to be unnecesary in most cases actualy.

- Kubernetes cannot be completely separated from TrueNAS itself (with the standard feature set). The Interface you use has to have an IP configured on TrueNAS and TrueNAS requires a CIDR for it to be configured.
- This brings me to my next point. Which still requires more testing. Selecting the Interface and ip 4 gateway in Kubernetes CAN EFFECT the behavior of TrueNAS.
-> A fun test to reproduce these issues is:
1. Setup the default system gateway to 10.6.6.6
2. Make sure you have a interface in this subnet and that you do not have dhcp enabled (have not tested with DHCP yet)
--> I choose bond1 in this case
3. Now go to Apps->Settings and set the interface to say bond2. give it 10.6.6.6 as the gateway. Save.
? Since you set the interface you would assume that this would be handled on layer 2 and it would go straight out that interface. Of course since it's an apps setting, one would also assume that this would not effect the default system behaviour. Don't get me wrong I understand why it breaks, knowing doesn't fix the inherent issue though.
4. TrueNAS is no longer reachable, because it sends all traffic to bond 2, the Asps interface in a completely separate network, which just happens to share the same subnet.
 
Top