Scale unable to ping internal network from inside a pod

groenator · Feb 4, 2023

Hi,

Like the title's saying, whenever I am inside a pod, I cannot ping my internal network. And because of this issue, my pod is not able to resolve my internal DNS.

I am using an internal DNS in my home, all my devices and computers, including Scale, can resolve the DNS using my AdguardHome server.

However, when I am inside a pod, for e.g emby, and I want to resolve a DNS inside my home, the pod cannot communicate with my internal DNS server.

Ping from inside the emby pod:

Code:

I have no name!@emby-555474f-5x2gd:/app$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=63 time=1.27 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=63 time=0.562 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=63 time=0.611 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=63 time=0.562 ms
^C
--- 192.168.1.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3039ms
rtt min/avg/max/mdev = 0.562/0.752/1.274/0.301 ms
I have no name!@emby-555474f-5x2gd:/app$ ping 192.168.1.254
PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
^C
--- 192.168.1.254 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2037ms

I have no name!@emby-555474f-5x2gd:/app$ ping 192.168.1.253
PING 192.168.1.253 (192.168.1.253) 56(84) bytes of data.
^C
--- 192.168.1.253 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2050ms

I have no name!@emby-555474f-5x2gd:/app$ ping 192.168.1.60
PING 192.168.1.60 (192.168.1.60) 56(84) bytes of data.
^C
--- 192.168.1.60 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1031ms

Ping from the scale host:

Code:

root@truenas[~]# ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.874 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.22 ms

--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.874/1.047/1.220/0.173 ms
root@truenas[~]# ping 192.168.1.254
PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
64 bytes from 192.168.1.254: icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from 192.168.1.254: icmp_seq=2 ttl=64 time=0.307 ms

--- 192.168.1.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1014ms
rtt min/avg/max/mdev = 0.201/0.254/0.307/0.053 ms

root@truenas[~]# ping 192.168.1.253
PING 192.168.1.253 (192.168.1.253) 56(84) bytes of data.
64 bytes from 192.168.1.253: icmp_seq=1 ttl=64 time=0.157 ms
64 bytes from 192.168.1.253: icmp_seq=2 ttl=64 time=0.250 ms

--- 192.168.1.253 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1032ms
rtt min/avg/max/mdev = 0.157/0.203/0.250/0.046 ms

--- 192.168.1.60 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.089/0.114/0.140/0.025 ms
root@truenas[~]# ping 192.168.1.60
PING 192.168.1.60 (192.168.1.60) 56(84) bytes of data.
64 bytes from 192.168.1.60: icmp_seq=1 ttl=64 time=0.103 ms
64 bytes from 192.168.1.60: icmp_seq=2 ttl=64 time=0.110 ms
64 bytes from 192.168.1.60: icmp_seq=3 ttl=64 time=0.128 ms

The host is able to ping my local devices, but the pods are not.

Anyone encounter this type of issue before? Is there a policy setup within the k3s to stop communicating with anything which is on the 192.168.1.0/24 subnet?

Regards,

NugentS · Feb 7, 2023

I have. I had PBR on my router which grabbed traffic from the NAS and forwarded it to a VPN connection.

One problem, as far as I can tell, is that the kube-router forwards all traffic out of the container network to the default router which is left to redirect the traffic back into the LAN if thats where it was destined. This means that if like me you were preventing the redirect (or rather sending the traffic elsewhere before the redirect) then traffic will never reach the LAN. It also means that the router has to deal with all traffic egressing from the containers whether is going to the other side of the router (which is correct) or the LAN (which isn't)

IX have declined to fix this.

Wether this is what you are seeing is another matter.
From a pod (with ping enabled) can you ping a LAN host?

groenator · Feb 7, 2023

I don't know what really happened with my NAS, when I posted this post I wasn't able to ping anything in my local network, other than my Gateway and my scale host IP. Today, I saw your post, so I went back and checked to see if the issue is the same. Now, I am able to ping all the machines from my network.

The only thing which I changed recently was by creating a new bridge for my second ethernet card and configuring k3s to use this ethernet card instead of my primary card.

At the time when I did this change, I wasn't able to ping anything.

I am very confused, you can see the ping message in the first post, at that time nothing worked.

Anyway, thanks for your input.

I will update this post if anything changes.

groenator · Feb 11, 2023

The issue came back, I restarted the NAS and can't ping my local network from a pod.

Code:

[ERROR] plugin/errors: 2 432904515568677909.1526139681791497665. HINFO: read udp 172.16.9.93:46475->192.168.1.253:53: i/o timeout
[ERROR] plugin/errors: 2 432904515568677909.1526139681791497665. HINFO: read udp 172.16.9.93:55716->192.168.1.254:53: i/o timeout
[ERROR] plugin/errors: 2 432904515568677909.1526139681791497665. HINFO: read udp 172.16.9.93:41012->192.168.1.254:53: i/o timeout
[ERROR] plugin/errors: 2 432904515568677909.1526139681791497665. HINFO: read udp 172.16.9.93:35775->192.168.1.254:53: i/o timeout
[ERROR] plugin/errors: 2 432904515568677909.1526139681791497665. HINFO: read udp 172.16.9.93:58381->192.168.1.254:53: i/o timeout
[ERROR] plugin/errors: 2 432904515568677909.1526139681791497665. HINFO: read udp 172.16.9.93:55704->192.168.1.253:53: i/o timeout
[ERROR] plugin/errors: 2 432904515568677909.1526139681791497665. HINFO: read udp 172.16.9.93:38761->192.168.1.254:53: i/o timeout
[ERROR] plugin/errors: 2 432904515568677909.1526139681791497665. HINFO: read udp 172.16.9.93:36745->192.168.1.254:53: i/o timeout
[ERROR] plugin/errors: 2 432904515568677909.1526139681791497665. HINFO: read udp 172.16.9.93:45694->192.168.1.254:53: i/o timeout

CoreDNS is unable to communicate with my DNS:

Where can I open a support ticket?

NugentS · Feb 11, 2023

You can't is the simple answer
Your support is here on this forum, or on reddit. IX will not respond to support tickets from a non-commercial (paying) customer.

Now if you can prove its a bug - thats a different matter. But this is likley a config issue

From the pod / container - can you run a traceroute? If so can you run the traceroute to the DNS Server (LAN) and post the result please. Also a tracroute to 8.8.8.8 for comparison please.

groenator · Feb 11, 2023

Hi,

I can't trace it from the pod because the tool is not installed. I can ping thou and curl.

Code:

/ $

--- FIRST PING: Able to PING Gateway
/ $ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
64 bytes from 192.168.1.1: seq=0 ttl=42 time=1.082 ms
64 bytes from 192.168.1.1: seq=1 ttl=42 time=0.268 ms
^C
--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.268/0.675/1.082 ms

--- SECOND PING: Able to PING MAIN Interface
/ $ ping 192.168.1.10
PING 192.168.1.10 (192.168.1.10): 56 data bytes
64 bytes from 192.168.1.10: seq=0 ttl=42 time=0.055 ms
64 bytes from 192.168.1.10: seq=1 ttl=42 time=0.085 ms
64 bytes from 192.168.1.10: seq=2 ttl=42 time=0.065 ms
^C
--- 192.168.1.10 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.055/0.068/0.085 ms

Below IP's are from my local DNS server and another VM hosted on scale. I can't ping anything. 

/ $ ping 192.168.1.254
PING 192.168.1.254 (192.168.1.254): 56 data bytes
^C
--- 192.168.1.254 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
/ $ ping 192.168.1.200
PING 192.168.1.200 (192.168.1.200): 56 data bytes
^C
--- 192.168.1.200 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

If you know a toolbox pod I can use to traceroute, let me know. I am going to try finding one.

groenator · Feb 11, 2023

Hi,

Here's my traceroute:

Traceroute from a pod:

Code:

/ # traceroute 192.168.1.254
traceroute to 192.168.1.254 (192.168.1.254), 30 hops max, 46 byte packets
 1  172.20.0.1 (172.20.0.1)  0.018 ms  0.009 ms  0.004 ms
 2  *  192.168.1.1 (192.168.1.1)  0.258 ms  *
 3  *  *  *
 4  *  *  *
 5  *  *  *
 6  *  *  *
 7  *  *  *
 8  *  *  *

Traceroute from the Scale host to the same server:

Code:

root@truenas[~]# traceroute 192.168.1.254
traceroute to 192.168.1.254 (192.168.1.254), 30 hops max, 60 byte packets
 1  192.168.1.254 (192.168.1.254)  0.301 ms  0.497 ms  0.568 ms
root@truenas[~]#

As you can see the host is able to connect to my DNS server, but the pod is not. This DNS server is located on the Scale node.

Traceroute to my second local DNS server from a pod:

Code:

/ # traceroute 192.168.1.253
traceroute to 192.168.1.253 (192.168.1.253), 30 hops max, 46 byte packets
 1  172.20.0.1 (172.20.0.1)  0.020 ms  0.008 ms  0.004 ms
 2  *  *  192.168.1.1 (192.168.1.1)  0.384 ms
 3  *  *  *

Traceroute to my second local DNS server from scale host:

Code:

root@truenas[~]# traceroute 192.168.1.253
traceroute to 192.168.1.253 (192.168.1.253), 30 hops max, 60 byte packets
 1  192.168.1.253 (192.168.1.253)  0.642 ms  0.656 ms  0.640 ms
root@truenas[~]#

Again, the pod is not able to communicate, but the host is able. This DNS server is located on a different machine, outside of scale.

Traceroute google from a pod:

Code:

 # traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 46 byte packets
 1  172.20.0.1 (172.20.0.1)  0.009 ms  0.007 ms  0.005 ms
 2  192.168.1.1 (192.168.1.1)  0.596 ms  *  *
 3  *  *  *
 4  172.20.134.193 (172.20.134.193)  7.832 ms  9.777 ms  13.931 ms
 5  *  *  *
 6  *  *  *
 7  172.20.178.37 (172.20.178.37)  26.042 ms  172.20.178.33 (172.20.178.33)  18.162 ms  172.20.178.37 (172.20.178.37)  19.773 ms
 8  *  *  *
 9  172.20.170.158 (172.20.170.158)  25.807 ms  172.20.103.54 (172.20.103.54)  14.329 ms  172.20.170.154 (172.20.170.154)  20.345 ms
10  185.153.237.152 (185.153.237.152)  17.939 ms  *  *
11  185.153.237.153 (185.153.237.153)  19.895 ms  142.250.162.44 (142.250.162.44)  12.796 ms  185.153.237.155 (185.153.237.155)  12.725 ms
12  *  74.125.242.97 (74.125.242.97)  14.872 ms  *
13  142.250.215.125 (142.250.215.125)  17.158 ms  142.251.52.143 (142.251.52.143)  22.264 ms  dns.google (8.8.8.8)  14.035 ms

Traceroute google from host:

Code:

root@truenas[~]# traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
 1  192.168.1.1 (192.168.1.1)  0.845 ms * *
 2  * * *
 3  172.20.134.193 (172.20.134.193)  18.214 ms  18.205 ms  18.265 ms
 4  * * *
 5  * * *
 6  172.20.178.37 (172.20.178.37)  19.275 ms *  23.274 ms
 7  * * *
 8  172.20.103.54 (172.20.103.54)  17.243 ms 172.20.170.146 (172.20.170.146)  17.245 ms 172.20.170.154 (172.20.170.154)  14.881 ms
 9  * * 185.153.237.152 (185.153.237.152)  19.922 ms
10  142.250.162.44 (142.250.162.44)  16.404 ms 185.153.237.155 (185.153.237.155)  8.421 ms 185.153.238.159 (185.153.238.159)  11.865 ms
11  * * *
12  dns.google (8.8.8.8)  16.374 ms  15.946 ms  15.968 ms

Please let me know if you need any other information.

Regards,

NugentS · Feb 11, 2023

Are you running something like Policy Based routing on the gateway that might be intercepting the packets at the gateway?

Some comments below:
Traceroute from a pod: - note that the routing, despite being on net goes to the defualt gateway (1.1). The packet is leaving the kube-router and going straight to the gateway. This is crappy routing as its relying on the router to redirect back to the correct host.
Traceroute from the Scale host to the same server: - Correct routing - but it would be. The packet leaves the NAS interface and goes straight to the destination.
Traceroute to my second local DNS server from a pod:
Traceroute to my second local DNS server from scale host: - Both these show exactly the same crappy routing issue
Traceroute google from a pod:
Traceroute google from host: - These both work because the traffic is meant to go to the gateway which then redirects the traffic properly out the external port.

Based on the above - the packets are leaving the pod, through the kube-router - being directed to the network gateway and stopping there. Implying that the router may not be redirecting traffic back to the LAN interface. This may be because something is intercepting the traffic or it isn't allowed to redirect for config or security reasons.

What is the router? Make/Model/OS?
Are you running any form of conditional routing on the gateway that might grab the traffic. I spotted this due to some PBR rules I was running on my router that was pushing traffic down a VPN.

Other thoughts:
You mentioned a second ethernet card and a bridge. Can you please explain the networking setup on the NAS. What ports are in what bridges and what IP addresses are assigned to what interface (bridge or NIC) - lets make sure you aren't doing anything wrong there. Also your K3S setup please. I am not sure you can do what I suspect you are trying to do - but the situation may have changed with Bluefin from when I tried what I think you are trying (on Angelfish)

Your LAN - 192.168.1.0/24 I assume.
Gateway - 192.168.1.1 - is this hardware or a virtual device on the NAS?
DNS - 192.168.1.254 & 192.168.1.253 - hardware or virtual devices on the NAS?

groenator · Feb 12, 2023

NugentS said:
Are you running something like Policy Based routing on the gateway that might be intercepting the packets at the gateway?

Some comments below:
Traceroute from a pod: - note that the routing, despite being on net goes to the defualt gateway (1.1). The packet is leaving the kube-router and going straight to the gateway. This is crappy routing as its relying on the router to redirect back to the correct host.
Traceroute from the Scale host to the same server: - Correct routing - but it would be. The packet leaves the NAS interface and goes straight to the destination.
Traceroute to my second local DNS server from a pod:
Traceroute to my second local DNS server from scale host: - Both these show exactly the same crappy routing issue
Traceroute google from a pod:
Traceroute google from host: - These both work because the traffic is meant to go to the gateway which then redirects the traffic properly out the external port.

Based on the above - the packets are leaving the pod, through the kube-router - being directed to the network gateway and stopping there. Implying that the router may not be redirecting traffic back to the LAN interface. This may be because something is intercepting the traffic or it isn't allowed to redirect for config or security reasons.

What is the router? Make/Model/OS?
Are you running any form of conditional routing on the gateway that might grab the traffic. I spotted this due to some PBR rules I was running on my router that was pushing traffic down a VPN.

Other thoughts:
You mentioned a second ethernet card and a bridge. Can you please explain the networking setup on the NAS. What ports are in what bridges and what IP addresses are assigned to what interface (bridge or NIC) - lets make sure you aren't doing anything wrong there. Also your K3S setup please. I am not sure you can do what I suspect you are trying to do - but the situation may have changed with Bluefin from when I tried what I think you are trying (on Angelfish)

Your LAN - 192.168.1.0/24 I assume.
Gateway - 192.168.1.1 - is this hardware or a virtual device on the NAS?
DNS - 192.168.1.254 & 192.168.1.253 - hardware or virtual devices on the NAS?

Hi,

I don't control my egress traffic, there's no network Policy which would block egress traffic at all. Last night I even re-initialized the k3s cluster with a different network configuration for the k3s.

The cluster is clean:

Code:

root@truenas[~]# kubectl get netpol -A
No resources found
root@truenas[~]#

I change the Cluster/Service and Gateway CIDR IPs from 172.16.0.0/16 and 172.17.0.0/16 network to 172.20.0.0/16 and 172.21.0.0/16. I thought this might bounce the network stack for k3s and make a clean install. The issue is still the same.

Before I answer your questions, I never had issues with Scale until I upgraded to the latest version. My Scale apps, like emby, were able to connect to my internal network fine.

What is the router? Make/Model/OS?
The router is a HUAWEI 5G CPE Pro with basic DHCP network configuration. No advanced routing configuration is set in this router. I can't even do that, to be honest :)

Other thoughts:
I used to have a second ethernet card connected via USB 3, the card was configured in bridge mode, using DHCP (don't judge me pls, I tried to assign a static IP but the UI didn't let me), anyway. This card is no longer connected in the NAS, I removed it, I thought at first the whole problem is because the k3s is using this card and might encounter some issues with the network. Eventually, I moved the k3s cluster and VMs to my main ethernet card which is configured in bridge mode too.

Your LAN - 192.168.1.0/24 - My Local network subnet
Gateway - 192.168.1.1 - Gateway configured in the NAS, basically, my router IP.
DNS - 192.168.1.254 & 192.168.1.253 - Both virtual appliances, 254 located in the NAS, 253 located on a different VM in a different machine.
Scale IP - 192.168.1.10/24 - The IP of scale host.

What is also strange thou, after I removed the USB network card, and rebooted my NAS everything started to work well. I was able to ping my other devices or resolve the DNS. However, last night I decided to move my NAS to a different location in the house, and when I brought the NAS back, the issue reappeared. I couldn't start some apps (it required resolving the DNS).

I have a basic TP-link switch TL-SG2008. Just because I change the NAS to use a different network cable and different port the cluster broke? This doesn't make any sense at all, as the host is still not affected.

This is the reason why I configured now the NAS with 9.9.9.9 DNS.

As you can see from the screenshots I have a very basic configuration. No extra routing, VPNs, etc configured.

Thanks for your help, and if I need to send you anything else let me know.

Regards,

NugentS · Feb 12, 2023

OK - Comment 1

I do know that for a VM (DNS on NAS) to talk to the NAS - it has to be on a bridge - as you have done. And as far as I can tell correctly. Its done in the same manner as mine - which works despite the crappy routing as the router does redirect back to the LAN. I assume that the opposite applies for the NAS to talk to a VM there has to be a bridge. However as stated this is done correctly (any one else have a look - I don't see an issue)

Comment - 2

From the traceroutes the traffic is leaving the kube-router onto the LAN - but going to the router and then not being directed back. TN is working as IX designed - and they have declined to fix this issue saying its an upstream bug (which I am not convinced by - but have no evidence to rebut)

I notice your location is London - may I ask why you are using a 5G connection rather than a proper cabled ISP - or are you? How is the Huawei cabled or are you using it dual WAN - with an cabled ISP and 5G backup?

groenator · Feb 13, 2023

Hi,

The internet in my area is not that great, I get a 70Mbps speed copper connection if I choose a normal ISP.

The 5G internet is way better, I get speeds up to 500 Mpbs in my area for a smaller price. My local devices are all connected via a LAN cable from my router to my switch. That's all it is, nothing fancy or complicated.

I thought this can be reported as a bug, but since IX is declining to have a look at the network bug, I don't think it matters.

Thank you for your help!

NugentS · Feb 13, 2023

Jira

ixsystems.atlassian.net

thats the bug I reported. NAS-117782

IX's response was: "Please direct this inquiry upstream to kube-router project. If you find a better solution or they give some an answer/guidance, please let us know and we’ll consider it. We, unfortunately, don’t have resources to investigate every diminutive detail of a 3rd party project."

I don't believe this is a kube-router issue - but as said I have nothing to rebut IX's contention with other than vague implications. My router does redirect traffic back to the LAN so it works here. Note that most of my containers are running on a VM running under KVM I only run a few using TN K3S for just this reason as I don't want my firewall getting bombarded with traffic that is destined for the LAN

@groenator I don't see any other explanation for this behaviour based on what you have provided. Does the router provide any advanced routing configuration options? Or are there any annoying security options ticked that may prevent redirects

groenator · Feb 13, 2023

Hi,

Thank you for your support, at least, you are the only one who tried to help me out. I appreciate all your help.

My router doesn't support any advanced routing capabilities.

I don't think there's an issue on the router side, I will try to investigate that part. Before I remove the USB ethernet card, the routing problem was there. Then I removed the USB ethernet and set the DNS back to my local servers. And for some miracle, everything worked fine until I decided to reboot my NAS. After that, the issue came back. The issue is more intermittent than permanent. It's also hard to replicate my issue on other machines.

I am trying to figure out why the issue came back after I reboot my NAS. I usually have my containers in a VM too, but, I have Emby and Unmanic running on scale because these are big applications and they require more resources. I prefer these two applications to use the host resources instead of using the VM ones.

Regards,

NugentS · Feb 13, 2023

Put emby on a VM - it only need resources when scanning the media store.
Unmanic - leave on the NAS directly

groenator · Feb 13, 2023

Thanks, I will plan to do that :)
Probably, do the same for nextcloud, stick it in a VM.

Important Announcement for the TrueNAS Community.

Scale unable to ping internal network from inside a pod

groenator

Dabbler

NugentS

MVP

groenator

Dabbler

groenator

Dabbler

NugentS

MVP

groenator

Dabbler

groenator

Dabbler

NugentS

MVP

groenator

Dabbler

Attachments

NugentS

MVP

groenator

Dabbler

NugentS

MVP

Jira

groenator

Dabbler

NugentS

MVP

groenator

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Scale unable to ping internal network from inside a pod

Dabbler

MVP

Dabbler

Dabbler

MVP

Dabbler

Dabbler

MVP

Dabbler

Attachments

MVP

Dabbler

MVP

Dabbler

MVP

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Scale unable to ping internal network from inside a pod"

Similar threads