System suddenly cannot ping its own IPv4 address

Cheese_Echidna

Dabbler
Joined
Jan 16, 2022
Messages
13
Hi,
Approximately some time between in the last week my server suddenly stopped being able to ping itself; I haven't changed anything in that time.
I noticed because a notification that should have come through did not and after some investigating I found that the bash scripts I had written to send me notifications had also stopped working.

When I went to the shell, I tried to curl my ntfy instance but found that it timed out. I have come to believe that this is not a problem with ntfy.


Here's what I know: If I ping the local IP of the server while on the same network it succeeds.
If I ping the local IP of another device on the network it succeeds.
If from any device I do an nslookup/dig of the servers domain name (hosted by cloudflare and dynamically updated) it returns the correct IPv4 address of my server.
From a remote device (not on the same network) if I try to curl the ntfy instance or ping any of the domains associated, it succeeds.
If from the server itself I try to ping its domain name it fails.
If from the server itself I try to ping its IP address it fails.
If I do a traceroute from the server, it does not do any hops.

Runnuning netstat -r
Code:
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
default         192.168.0.1     0.0.0.0         UG        0 0          0 enp3s0
172.16.0.0      0.0.0.0         255.255.0.0     U         0 0          0 kube-bridge
192.168.0.0     0.0.0.0         255.255.255.0   U         0 0          0 enp3s0


After running df -h the highest Use% was my main pool at 72%.
I should say that recently I did get the notification that said that my drive was 85% full and so I went through and deleted a bunch of files and now it's down to 72%. The hitting 85% did happen before the problem started, but the problem started before I deleted the files.
I can't imagine how that could have caused it though.

Because of this, everything has stopped working because all of my apps use the domain names of each other in their API fields instead of the local IPs, and now none of these are resolving. Does anyone know what might be happening?
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Does rebooting fix it? Also, is there anything suspicious on /var/log/syslog file?
 

nKk

Dabbler
Joined
Jan 8, 2018
Messages
42
If the restart is not fixing the issue:
What is the state of network interfaces, are they UP and running?
You can try to use tcpdump and check if the ping requests from other device reach your server or there is network equipment that answer to ping and ping packets not reach the server.
 

Cheese_Echidna

Dabbler
Joined
Jan 16, 2022
Messages
13
Rebooting does not fix it. :(

I have checked the logs and I don't really know what I'm looking for.
There's quite a lot but nothing too weird, is there anything I could look for?

Looking at tcpdump I can see the pings coming in from other devices on the network to the server's local IP.

I can remember having similar issues where devices on the network would fail to access the server through its domain name, I believe it had to do with my ISP's NAT. I fixed it through a DNS rewrite in AdGuard Home (local DNS running on the server). Can't really do that here because the DNS server is running on the NAS.

However, this problem has never been an issue for me as I have always been able to ping the server from itself using its domain name; no problem.
When I remove my custom DNS server, I can no longer use the public domain names to access the server from my PC.

Do you think this could be caused by my reverse proxy or ISP or DNS?
 

nKk

Dabbler
Joined
Jan 8, 2018
Messages
42
For using names to access network devices you should have working DNS and if you use Windows and have more than one DNS server in network settings wired thinks happen. Some times ping resolve DNS names correctly but other applications can't and some times it's in revers.

Other possible issue using names is to use only host names without the domain part, but on the PC/server domain is not configure:
host1.test.org - Full DNS name (FQDN)
If you try to ping only "host1" and your device are not configure to use "test.org" as domain the name can't be resolved.

To check if the problem is with DNS server you can use hosts file and manually add IP/name pair for server and try to access.

But independent from this you should be able to ping server IP address from the server without problems.

Can you describe you network scheme - IP addresses:
- local IP's and real IP's if you use
- domain names and DNS servers
 

Cheese_Echidna

Dabbler
Joined
Jan 16, 2022
Messages
13
Sure. (But I'm not going to doxx myself)

Locally:
- The server has IP address 192.168.0.100 and is running TrueNAS Scale
- The router has IP address 192.168.0.1 and is responsible for DHCP
- There is an AdGuard Home instance that is set as the primary DNS resolver on all of my devices (except the server) for my home network. This DNS server contains a custom rule to replace all of the outbound traffic to the public domain of the server with its local IP. (This is basically just a band-aid on what might be the same issue, however it never affected the server before)
- When I ping the local IP of the server (192.168.0.100) it succeeds from both my PC and the server

On the net:
- The ports 80 and 443 are forwarded to the server
- Domains are managed by Cloudflare and dynamically updated by the ddns-updater app. As per the app, the has been no change to my public IPv4 address in the last three weeks
- On Cloudflare, A records for the domain show the correct ipv4 address
- Only the two ports are exposed because everything is managed by a Traefik reverse proxy
- If the root domain was server.site.com then the TrueNAS Scale dashboard is exposed at root.server.site.com and my ntfy instance is hosted at ntfy.server.site.com. All publicly facing services are secured with HTTPS.
- When (either from the NAS itself or on my PC after removing my custom DNS server) I ping the public IP of the server it fails
 

nKk

Dabbler
Joined
Jan 8, 2018
Messages
42
Let's summarize:
Internally you can connect to the server using private IP address (192.168.0.100)

Externally you can connect to the server using domain name (root.server.site.com)

Internally you can't connect to the server using domain name - when you try to ping/nslookup from internal device how domain name is resolved - to public or private IP address?
Perhaps AdGuard DNS rules not working or you have configured second DNS server and the device use it to resolve domain name. If there is a second DNS server configured you can try to remove it and test again.
 

Cheese_Echidna

Dabbler
Joined
Jan 16, 2022
Messages
13
Pretty much.

I have turned off the private DNS server to remove it as a variable. I am now seeing two distinct sets of behaviour.
I have switched to using curl to test if the server is found because ping was acting really weird.

Externally:
- Going to the domain of the server in a web browser works
- Going to the IP address of server in a web browser gives you a 404 because the reverse proxy does not know what to do (Expected)
- Using a dig/nslookup with any DNS provider returns my IP address
(I double checked this with https://www.whatsmydns.net/#A/ and nearly all servers returned my IP public address, the ones that didn't timed out)
- Doing a traceroute in Nmap it returns a route to the server that looks all good


Internally (Shared by internal devices and the server itself)
- Using a dig/nslookup with any DNS provider still returns my IP address
- Going to the domain of the server in a web browser fails with a timeout
- Going to the IP address of server in a web browser fails with a timeout


However, while I was conducting these tests I noticed something strange. Here is the result from https://tools.keycdn.com/traceroute
Code:
Start: 2023-10-18T22:50:49+0000
                                   Loss   Snt   Last   Avg  Best  Wrst StDev
  1.|-- ???                       100.0     4    0.0   0.0   0.0   0.0   0.0
  2.|-- 10.90.2.16                 0.0%     4    0.4   0.5   0.4   0.8   0.2
  3.|-- 143.244.192.0              0.0%     4    0.3   0.5   0.3   0.7   0.2
  4.|-- 143.244.224.176            0.0%     4    0.7   0.7   0.6   0.7   0.1
  5.|-- 143.244.224.175            0.0%     4    0.4   0.4   0.3   0.6   0.1
  6.|-- 110.145.196.153            0.0%     4    0.4   0.5   0.4   0.8   0.2
  7.|-- 203.50.12.132              0.0%     4    2.4   2.2   1.1   2.9   0.8
  8.|-- 203.50.13.131              0.0%     4   11.5  12.1  11.5  12.7   0.5
  9.|-- 203.50.11.195              0.0%     4   12.0  12.0  11.9  12.2   0.1
 10.|-- 203.54.153.14              0.0%     4   12.5  12.6  12.5  12.9   0.2
 11.|-- ???                       100.0     4    0.0   0.0   0.0   0.0   0.0
 12.|-- 210.49.105.142             0.0%     4   12.6  12.8  12.6  13.0   0.2
 13.|-- ???                       100.0     4    0.0   0.0   0.0   0.0   0.0


Notice that it finishes at 210.49.105.142 and then just kinda dies, not reaching my server.
Well a IP lookup tells me that that domain belongs to my crappy ISP (Optus), see: https://check-host.net/ip-info?host=210.49.105.138
So I get a feeling that they might have something to do with this.
 

nKk

Dabbler
Joined
Jan 8, 2018
Messages
42
Internally (Shared by internal devices and the server itself)
- Using a dig/nslookup with any DNS provider still returns my IP address
- Going to the domain of the server in a web browser fails with a timeout
- Going to the IP address of server in a web browser fails with a timeout
nslookup return real IP address of the server?
If this is the case perhaps there is a problem with port forwarding of your router because the packets arrive at internal NIC and usually routers expect packets for forwarding to arrive from external interface. Depends of the router you may have possibilities to make additional configuration of forwarding rules.
In other way you should use your internal DNS using private IP addresses for the same domain names and configure it on all internal devices as primary and the only one DNS server.
 
Top