I'm running TrueNAS-12.0-U8 on P8Z68 board. After a while -- usually after a day -- network throughput drops dramatically (to about 1-2Mbps), as reported by iperf3. Moreover, ssh-ing to the box and typing somethign is laggy, so somethign is definitely bonkers with network.
This is consistently happening after about 12-24 hours of uptime. What I have observed so far:
1. This happens with built-in Intel LAN adapter (Asus P8Z68 v-pro/gen3 board)
2. This happens with HP dual head NIC NC360T, connected with a single port.
3. This happens with NC360T in LACP.
4. This happens with similar IBM card. At this point I'm convinced it has nothing to do with the NIC per se
5. IT does not matter if the system was idle all that time or transferring massive amounts of files.
6. Replugging lan cable does not help recover
7. Resetting network configuration via console either does not help or leaves the network in bad state (complete loss of connectivity)
8. Rebooting the whole nas fixes it, for about a day.
6. There is nothing useful in the /var/log/messages when that happens: it was working find this morning, and wasn't after 11:20:
Is " Limiting open port RST response from " relevant here? Reading this forum it appears that this is a result of someone knocking to the closed port. For the lack of other ideas I can try to research in that direction, but this does not look to be likely culprit -- it would have been a bug if it was possible to bring down the nas by just knocking at closed port.
Questions:
1. What other avenues do you guys suggest I can explore to further triage it? I cannot reproduce it on-demand, but it happens on its own within a day.
2. I have the system in that state now. What OS state can I look at to see what's going on?
This appears to be FreeBSD system issue at this point, as opposed to TrueNAS specific one (storage is not involved). I only found this vaguely relevant thread with no outcome: https://www.truenas.com/community/threads/weird-networking-problems-after-60-days-of-uptime.38175/
Any ideas are welcomed!
This is consistently happening after about 12-24 hours of uptime. What I have observed so far:
1. This happens with built-in Intel LAN adapter (Asus P8Z68 v-pro/gen3 board)
2. This happens with HP dual head NIC NC360T, connected with a single port.
3. This happens with NC360T in LACP.
4. This happens with similar IBM card. At this point I'm convinced it has nothing to do with the NIC per se
5. IT does not matter if the system was idle all that time or transferring massive amounts of files.
6. Replugging lan cable does not help recover
7. Resetting network configuration via console either does not help or leaves the network in bad state (complete loss of connectivity)
8. Rebooting the whole nas fixes it, for about a day.
6. There is nothing useful in the /var/log/messages when that happens: it was working find this morning, and wasn't after 11:20:
Code:
truenas% sudo tail /var/log/messages Mar 1 00:00:00 truenas newsyslog[55520]: logfile turned over due to size>200K Mar 1 00:00:00 truenas syslog-ng[1028]: Configuration reload request received, reloading configuration; Mar 1 00:00:00 truenas syslog-ng[1028]: Configuration reload finished; Mar 1 11:20:19 truenas kernel: Limiting open port RST response from 289 to 200 packets/sec Mar 1 11:21:19 truenas kernel[1028]: Last message 'Limiting open port R' repeated 1 times, suppressed by syslog-ng on truenas.local
Is " Limiting open port RST response from " relevant here? Reading this forum it appears that this is a result of someone knocking to the closed port. For the lack of other ideas I can try to research in that direction, but this does not look to be likely culprit -- it would have been a bug if it was possible to bring down the nas by just knocking at closed port.
Questions:
1. What other avenues do you guys suggest I can explore to further triage it? I cannot reproduce it on-demand, but it happens on its own within a day.
2. I have the system in that state now. What OS state can I look at to see what's going on?
This appears to be FreeBSD system issue at this point, as opposed to TrueNAS specific one (storage is not involved). I only found this vaguely relevant thread with no outcome: https://www.truenas.com/community/threads/weird-networking-problems-after-60-days-of-uptime.38175/
Any ideas are welcomed!