Network randomly failing for short periods

Status
Not open for further replies.

grokem

Cadet
Joined
May 10, 2015
Messages
8
Has anyone seen a FreeNAS box randomly (0-4 times/day) lose all network traffic for 5-60 minutes???

I've been happily running this FreeNAS box for over two years and this problem has slowly got worse since February this year. I'm at a total loss of where to look....

The problem is that:
  • the box is not contactable via the network (NFS,SSH has no response, no pings are returned.)
  • I have tested connectivity from multiple machines on network.
Fault finding using the console (via a serial cable connected to the box.)
  • I cannot ping out from the box.
  • tcpdump shows no network traffic.
  • ifconfig is normal (same as pre-incident) with the correct IP address set.
  • top shows 99% idle CPU, with the historic load averages all below < 0.2
  • No relevant errors in dmesg. (There are errors from upsmon not being able to contact the UPS, and the freenas process failing in smtplib with "gaierror: [Errno 8] hostname nor servname provided, or not known" .)
  • Nothing abnormal in /var/log/messages or debug.log.
  • ipfilter and ipfw are not enabled.
  • iostat shows very low to zero disk activity (since no NFS access from outside the box).
IF left alone, the box will come back and run (apparently) normally until the next incident. Rebooting also fixes the problem.

I've changed the network switch, the cable and installed an Intel PRO 82574L NIC to eliminate the obvious. So I think I have eliminated hardware faults except for motherboard and memory - which I think would show up a little more randomly??

The box is a Gigabyte GA-Z68A-D3H-B3 with a i5-2500 CPU. 3 sets of ZFS mirrored WD Red NAS disks directly connected to the MB.

I think I have eliminated a lot causes. I'm happy to provide output of any command or log.

Any suggestions would be really appreciated.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Well, your motherboard has all that extra crap you don't need.. audio, etc etc etc. We know it can cause problems, which is why we don't recommend desktop hardware, but that's exactly what you are using. So I can't say I'm too surprised, and I can only say that I'd bet if you stick with what is "tried and tested" you'd find you don't have random unexplained problems.
 

grokem

Cadet
Joined
May 10, 2015
Messages
8
The title of the forum says "Help & Support". Thanks for your unhelpful non-support.

Do you really think that the presence of on-board audio chip causes the network stack to fail?? After over two years of successful operation? This fault is either hardware that recently died or caused by a software upgrade. I need to fault find to determine which it is.

I was hoping for somebody knowledgeable to point out which tools would help narrow down my problem. I did review many other forum posts and noticed your constant trolling of other users. I was really hoping to avoid the old-man resident troll saying 'I only help if you do exactly the way I do'. Which is another way of saying, 'I don't know much but I got it working this one way once'.

** I don't think the FreeBSD guys would appreciate your claim that FreeBSD only works on a ridiculously narrow range of hardware. **

Obviously I have no choice but to follow my colleagues advice 'to junk FreeNAS and just use straight FreeBSD' - like so many others.

PS. Forgive me for feeding the troll.....
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Don't mind cyberjock. He's just padding his post count. :smile:

Well it sounded like a HW problem to me and I thought it might be external, but the reboot and new NIC, Cable, and Switch seem to kill that idea.

So I'm left with 2 ideas:
1. run a livecd of something like ubuntu and run bi-directional pings looking for timeouts.
2. reinstall Freenas (save a backup of your config first).
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
While you gave us a little bit of information about your system, you didn't say anything about the size of the hard disks, nor the amount of RAM.

Nor did you tell us which version of FreeNAS that you are using. If you are using v9.3, what is the full version number?
 

grokem

Cadet
Joined
May 10, 2015
Messages
8
I'm running FreeNAS-9.3-CURRENT-201503161938.

16Gig RAM. Not ECC, but this is a networking problem.

The disks are
  • 2 x 2TB WD Red NAS - ZFS Mirror
  • 2 x 3TB WD Red NAS - ZFS Mirror
  • 1 x 2 TB WD Red NAS - ZFS
During the network outages, arc_stat shows ZFS using about 11Gig as it usually does. Top shows memory is available and NO swap activity.

EDIT: I should have said that all disks are GELI encrytped.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
So the system is responsive during these episodes, it's just the networking that is wonky? If it were me, I'd do a fresh install and import the backed up config.
 

grokem

Cadet
Joined
May 10, 2015
Messages
8
@depasseg
I agree with both your suggestions but I'm hoping to not have to take the box offline for too long.

I did try your second suggestion of 'reinstallation' but ran into trouble with my disks. Because they are GELI encrypted, they don't automatically load, this lead the new installation to drop my network shares because the zfs pools didn't exist. Problems snowballed. If I have to try again, I think it might be easier to re-enter my settings without loading the previously stored config DB.

Am I right in my understanding that saving the config doesn't include the GELI disk keys? This lead to all sorts of problems when I uploaded the config to the new installation.
 

grokem

Cadet
Joined
May 10, 2015
Messages
8
Sorry I was typing when you replied.

Yes, the system is fully responsive on console. The networking is the only thing that is failed.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874

grokem

Cadet
Joined
May 10, 2015
Messages
8
Thanks for checking. I did keep my keys. My experiment was also done on a separate USB stick to enable a quick reversion.

One of my problems after I loaded the config, was when I attempted to import the ZFS drives. FreeNAS would not show any disks in the dropbox after I uploaded the GELI key. Do I need to export the disks from the running config before trying to import them into the new installation - despite the config DB being the same?

While the problems I encountered during my test should probably be better be documented by me, I was hoping to find the time to try again.

Anyway, I would prefer to stick to fault-finding the network problem. Where should I look? Is there any debug capability?
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
I would import and unlock the disks first and then upload the config file.

As for the networking issue, I'm afraid I'm at a loss.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Do you really think that the presence of on-board audio chip causes the network stack to fail?? After over two years of successful operation? This fault is either hardware that recently died or caused by a software upgrade. I need to fault find to determine which it is.

No freakin clue. Weird crap has happened. People would do an upgrade and find a new driver doing crap it hadn't done before.

Totally not expected, but very well documented to have happened in the past, and without a doubt will continue to happen to the poor souls that don't listen and buy appropriate hardware to start with.

I was hoping for somebody knowledgeable to point out which tools would help narrow down my problem. I did review many other forum posts and noticed your constant trolling of other users. I was really hoping to avoid the old-man resident troll saying 'I only help if you do exactly the way I do'. Which is another way of saying, 'I don't know much but I got it working this one way once'.

** I don't think the FreeBSD guys would appreciate your claim that FreeBSD only works on a ridiculously narrow range of hardware. **

Yes, except if you go to the FreeBSD forums they'll make a charcoal brick out of you for doing stupid crap you can't troubleshoot yourself to "some" extent. So far the only evidence is "my crap is broken" and that would not fly on the FreeBSD forums. Even I don't post there because I don't want to be the next brick. :P

Have you even read their forum rules? Use proper terminology, better start a post with the proper info to start. Oh, failed to include *all* of your hardware in the initial post, expect to be banned without warning.

The FreeBSD forum is totally unforgiving and doesn't give a crap if you don't get an answer.

So your argument is foolish because YOU would already be banned from there, and you don't have the appropriate expertise or knowledge to post there and not be flamed to death.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Since you are using an older version of 9.3, you might want to peruse the changelogs in the newer versions and look at the bug fixes. Start here - http://download.freenas.org/9.3/STABLE/ and drill down into the directories (versions).
 

grokem

Cadet
Joined
May 10, 2015
Messages
8
After some further searching I have found some people having problems with network buffer overflows. I have found that restarting NFS will allow me to bring up the network interface successfully.

During an outage, if I try:

ifconfig em0 down
ifconfig em0 up

I get this error: "Could not setup receive structures"

When I restart nfs, the "ifconfig em0 up" command is successful and the network is restored.

Checking netstat -m during the outage (ie. before the above commands) I get:

[root@medusa] ~# netstat -m
1026/1599/2625 mbufs in use (current/cache/total)
0/526/526/262144 mbuf clusters in use (current/cache/total/max)
0/500 mbuf+clusters out of packet secondary zone in use (current/cache)
0/64/64/504941 4k (page size) jumbo clusters in use (current/cache/total/max)
1025/70/1095/149612 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84156 16k jumbo clusters in use (current/cache/total/max)
9481K/2337K/11819K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines​

This all looks normal to me.

Does this ring any bells for anybody??
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Are you using jumbo frames or is that just a random log message. Also when posting logs or CLI output use code tags so formatting is nice.
 

grokem

Cadet
Joined
May 10, 2015
Messages
8
SweetAndLow: Yes I do have Jumbo frames set. I have tried with MTU 1500 and smaller to no effect. (Point taken about the formatting.)

I have found a link between NFS and the network dropping. If I restart the NFS daemon from the console, the network immediately restarts. Using netstat I can't actually catch it running out of memory.

That's enough effort on this system. I have enough time to now do a complete new installation.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Be careful with jumbo frames. Most of the time people don't know how to actually configure them. And you should also know that they don't really have any benefit in today's world.
 
Status
Not open for further replies.
Top