FreeNAS dies after a few hours

Status
Not open for further replies.

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
I've started having a strange problem a few days ago with my system. After a few hours (seems to be between 8 and 16), it becomes totally unresponsive: no GUI, no SSH, file shares inaccessible. I need to hard boot the system to get it working again. I've been unable to determine the cause, but I'm not sure where all to look. Not much has changed over the last few days other than I started playing with iocage and setting up a few jails.

I've been using FreeNAS for about a year without any issues, but this one makes the system unusable. Needless to stay, troubleshooting is difficult in this case. Can anyone suggest a few things to check before I rebuild it? The last thing I see in /var/log/messages before it dies is:
Code:
Mar 12 02:53:39 freenas /alert.py: [system.alert:400] Alert module '<update_check.UpdateCheckAlert object at 0x81639ac88>' failed: Call timeout
Mar 12 02:53:42 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (15 occurrences)
Mar 12 02:54:42 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
Mar 12 02:55:23 freenas daemon[3525]:	 2018/03/12 02:55:23 [WARN] agent: Check 'service:nas-health' is now warning
Mar 12 02:55:39 freenas /alert.py: [system.alert:400] Alert module '<samba4.Samba4Alert object at 0x8174dac88>' failed: Call timeout
Mar 12 02:55:42 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
Mar 12 02:56:39 freenas /alert.py: [system.alert:400] Alert module '<update_check.UpdateCheckAlert object at 0x81639ac88>' failed: Call timeout
Mar 12 02:56:42 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
Mar 12 02:57:26 freenas daemon[3525]:	 2018/03/12 02:57:26 [WARN] agent: Check 'service:nas-health' is now warning
Mar 12 02:57:42 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (50 occurrences)
Mar 12 02:58:40 freenas /alert.py: [system.alert:400] Alert module '<samba4.Samba4Alert object at 0x8174dac88>' failed: Call timeout
Mar 12 02:58:42 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (16 occurrences)
Mar 12 02:59:29 freenas daemon[3525]:	 2018/03/12 02:59:29 [WARN] agent: Check 'service:nas-health' is now warning
Mar 12 02:59:40 freenas /alert.py: [system.alert:400] Alert module '<update_check.UpdateCheckAlert object at 0x81639ac88>' failed: Call timeout
Mar 12 02:59:42 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (4 occurrences)
Mar 12 03:00:40 freenas /alert.py: [system.alert:400] Alert module '<samba4.Samba4Alert object at 0x8174dac88>' failed: [Errno 48] Address already in use
Mar 12 03:00:42 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
Mar 12 03:01:40 freenas /alert.py: [system.alert:400] Alert module '<update_check.UpdateCheckAlert object at 0x81639ac88>' failed: Call timeout
Mar 12 03:01:42 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
Mar 12 07:46:08 freenas syslog-ng[1713]: syslog-ng starting up; version='3.7.3'

After reboot, I see a lot of this in the log:
Code:
Mar 12 07:46:08 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
Mar 12 07:46:08 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (57 occurrences)
Mar 12 07:46:08 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (5 occurrences)
Mar 12 07:46:08 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
Mar 12 07:46:08 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
Mar 12 07:46:08 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
Mar 12 07:46:08 freenas kernel: sonewconn: pcb 0xfffff800338d1000: Listen queue overflow: 151 already in queue awaiting acceptance (61 occurrences)

I am running
Code:
FreeBSD freenas.local 11.1-STABLE FreeBSD 11.1-STABLE #0 r321665+d4625dcee3e(freenas/11.1-stable): Wed Dec 13 16:33:42 UTC 2017

Thanks for any assistance
 
Last edited by a moderator:

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Joined
Jan 18, 2017
Messages
525
FreeNAS 11.1-stable had a memory leak issue IIRC you should try updating to 11.1-U2 to see if that resolves the issue.
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
FreeNAS 11.1-stable had a memory leak issue IIRC you should try updating to 11.1-U2 to see if that resolves the issue.
I tried the update and my system didn't come back up. All I could see as the log scrolled past was a lot of stuff about refused connections and components not talking to each other. When the log stopped and the menu presented itself and its 0.0.0.0 IP address, I became terrified and rebooted into 11.1-stable. All is well, so far. I'm not sure if anyone else has had that issue; seems like a bad one.

I will post back if this problem presents itself again, as I expect it will.
 
Last edited:
Joined
Jan 18, 2017
Messages
525
Did you happen to have LACP setup? I remember seeing an issue with it needing to be reconfigured after an update recently not sure which update did that though....
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
So, it seems my problem was likely a bunch of rogue connections from my PC to the FreeNAS GUI. That machine had been up for days and the browser was obviously in a bad state. I will post back if this re-occurs.

Now as for my failed U2 update, that remains a mystery...
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
Now as for my failed U2 update, that remains a mystery...
Problem solved by booting to 11.1-stable, removing the U2 update, and re-upgrading. Second time worked fine. It is possible my install could have been interrupted the first time; guessing that was the case.
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
Well, this is not solved after all. I have eliminated the excess connections, but this seems to have been unrelated to my system going down. This is happening consistently every day. I don't see anything unusual in the CPU or memory graphs.

At this point I am just considering restoring my system from backup, unless anyone has some ideas for other things to look at.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Changing the title of the thread to something more descriptive might get more people looking at it.
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
Changing the title of the thread to something more descriptive might get more people looking at it.
Not really sure what to change it to, as I still have no idea what is causing the problem. I have eliminated the queue overflow messages from my first post (seemingly unrelated to the hanging), but I still see the Samba and UpdateCheckAlert messages in my logs.

My symptoms are the same as what is described here, but people are reporting their issues are solved with U2. Not so for me; I am continuing to investigate. Looks like the system hangs around 3am every morning so I am suspecting SMART checks or other maintenance processes.
 
Joined
Jan 18, 2017
Messages
525
If you are on U2 and its still hanging I suggest filing a bug report.
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
If you are on U2 and its still hanging I suggest filing a bug report.
I am leaning towards that, but I'm not sure how useful it will be without trying to isolate the problem. I am going to try disabling a backup replication task I have that runs at night; if that fails, I will see if replacing the boot drives and rebuilding my system solves the problem, but from what I can tell everything is healthy.
 

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
3AM is when the nightly periodic reports are created. Some of these involve a find that runs over all filesystems and generates a fair load. If you posted system specifications earlier, I did not see them. Do you have at least 8G of RAM?
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
3AM is when the nightly periodic reports are created. Some of these involve a find that runs over all filesystems and generates a fair load.
Interesting, one more thing to check.

If you posted system specifications earlier, I did not see them. Do you have at least 8G of RAM?

I do; they are in my signature. Should have been clear about that.
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
Ok, I found the culprit. It is indeed my midnight replication task. I disabled it last night and the system stayed up. This is a task that I had intended to run nightly to backup my pool to an extra drive in my system, but as I mentioned in another thread it has never worked correctly. That said though, before now it has never caused problems aside from not completing.

I am going to disable this task until 11.3 and try again. Thanks everyone for your input.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Ok, I found the culprit. It is indeed my midnight replication task. I disabled it last night and the system stayed up. This is a task that I had intended to run nightly to backup my pool to an extra drive in my system, but as I mentioned in another thread it has never worked correctly. That said though, before now it has never caused problems aside from not completing.

I am going to disable this task until 11.3 and try again. Thanks everyone for your input.
I have told other people, but I don't know if I mentioned it to you. I have never been able to get a USB drive to work reliably enough to use in the way you are trying to use this. My experience with USB and FreeNAS was that it crashed the system. I would suggest finding another way to connect a backup drive that doesn't involve USB and see if the backup task works. I have a backup pool of four drives that are connected by SAS and make a twice weekly backup to them that works perfectly.
You might want to discuss your backup method.
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
I have not had good luck with the USB drive in this situation either, but it works fine as an extra share that I back my Windows PC up to using a third party backup tool.

However, in this case I am actually trying to replicate to an extra *internal* drive I have in my FreeNAS system. I feel this should work?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
However, in this case I am actually trying to replicate to an extra *internal* drive I have in my FreeNAS system. I feel this should work?
The thing I saw when trying to use USB is that the interface would periodically reset under high, sustained transfer and that wold cause the drive to suddenly 'disappear'. FreeNAS would not handle that well and would crash. I tried several times to make this work and finally gave up on it because of the crashes. I didn't want the USB drive to cause damage to the data in my FreeNAS system. That is why I connected my backup drive to a different interface. I first went with plugging the drive into a SATA port and that worked fine, but when I had more data than would fit on one drive I went to a pool of drives and connected them to the SAS controller. It has evolved over the years. I don't use USB for anything on my FreeNAS system.
 
Status
Not open for further replies.
Top