System hangs at night

Status
Not open for further replies.

selfsame

Cadet
Joined
Feb 25, 2016
Messages
8
Every night sometime after midnight my system will become unresponsive to all network requests. I cannot ssh in to the system, nor view any of the web interfaces (FreeNAS webGUI, Plex, Transmission, etc). A hard reboot fixes the system and it works fine all day/night until sometime after midnight again. My system is currently headless and I haven't attempted to directly interact with the machine in the 'hung' state but I'll be trying that tomorrow if this persists.

At first I thought this was related to a hard drive integrity cron process since I had a hard drive that was reporting bad sectors. I replaced the drive but the nightly hanging continues.

I've examined the messages log and there's no obvious error or rogue cron job that seems to be the culprit. The hardware is all new as of last fall (except for the hard drives) and has been working fine until about 2 weeks ago (I'm happy to post HW specs if anyone thinks that would be helpful). The only change I can think of that might be salient is that about a month ago I configured my router to use a DynamicDNS service so I could access Plex remotely, so my FreeNAS system is now exposed to the big bad internet. I plan on reviewing my firewall settings tonight to lock that down further.

Here are several days' worth of excerpts from my messages log just before the system hangs and is rebooted.

Code:
Mar  6 02:11:55 Kurgan sshd[31371]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar  6 02:12:27 Kurgan sshd[31401]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar  6 02:13:14 Kurgan sshd[31420]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar  6 02:13:44 Kurgan sshd[31422]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar  6 02:13:50 Kurgan sshd[31424]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar  6 02:13:53 Kurgan sshd[31426]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar  6 02:14:23 Kurgan sshd[31428]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar  6 02:14:29 Kurgan sshd[31447]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar  6 02:14:34 Kurgan sshd[31449]: fatal: Read from socket failed: Connection reset by peer [preauth]
<REBOOT>
Mar  6 07:56:02 Kurgan syslog-ng[1745]: syslog-ng starting up; version='3.5.6'
...
Mar  7 00:00:00 Kurgan syslog-ng[1745]: Configuration reload request received, reloading configuration;
Mar  7 00:26:22 Kurgan smartd[2464]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Mar  7 00:26:22 Kurgan smartd[2464]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Mar  7 00:56:22 Kurgan smartd[2464]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Mar  7 00:56:22 Kurgan smartd[2464]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Mar  7 01:26:23 Kurgan smartd[2464]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Mar  7 01:26:23 Kurgan smartd[2464]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Mar  7 01:56:22 Kurgan smartd[2464]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Mar  7 01:56:22 Kurgan smartd[2464]: Device: /dev/ada1, 1 Offline uncorrectable sectors
<REBOOT>
Mar  7 08:54:19 Kurgan syslog-ng[1745]: syslog-ng starting up; version='3.5.6'
...
Mar  8 00:00:00 Kurgan syslog-ng[1758]: Configuration reload request received, reloading configuration;
Mar  8 00:11:10 Kurgan smartd[11394]: Device: /dev/ada2, 1 Currently unreadable (pending) sectors
Mar  8 00:11:10 Kurgan smartd[11394]: Device: /dev/ada2, 1 Offline uncorrectable sectors
Mar  8 00:41:10 Kurgan smartd[11394]: Device: /dev/ada2, 1 Currently unreadable (pending) sectors
Mar  8 00:41:10 Kurgan smartd[11394]: Device: /dev/ada2, 1 Offline uncorrectable sectors
Mar  8 01:11:10 Kurgan smartd[11394]: Device: /dev/ada2, 1 Currently unreadable (pending) sectors
Mar  8 01:11:10 Kurgan smartd[11394]: Device: /dev/ada2, 1 Offline uncorrectable sectors
Mar  8 01:41:11 Kurgan smartd[11394]: Device: /dev/ada2, 1 Currently unreadable (pending) sectors
Mar  8 01:41:11 Kurgan smartd[11394]: Device: /dev/ada2, 1 Offline uncorrectable sectors
Mar  8 02:11:10 Kurgan smartd[11394]: Device: /dev/ada2, 1 Currently unreadable (pending) sectors
Mar  8 02:11:10 Kurgan smartd[11394]: Device: /dev/ada2, 1 Offline uncorrectable sectors
<REBOOT>
Mar  8 07:30:47 Kurgan syslog-ng[1756]: syslog-ng starting up; version='3.5.6'
...
Mar  9 00:00:00 Kurgan syslog-ng[1756]: Configuration reload request received, reloading configuration;
Mar  9 00:46:58 Kurgan sshd[41257]: error: Received disconnect from 74.208.144.85: 3: com.jcraft.jsch.JSchException: Auth fail [preauth]
Mar  9 00:46:58 Kurgan sshd[41259]: error: Received disconnect from 74.208.144.85: 3: com.jcraft.jsch.JSchException: Auth fail [preauth]
Mar  9 00:46:59 Kurgan sshd[41261]: error: Received disconnect from 74.208.144.85: 3: com.jcraft.jsch.JSchException: Auth fail [preauth]
Mar  9 00:47:00 Kurgan sshd[41263]: error: Received disconnect from 74.208.144.85: 3: com.jcraft.jsch.JSchException: Auth fail [preauth]
<REBOOT>
Mar  9 07:45:59 Kurgan syslog-ng[1754]: syslog-ng starting up; version='3.5.6'
...
Mar 10 00:00:00 Kurgan syslog-ng[1754]: Configuration reload request received, reloading configuration;
Mar 10 00:48:01 Kurgan sshd[36134]: error: Received disconnect from 42.119.249.101: 3: com.jcraft.jsch.JSchException: Auth fail [preauth]
Mar 10 00:48:08 Kurgan sshd[36136]: error: Received disconnect from 42.119.249.101: 3: com.jcraft.jsch.JSchException: Auth fail [preauth]
Mar 10 00:48:20 Kurgan sshd[36138]: error: Received disconnect from 42.119.249.101: 3: com.jcraft.jsch.JSchException: Auth fail [preauth]
Mar 10 01:29:00 Kurgan sshd[37119]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar 10 01:29:01 Kurgan sshd[37122]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar 10 02:32:24 Kurgan sshd[38685]: fatal: Read from socket failed: Connection reset by peer [preauth]
Mar 10 02:33:05 Kurgan sshd[38704]: fatal: Read from socket failed: Connection reset by peer [preauth]
<REBOOT>
Mar 10 07:44:11 Kurgan syslog-ng[1750]: syslog-ng starting up; version='3.5.6'
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
1) Did you open ssh to the internet on its default port from Suzie Homemaker IP space? (Please say no, dear God, say no). OK, I just checked those IP's, yes, you must have exposed yourself on default ports, because that 42.119.249.101 is a l33t-h4x0r IP address from Vietnam. What else have you exposed? If you MUST forward ssh, then, for the love of all that is holy, expose it on a NON-standard port, DISABLE password authentication, and force certificate authentication only. Odds are now definitely non-zero that someone got on your box.

2) Do you not see that drive ada2 may be failing? Let's have a look at the output to this, in "code" square brackets, please:
Code:
smartctl -x /dev/ada2
 

selfsame

Cadet
Joined
Feb 25, 2016
Messages
8
I agree completely that leaving port 22 open was risky. Last night I updated firewall settings to only allow from a known IP range I can VPN over. And other changes to SSH settings - disable password auth, change default port - will be made shortly.

@DrKK - regarding point #2 - I already replaced that drive as my first thought was "let's address obvious error messages". See this thread for more info.

As an update, this morning the system was hung again and I checked the terminal which was unresponsive to keyboard input so properly hung. The following was on the terminal before reboot:

Code:
Enter an option from 1-14: Mar 10 22:57:03 Kurgan afp[62197]: read: Operation timed out
Mar 10 22:57:03 Kurgan afp[62197]: dmi_stream_read: len:-1, Operation timed out


So the only thing I can think of there is that one of the macs in the house didn't close a connection properly when reading remotely from the FreeNAS box. But that seems like possibly a bug in the FreeNAS's implementation of the AFP protocol and shouldn't have hung the system. I'll try disabling AFP tonight and see if the machine still hangs.
 

selfsame

Cadet
Joined
Feb 25, 2016
Messages
8
I disabled AFP and the system has been fine - no hanging. If this continues for a few days I'll close this thread as solved and open a bug report.
 
Status
Not open for further replies.
Top