Frequent kernel panics

Status
Not open for further replies.

microbug

Dabbler
Joined
Dec 14, 2016
Messages
44
My FreeNAS box has been fairly stable until now, but it has started crashing randomly. Today I wiped my SSD and added it as L2ARC, and reimported a jail from the SSD to the hard drive array. I can't see how that would cause this though. Screenshot of the console: http://imgur.com/a/DXc8I.

Any suggestions?

Edit:
Specs are as follows:

  • FreeNAS-9.10.2 (a476f16)
  • Intel Xeon E3-1265L v3
  • 8GB ECC DDR3 1600MHz RAM
  • ASRock Rack E3C224D2I motherboard
  • 3TB WD Red HDD
  • 2TB Seagate HDD
  • 120GB Samsung SSD (L2ARC)
  • No expansion cards, using built in NIC

Edit 2: Post Thread Update
  • Problems stopped after disabling VIMAGE on jails
  • EVGA G2 550W power supply (tier 1) now installed in the NAS
  • Removed L2ARC as it wasn't useful
  • Looking into upgrading RAM to 16GB and storage to a 12/15TB RAIDZ1 array (5/6*3TB Hitachi HDDs)
  • Looking into acquiring a UPS to prevent ungraceful shutdowns
 
Last edited:

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Any suggestions?
You have not received any responses due to lack of information.
Please read and follow the Forum Rules and provide requested info
so you can be helped.
Thanks!
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Yes, @BigDave's suggestion will be helpful.

How have you configured your network? In particular, your firewall? Because it appears that outside users are trying to hack into your system via SSH, from IP addresses 13.152.36.69 and 5.57.220.230. This is bad... very bad. :eek:

Also, are you using LAGG or such? It looks like the crash had something to do with your Ethernet stack.

We might be able to help out, if you'll post your system information.
 

microbug

Dabbler
Joined
Dec 14, 2016
Messages
44
I have updated the original post with my system specs.

My firewall is the one on the ISP's router, and it's turned off... I'll turn it back on. Port forwarding has been set up for SSH, SFTP and HTTP/S. SSH is configured to only allow public key authentication and not allow root login so I don't see the hacking being a problem.

I don't think I'm using LAGG. The two jails both use VIMAGE, if that could help. Here are the details of the network interfaces:
Code:
[*****@lead] ~% ifconfig

igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=2400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>

	ether d0:50:99:79:9a:c5

	inet 192.168.0.150 netmask 0xffffff00 broadcast 192.168.0.255

	inet6 fe80::d250:99ff:fe79:9ac5%igb0 prefixlen 64 scopeid 0x1

	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

	media: Ethernet autoselect (1000baseT <full-duplex>)

	status: active

igb1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>

	ether d0:50:99:79:9a:c6

	nd6 options=9<PERFORMNUD,IFDISABLED>

	media: Ethernet autoselect

	status: no carrier

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384

	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>

	inet6 ::1 prefixlen 128

	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3

	inet 127.0.0.1 netmask 0xff000000

	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

	ether 02:38:c2:98:c1:00

	nd6 options=1<PERFORMNUD>

	id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15

	maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200

	root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0

	member: epair1a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>

			ifmaxaddr 0 port 6 priority 128 path cost 2000

	member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>

			ifmaxaddr 0 port 5 priority 128 path cost 2000

	member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>

			ifmaxaddr 0 port 1 priority 128 path cost 20000

epair0a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=8<VLAN_MTU>

	ether 02:ff:20:00:05:0a

	nd6 options=1<PERFORMNUD>

	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)

	status: active

epair1a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=8<VLAN_MTU>

	ether 02:ff:20:00:06:0a

	nd6 options=1<PERFORMNUD>

	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)

	status: active




Edit:
I think you might be right about someone trying to hack my system. I just checked the console log and there are lots of SSH auth errors (http://imgur.com/a/Z73Rg). Again, this shouldn't be a problem as I'm using a 4096-bit key with a passphrase that has been securely stored, and password login is disabled.
 
Last edited:

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I have updated the original post with my system specs.

My firewall is the one on the ISP's router, and it's turned off... I'll turn it back on. Port forwarding has been set up for SSH, SFTP and HTTP/S. SSH is configured to only allow public key authentication and not allow root login so I don't see the hacking being a problem.

I don't think I'm using LAGG. The two jails both use VIMAGE, if that could help. Here are the details of the network interfaces:
Code:
[*****@lead] ~% ifconfig

igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=2400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>

	ether d0:50:99:79:9a:c5

	inet 192.168.0.150 netmask 0xffffff00 broadcast 192.168.0.255

	inet6 fe80::d250:99ff:fe79:9ac5%igb0 prefixlen 64 scopeid 0x1

	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

	media: Ethernet autoselect (1000baseT <full-duplex>)

	status: active

igb1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>

	ether d0:50:99:79:9a:c6

	nd6 options=9<PERFORMNUD,IFDISABLED>

	media: Ethernet autoselect

	status: no carrier

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384

	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>

	inet6 ::1 prefixlen 128

	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3

	inet 127.0.0.1 netmask 0xff000000

	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

	ether 02:38:c2:98:c1:00

	nd6 options=1<PERFORMNUD>

	id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15

	maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200

	root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0

	member: epair1a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>

			ifmaxaddr 0 port 6 priority 128 path cost 2000

	member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>

			ifmaxaddr 0 port 5 priority 128 path cost 2000

	member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>

			ifmaxaddr 0 port 1 priority 128 path cost 20000

epair0a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=8<VLAN_MTU>

	ether 02:ff:20:00:05:0a

	nd6 options=1<PERFORMNUD>

	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)

	status: active

epair1a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=8<VLAN_MTU>

	ether 02:ff:20:00:06:0a

	nd6 options=1<PERFORMNUD>

	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)

	status: active




Edit:
I think you might be right about someone trying to hack my system. I just checked the console log and there are lots of SSH auth errors (http://imgur.com/a/Z73Rg). Again, this shouldn't be a problem as I'm using a 4096-bit key with a passphrase that has been securely stored, and password login is disabled.

I believe your problems have to do with moving your jails. The console screenshot contains this error message:

lead kernel: ng_ether_ifnet_arrival_event: can't re-name node epair1b

Followed by several lost memory messages related to UDP and TCP networking - then blows up spectacularly. You're using VIMAGE, which seems to involve bridging (which we see above) and epair1b is a member of the bridge.

You might be best served to destroy your jails, simplify your network stack, and then re-create the jails. A nuisance, I know...
 

microbug

Dabbler
Joined
Dec 14, 2016
Messages
44
I was just in the process of doing 'warden export [jail]' when the system crashed again. The jail was stopped, and it was a different message: http://imgur.com/a/lCmMf

I'll go ahead and run that export again, and delete the jail to see if the crashes stop, but I'm not hopeful...
 

microbug

Dabbler
Joined
Dec 14, 2016
Messages
44
I just got a similar crash to last time. I deleted the old jail and the template, and made a new one with a fresh standard template. This is the console output: http://imgur.com/a/Tcdc5

I'm going to run memtest86 to rule out RAM problems.

Edit:
@Spearfoot, I reread your post and saw that you suggested I 'simplify my network stack' — what do you mean by that? I only have one ethernet cable going in, and VLANs etc. aren't configured.
 
Last edited:

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I just got a similar crash to last time. I deleted the old jail and the template, and made a new one with a fresh standard template. This is the console output: http://imgur.com/a/Tcdc5

I'm going to run memtest86 to rule out RAM problems.

Edit:
@Spearfoot, I reread your post and saw that you suggested I 'simplify my network stack' — what do you mean by that? I only have one ethernet cable going in, and VLANs etc. aren't configured.
The new error messages don't seem to be network related, 'geom' makes me think something's up with the disk/drive subsystem(!)

By 'simplify', I just mean that I would minimize the complexity and just get the simplest configuration up and running, making sure there are no vestiges of anything leftover from prior setups, i.e., no bridges or anything like that. Once you have the beast running reliably, then you can start installing jails.
 

microbug

Dabbler
Joined
Dec 14, 2016
Messages
44
The console window was constantly cycling through different error messages so perhaps the screencap that I took just happened to include that... I have online backups of the most important data, but losing the drives would be a pain.

memtest86 is currently running. Once it finishes (assuming no errors) I'll delete the new jail and take a look at the network interfaces.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
The console window was constantly cycling through different error messages so perhaps the screencap that I took just happened to include that... I have online backups of the most important data, but losing the drives would be a pain.

memtest86 is currently running. Once it finishes (assuming no errors) I'll delete the new jail and take a look at the network interfaces.
That's good.

BTW: I noticed you've configured your system with an L2ARC device... but you only have 8GB RAM. Until you've installed 64+GB of RAM, an L2ARC isn't going to help your performance, and in fact, the overhead of maintaining it will actually hurt the performance of a memory-constrained system like yours. In general, it's always better to install more RAM in lieu of an L2ARC device.
 

microbug

Dabbler
Joined
Dec 14, 2016
Messages
44
I had just read that the L2ARC would be useless from @BigDave's PowerPoint link in his signature. Once the system is rebooted I'll remove it and see if that conveniently solves my problems (they started around the same time as I added it). The SSD came with the system when I bought it, so I thought I'd put it to use. I'll sell it and buy some more RAM!
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I had just read that the L2ARC would be useless from @BigDave's PowerPoint link in his signature. Once the system is rebooted I'll remove it and see if that conveniently solves my problems (they started around the same time as I added it). The SSD came with the system when I bought it, so I thought I'd put it to use. I'll sell it and buy some more RAM!
That's the spirit! :D
 

microbug

Dabbler
Joined
Dec 14, 2016
Messages
44
I detached the L2ARC and tried starting the jail to see if it would crash. It did: http://imgur.com/a/WX5Gb

I'll remove it and see if the server is stable then.

Edit: memtest86 passed with no errors

Edit 2: As of just now, the system has been up for 2 hours without any errors. I'm going to try making a jail with VIMAGE disabled, as the errors seemed to be network-related.
 
Last edited:

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
That still looks like RAM errors to me. Is the power supply big enough and high enough quality for that system? Even if it is, it still could be failing. Is the additional four-pin CPU power connector attached?
 

microbug

Dabbler
Joined
Dec 14, 2016
Messages
44
I just checked and the 4-pin is plugged in correctly. I have been using the server without a crash now for 7 hours — it seems disabling VIMAGE did the trick. That said, the power supply in the server is one of these (the 250W one), which is just about enough (this calculator gives me a max load of 203W). I bought the server second hand at a bargain price, and the original owner apparently didn't realise that FlexATX isn't compatible with the case; the PSU is attached only by one screw and lots of duct tape. He had also attached LED lighting (for a server...), HDD coolers and didn't send any of the extra HDD brackets for the case. I guess you get what you pay for...

I'll keep an eye on it, and if I still keep getting crashes I'll post back here. I'll also run memtest86 again overnight. Right now it seems to be fixed though. Thanks everyone for your help!
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I just checked and the 4-pin is plugged in correctly. I have been using the server without a crash now for 7 hours — it seems disabling VIMAGE did the trick. That said, the power supply in the server is one of these (the 250W one), which is just about enough (this calculator gives me a max load of 203W). I bought the server second hand at a bargain price, and the original owner apparently didn't realise that FlexATX isn't compatible with the case; the PSU is attached only by one screw and lots of duct tape. He had also attached LED lighting (for a server...), HDD coolers and didn't send any of the extra HDD brackets for the case. I guess you get what you pay for...

I'll keep an eye on it, and if I still keep getting crashes I'll post back here. I'll also run memtest86 again overnight. Right now it seems to be fixed though. Thanks everyone for your help!
Uh oh! A 250W PSU isn't really adequate for your system; our "Proper Power Supply Sizing Guidance" thread suggests you need ~450W.
 
Last edited:

microbug

Dabbler
Joined
Dec 14, 2016
Messages
44
Looks like that NAS overhaul I was thinking of will come sooner rather than later. Thanks for the heads up!
 

microbug

Dabbler
Joined
Dec 14, 2016
Messages
44
HTTP/S was never exposed (I checked the router and I hadn't enabled it). That only leaves SSH exposed, which will only accept public key authentication. Plus, it's a jail with limited write access to the main system. That said, I'm thinking about setting up denyhosts as I'm getting lots of SSH auth attempts from random IPs.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I'm thinking about setting up denyhosts as I'm getting lots of SSH auth attempts from random IPs.
If you change the public port to something non-standard, you'll eliminate a lot of the script kiddies. Which is not to say you shouldn't implement additional hardening.
 
Status
Not open for further replies.
Top