WebUI and commandline menu becomes unresponsive...

Status
Not open for further replies.

shoon

Cadet
Joined
May 31, 2011
Messages
9
I have a Dell T710 spec'd with
  • Areca 1880ixl w/ 8x2TB SAS (Raid10)
  • 24GB RAM
  • (1) quad port Intel pro gigabit
  • (2) Broadcom NetXtreme dual port gigabit card
  • Internal 1GB SD card for FreeNAS (8.0-Release) boot

Configuration from default:
  • Single ZFS file system on RAID10 container
  • LACP NIC interface (2x2port NetXtreme, 4port Intel, 2 onboard ports)
  • ISCSI enabled

General description of the problem: No data exists on the file system yet; the web interface will consistently become unresponsive and the commandline menu on the terminal will also fail to respond. Hitting CTRL+ALT+DEL will initiate the reboot but will hang trying to kill a certain PID. Received an "Out of Memory" error a single time before the WebUI crashed.
 

torrin

Moderator
Joined
May 30, 2011
Messages
32
I have a Dell T710 spec'd with
  • Areca 1880ixl w/ 8x2TB SAS (Raid10)
  • 24GB RAM
  • (1) quad port Intel pro gigabit
  • (2) Broadcom NetXtreme dual port gigabit card
  • Internal 1GB SD card for FreeNAS (8.0-Release) boot

Configuration from default:
  • Single ZFS file system on RAID10 container
  • LACP NIC interface (2x2port NetXtreme, 4port Intel, 2 onboard ports)
  • ISCSI enabled

General description of the problem: No data exists on the file system yet; the web interface will consistently become unresponsive and the commandline menu on the terminal will also fail to respond. Hitting CTRL+ALT+DEL will initiate the reboot but will hang trying to kill a certain PID. Received an "Out of Memory" error a single time before the WebUI crashed.

Are you running 64bit FreeNAS? If not, that might be the problem.

Also, you might want to ssh into the box and run tail -f /var/log/messages That way, if you get some crazy error in the log, you will see it on the screen and it will not be lost during a reboot.
 

shoon

Cadet
Joined
May 31, 2011
Messages
9
Are you running 64bit FreeNAS? If not, that might be the problem.

Yes, I'm running build: FreeNAS-8.0-RELEASE-amd64

Also, you might want to ssh into the box and run tail -f /var/log/messages That way, if you get some crazy error in the log, you will see it on the screen and it will not be lost during a reboot.

I'll post back any weird messages I might see (right now the only odd thing I see now is "Could not setup receive structures" on some of the interfaces.)
 

shoon

Cadet
Joined
May 31, 2011
Messages
9
The failure occurred again, no entries (other than basic boot up / ntp) prior to crash in /var/log/messages

The out of memory message (webui popup) dialog is titled:

Message from webpage

Out of memory at line: 189

Any other diagnostic hints?
 

esamett

Patron
Joined
May 28, 2011
Messages
345
my system was sluggish with 8 disks and 2gb ram. smoothed out with 4gb. you have lots of ram. interesting.
 

shoon

Cadet
Joined
May 31, 2011
Messages
9
The crash happened again, but I was watching the lighthttpd process and noticed it went into a 'keglim' state:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
2738 www 1 44 0 21376K 4672K keglim 5 0:01 0.00% lighttpd

killing the process (-9) took a considerable amount of time and after the process was killed, I was unable to restart /etc/rc.d/ix-httpd (it hangs trying to fetch something)

A quick googling leads me to believe there is an issue with the bce driver (broadcom) or the options I'm using: metric X mtu 9000
 

shoon

Cadet
Joined
May 31, 2011
Messages
9
I applied the 8.0.1-BETA1 this morning (from FreeNAS-8.0.1-BETA1-amd64-GUI_Upgrade.xz using the cli /root/update procedure) and I'm still seeing the lighttpd process go into a keglim state and fail to respond / shutdown

Last night I did see an error on the console before the console/ssh failed to respond, it was an istgt_lu.c error:
:1714: istgt_lu_add_unit : *** ERROR *** LU1: no LUN
:1863: istgt_lu_init: *** ERROR *** lu_add_unit() failed
:1665:main : *** ERROR *** istgt_lu_init() failed
... this might be related to me playing around with the iSCSI options and not related to the stability bug.

Any dev insight or any additional info you need into this critical stability bug?
 

shoon

Cadet
Joined
May 31, 2011
Messages
9
Argh... how frustrating. If I don't leave the web UI open for any length of time; the system seems to be stable. Any other thoughts of things to check?
 
I

ixdwhite

Guest
Sounds like you're running out of mbuf clusters which causes a deadlock. A WCHAN of 'keglimit' is the telltale. I suspect all of the NICs in the system are chewing up the default mbuf cluster allocation. 10GbE cards are similarly hungry.

Add this to /boot/loader.conf from the shell (you will need to make / writable first by running "mount -uw /") and rebooting:

kern.ipc.nmbclusters="128000"

You may need to unplug all of the Ethernet cables so DHCP doesn't try to run to keep things from locking up on boot.
 

shoon

Cadet
Joined
May 31, 2011
Messages
9
Thank you Doug. I have applied the loader.conf changes and will update this post after I put it through my application's tests.

Update: this seems to have addressed the problems. I have transferred about 2-3TB over the interfaces without seeing the lockup.

Thank you again!
 

JimGat

Cadet
Joined
Nov 14, 2011
Messages
2
I am not sure if this is related but the fix for this bought us several days in between issues now.

We are running a FreeNAS 8.01 on a supermicro chassis with Core i7 6GB Ram and two on board Intel and one Intel Pro Quad Port Card. 6 Gig ports using the igb driver.
We are currently only using one of the Quad Port Gig Ports for NFS to 3 VMware ESXi 4.1

Before upping the kern.ipc.nmbclusters="128000" we were having the non-responsive management GUI and the SSH console

After adding the kernel parameter everything works very well and will out perform our other Linux based NFS stores.

The problem we are having now is the storage network card (Intel Quad Port) will stop communicating. Seems that it does that on all 4 ports. From the FreeNas console you can ping the IP's of the card. Nothing Shows down, but not ping-able from the ESX Boxes. Disconnecting the switch has no effect (HP Managed switch). There are no packet errors that have occured. We are using Jumbo Frames MTU=9000. NFS is still active on the Management IP side, I was connected Via SSH on the management IP. Dowing Just the IGB2 (primary IP on the intel) seems to allowed the entire card to work. I am going to post in another thread If I can find one more aplicable. .

Thanks for any help in advanced and thanks to the Team for such a great NAS
 
Status
Not open for further replies.
Top