Very slow system, shell and GUI

Status
Not open for further replies.

alxxer

Dabbler
Joined
May 7, 2013
Messages
38
So my server is pretty beffy I'd like to think but its been really sluggish lately. The GUI takes forever to load a tab. And the from the shell, it would take 1 minute or so run the command. Has anyone had this issue before?
Here are my specs:

-Freenas 9.2.1.8 x64
-SuperMicro X10SLL-FO
-Intel Xeon E3 1220 v3
-16GB ECC RAM
-4x Hitachi 2TB
-4x WD RED 3TB
-4x Seagate 4TB
-Stripped and mirrored
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,553
Does it respond sluggishly if you SSH into it?
Does it respond sluggishly if you type commands physically at the computer (connect monitor and keyboard)?
Are you running any jails / plugins?
Any interesting entries in /var/log/messages?
What services are running?
What does "top" output look like?
Are you using LACP?
Is your FreeNAS server exposed to the internet?
What type of network do you have?
How is your FreeNAS server connected to the network?
Does the problem go away if you directly connect the workstation to the server?
Does the problem occur if you log into the webgui from a different computer?
Are you running any custom scripts or cronjobs?
Do you have smart enabled?
What is the output of zpool status?
Do you have system email enabled?

Not quite 20 questions, but it will give you a start. :)
 

alxxer

Dabbler
Joined
May 7, 2013
Messages
38
See below
 

alxxer

Dabbler
Joined
May 7, 2013
Messages
38
Does it respond sluggishly if you SSH into it?
-Yes, also from the GUI shell

Does it respond sluggishly if you type commands physically at the computer (connect monitor and keyboard)?
-Haven't tried that yet I'm using iKVM from the supermicro board.

Are you running any jails / plugins?
- I was, but I've removed them all

Any interesting entries in /var/log/messages?
-I'll check this when I'm home later, and post

What services are running?
-I'll post later as well.

What does "top" output look like?
-I'll post later as well.

Are you using LACP?
-Yes

Is your FreeNAS server exposed to the internet?
-No

What type of network do you have?
-Flat network no vlans or anything

How is your FreeNAS server connected to the network?
-Modem -> Time Capsule -> Procurve Switch -> FreeNAS

Does the problem go away if you directly connect the workstation to the server?
-Did not try this but they are both on the same switch, I tried with WiFi and hard wired

Does the problem occur if you log into the webgui from a different computer?
-Yes

Are you running any custom scripts or cronjobs?
-No

Do you have smart enabled?
-Yes

What is the output of zpool status?
-I'll post later as well.

Do you have system email enabled?
-Yes, it connects to my gmail account
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,553
When you are doing your testing, try disabling LACP. I also once had a similar problem because I had configured LACP with one good cable and a few bad cables.
 

alxxer

Dabbler
Joined
May 7, 2013
Messages
38

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
How about:

Is SMART reporting any problems? (These should be in the footer of the WebGUI if there are any).
Do you do regular SMART tests on all of your disks?
Have you checked that your disk are in fact running the scheduled tests?
 

alxxer

Dabbler
Joined
May 7, 2013
Messages
38
How about:

Is SMART reporting any problems? (These should be in the footer of the WebGUI if there are any).
Do you do regular SMART tests on all of your disks?
Have you checked that your disk are in fact running the scheduled tests?

Result of messages on smartd:
Oct 21 15:13:03 freenas smartd[3084]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Oct 21 15:13:04 freenas smartd[3084]: Device: /dev/da4 [SAT], 496 Currently unreadable (pending) sectors
Oct 21 15:13:04 freenas smartd[3084]: Device: /dev/da4 [SAT], 496 Offline uncorrectable sectors
Oct 21 15:13:09 freenas kernel: done.
Oct 21 15:13:09 freenas ntpd[2812]: time reset -1.207315 s
Oct 21 15:14:12 freenas su: in pam_sm_authenticate(): (pam_group) neither luser nor ruser specified, assuming ruser
Oct 21 15:14:16 freenas su: alex to root on /dev/pts/0
Oct 21 15:18:45 freenas notifier: Stopping smartd.
Oct 21 15:18:46 freenas notifier: Waiting for PIDS: 3087.
Oct 21 15:18:46 freenas notifier: smartd not running? (check /var/run/smartd.pid).
Oct 21 15:18:46 freenas notifier: Starting smartd.
Oct 21 15:18:50 freenas smartd[4436]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Oct 21 15:18:53 freenas smartd[4436]: Device: /dev/da4 [SAT], 496 Currently unreadable (pending) sectors
Oct 21 15:18:56 freenas smartd[4436]: Device: /dev/da4 [SAT], 496 Offline uncorrectable sectors
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
There ya go.. ada1 and da4 are failing. da4 is really bad, ada1 is just starting to fail.

Edit: I bet if you fail those two disks out of the pool your performance will be fine.

Failing disks cause ZFS to vomit all over itself and the cleanup takes time. Meanwhile you are pissed because your server is sucking in the performance arena.
 

alxxer

Dabbler
Joined
May 7, 2013
Messages
38
When you are doing your testing, try disabling LACP. I also once had a similar problem because I had configured LACP with one good cable and a few bad cables.

Top Output:

35 processes: 1 running, 34 sleeping
CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 163M Active, 66M Inact, 497M Wired, 1168K Cache, 108M Buf, 15G Free
ARC: 7947K Total, 628K MFU, 5562K MRU, 16K Anon, 188K Header, 1552K Other
Swap: 20G Total, 20G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
3230 root 6 20 0 287M 97036K usem 2 0:01 0.00% python2.7
3383 root 12 20 0 143M 10908K uwait 1 0:00 0.00% collectd
3900 root 1 21 0 158M 46420K ttyin 2 0:00 0.00% python2.7
3389 root 4 52 0 168M 46104K select 0 0:00 0.00% python2.7
4641 root 1 20 0 16552K 2532K CPU3 3 0:00 0.00% top
3328 nobody 1 20 0 9904K 2148K select 2 0:00 0.00% mdnsd
3078 root 1 20 0 275M 17616K select 3 0:00 0.00% smbd
3963 root 1 20 0 69520K 5844K select 0 0:00 0.00% sshd
3968 root 1 20 0 17516K 3384K pause 3 0:00 0.00% csh
3322 www 1 20 0 26040K 5332K kqread 3 0:00 0.00% nginx
3965 alex 1 20 0 69520K 6064K select 1 0:00 0.00% sshd
2524 root 1 20 0 12032K 1732K select 1 0:00 0.00% syslogd
3106 root 1 31 10 18588K 3232K wait 2 0:00 0.00% sh
2812 root 1 20 0 22216K 3872K select 3 0:00 0.00% ntpd
3967 alex 1 20 0 45280K 2292K wait 1 0:00 0.00% su
3075 root 1 20 0 208M 12508K select 3 0:00 0.00% nmbd
3081 root 1 20 0 259M 15020K select 1 0:00 0.00% winbindd
3085 root 1 20 0 259M 15520K select 2 0:00 0.00% winbindd
3966 alex 1 20 0 12284K 2748K wait 2 0:00 0.00% bash
4493 root 1 35 0 28208K 4336K nanslp 3 0:00 0.00% smartd
3756 root 1 22 0 14132K 1808K nanslp 2 0:00 0.00% cron
3902 root 1 52 0 12040K 1616K ttyin 3 0:00 0.00% getty
3901 root 1 52 0 12040K 1616K ttyin 0 0:00 0.00% getty
3907 root 1 52 0 12040K 1616K ttyin 3 0:00 0.00% getty
3904 root 1 52 0 12040K 1616K ttyin 0 0:00 0.00% getty
 

alxxer

Dabbler
Joined
May 7, 2013
Messages
38
There ya go.. ada1 and da4 are failing. da4 is really bad, ada1 is just starting to fail.

Edit: I bet if you fail those two disks out of the pool your performance will be fine.

Failing disks cause ZFS to vomit all over itself and the cleanup takes time. Meanwhile you are pissed because your server is sucking in the performance arena.
Ok I did an RMA for both drives.
da4 is on advanced replacement.
Wow did not know that I would experience such a system slow down with failing disks, as I though that all disks are mirrored, and the OS is on a flash drive, so it would only complain about failing disk and still work.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Yep. The behavior is very common and is the one downside to relying on mechanical disks without TLER. ;)

SMART almost always catches the errors before ZFS does, so I have SMART monitoring email my cell phone so I can be informed immediately about any problems. ;)
 
Status
Not open for further replies.
Top