CPU Utilization 100% For Several Days

tjvanpat · Apr 10, 2014

Hi.

This is the second time I've checked into my FreeNAS box (running FreeNAS-9.2.1.2-RELEASE-x64) and found that the CPU utilization was at 100%...

...had been at 100% for several days...

Based on some of the other posts that have this kind of problem, I've collected the following data:

Code:

badass@freenas_1:~ % zpool status
  pool: volume_0
state: ONLINE
  scan: scrub repaired 0 in 2h41m with 0 errors on Sun Mar 23 06:41:20 2014
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        volume_0                                        ONLINE      0    0    0
          raidz2-0                                      ONLINE      0    0    0
            gptid/6a557eb9-a100-11e3-827a-002590d97629  ONLINE      0    0    0
            gptid/6aaa0492-a100-11e3-827a-002590d97629  ONLINE      0    0    0
            gptid/6afdd223-a100-11e3-827a-002590d97629  ONLINE      0    0    0
            gptid/6b50e62f-a100-11e3-827a-002590d97629  ONLINE      0    0    0
            gptid/6ba1e0ae-a100-11e3-827a-002590d97629  ONLINE      0    0    0
            gptid/6bf4aa35-a100-11e3-827a-002590d97629  ONLINE      0    0    0
 
errors: No known data errors

Code:

badass@freenas_1:~ % sudo gstat
dT: 1.001s  w: 1.000s
L(q)  ops/s    r/s  kBps  ms/r    w/s  kBps  ms/w  %busy Name
    0      0      0      0    0.0      0      0    0.0    0.0| ada0
    0      0      0      0    0.0      0      0    0.0    0.0| ada0p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada0p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada1
    0      0      0      0    0.0      0      0    0.0    0.0| ada2
    0      0      0      0    0.0      0      0    0.0    0.0| ada3
    0      0      0      0    0.0      0      0    0.0    0.0| ada4
    0      0      0      0    0.0      0      0    0.0    0.0| ada5
    0      0      0      0    0.0      0      0    0.0    0.0| ada1p1.eli
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6a557eb9-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| ada1p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada1p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada2p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada2p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada3p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada3p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada4p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada4p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada5p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada5p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada2p1.eli
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6aaa0492-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| ada3p1.eli
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6afdd223-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| ada4p1.eli
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6b50e62f-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| ada5p1.eli
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6ba1e0ae-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6bf4aa35-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| da0
    0      0      0      0    0.0      0      0    0.0    0.0| da0s1
    0      0      0      0    0.0      0      0    0.0    0.0| da0s2

Letting 'zpool iostat 1' run for a few seconds:

Code:

badass@freenas_1:~ % zpool iostat 1
              capacity    operations    bandwidth
pool        alloc  free  read  write  read  write
----------  -----  -----  -----  -----  -----  -----
volume_0    4.71T  6.16T      2    12  245K  319K
volume_0    4.71T  6.16T      0    210      0  1.17M
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0  12.0K
volume_0    4.71T  6.16T      0    180      0  847K
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0    180      0  847K
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0

And the cause of the 100% CPU utilization (the full output is available here):

Code:

badass@freenas_1:~ % ps auxH
USER        PID  %CPU %MEM    VSZ    RSS TT  STAT STARTED      TIME COMMAND
root      25427 100.0  0.6 127812  94084 ??  R    Wed04AM 2326:35.16 zpool history volume_0
root      27963 100.0  0.6 127812  94084 ??  R    Mon04AM 5205:01.55 zpool history volume_0
root      75495 100.0  0.6 127812  94084 ??  R    4:00AM  894:33.66 zpool history volume_0
root      76201  98.8  0.6 127812  94084 ??  R    Tue04AM 3765:31.39 zpool history volume_0

The hardware specs are as follows:
CPU
Intel Core i3-4130

Motherboard
SUPERMICRO MBD-X10SLM-F-O

RAM
16GB Crucial DDR3 SDRAM ECC Unbuffered DDR3 1600 (CT2KIT51272BD160B)

HDD
6x HGST/Hitachi Ultrastar 7K3000 2TB (HUA723020ALA641) in a ZRAID2 configuration

I have ZFS Scrubs scheduled for 4am on the 1st and 15th day of the month, every month, and every day except Friday and Saturday, with a 10 day threshold.

I have a recursive periodic snapshot scheduled for every 6 hours and available to run at any time during the day, every day of the week. This snapshot is then replicated to another FreeNAS box (also 9.2.1.2, same HW config, except only 8GB of RAM, and 6x 3TB WD Reds).

I have a long SMART test scheduled for all disks at 4am on the 8th and 22nd of the month, every month, and every day. The short tests are scheduled for 2am, every other day of the month, every day of the week.

So...any ideas what keeps causing this problem? My guess is that it's something related to the scrub, though I'm not sure what or why.

bigphil · Apr 10, 2014

A quick search found this thread...same issue.

jkh · Apr 10, 2014

tjvanpat said:
Hi.

This is the second time I've checked into my FreeNAS box (running FreeNAS-9.2.1.2-RELEASE-x64) and found that the CPU utilization was at 100%....

Known problem. Have you tried installing 9.2.1.4-BETA? It addresses several CPU spin issues.

tjvanpat · Apr 10, 2014

Ah, seems my googling didn't work out as well as I'd have hoped. I haven't tried 9.2.1.4-BETA, but I suppose I can give that a shot if it's considered stable enough. Otherwise, checking in once a day and doing a reboot if necessary to clean things up isn't too big of a deal (compared to shifting to the backup and doing a restore if the beta hits a problem).

cyberjock · Apr 10, 2014

jkh and I have exchanged emails... and 9.2.1.4 is supposed to be *the* 9.2.1 release. I'd definitely give it a whirl if you haven't yet.

tjvanpat · Apr 10, 2014

Ok, I'll give it a shot on my backup tomorrow night (I haven't done an update yet, and it's a bit late to start tonight), and let that run for a day or two, then try my main box next if all goes well.

Thanks for all of the quick help.

tjvanpat · Apr 15, 2014

I've upgraded to 9.2.1.4-BETA around noon on Monday (4/14/14), and I'm still seeing the same problem:

Code:

last pid:  4657;  load averages:  1.09,  1.07,  1.03  up 1+10:03:27    22:14:48
70 processes:  2 running, 68 sleeping
 
Mem: 545M Active, 673M Inact, 13G Wired, 370M Buf, 1486M Free
ARC: 12G Total, 11G MFU, 546M MRU, 4843K Anon, 55M Header, 327M Other
Swap: 12G Total, 12G Free
 
 
  PID USERNAME    THR PRI NICE  SIZE    RES STATE  C  TIME  WCPU COMMAND
63016 root          1 103    0  125M 94084K CPU3    3  18.2H 100.00% zpool
6031    816      33  20    0  314M  140M usem    0  4:01  0.00% python2.7
3553 root        12  20    0  146M 16752K uwait  0  1:57  0.00% collectd
7138    816      20  20    0  384M  229M select  1  1:32  0.00% python2.7
5462    972      13  35  15  305M  137M select  2  0:46  0.00% python
2822 root          6  20    0  311M  124M usem    1  0:38  0.00% python2.7
4925    972      16  20    0  206M 93072K uwait  1  0:18  0.00% Plex Media Server
6012    972      14  20    0  190M 45176K uwait  1  0:17  0.00% Plex DLNA Server
93232 root          1  20    0  330M 27996K select  2  0:09  0.00% smbd
14994 uucp          1  20    0 14232K  3024K select  0  0:07  0.00% usbhid-ups
5137 root          6  20    0  139M 28676K usem    3  0:05  0.00% python2.7
7564 root          6  20    0  139M 29168K usem    3  0:04  0.00% python2.7
6265 root          6  20    0  144M 29456K usem    3  0:04  0.00% python2.7
13702 root          1  20    0  275M 17476K select  0  0:03  0.00% smbd
2437 root          1  20    0 22220K  3896K select  3  0:02  0.00% ntpd
4542    972      10  20    0  223M 45620K select  3  0:01  0.00% python
14996 uucp          1  20    0 20352K  3800K select  1  0:01  0.00% upsd
4547    972      10  20    0  215M 45796K select  3  0:01  0.00% python

So...any ideas on this one? Is there any other data that would be useful for this?

cyberjock · Apr 15, 2014

Scrub in progress?

tjvanpat · Apr 15, 2014

Nope.

Code:

[root@freenas_1 ~]# zpool status                                                                                                   
  pool: volume_0                                                                                                                   
state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 2h50m with 0 errors on Sun Apr 13 22:03:47 2014                                                       
config:                                                                                                                           
                                                                                                                                   
        NAME                                            STATE    READ WRITE CKSUM                                                 
        volume_0                                        ONLINE      0    0    0                                                 
          raidz2-0                                      ONLINE      0    0    0                                                 
            gptid/6a557eb9-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
            gptid/6aaa0492-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
            gptid/6afdd223-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
            gptid/6b50e62f-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
            gptid/6ba1e0ae-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
            gptid/6bf4aa35-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
                                                                                                                                   
errors: No known data errors

jkh · Apr 16, 2014

This is https://bugs.freenas.org/issues/4736

It will be fixed in 9.2.1.4-RELEASE. It didn't get fixed in time to make the BETA, though there is also a nightly rolled from the 9.2.1-BRANCH with the fix in it if you don't want to wait - see http://download.freenas.org/nightlies/9.2.1.4/BETA/20140415/

cyberjock · Apr 16, 2014

So.. beta2!?

jkh · Apr 16, 2014

cyberjock said:
So.. beta2!?

Or an RC. We'll see - we have some more bugs in that milestone we want to deal with first.

HoneyBadger · Apr 16, 2014

Bug tracker says it's going to be resolved as of 9.2.2-alpha ... is 9.2.1.4-rel going to sneak in there beforehand?

Meanwhile I'm going to sit a point release back on 9.2.0 until this shakes out.

Important Announcement for the TrueNAS Community.

CPU Utilization 100% For Several Days

tjvanpat

Dabbler

bigphil

Patron

jkh

Guest

tjvanpat

Dabbler

cyberjock

Inactive Account

tjvanpat

Dabbler

tjvanpat

Dabbler

cyberjock

Inactive Account

tjvanpat

Dabbler

jkh

Guest

cyberjock

Inactive Account

jkh

Guest

HoneyBadger

actually does care

Similar threads

Important Announcement for the TrueNAS Community.

CPU Utilization 100% For Several Days

Dabbler

Patron

jkh

Guest

Dabbler

Inactive Account

Dabbler

Dabbler

Inactive Account

Dabbler

jkh

Guest

Inactive Account

jkh

Guest

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "CPU Utilization 100% For Several Days"

Similar threads