CPU Utilization 100% For Several Days

Status
Not open for further replies.

tjvanpat

Dabbler
Joined
Mar 2, 2014
Messages
16
Hi.

This is the second time I've checked into my FreeNAS box (running FreeNAS-9.2.1.2-RELEASE-x64) and found that the CPU utilization was at 100%...

oEHuIt7.png


...had been at 100% for several days...

9oSk1vy.png


Based on some of the other posts that have this kind of problem, I've collected the following data:


Code:
badass@freenas_1:~ % zpool status
  pool: volume_0
state: ONLINE
  scan: scrub repaired 0 in 2h41m with 0 errors on Sun Mar 23 06:41:20 2014
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        volume_0                                        ONLINE      0    0    0
          raidz2-0                                      ONLINE      0    0    0
            gptid/6a557eb9-a100-11e3-827a-002590d97629  ONLINE      0    0    0
            gptid/6aaa0492-a100-11e3-827a-002590d97629  ONLINE      0    0    0
            gptid/6afdd223-a100-11e3-827a-002590d97629  ONLINE      0    0    0
            gptid/6b50e62f-a100-11e3-827a-002590d97629  ONLINE      0    0    0
            gptid/6ba1e0ae-a100-11e3-827a-002590d97629  ONLINE      0    0    0
            gptid/6bf4aa35-a100-11e3-827a-002590d97629  ONLINE      0    0    0
 
errors: No known data errors


Code:
badass@freenas_1:~ % sudo gstat
dT: 1.001s  w: 1.000s
L(q)  ops/s    r/s  kBps  ms/r    w/s  kBps  ms/w  %busy Name
    0      0      0      0    0.0      0      0    0.0    0.0| ada0
    0      0      0      0    0.0      0      0    0.0    0.0| ada0p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada0p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada1
    0      0      0      0    0.0      0      0    0.0    0.0| ada2
    0      0      0      0    0.0      0      0    0.0    0.0| ada3
    0      0      0      0    0.0      0      0    0.0    0.0| ada4
    0      0      0      0    0.0      0      0    0.0    0.0| ada5
    0      0      0      0    0.0      0      0    0.0    0.0| ada1p1.eli
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6a557eb9-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| ada1p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada1p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada2p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada2p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada3p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada3p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada4p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada4p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada5p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada5p2
    0      0      0      0    0.0      0      0    0.0    0.0| ada2p1.eli
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6aaa0492-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| ada3p1.eli
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6afdd223-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| ada4p1.eli
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6b50e62f-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| ada5p1.eli
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6ba1e0ae-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| gptid/6bf4aa35-a100-11e3-827a-002590d97629
    0      0      0      0    0.0      0      0    0.0    0.0| da0
    0      0      0      0    0.0      0      0    0.0    0.0| da0s1
    0      0      0      0    0.0      0      0    0.0    0.0| da0s2


Letting 'zpool iostat 1' run for a few seconds:

Code:
badass@freenas_1:~ % zpool iostat 1
              capacity    operations    bandwidth
pool        alloc  free  read  write  read  write
----------  -----  -----  -----  -----  -----  -----
volume_0    4.71T  6.16T      2    12  245K  319K
volume_0    4.71T  6.16T      0    210      0  1.17M
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0  12.0K
volume_0    4.71T  6.16T      0    180      0  847K
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0    180      0  847K
volume_0    4.71T  6.16T      0      0      0      0
volume_0    4.71T  6.16T      0      0      0      0


And the cause of the 100% CPU utilization (the full output is available here):
Code:
badass@freenas_1:~ % ps auxH
USER        PID  %CPU %MEM    VSZ    RSS TT  STAT STARTED      TIME COMMAND
root      25427 100.0  0.6 127812  94084 ??  R    Wed04AM 2326:35.16 zpool history volume_0
root      27963 100.0  0.6 127812  94084 ??  R    Mon04AM 5205:01.55 zpool history volume_0
root      75495 100.0  0.6 127812  94084 ??  R    4:00AM  894:33.66 zpool history volume_0
root      76201  98.8  0.6 127812  94084 ??  R    Tue04AM 3765:31.39 zpool history volume_0


The hardware specs are as follows:
CPU
Intel Core i3-4130

Motherboard
SUPERMICRO MBD-X10SLM-F-O

RAM
16GB Crucial DDR3 SDRAM ECC Unbuffered DDR3 1600 (CT2KIT51272BD160B)

HDD
6x HGST/Hitachi Ultrastar 7K3000 2TB (HUA723020ALA641) in a ZRAID2 configuration

I have ZFS Scrubs scheduled for 4am on the 1st and 15th day of the month, every month, and every day except Friday and Saturday, with a 10 day threshold.

I have a recursive periodic snapshot scheduled for every 6 hours and available to run at any time during the day, every day of the week. This snapshot is then replicated to another FreeNAS box (also 9.2.1.2, same HW config, except only 8GB of RAM, and 6x 3TB WD Reds).

I have a long SMART test scheduled for all disks at 4am on the 8th and 22nd of the month, every month, and every day. The short tests are scheduled for 2am, every other day of the month, every day of the week.

So...any ideas what keeps causing this problem? My guess is that it's something related to the scrub, though I'm not sure what or why.
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
A quick search found this thread...same issue.
 
J

jkh

Guest
Hi.

This is the second time I've checked into my FreeNAS box (running FreeNAS-9.2.1.2-RELEASE-x64) and found that the CPU utilization was at 100%....

Known problem. Have you tried installing 9.2.1.4-BETA? It addresses several CPU spin issues.
 

tjvanpat

Dabbler
Joined
Mar 2, 2014
Messages
16
Ah, seems my googling didn't work out as well as I'd have hoped. I haven't tried 9.2.1.4-BETA, but I suppose I can give that a shot if it's considered stable enough. Otherwise, checking in once a day and doing a reboot if necessary to clean things up isn't too big of a deal (compared to shifting to the backup and doing a restore if the beta hits a problem).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
jkh and I have exchanged emails... and 9.2.1.4 is supposed to be *the* 9.2.1 release. I'd definitely give it a whirl if you haven't yet.
 

tjvanpat

Dabbler
Joined
Mar 2, 2014
Messages
16
Ok, I'll give it a shot on my backup tomorrow night (I haven't done an update yet, and it's a bit late to start tonight), and let that run for a day or two, then try my main box next if all goes well.

Thanks for all of the quick help.
 

tjvanpat

Dabbler
Joined
Mar 2, 2014
Messages
16
I've upgraded to 9.2.1.4-BETA around noon on Monday (4/14/14), and I'm still seeing the same problem:

IXJvEgj.png


Code:
last pid:  4657;  load averages:  1.09,  1.07,  1.03  up 1+10:03:27    22:14:48
70 processes:  2 running, 68 sleeping
 
Mem: 545M Active, 673M Inact, 13G Wired, 370M Buf, 1486M Free
ARC: 12G Total, 11G MFU, 546M MRU, 4843K Anon, 55M Header, 327M Other
Swap: 12G Total, 12G Free
 
 
  PID USERNAME    THR PRI NICE  SIZE    RES STATE  C  TIME  WCPU COMMAND
63016 root          1 103    0  125M 94084K CPU3    3  18.2H 100.00% zpool
6031    816      33  20    0  314M  140M usem    0  4:01  0.00% python2.7
3553 root        12  20    0  146M 16752K uwait  0  1:57  0.00% collectd
7138    816      20  20    0  384M  229M select  1  1:32  0.00% python2.7
5462    972      13  35  15  305M  137M select  2  0:46  0.00% python
2822 root          6  20    0  311M  124M usem    1  0:38  0.00% python2.7
4925    972      16  20    0  206M 93072K uwait  1  0:18  0.00% Plex Media Server
6012    972      14  20    0  190M 45176K uwait  1  0:17  0.00% Plex DLNA Server
93232 root          1  20    0  330M 27996K select  2  0:09  0.00% smbd
14994 uucp          1  20    0 14232K  3024K select  0  0:07  0.00% usbhid-ups
5137 root          6  20    0  139M 28676K usem    3  0:05  0.00% python2.7
7564 root          6  20    0  139M 29168K usem    3  0:04  0.00% python2.7
6265 root          6  20    0  144M 29456K usem    3  0:04  0.00% python2.7
13702 root          1  20    0  275M 17476K select  0  0:03  0.00% smbd
2437 root          1  20    0 22220K  3896K select  3  0:02  0.00% ntpd
4542    972      10  20    0  223M 45620K select  3  0:01  0.00% python
14996 uucp          1  20    0 20352K  3800K select  1  0:01  0.00% upsd
4547    972      10  20    0  215M 45796K select  3  0:01  0.00% python


So...any ideas on this one? Is there any other data that would be useful for this?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Scrub in progress?
 

tjvanpat

Dabbler
Joined
Mar 2, 2014
Messages
16
Nope.

Code:
[root@freenas_1 ~]# zpool status                                                                                                   
  pool: volume_0                                                                                                                   
state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 2h50m with 0 errors on Sun Apr 13 22:03:47 2014                                                       
config:                                                                                                                           
                                                                                                                                   
        NAME                                            STATE    READ WRITE CKSUM                                                 
        volume_0                                        ONLINE      0    0    0                                                 
          raidz2-0                                      ONLINE      0    0    0                                                 
            gptid/6a557eb9-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
            gptid/6aaa0492-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
            gptid/6afdd223-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
            gptid/6b50e62f-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
            gptid/6ba1e0ae-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
            gptid/6bf4aa35-a100-11e3-827a-002590d97629  ONLINE      0    0    0                                                 
                                                                                                                                   
errors: No known data errors
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So.. beta2!?
 
Status
Not open for further replies.
Top