11.2beta2 zfskern 100% cpu server not reachable

Status
Not open for further replies.

Wizermil

Cadet
Joined
Jul 13, 2017
Messages
4
twice a day zfskern reach 100% without any reasons the only solution we found to fix it is to stop samba and nfs for 10-15 min until the load on server reach it's normal values.
We checked for hardware issues and didn't find any so we are looking for some help with someone with dtrace/debugging skills to extract a coherent flamegraph that we could use to find a solution to this annoying issue

thanks

Code:
   15 root	   -8   0	 0   256 S 100.  0.0  2h18:38 zfskern
69040 root	   20   0  209M  165M S  6.7  0.1 30:28.36 python3.6: middlewared
	0 root	  -16   0	 0 11072 S  0.7  0.0 21h07:27 kernel
   26 root	   16   0	 0	16 S  0.0  0.0 18:52.41 syncer
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
twice a day zfskern reach 100% without any reasons the only solution we found to fix it is to stop samba and nfs for 10-15 min until the load on server reach it's normal values.
We checked for hardware issues and didn't find any so we are looking for some help with someone with dtrace/debugging skills to extract a coherent flamegraph that we could use to find a solution to this annoying issue

thanks

Code:
   15 root	   -8   0	 0   256 S 100.  0.0  2h18:38 zfskern
69040 root	   20   0  209M  165M S  6.7  0.1 30:28.36 python3.6: middlewared
	0 root	  -16   0	 0 11072 S  0.7  0.0 21h07:27 kernel
   26 root	   16   0	 0	16 S  0.0  0.0 18:52.41 syncer
I cant help with this but you may want to provide some basic information about your pool, how its used, and the version of FreeNAS that your running. We then at least have some starting point.
 

Wizermil

Cadet
Joined
Jul 13, 2017
Messages
4
I can't help with this but you may want to provide some basic information about your pool, how its used, and the version of FreeNAS that your running. We then at least have some starting point.

FreeNAS 11.2-BETA2

Code:
zpool status tank
  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 0 days 06:16:18 with 0 errors on Sat Sep  1 06:16:22 2018
config:

	NAME											STATE	 READ WRITE CKSUM
	tank											ONLINE	   0	 0	 0
	  raidz2-0									  ONLINE	   0	 0	 0
		gptid/b720873a-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/b78fb359-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/b7facf85-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/b8698a60-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
	  raidz2-1									  ONLINE	   0	 0	 0
		gptid/b8dbf370-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/b94920ca-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/b9b1dcae-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/ba1e3d47-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
	  raidz2-2									  ONLINE	   0	 0	 0
		gptid/ba92b975-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/bb00915c-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/bb6c8cfd-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/bbd704ec-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
	  raidz2-4									  ONLINE	   0	 0	 0
		multipath/disk1							 ONLINE	   0	 0	 0
		multipath/disk2							 ONLINE	   0	 0	 0
		multipath/disk3							 ONLINE	   0	 0	 0
		multipath/disk4							 ONLINE	   0	 0	 0
	logs
	  gptid/bc595311-f515-11e5-a594-0cc47a523364	ONLINE	   0	 0	 0
	cache
	  ada1										  ONLINE	   0	 0	 0
	spares
	  gptid/ee5b86c5-f515-11e5-a594-0cc47a523364	AVAIL
	  multipath/disk5							   AVAIL

errors: No known data errors
 

Wizermil

Cadet
Joined
Jul 13, 2017
Messages
4
We downgraded to 11.1-U6 but we have the same problem.
FYI when we reached the highest load (~30 for a 2 sockets 2 cores) we noticed that several processes were consuming some cpu middlewared, autosnapshot.py (stacking like crazy I had to kill them manually) and zfs upgrade (I don't know what could have trigger this command :( )

Can it be related to the fact that our pool is not balanced? We added larger disks when we saw that we were reaching the limit of free space
Code:
zpool iostat -v tank
										   capacity	 operations	bandwidth
pool									alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
tank									25.4T  32.4T	  8	 38   317K  1.49M
  raidz2								6.91T   287G	  1	  6  62.0K   126K
	gptid/b720873a-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  15.6K  72.4K
	gptid/b78fb359-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  15.7K  72.4K
	gptid/b7facf85-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  15.7K  72.4K
	gptid/b8698a60-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  15.7K  72.4K
  raidz2								6.90T   294G	  2	  8  63.9K   144K
	gptid/b8dbf370-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.1K  83.4K
	gptid/b94920ca-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.1K  83.4K
	gptid/b9b1dcae-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.2K  83.4K
	gptid/ba1e3d47-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.1K  83.4K
  raidz2								6.90T   298G	  2	  8  65.1K   147K
	gptid/ba92b975-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.6K  85.2K
	gptid/bb00915c-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.5K  85.1K
	gptid/bb6c8cfd-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.4K  85.2K
	gptid/bbd704ec-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.5K  85.2K
  raidz2								4.73T  31.5T	  2	 11   126K   241K
	multipath/disk1						 -	  -	  0	  3  31.9K   137K
	multipath/disk2						 -	  -	  0	  3  31.8K   137K
	multipath/disk3						 -	  -	  0	  3  31.7K   137K
	multipath/disk4						 -	  -	  0	  3  31.8K   137K
logs										-	  -	  -	  -	  -	  -
  gptid/bc595311-f515-11e5-a594-0cc47a523364  1.12M   186G	  0	  3	  9   867K
cache									   -	  -	  -	  -	  -	  -
  ada1								  59.1G   418G	  0	  0	  0   488K
--------------------------------------  -----  -----  -----  -----  -----  -----
 
Last edited:
Status
Not open for further replies.
Top