11.2beta2 zfskern 100% cpu server not reachable

Wizermil · Sep 3, 2018

twice a day zfskern reach 100% without any reasons the only solution we found to fix it is to stop samba and nfs for 10-15 min until the load on server reach it's normal values.
We checked for hardware issues and didn't find any so we are looking for some help with someone with dtrace/debugging skills to extract a coherent flamegraph that we could use to find a solution to this annoying issue

thanks

Code:

   15 root	   -8   0	 0   256 S 100.  0.0  2h18:38 zfskern
69040 root	   20   0  209M  165M S  6.7  0.1 30:28.36 python3.6: middlewared
	0 root	  -16   0	 0 11072 S  0.7  0.0 21h07:27 kernel
   26 root	   16   0	 0	16 S  0.0  0.0 18:52.41 syncer

kdragon75 · Sep 3, 2018

Wizermil said:
twice a day zfskern reach 100% without any reasons the only solution we found to fix it is to stop samba and nfs for 10-15 min until the load on server reach it's normal values.
We checked for hardware issues and didn't find any so we are looking for some help with someone with dtrace/debugging skills to extract a coherent flamegraph that we could use to find a solution to this annoying issue

thanks

Code:
15 root -8 0 0 256 S 100. 0.0 2h18:38 zfskern 69040 root 20 0 209M 165M S 6.7 0.1 30:28.36 python3.6: middlewared 0 root -16 0 0 11072 S 0.7 0.0 21h07:27 kernel 26 root 16 0 0 16 S 0.0 0.0 18:52.41 syncer

I cant help with this but you may want to provide some basic information about your pool, how its used, and the version of FreeNAS that your running. We then at least have some starting point.

Wizermil · Sep 3, 2018

kdragon75 said:
I can't help with this but you may want to provide some basic information about your pool, how its used, and the version of FreeNAS that your running. We then at least have some starting point.

FreeNAS 11.2-BETA2

Code:

zpool status tank
  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 0 days 06:16:18 with 0 errors on Sat Sep  1 06:16:22 2018
config:

	NAME											STATE	 READ WRITE CKSUM
	tank											ONLINE	   0	 0	 0
	  raidz2-0									  ONLINE	   0	 0	 0
		gptid/b720873a-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/b78fb359-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/b7facf85-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/b8698a60-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
	  raidz2-1									  ONLINE	   0	 0	 0
		gptid/b8dbf370-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/b94920ca-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/b9b1dcae-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/ba1e3d47-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
	  raidz2-2									  ONLINE	   0	 0	 0
		gptid/ba92b975-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/bb00915c-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/bb6c8cfd-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
		gptid/bbd704ec-f515-11e5-a594-0cc47a523364  ONLINE	   0	 0	 0
	  raidz2-4									  ONLINE	   0	 0	 0
		multipath/disk1							 ONLINE	   0	 0	 0
		multipath/disk2							 ONLINE	   0	 0	 0
		multipath/disk3							 ONLINE	   0	 0	 0
		multipath/disk4							 ONLINE	   0	 0	 0
	logs
	  gptid/bc595311-f515-11e5-a594-0cc47a523364	ONLINE	   0	 0	 0
	cache
	  ada1										  ONLINE	   0	 0	 0
	spares
	  gptid/ee5b86c5-f515-11e5-a594-0cc47a523364	AVAIL
	  multipath/disk5							   AVAIL

errors: No known data errors

Wizermil · Sep 5, 2018

We downgraded to 11.1-U6 but we have the same problem.
FYI when we reached the highest load (~30 for a 2 sockets 2 cores) we noticed that several processes were consuming some cpu middlewared, autosnapshot.py (stacking like crazy I had to kill them manually) and zfs upgrade (I don't know what could have trigger this command :( )

Can it be related to the fact that our pool is not balanced? We added larger disks when we saw that we were reaching the limit of free space

Code:

zpool iostat -v tank
										   capacity	 operations	bandwidth
pool									alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
tank									25.4T  32.4T	  8	 38   317K  1.49M
  raidz2								6.91T   287G	  1	  6  62.0K   126K
	gptid/b720873a-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  15.6K  72.4K
	gptid/b78fb359-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  15.7K  72.4K
	gptid/b7facf85-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  15.7K  72.4K
	gptid/b8698a60-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  15.7K  72.4K
  raidz2								6.90T   294G	  2	  8  63.9K   144K
	gptid/b8dbf370-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.1K  83.4K
	gptid/b94920ca-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.1K  83.4K
	gptid/b9b1dcae-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.2K  83.4K
	gptid/ba1e3d47-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.1K  83.4K
  raidz2								6.90T   298G	  2	  8  65.1K   147K
	gptid/ba92b975-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.6K  85.2K
	gptid/bb00915c-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.5K  85.1K
	gptid/bb6c8cfd-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.4K  85.2K
	gptid/bbd704ec-f515-11e5-a594-0cc47a523364	  -	  -	  0	  2  16.5K  85.2K
  raidz2								4.73T  31.5T	  2	 11   126K   241K
	multipath/disk1						 -	  -	  0	  3  31.9K   137K
	multipath/disk2						 -	  -	  0	  3  31.8K   137K
	multipath/disk3						 -	  -	  0	  3  31.7K   137K
	multipath/disk4						 -	  -	  0	  3  31.8K   137K
logs										-	  -	  -	  -	  -	  -
  gptid/bc595311-f515-11e5-a594-0cc47a523364  1.12M   186G	  0	  3	  9   867K
cache									   -	  -	  -	  -	  -	  -
  ada1								  59.1G   418G	  0	  0	  0   488K
--------------------------------------  -----  -----  -----  -----  -----  -----

Important Announcement for the TrueNAS Community.

11.2beta2 zfskern 100% cpu server not reachable

Wizermil

Cadet

kdragon75

Wizard

Wizermil

Cadet

Wizermil

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

11.2beta2 zfskern 100% cpu server not reachable

Wizermil

Cadet

kdragon75

Wizard

Wizermil

Cadet

Wizermil

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "11.2beta2 zfskern 100% cpu server not reachable"

Similar threads