FreeNAS locks up for a few seconds, then returns to normal functionality

Status
Not open for further replies.

lentil

Cadet
Joined
Aug 19, 2018
Messages
9
Hello,

I have setup a FreeNAS system in esxi with an HBA passed through. At times I see the memory spike to consume all 35 GB of RAM (looking at the reporting tool), and during this time the system is completely unresponsive. SSH, HTTPS, Network Sharing, Shell access all do not work while the memory spikes. I can't determine the root cause of this, whether it is an issue with Samba or FreeNAS. I am not entirely familiar with FreeBSD. This is preventing some service from working (namely system backups and larger file transfers) from working. I have debugging setup on both SMB and syslog in an attempt to find the information.

Clients accessing this are three Ubuntu 18.04 machines, two Windows 10, one Windows Server, two macOS devices.

Configuration wise:
Bare metal hardware:
Code:
AMD: 1700X
Motherboard: ASRock X470 Master SLI/ac
RAM: 64 GB Corsair DDR4 ECC
SSD1: Intel S3520
SSD2: OCS Vertex 4
EVGA GT710
LSI 9211-8i
Seasonic 400W Platinum
OS: ESXi 6.7


FreeNAS 11.2 "hardware":
Code:
4 CPU Cores
35 GB RAM
LSI 9211-8i (Passthrough)
6 x 6TB WD RED
1 x 120GB Intel SSD ZIL


zpool status:
Code:
pool: zfs_pool
 state: ONLINE
  scan: none requested
config:

	NAME												STATE	 READ WRITE CKSUM
	zfs_pool											ONLINE	   0	 0	 0
	  raidz2-0										  ONLINE	   0	 0	 0
		gptid/993c2ebc-a1d5-11e8-97ef-000c294af712.eli  ONLINE	   0	 0	 0
		gptid/9b428677-a1d5-11e8-97ef-000c294af712.eli  ONLINE	   0	 0	 0
		gptid/9d7ee4ea-a1d5-11e8-97ef-000c294af712.eli  ONLINE	   0	 0	 0
		gptid/9f88b60b-a1d5-11e8-97ef-000c294af712.eli  ONLINE	   0	 0	 0
		gptid/a1c84e0e-a1d5-11e8-97ef-000c294af712.eli  ONLINE	   0	 0	 0
		gptid/a3e93bef-a1d5-11e8-97ef-000c294af712.eli  ONLINE	   0	 0	 0
	logs
	  gptid/a56f4e93-a1d5-11e8-97ef-000c294af712.eli	ONLINE	   0	 0	 0

errors: No known data errors


zfs list
Code:
NAME														USED  AVAIL  REFER  MOUNTPOINT
freenas-boot												869M  10.3G	64K  none
freenas-boot/ROOT										   869M  10.3G	29K  none
freenas-boot/ROOT/Initial-Install							 1K  10.3G   866M  legacy
freenas-boot/ROOT/default								   869M  10.3G   867M  legacy
zfs_pool												   1.68T  19.9T   176K  /mnt/zfs_pool
zfs_pool/.system										   73.2M  19.9T   192K  legacy
zfs_pool/.system/configs-c6df7f068c2c4171917a68f70a853917   631K  19.9T   631K  legacy
zfs_pool/.system/cores									 4.10M  19.9T  4.10M  legacy
zfs_pool/.system/rrd-c6df7f068c2c4171917a68f70a853917	  19.4M  19.9T  19.4M  legacy
zfs_pool/.system/samba4									 647K  19.9T   647K  legacy
zfs_pool/.system/syslog-c6df7f068c2c4171917a68f70a853917   48.1M  19.9T  48.1M  legacy
zfs_pool/.system/webui									  176K  19.9T   176K  legacy
zfs_pool/home											   983K  19.9T   983K  /mnt/zfs_pool/home
zfs_pool/storage											441G  19.9T   411G  /mnt/zfs_pool/storage
zfs_pool/storage/games									 29.7G  1.97T  29.7G  /mnt/zfs_pool/storage/games
zfs_pool/media										 1.25T  2.78T  1.25T  /mnt/zfs_pool/media


zpool get all zfs_pool
Code:
NAME	  PROPERTY					   VALUE						  SOURCE
zfs_pool  size						   32.5T						  -
zfs_pool  capacity					   4%							 -
zfs_pool  altroot						/mnt						   local
zfs_pool  health						 ONLINE						 -
zfs_pool  guid						   10932289593781872771		   default
zfs_pool  version						-							  default
zfs_pool  bootfs						 -							  default
zfs_pool  delegation					 on							 default
zfs_pool  autoreplace					off							default
zfs_pool  cachefile					  /data/zfs/zpool.cache		  local
zfs_pool  failmode					   continue					   local
zfs_pool  listsnapshots				  off							default
zfs_pool  autoexpand					 on							 local
zfs_pool  dedupditto					 0							  default
zfs_pool  dedupratio					 1.60x						  -
zfs_pool  free						   30.9T						  -
zfs_pool  allocated					  1.58T						  -
zfs_pool  readonly					   off							-
zfs_pool  comment						-							  default
zfs_pool  expandsize					 -							  -
zfs_pool  freeing						0							  default
zfs_pool  fragmentation				  1%							 -
zfs_pool  leaked						 0							  default
zfs_pool  bootsize					   -							  default
zfs_pool  checkpoint					 -							  -
zfs_pool  feature@async_destroy		  enabled						local
zfs_pool  feature@empty_bpobj			active						 local
zfs_pool  feature@lz4_compress		   active						 local
zfs_pool  feature@multi_vdev_crash_dump  enabled						local
zfs_pool  feature@spacemap_histogram	 active						 local
zfs_pool  feature@enabled_txg			active						 local
zfs_pool  feature@hole_birth			 active						 local
zfs_pool  feature@extensible_dataset	 active						 local
zfs_pool  feature@embedded_data		  active						 local
zfs_pool  feature@bookmarks			  enabled						local
zfs_pool  feature@filesystem_limits	  enabled						local
zfs_pool  feature@large_blocks		   active						 local
zfs_pool  feature@sha512				 enabled						local
zfs_pool  feature@skein				  enabled						local
zfs_pool  feature@device_removal		 enabled						local
zfs_pool  feature@obsolete_counts		enabled						local
zfs_pool  feature@zpool_checkpoint	   enabled						local
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'm making an assumption that you have not locked all 35GB of RAM for your FreeNAS VM, which you should do. I suspect that you have maxed out your RAM and are swapping RAM out under ESXi. Just a guess.
 

lentil

Cadet
Joined
Aug 19, 2018
Messages
9
Hey Joeschmuck,

Unfortunately, I have done that. :(
upload_2018-8-19_15-2-56.png

The yellow triangle is due to vCenter - not any issues.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Are your other VM's locked as well? They typically don't need to be but it still sounds like a memory swap issue. I'm not running ESXi 6.7 yet, maybe in a few months.
 

lentil

Cadet
Joined
Aug 19, 2018
Messages
9
Are your other VM's locked as well? They typically don't need to be but it still sounds like a memory swap issue. I'm not running ESXi 6.7 yet, maybe in a few months.
They are not, I still have ~ 16 GB of unused memory left on the host, as I have not finalized the entire build. Though I can give it a shot - it would require rebooting the hosts AFAIK.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Are your other VM's locked as well? They typically don't need to be but it still sounds like a memory swap issue. I'm not running ESXi 6.7 yet, maybe in a few months.
Reserved is reserved. If you lock all memory the VM WILL NOT START no matter how many times you try, until the VMs requirements are met.
it would require rebooting the hosts AFAIK.
The hosts or VMs? Please be concise and correct with your terminology. The hosts do not need a reboot when setting memory reservations.
The yellow triangle is due to vCenter - not any issues.
Please elaborate on the yellow triangle.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

lentil

Cadet
Joined
Aug 19, 2018
Messages
9
Reserved is reserved. If you lock all memory the VM WILL NOT START no matter how many times you try, until the VMs requirements are met.

The hosts or VMs? Please be concise and correct with your terminology. The hosts do not need a reboot when setting memory reservations.

Please elaborate on the yellow triangle.

Correct, I mean guests/VMs, you would have to reboot the VMs to reserve the memory function. The yellow triangle just states that any changes made outside of vCenter (because I logged into the esxi console itself and not vCenter) would be reversed, so again, there is no issue with the yellow triangle being there.

When you were building this, did you look at the build that @Stux did? He gave a lot of details about how to configure the system.

Build Report: Node 304 + X10SDV-TLN4F [ESXi/FreeNAS AIO]
https://forums.freenas.org/index.ph...node-304-x10sdv-tln4f-esxi-freenas-aio.57116/

Also, that SATA SSD is going to limit you performance. Take a look at the testing that @Stux did

Testing the benefits of SLOG
https://forums.freenas.org/index.php?threads/testing-the-benefits-of-slog-using-a-ram-disk.56561

I will take a look at that soon. To be fair, I am using DC grade Intel SSDs, so the Intel SSD that I mentioned is an DC S3520.
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I'm making an assumption that you have not locked all 35GB of RAM for your FreeNAS VM, which you should do. I suspect that you have maxed out your RAM and are swapping RAM out under ESXi. Just a guess.

IIRC, in order to do PCIe passthrough you need to lock down the memory.

ZFS will lock up like you see if you are slamming it faster than data can be written... this might happen if you have intra-esxi networking between VMs and FreeNAS which can easily hit 20gbps, which your 6-way RaidZ2 pool will not keep up with.

There are some properties you can fiddle with if this is the case.

Other thing is that I configured the "shares" priority on the FreeNAS instance higher than anything else... basically if FreeNAS is wanting for CPU time etc... then everything else will block.
 
Last edited:

lentil

Cadet
Joined
Aug 19, 2018
Messages
9
As far as the hardware/guest goes, here is a full page layout of esxi's UI:
upload_2018-8-19_20-18-52.png



I am using the default networking vswitch layout, I am not sure if the vSwitch group will be of any benefit right now as a lot of this would also be limited by the drives would it not? While I can see the major benefits of the vSwitch for NFS/iSCIS, I don't think the same would apply for Samba4, would it? Though I am willing to give it a shot for this ESXi server.

@Stux just to clarify are you talking about the settings below, resource pools (within ESXi), or something else on the FreeNAS system?
upload_2018-8-19_20-21-26.png


I did read the guide, I am not using NFS yet, but will setup NFS in the way that you did. However, the rest of the (software) setup is relatively similar - apart from using any fan controllers for software (the case has a built in hardware fan controller).

I might be up to consider swapping out the pieces for something like this. However, that would be spending three times the money I spent on the motherboard and CPU combo.

In addition, hopefully this helps, here is a simplified layout of the devices and network connections.
upload_2018-8-19_23-20-43.png
 
Last edited:

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Other thing is that I configured the "shares" priority on the FreeNAS instance higher than anything else... basically if FreeNAS is wanting for CPU time etc... then everything else will block.
Not a terrible idea but you should also set a limit then. Otherwise a rogue process on FreeNAS will DOS you other VMs.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Your plex VM may be screwing you over. Unless you can prove that you NEED it to have 6 cores, drop that as low as you can. All VMs should be configured with the MINIMUM number of vCPUs needed to do the job required. Basically for Plex to do ANYTHING it need at least 6 cores free for that clock. And that means FreeNAS NEEDS 4 but only has access to 2. This is a bit of a simplification as there is hyperthreading and other black magic that VMware oes with the CPU scheduler...
Try 3 cores for FreeNAS and 4 for Plex. Check your CPU ready values under load in vCenter before and after.
On the subject of vCenter, if esxi2 is a member of a cluster, why are you working in the host web client?
 

lentil

Cadet
Joined
Aug 19, 2018
Messages
9
Your plex VM may be screwing you over. Unless you can prove that you NEED it to have 6 cores, drop that as low as you can. All VMs should be configured with the MINIMUM number of vCPUs needed to do the job required. Basically for Plex to do ANYTHING it need at least 6 cores free for that clock. And that means FreeNAS NEEDS 4 but only has access to 2. This is a bit of a simplification as there is hyperthreading and other black magic that VMware oes with the CPU scheduler...
Try 3 cores for FreeNAS and 4 for Plex. Check your CPU ready values under load in vCenter before and after.
On the subject of vCenter, if esxi2 is a member of a cluster, why are you working in the host web client?

I brought up the CPU cores AFTER I was attempting to play a 4K video where plex could not handle playing it through to the nVidia shield, but, the desktop was working fine. After raising it to 6 cores, it still had some stuttering (I think this is more of the nVidia Shield then the plex server) but it at least played at that point.

While I am not seeing a huge amount of ZIL usage, I am seeing some even when using SMB.

Code:
   N-Bytes  N-Bytes/s N-Max-Rate	B-Bytes  B-Bytes/s B-Max-Rate	ops  <=4kB 4-32kB >=32kB
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
	 47296	  47296	  47296	 147456	 147456	 147456	  4	  0	  0	  4
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
	 22144	  22144	  22144	 147456	 147456	 147456	  4	  0	  0	  4
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
 150414000  150414000  150414000  150470656  150470656  150470656   1148	  0	  0   1148
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
	131072	 131072	 131072	 131072	 131072	 131072	  1	  0	  0	  1
	 29368	  29368	  29368	 262144	 262144	 262144	  2	  0	  0	  2
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0




This is just when attempting to perform Windows Backup Utility (which ultimately always fails for some reason - Windows states that the server (FreeNAS) is not responsive/inaccessible):
upload_2018-8-20_9-30-26.png


When this happens, I don't see anything useful in /var/log/samba4/log.smbd and samba4 is not splitting out the log files on a per-device basis. The server is still up and functioning. In this case maybe the tool is the issue, but if I also try to use Time Machine following this guide I run into the same issue where it says it can't access the share.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

lentil

Cadet
Joined
Aug 19, 2018
Messages
9
So it unfortunately locked up again trying to delete files, what was happening is if I left it alone, it would finally become responsive again and then it would lock up a few minutes later. When pulling up the console window, I see:
upload_2018-8-20_11-46-31.png
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
FreeNAS 11.2 "hardware":
So it unfortunately locked up again trying to delete files,
This is actually a fault I have helped a couple of other people with who are running on bare metal. You need to go back to 11.1-U5 instead of running the BETA version of the OS.
 

lentil

Cadet
Joined
Aug 19, 2018
Messages
9
This is actually a fault I have helped a couple of other people with who are running on bare metal. You need to go back to 11.1-U5 instead of running the BETA version of the OS.

Is there a way to easily downgrade and keep the current configurations?

I could definitely re-install, I just don't want to have to setup everything all over again.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Is there a way to easily downgrade and keep the current configurations?

I could definitely re-install, I just don't want to have to setup everything all over again.
Your existing pool should import without a problem, but I don't think you can import the new config database into an older version of FreeNAS.
@Ericloewe is that a possibility, bringing the config from a newer version of FreeNAS to an older version?
 
Status
Not open for further replies.
Top