FreeNAS and VMWare issue

Status
Not open for further replies.

Pucked

Cadet
Joined
Aug 3, 2017
Messages
3
So I'll preface this by saying that myself and another admin are trying to resolve this issue. We didn't build the problem system this way, and we're new to FreeNAS. We're just trying to see if we can resolve this issue. The original system builder is gone.

I spent most of last night and this morning trying to come up to speed on FreeNAS stuff and troubleshoot as much as I can, so please forgive me (and correct me) if I use some incorrect terminology.

The problem

We have a FreeNAS server that presents about 12 Terabytes of storage to a VSphere environment over iSCSI. The pool filled up, even though the overlayed VMFS only sees about half of the space used. We're not able to do anything with this NAS, as VMWare receives out of space errors when trying to write or use it.

From all of my reading, I get why this is happening: VMWare allocated all of this space during thin provisioning, but the underlying ZFS has no way to know that its free when VMWare is done with it. So from VMWare's perspective, there's still plenty of space. However, when it tries to write to that space, the CoW properties of ZFS prevent it from doing so because the pool is full from the ZFS point of view.

My question: is there anything that we can do to fix this without having to migrate all of our VMs off and rebuild the NAS from scratch?

Things that we have tried

Deleting VMs: This obviously didn't work, since VMWare isn't actually freeing the space in a way that ZFS knows about. However, I wanted to note that we tried.

Adding a thick VM: Someone suggested trying to thick provision lazy zero a VM, as that would zero out the space that VMWare isn't using. This failed and VMWare received an error that there was no more space available.

UNMAP: Sending an UNMAP command from ESX using the "esxcli storage vmfs unmap -l <NAS name>" command.

This failed, saying that "Devices backing volume <UUID> do not support UNMAP" My reading indicates that the volume must be sparse for this to work. I don't know if it was set up to be sparse, but I checked the refreservation and it was "none", as described in these instructions: https://forums.freenas.org/index.php?threads/how-to-check-if-a-zvol-is-sparse-thru-freenas.40624/

My counterpart noticed that the iSCSI extent is set to "file" instead of "device." Could that be why UNMAP is failing? My understanding is that FreeNAS 9.10.1-U4 does indeed support UNMAP.

Setup

Our FreeNAS version is 9.10.1-U4

The volume is configured with 7 vDevs: 6 are mirrors, and 1 labeled "cache" is striped. The mirrors each have two 2TB HDDs in them. The cache stripe has two 250GB HDDs.

There is one single dataset? (I think that's the right term) configured under this volume. It has all ~12Terabytes of the storage allocated to it.

This is shared over iSCSI as an extent of type "file" with a size of all 12TB.


Any help that anyone could provide is greatly appreciated. As I said: we didn't build it this way, and my reading of these forums indicates that this is very much a suboptimal configuration (i.e. more than 60% of the pool should never be in use, etc). We're just trying to get things working again.

Thank you very, very much for your time.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
You can try zfs list -r -t snapshot poolName and see if you have snapshots taking up space. If so, you can try removing unneeded snapshots, and hopefully that will succeed. Be careful you are deleting only what you intend.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Oh, one other possibility. If you expand your pool by adding a new vdev, you may be able to resume normal operation. Obviously, it should probably be another set of mirrors. Another possibility is to replace the drives (one at a time) that are part of one set of mirrors with larger disks; once both are replaced, the pool can expand.

Can anyone confirm this would work on a full pool?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,367
Can anyone confirm this would work on a full pool?

That'd really the key. And no I can't :(

But it's worth a try.

Actually replace should work. Adding a vdev might fail.
 

Pucked

Cadet
Joined
Aug 3, 2017
Messages
3
You can try zfs list -r -t snapshot poolName and see if you have snapshots taking up space. If so, you can try removing unneeded snapshots, and hopefully that will succeed. Be careful you are deleting only what you intend.

Thanks for the tip. I get a "no datasets available" message when I try to check for snapshots with this command. I'm guessing that means that datasets weren't configured?

I think we're going to end up migrating workload to another NAS and rebuilding this one (probably to use a different filesystem/solution, to avoid these types of issues in the future).

Thanks for the help though!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
My counterpart noticed that the iSCSI extent is set to "file" instead of "device." Could that be why UNMAP is failing? My understanding is that FreeNAS 9.10.1-U4 does indeed support UNMAP.
I think that file-based extents don't really play well with UNMAP. Nobody has used them in quite a while in production because the kernel initiator has improved massively and is now faster, making device-based extents faster.


The volume is configured with 7 vDevs: 6 are mirrors, and 1 labeled "cache" is striped.
If it's labeled cache, it's not striped with the others, it's L2ARC.

The cache stripe has two 250GB HDDs.
That seems like a rather silly idea. A cache that is slower than the main pool? And taking up valuable ARC in the process...

I think we're going to end up migrating workload to another NAS and rebuilding this one
You definitely want to migrate the data away, at least temporarily to restore a saner configuration. If you can live with the 50% free space recommendation, ZFS is a very viable option with unique data integrity guarantees (I say unique because we all know that btrfs is nowhere close to being stable).
 

kspare

Guru
Joined
Feb 19, 2015
Messages
507
Fwiw this is why I avoid iscsi and use nfs. The space is what it is with nfs its also a native protocol to VMware.

Someone above gave you advice to migrate off and I would highly suggest this as the safest way to secure your data. I'd then switch over to nfs.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
The space is what it is with nfs
No. It has absolutely no bearing on this matter. Block devices are block devices and NFS is not magically better.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
No. It has absolutely no bearing on this matter. Block devices are block devices and NFS is not magically better.

Arguably incorrect; with NFS, if you go and delete a VM, ZFS is aware of the VMDK deletion and frees the space. As an iSCSI device, it may not, because it may have nothing to indicate to it that those blocks are now free. This is kind of what UNMAP was intended to address. If you're lacking UNMAP because you're using a file-based extent, and your pool fills, then you're all sorts of screwed and you have to tread very carefully. This is another very good reason you really want to keep pool utilization below 50%.
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
To give you a better recommendation on which storage protocol to use, we need more information about your VMware environment...which version of ESXi and what type of guest OS's are you running? The preferred method for VMware environments using FreeNAS storage would be iSCSI as it offers the most features, but the trade off is you need to know how to manage them properly so you don't run into these types of issues.
 
Last edited:

Pucked

Cadet
Joined
Aug 3, 2017
Messages
3
Thanks again to everyone for all of the responses and clarification.

@Ericloewe Thanks for some help with the terminology.

Fwiw this is why I avoid iscsi and use nfs. The space is what it is with nfs its also a native protocol to VMware.

Someone above gave you advice to migrate off and I would highly suggest this as the safest way to secure your data. I'd then switch over to nfs.

Yeah, this is exactly what we're doing. To be honest, we don't have any real need for the data integrity, snapshots, etc. of ZFS. We'll probably migrate to a different OS (probably Windows Server, which is what we use for our other NAS servers in the cluster). More on that later.

If you're lacking UNMAP because you're using a file-based extent, and your pool fills, then you're all sorts of screwed and you have to tread very carefully. This is another very good reason you really want to keep pool utilization below 50%.

Yep, when I started troubleshooting I read all of your posts and very quickly realized that we were in for a world of hurt with pool utilization > 50%.

To give you a better recommendation on which storage protocol to use, we need more information about your VMware environment...which version of ESXi and what type of guest OS's are you running? The preferred method for VMware environments using FreeNAS storage would be iSCSI as it offers the most features, but the trade off is you need to know how to manage them properly so you don't run into these types of issues.

We use 6.0 with a variety of guests, both Windows (client and server) and Linux (typically Ubuntu and Cent, but there are others).

Our storage needs are fairly simple for this cluster. It's sort of like a big ephemeral lab configuration, so VMs are short lived and not handling any "important" workload. We're thinking Windows Server with NFS shares is going to work well (and indeed does work well for our other NAS servers that use it). While FreeNAS and ZFS are interesting for me, we just don't have the problems that ZFS solves, so we don't really need the additional administrative overhead of another technology.

Thankfully, we were able to clean up some of the ephemeral VMs and Vmotion them off to another NAS while we rebuild this one. As I mentioned: this wasn't our configuration and my initial reading of the forums and documentation on here made me pretty quickly realize that this was a very sub-optimal configuration, and we're lucky that we have the Vmotion swing space to shuffle things around.

Thanks again so much for the help everyone.
 
Status
Not open for further replies.
Top