So I'll preface this by saying that myself and another admin are trying to resolve this issue. We didn't build the problem system this way, and we're new to FreeNAS. We're just trying to see if we can resolve this issue. The original system builder is gone.
I spent most of last night and this morning trying to come up to speed on FreeNAS stuff and troubleshoot as much as I can, so please forgive me (and correct me) if I use some incorrect terminology.
The problem
We have a FreeNAS server that presents about 12 Terabytes of storage to a VSphere environment over iSCSI. The pool filled up, even though the overlayed VMFS only sees about half of the space used. We're not able to do anything with this NAS, as VMWare receives out of space errors when trying to write or use it.
From all of my reading, I get why this is happening: VMWare allocated all of this space during thin provisioning, but the underlying ZFS has no way to know that its free when VMWare is done with it. So from VMWare's perspective, there's still plenty of space. However, when it tries to write to that space, the CoW properties of ZFS prevent it from doing so because the pool is full from the ZFS point of view.
My question: is there anything that we can do to fix this without having to migrate all of our VMs off and rebuild the NAS from scratch?
Things that we have tried
Deleting VMs: This obviously didn't work, since VMWare isn't actually freeing the space in a way that ZFS knows about. However, I wanted to note that we tried.
Adding a thick VM: Someone suggested trying to thick provision lazy zero a VM, as that would zero out the space that VMWare isn't using. This failed and VMWare received an error that there was no more space available.
UNMAP: Sending an UNMAP command from ESX using the "esxcli storage vmfs unmap -l <NAS name>" command.
This failed, saying that "Devices backing volume <UUID> do not support UNMAP" My reading indicates that the volume must be sparse for this to work. I don't know if it was set up to be sparse, but I checked the refreservation and it was "none", as described in these instructions: https://forums.freenas.org/index.php?threads/how-to-check-if-a-zvol-is-sparse-thru-freenas.40624/
My counterpart noticed that the iSCSI extent is set to "file" instead of "device." Could that be why UNMAP is failing? My understanding is that FreeNAS 9.10.1-U4 does indeed support UNMAP.
Setup
Our FreeNAS version is 9.10.1-U4
The volume is configured with 7 vDevs: 6 are mirrors, and 1 labeled "cache" is striped. The mirrors each have two 2TB HDDs in them. The cache stripe has two 250GB HDDs.
There is one single dataset? (I think that's the right term) configured under this volume. It has all ~12Terabytes of the storage allocated to it.
This is shared over iSCSI as an extent of type "file" with a size of all 12TB.
Any help that anyone could provide is greatly appreciated. As I said: we didn't build it this way, and my reading of these forums indicates that this is very much a suboptimal configuration (i.e. more than 60% of the pool should never be in use, etc). We're just trying to get things working again.
Thank you very, very much for your time.
I spent most of last night and this morning trying to come up to speed on FreeNAS stuff and troubleshoot as much as I can, so please forgive me (and correct me) if I use some incorrect terminology.
The problem
We have a FreeNAS server that presents about 12 Terabytes of storage to a VSphere environment over iSCSI. The pool filled up, even though the overlayed VMFS only sees about half of the space used. We're not able to do anything with this NAS, as VMWare receives out of space errors when trying to write or use it.
From all of my reading, I get why this is happening: VMWare allocated all of this space during thin provisioning, but the underlying ZFS has no way to know that its free when VMWare is done with it. So from VMWare's perspective, there's still plenty of space. However, when it tries to write to that space, the CoW properties of ZFS prevent it from doing so because the pool is full from the ZFS point of view.
My question: is there anything that we can do to fix this without having to migrate all of our VMs off and rebuild the NAS from scratch?
Things that we have tried
Deleting VMs: This obviously didn't work, since VMWare isn't actually freeing the space in a way that ZFS knows about. However, I wanted to note that we tried.
Adding a thick VM: Someone suggested trying to thick provision lazy zero a VM, as that would zero out the space that VMWare isn't using. This failed and VMWare received an error that there was no more space available.
UNMAP: Sending an UNMAP command from ESX using the "esxcli storage vmfs unmap -l <NAS name>" command.
This failed, saying that "Devices backing volume <UUID> do not support UNMAP" My reading indicates that the volume must be sparse for this to work. I don't know if it was set up to be sparse, but I checked the refreservation and it was "none", as described in these instructions: https://forums.freenas.org/index.php?threads/how-to-check-if-a-zvol-is-sparse-thru-freenas.40624/
My counterpart noticed that the iSCSI extent is set to "file" instead of "device." Could that be why UNMAP is failing? My understanding is that FreeNAS 9.10.1-U4 does indeed support UNMAP.
Setup
Our FreeNAS version is 9.10.1-U4
The volume is configured with 7 vDevs: 6 are mirrors, and 1 labeled "cache" is striped. The mirrors each have two 2TB HDDs in them. The cache stripe has two 250GB HDDs.
There is one single dataset? (I think that's the right term) configured under this volume. It has all ~12Terabytes of the storage allocated to it.
This is shared over iSCSI as an extent of type "file" with a size of all 12TB.
Any help that anyone could provide is greatly appreciated. As I said: we didn't build it this way, and my reading of these forums indicates that this is very much a suboptimal configuration (i.e. more than 60% of the pool should never be in use, etc). We're just trying to get things working again.
Thank you very, very much for your time.