Details all below, but I'm basically interested if I can expect 15+ms of latency using ZFS exported to VMware due only to the O_SYNC issue, regardless of IO speed or ZIL speed.
In our testlab we have a bunch of servers with a few different enterprise storage arrays. Currently we use a VNXe to provide NFS datastores for VMware hosts. The VMs that go on these datastores are generally not IO intensive and are used as basic Windows servers (web, AD, vmm, dhcp, etc); there are probably around 60 of these active at any given time. Anything that needs "good" IO goes on block storage using fiber channel. Using NFS for these types of things has the huge advantage of allowing us to free up FC switch ports for other bandwidth hungry applications. If SAN switches and FC HBAs were free everything would use FC, but alas, they are not, so NFS is where I'm at. I prefer not to use iSCSI.
We are migrating most of our general purpose testlab storage to a VMAX 20k. I'd love to use something like FreeNAS on a VM that sits on a host with FC access to provide NFS storage to various VMware servers.
I've read a bunch of forum posts and followed many links posted out there by cyberjock (thanks) and others in an attempt to understand the best way to set things up. I've looked for other instances of people using enterprise storage behind FreeNAS and haven't found it. I know that I can buy an enterprise NFS filer head for the array. Let's just say a requirement here is not to spend money and leave it at that.
The physical host is a Cisco B420 (2.8Ghz Sandybridge); all hosts have dual 10Gbit NICs. The VM right now has 2 vCPU 4GB of RAM. I know that's a low amount of RAM for an actual in-use FreeNAS appliance but right now I'm just using a synthetic loadgen to test write performance on a single 1GB file from a single VM; scaling the VM up to much more memory and CPU is not a problem and is expected in the future. I'm using storage from a tiered pool of disks on the VMAX comprised of EFD and 15k FC and testing random write performance (on a single 1GB file), but none of that should really matter since all the writes get absorbed by the array cache.
I created a striped ZFS volume with a pair of 100GB disks, exported the volume, mounted it on a VMware host and created a disk on an existing VM that also has a disk from a VNXe via NFS. Since the goal is to get comparable performance I opted to run tests against both and see what happens. Given what I read in the forums, I wasn't terribly surprised that write performance had some latency. I'm seeing around 2,000 8K writes/s @ 3ms of latency -- not bad, but the same test on the VNXe got 4,000 8K write/s at 3ms of latency. Bumping up the # of threads in the test from 8 to 32, the VNXe jumps to 7,500 writes/s at 3ms of latency and FreeNAS stays at 2,000 but latency jumps to 15ms. iostat on the freenas host shows IO service time around 1ms, so the 15ms on the guest VM isn't at disk layer -- I'm guessing it's O_SYNC related. I read that others have alleviated this issue by putting the ZIL on an SSD -- given that I've got a huge amount of array cache any LUN from the array is basically an SSD from a write latency standpoint. I tried presenting another LUN and designating it as a ZIL but that didn't change anything (iostat did show that it used the ZIL and flushed to the data disks every few seconds). UFS is worse. Disabling sync on the volume drops latency to 1ms (or less) at 10,000 writes/s. I'd prefer to not rebuild a crap ton of VMs so I'm guessing I shouldn't actually use this, but it does help narrow down the issue.
targetthreadswrites/slatency (ms)
VNXe84,0003
FreeNAS82,0003
VNXe327,5003
FreeNAS322,00015
FreeNAS (no sync)810,0001
I expect some latency just due to the number of hops to the disk, but is this type of latency expected inherently with FreeNAS? If not, what did I do wrong?
In our testlab we have a bunch of servers with a few different enterprise storage arrays. Currently we use a VNXe to provide NFS datastores for VMware hosts. The VMs that go on these datastores are generally not IO intensive and are used as basic Windows servers (web, AD, vmm, dhcp, etc); there are probably around 60 of these active at any given time. Anything that needs "good" IO goes on block storage using fiber channel. Using NFS for these types of things has the huge advantage of allowing us to free up FC switch ports for other bandwidth hungry applications. If SAN switches and FC HBAs were free everything would use FC, but alas, they are not, so NFS is where I'm at. I prefer not to use iSCSI.
We are migrating most of our general purpose testlab storage to a VMAX 20k. I'd love to use something like FreeNAS on a VM that sits on a host with FC access to provide NFS storage to various VMware servers.
I've read a bunch of forum posts and followed many links posted out there by cyberjock (thanks) and others in an attempt to understand the best way to set things up. I've looked for other instances of people using enterprise storage behind FreeNAS and haven't found it. I know that I can buy an enterprise NFS filer head for the array. Let's just say a requirement here is not to spend money and leave it at that.
The physical host is a Cisco B420 (2.8Ghz Sandybridge); all hosts have dual 10Gbit NICs. The VM right now has 2 vCPU 4GB of RAM. I know that's a low amount of RAM for an actual in-use FreeNAS appliance but right now I'm just using a synthetic loadgen to test write performance on a single 1GB file from a single VM; scaling the VM up to much more memory and CPU is not a problem and is expected in the future. I'm using storage from a tiered pool of disks on the VMAX comprised of EFD and 15k FC and testing random write performance (on a single 1GB file), but none of that should really matter since all the writes get absorbed by the array cache.
I created a striped ZFS volume with a pair of 100GB disks, exported the volume, mounted it on a VMware host and created a disk on an existing VM that also has a disk from a VNXe via NFS. Since the goal is to get comparable performance I opted to run tests against both and see what happens. Given what I read in the forums, I wasn't terribly surprised that write performance had some latency. I'm seeing around 2,000 8K writes/s @ 3ms of latency -- not bad, but the same test on the VNXe got 4,000 8K write/s at 3ms of latency. Bumping up the # of threads in the test from 8 to 32, the VNXe jumps to 7,500 writes/s at 3ms of latency and FreeNAS stays at 2,000 but latency jumps to 15ms. iostat on the freenas host shows IO service time around 1ms, so the 15ms on the guest VM isn't at disk layer -- I'm guessing it's O_SYNC related. I read that others have alleviated this issue by putting the ZIL on an SSD -- given that I've got a huge amount of array cache any LUN from the array is basically an SSD from a write latency standpoint. I tried presenting another LUN and designating it as a ZIL but that didn't change anything (iostat did show that it used the ZIL and flushed to the data disks every few seconds). UFS is worse. Disabling sync on the volume drops latency to 1ms (or less) at 10,000 writes/s. I'd prefer to not rebuild a crap ton of VMs so I'm guessing I shouldn't actually use this, but it does help narrow down the issue.
targetthreadswrites/slatency (ms)
VNXe84,0003
FreeNAS82,0003
VNXe327,5003
FreeNAS322,00015
FreeNAS (no sync)810,0001
I expect some latency just due to the number of hops to the disk, but is this type of latency expected inherently with FreeNAS? If not, what did I do wrong?