Slow NFS access from ESXi 5.1 - Everything else is fast

Status
Not open for further replies.

sike

Cadet
Joined
Dec 23, 2012
Messages
3
Hi

I have a fresh install of FreeNAS (FreeNAS-8.3.0-RELEASE-p1-x64) running on brand new fast hardware. I also have a ESXi 5.1 host running on a HP DL380 G7

If I connect using CIFS of iSCSI I max out the Gigabit network with ease. But if I connect from the ESXi host using NFS I am lucky to get 10MB/s...

I even just upgraded my ESXi host. I found this: http://www.kendrickcoleman.com/index.php/Tech-Blog/synology-users-do-not-upgrade-to-vsphere-51.html
But can't find anything related to FreeNAS apart from this: http://forums.freenas.org/showthread.php?10444-ESXi-NFS-vs-iSCSI-performance

Thanks for any help!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
ESXi likes to do synchronous NFS writes. It's a performance-killer. You can try using a ZIL, which might or might not work, or you can tell ZFS to disable sync writes on the afflicted volume, which is evil, and could probably cost you your data if there was a crash.

Here, everything's on redundant UPS's, and the fear of losing data is pretty minimal, and we use NFS for read-only mounts and backup partitions, so it was an easy call to just disable sync writes in ZFS, and suddenly, pow, NFS is real fast.

But again, read before doing, "cost you your data", "poof", "byebye", "we're only using it for backups so we don't care."
 

sike

Cadet
Joined
Dec 23, 2012
Messages
3
Thanks for the input.

We have a couple of servers running on a QNAP TS 859U-RP+ which is sharing via NFS. I am getting over 100MB/s on the device.

In terms of Hardware and proformance the QNAP should be much much slower than the FreeNAS machine I built. I can try putting 2 SSDs in for ZIL, but according to the info I have found in the froums and WiKi this will only boost the iSCSI proformance.
 

sike

Cadet
Joined
Dec 23, 2012
Messages
3
@jgreco: I tried disabling syncronous NFS writes using the command:

zfs set sync=disabled DATA01/DS04

NFS is now also blazing fast.

Does this mean I could loose all the data in Datastore 4 (DS04) or across the whole Dataset?

Why is the QNAP much faster? We have had power outages with the QNAP and have not lost a single byte?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
NFS is now also blazing fast.

Does this mean I could loose all the data in Datastore 4 (DS04) or across the whole Dataset?

You shouldn't "lose" any data. However, you might scramble some data (see below).

Read this.

Having read that, now listen carefully to what I'm about to say. ESXi is excessively paranoid about writes out to NFS; it wants, very much, to ensure that it is not the party responsible for mangling your data.

ZFS is also excessively paranoid and does not want to be that party either.

So in an odd way, ZFS/NFS/ESXi are actually awesome together because they're all working together to ensure data integrity. But it tanks performance to provide those safety guarantees.

What you are doing by setting sync=disabled is telling ZFS to disregard the sync requests from ESXi. ZFS will still take reasonable care with your data. However, if, for any reason, you were to lose the NAS during the window where a VM has written something (and ESXi has written it to the filer) and before ZFS has flushed it out to disk, then when the NAS comes back, those changes won't be there. This could be a problem for your VM.

Why is the QNAP much faster? We have had power outages with the QNAP and have not lost a single byte?

The QNAP might have a battery backed write cache. Or you might have configured the QNAP to enable write caching without a BBU. Or it might even enable write caching without a BBU without telling you. In any case, you MIGHT not have lost a single byte. However, it is much more likely that you've lost data and just not realized it. Basically, if your guest VM writes data and your storage system confirms that it is written, and it isn't actually committed to nonvolatile storage of some sort (flash, spinny rust, etc), then you lose that data if the NAS crashes/loses power/etc. This is very basic.

So here's the thing. If your VM was largely idle and maybe only adding entries to log files, and the VM crashed at the same time as the NAS filer because the VM host lost power too, then when everything reboots, any "lost data" gets chalked up to the power loss because the reboot process will do file system consistency checks and everything comes back up dandy.

But if your VM is busy writing database file updates to your critical business database, and suddenly your NAS crashes, and a gig of write data is in memory but not flushed to disk (and your VM's been told that it's been stored), and the NAS comes back up, and your VM is still running, well, now, you have a Real Problem. Your on-disk database and its indexes are no longer consistent with what the VM reasonably expects to be out there, and so now as further writes happen, the on-disk database gets corrupted (remember a gig worth of updates "happened" but also didn't happen!) and as reads happen, your database app freaks out and takes a dump and you're left with data goulash.

Ok, so here's the thing. You need to understand that turning sync off shouldn't cause ZFS to lose the files in your datastore. However, those files represent virtual disks for your virtual machines. Turning off sync will increase performance, yes. But turning off sync also reduces the resiliency of the storage system to failures/crashes/power-outages/etc. Given what you are storing in the files, it is possible that a crash could cause havoc with your virtual machines.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Incidentally, I happened to be doing some related research and decided to go looking at the QNAP; their forums have some discussion about this. The QNAP defaults to write cache being enabled and you are supposed to DISABLE it for use with ESXi/NFS. Maybe the answer to your question of "why is the QNAP much faster" becomes "Because you didn't configure it the way QNAP says you need to." ;) ;)

What I did notice is that on page 3 of this document they say that starting in version 3.2.1, write cache is disabled by default, but previously it had been enabled by default. Of course, you might have changed this setting and who knows what firmware revision you have. Best to just look.

It seems clear to me that the QNAP guy in the forum either doesn't really understand the answer he's providing or doesn't care to detail why this is so important. For example, for us, here, it's completely sane and rational to disable sync because the way we do backups means that a corrupted vmdk is unlikely, and even if it happened we have other backup snapshots, and even if they all went bad they're only backups, the live images are stored on SAN storage elsewhere. But it would be a really bad idea to disable sync on a datastore with live VM's on it. One screwup/panic/crash of the NFS server and suddenly all your vmdk's are probably inconsistent with the running VM's, so you need to synchronously reboot your virtualization cluster (all the affected VM's!)... ick.
 
Status
Not open for further replies.
Top