ESXi NFS vs iSCSI performance

hpnas · Dec 23, 2012

I have always noticed a huge performance gap between NFS and iSCSI and NFS using EXSi. For example, I am installing Windows 2012 at the same time - one to a NFS store and the other to iSCSI and I see about 10x performance increase in milliseconds it takes to write to the disk. My questions are

1. Is there anything to improve the performance of NFS?
2. Is there any reason why I wouldn't want to create a file extent on my ZFS RAIDZ-1?

maglaubig · Dec 24, 2012

There are other threads on this in the forums if you haven't found them, and I had the same issue. My solution was to use iSCSI. If your heart is set on NFS, the recommended step is to get a dedicated ZIL device for the ZFS pool that is on fast disk. SSD or SAS if your board will support it. This is because ESXi requires sync writes to disk over NFS and will wait until it gets confirmation. The ZIL device should be at least RAID1 as well, if you lose it you lose all your data. The ZIL device doesn't need to be as large as the rest of your disk, just large enough to keep up with writes you'd be sending it via ESXi NFS. Think of it sort of like a write cache.

The other option is to disable ZIL. Everyone, myself included, will tell you this is a bad idea. You could completely corrupt your ZFS volume this way.

My recommendation if you don't want to spend any more money and have a few dedicated NICs across all your hardware is go the iSCSI route with a file extent. Depending on your number of disks, folks will recommend making a vdev of RAID1 and then add new vdev's of RAID1 as a stripe to expand the pool. RAIDZ-1 takes a big performance hit on reads since it checksums across all drives, so your read speed is limited to a single drive. Here's a forum link.

For iSCSI, just follow the setup here. Make sure to setup LUC, else anytime you make a change to iSCSI LUNs it'll require restarting the service and likely dismounting your VMFS data stores. Be sure to use Jumbo frames (MTU 9000) as well for 4.1 and later, it's a HUGE performance boost. Switch and NICs must all support it or you can't use it. Double-check if you're on an earlier version, jumbo frames were supported on VMs before they were supported for iSCSI.

Lastly, and I know this was true in 4.1 when using the software iSCSI driver, if you want multi-pathing on iSCSI the paths must be on separate subnets. The vmkernel isn't smart enough to separate out traffic in the stack. It's a bit counter intuitive to setup, you shouldn't really be using any bonding (LACP/FEC) type setups but a 1-to-1 relationship for vmkernel iSCSI to physical NIC. Then setup your LUN to use round-robin multipathing and you should be all set.

hpnas · Jan 4, 2013

I am glad to hear to hear that I am not alone with the NFS issues. I really do thank you for the verbose response, this was very helpful. I did turn on iSCSI and did a file extent on the RAIDZ to start. However, I noticed the performance was not always very good. In fact, I had some VMs that would have a hard time starting and when looking at the logs I found

Jan 4 06:15:05 freenas istgt[70805]: istgt_iscsi.c:4629:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed
Jan 4 06:15:05 freenas istgt[70805]: istgt_iscsi.c:5269:worker: ***ERROR*** iscsi_execute() failed on iqn.2011-03.example.org.istgt:vmtarget,t,0x0001(iqn.1998-01.com.vmware:localhost-4f95c65c,i,0x00023d000001)
Jan 4 06:15:09 freenas istgt[70805]: Login from iqn.1998-01.com.vmware:localhost-4f95c65c (192.168.15.106) on iqn.2011-03.example.org.istgt:vmtarget LU1 (192.168.15.100:3260,1), ISID=23d000001, TSIH=881, CID=0, HeaderDigest=off, DataDigest=off
Jan 4 06:15:09 freenas istgt[70805]: istgt_iscsi.c:3207:istgt_iscsi_op_scsi: ***ERROR*** CmdSN(11343945) ignore (ExpCmdSN=11343945, MaxCmdSN=11343944)
Jan 4 06:15:09 freenas istgt[70805]: istgt_iscsi.c:4629:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed
Jan 4 06:15:09 freenas istgt[70805]: istgt_iscsi.c:5269:worker: ***ERROR*** iscsi_execute() failed on iqn.2011-03.example.org.istgt:vmtarget,t,0x0001(iqn.1998-01.com.vmware:localhost-4f95c65c,i,0x00023d000001)
Jan 4 06:15:13 freenas istgt[70805]: Login from iqn.1998-01.com.vmware:localhost-4f95c65c (192.168.15.106) on iqn.2011-03.example.org.istgt:vmtarget LU1 (192.168.15.100:3260,1), ISID=23d000001, TSIH=882, CID=0, HeaderDigest=off, DataDigest=off

When I stopped trying to start the VM the errors stopped. The last time I ran iSCSI it was as a device extent and the performance was great. I have SOHO equipment on the network so I doubt it knows how to handle jumbo frames but the fact that device extent worked great and file extent has the issues above - is this just a RAIDZ issue with iSCSI?

jgreco · Jan 4, 2013

Check out bug #1531 for tuning suggestions. Do not just try random setting changes but instead read until you understand the thing (or you won't have real good luck). If you don't understand the thing, come back and we'll set you right.

I did a trite test the other day of file vs zvol based ZFS iSCSI and didn't see a huge difference, but it was admittedly a trite test.

RAIDZ is not great for performance and RAIDZ2 is worse. If you have not tuned your VM's to avoid superfluous writes where possible, you can rapidly descend into hell because of all the updates that are being made. A set of disks in RAIDZ over the network is likely to be slower than a single disk attached direct to a server. A bunch of VM's all sharing that RAIDZ is worse yet.

Help us help you by telling us a little bit about your FreeNAS setup, and also what sort of VM's you're dealing with.

hpnas · Jan 4, 2013

Thanks for the quick reply and I will check out 1531. However, do you believe the errors you see in my logs are indicative of performance issues that could be fixed with changing of settings or could this be a deeper issue? I was unable to find any others with these type of errors.

My set is:

HP ProLiant MicroServer N40L
3 x 2 TB RAIDz
1 x 2 TB Drive
1 x 80 GB SSD
8 GB of RAM

Cisco e2000 Router (DD-WRT)
ZyXEL GS1100-24 1000Mbps 24-port LAN Switch
VMware ESXi 5

I only have 2 VMs running, the Windows 2012 VM is the one I have the most issues with and its just the base OS with no special apps. The other VM is Windows 2008 R2.

jgreco · Jan 4, 2013

Simple way to tell. Get it to happen again. On the FreeNAS box, while it is happening, see if the ZFS pool appears to be hung up. Even just a few seconds can be problematic for iSCSI, and it appears your initiator is dropping and reconnecting. The particular error you quoted is not something I recall having seen, usually we were seeing something in iscsi_something_write indicating the connection had dropped. Still, responsiveness should be obvious, go over onto the FreeNAS volume in the shell and do stuff like "ls" and "touch file" and other trivial stuff. We were seeing dead hangs. If the response isn't within a second or two, "dat's bad."

You can use "zpool iostat 1" to inspect how busy the pool is.

You can use "gstat" to inspect how busy the disks are. If they're pegging out at 100%, even if it isn't always pegged out, that's a strong suggestion that the pool might be saturated. gstat will give you clues about writes and reads.

You should probably give us some idea of how fast your pool actually is as well.

My experience with Windows VM's is that they kind of suck. I've never seen Windows VM's be particularly fast on any iSCSI service, and slower yet when they're actually doing something more complex than running a screen saver. As far as I can tell, Windows is not particularly careful about how it uses resources, and assumes it has an entire machine's resources all to itself to waste as it wishes. This makes for being a bad VM neighbor, though I'm told by some of the virtualization gurus I know that there are "things" that can be done to make them work much better.

hpnas · Jan 4, 2013

[root@freenas ~]# zpool iostat 1
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
store 1.89T 3.54T 1 148 256K 2.34M
store 1.89T 3.54T 3 0 136K 128K
store 1.89T 3.54T 2 0 384K 0
store 1.89T 3.54T 13 0 1.63M 0
store 1.89T 3.54T 72 0 7.33M 0
store 1.89T 3.54T 45 272 4.08M 8.44M
store 1.89T 3.54T 88 0 10.8M 0
store 1.89T 3.54T 96 0 10.8M 0
store 1.89T 3.54T 66 0 7.18M 0
store 1.89T 3.54T 164 0 20.3M 0
store 1.89T 3.54T 278 161 34.6M 2.36M
store 1.89T 3.54T 502 0 62.0M 0
store 1.89T 3.54T 249 0 30.4M 0
store 1.89T 3.54T 581 0 71.6M 0
store 1.89T 3.54T 321 0 40.2M 0
store 1.89T 3.54T 341 133 42.6M 1.01M
store 1.89T 3.54T 308 0 38.6M 0
store 1.89T 3.54T 68 0 8.62M 0
store 1.89T 3.54T 0 0 0 128K
store 1.89T 3.54T 0 0 0 0
store 1.89T 3.54T 0 171 0 2.90M

Jan 4 13:55:28 freenas istgt[70805]: istgt_iscsi.c:4629:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed
Jan 4 13:55:28 freenas istgt[70805]: istgt_iscsi.c:5269:worker: ***ERROR*** iscsi_execute() failed on iqn.2011-03.example.org.istgt:vmtarget,t,0x0001(iqn.1998-01.com.vmware:localhost-4f95c65c,i,0x00023d000001)
Jan 4 13:55:31 freenas istgt[70805]: Login from iqn.1998-01.com.vmware:localhost-4f95c65c (192.168.15.106) on iqn.2011-03.example.org.istgt:vmtarget LU1 (192.168.15.100:3260,1), ISID=23d000001, TSIH=1125, CID=0, HeaderDigest=off, DataDigest=off
Jan 4 13:55:31 freenas istgt[70805]: istgt_iscsi.c:3207:istgt_iscsi_op_scsi: ***ERROR*** CmdSN(11622001) ignore

I couldn't point above to the exact times of when I get iSCSI errors but my assumption is when you start seeing those reads get up there in MB then the errors start coming out.

I am currently accessing my freenas from the Win2k8 VM (I am remote right now) and ran gstat. I see the disks drive up to about 70% and then the VM freezes so I am not able to see if they go above but I assume it does hence the VM locks up because the disks are too busy. It may be safe to assume that the the disk are becoming too busy while the VMs are in use and reads/writes are being dropped or delayed (this will explain why I have issues even booting up the Windows 2012 VM sometimes.

What I am going to try is to do a device extent to a single hard drive and see if the fact I the RAIDZ pool is causing the issues or something else. My initial though is I will only see this when using RAIDZ.

jgreco · Jan 4, 2013

If you look in your ESXi logs, you'll see some indication of trouble, I believe. It seems likely that your drives are being bombarded with I/O requests sufficient to cause a fair bit of latency (try looking under vSphere, Performance tab, Datastore view for the host). There's a fair bit of writing going on which won't be helping matters too much, and it is entirely possible that something big gets written which takes the server "out to lunch" for a bit. ESXi basically has a rough time dealing with I/O subsystems that don't perform as fast as the I/O requests that are being generated for them, and if the FreeNAS is going catatonic as a result, that can cause very poor behaviours.

Feel free to experiment with files versus zvol based device extents for your iSCSI. Please report back on what you discover! I haven't yet done any serious tests to see which is the winner under FreeNAS 8.3. If that fixes it for you, then great.

But let's guess that maybe you need a faster pool.

The answer for reads is to build a pool that is capable of sustaining the read demands. ZFS is way cool for this because you can do things BESIDES just throwing more disk at the problem (the traditional answer). But if we look at the problem starting with the disks, for vdevs, throwing more disks at it is usually helpful: know that mirrors are faster than RAIDZ, RAIDZ1 is faster than RAIDZ2, and SSD is faster than it all... Throwing more memory at it too: More memory means a larger ARC and ZFS is exceedingly awesome at utilizing ARC for reducing physical disk I/O. The N40L will take up to 16GB (we have an N36L here with 16GB, and yes, HP says 8GB is the max. They lie.) You can gain substantial additional read benefits by adding a SSD for L2ARC and making some tuning changes. L2ARC won't magically fix everything immediately, but would probably be noticeable over time.

The answers for writes are to eliminate as many stupid pointless writes as possible (difficult under Windows, I know) and then to look at bug 1531 as a way to maintain responsiveness under heavy write loads.

Ultimately, a pool that is too slow for your I/O load is always at risk of failing in painful ways, which is a generally frustrating aspect to iSCSI. This isn't unique to FreeNAS, sadly.

maglaubig · Jan 4, 2013

If you are over committing your storage, average device latency and SCSI bus resets will tell you if storage is telling ESX to wait and then your errors are IO timeouts.

Reorganizing your pool for essentially RAID10, and potentially using SSD for the Zil might help, but another SSD would be preferred to mirror the Zil. If reads are your problem, that's tougher with the items you have on hand. Aligned partitions shouldn't be an issue on your versions of OS, just may need faster drives, or more of the slower to keep up, or fewer VMs.

cyberjock · Jan 4, 2013

maglaubig said:
If you are over committing your storage, average device latency and SCSI bus resets will tell you if storage is telling ESX to wait and then your errors are IO timeouts.

Reorganizing your pool for essentially RAID10, and potentially using SSD for the Zil might help, but another SSD would be preferred to mirror the Zil. If reads are your problem, that's tougher with the items you have on hand. Aligned partitions shouldn't be an issue on your versions of OS, just may need faster drives, or more of the slower to keep up, or fewer VMs.

You should read the thread about the ZIL. It isn't as useful as people think. I read somewhere about a year ago that if you aren't using a database it's virtually worthless. The recent ZIL thread tends to support that fact. The ZIL is not just a "big write cache" and actually serves a small function but can be extremely effective if you know how it works that function.

Attempting to throw more hardware at the problem doesn't really seem to help much.

jgreco · Jan 5, 2013

maglaubig said:
If you are over committing your storage, average device latency and SCSI bus resets will tell you if storage is telling ESX to wait and then your errors are IO timeouts.

That general idea is loosely correct, especially the first half, but I'm going to be a bit pedantic here. Storage isn't telling ESXi anything - storage gets busy and *stops* telling ESXi anything for short bits of time. ESXi then interprets this as an I/O timeout, which I can see why it would do that, but it is really important to understand that this isn't a FreeNAS I/O timeout, it's just large latency and ESXi deciding that this will be interpreted as a device-timeout,-lets-close-and-restart-it by the software initiator.

There are scores of threads out there with people who have iSCSI servers that are not up to the task of servicing the load. My impression is that it is extremely difficult to guarantee that an iSCSI service will always be sufficiently responsive under any load that can be thrown at it. It is helpful and instructive to look at these threads to see what people have done. My impression is that production environments should fix the problem well enough that this isn't needed, and then maybe make the adjustment as a belt-and-suspenders fix, or just make a note of it so that it can be applied hot if you suddenly run into issues.

Reorganizing your pool for essentially RAID10, and potentially using SSD for the Zil might help, but another SSD would be preferred to mirror the Zil. If reads are your problem, that's tougher with the items you have on hand. Aligned partitions shouldn't be an issue on your versions of OS, just may need faster drives, or more of the slower to keep up, or fewer VMs.

ZIL is not indicated as a likely fix to the problems, at least given the facts described by the OP to this point.

Fewer VM's is ... um, are you really suggesting that he go from two VM's down to one? At that point, it'd be better to advise "to heck with virtualization and network storage, throw a disk on a real PC and be done with it."

bollar · Jan 5, 2013

noobsauce80 said:
You should read the thread about the ZIL. It isn't as useful as people think. I read somewhere about a year ago that if you aren't using a database it's virtually worthless. The recent ZIL thread tends to support that fact. The ZIL is not just a "big write cache" and actually serves a small function but can be extremely effective if you know how it works that function.

Attempting to throw more hardware at the problem doesn't really seem to help much.

NFS and ESXi is one of the situations where a moving the ZIL to a faster non-volatile device is supposed to improve performance. This is because files are opened with the o_sync flag and ZFS won't return the call until the data has been committed to non-volatile storage.

This entry explains why: Solaris ZFS, Synchronous Writes and the ZIL Explained

As a bonus, Constantin explains how leaving the ZIL in the pool causes a "swiss cheese" effect that eventually causes fragmentation in the zpool.

Like Victor pointed out below in the comments, ZFS doesn't use a special area of the disk for it's ZIL ("the intent log is allocated from blocks within the main pool"), so the extra seeks for log writes are not an issue for ZFS. On the other hand, excessive use of the ZIL will create a "swiss cheese" effect and increase fragmentation of the disk's blocks, potentially hurting read and write performance for pools that are near the top of their capacity.

hpnas · Jan 5, 2013

Here is an update -

I moved the VM to a single hard drive that is not in the RAIDZ volume and setup a device extent. However, I have the same issues as I did when I was using the file extent on the RAIDZ pool.

Jan 5 11:16:26 freenas istgt[70805]: istgt_iscsi.c:4629:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed
Jan 5 11:16:26 freenas istgt[70805]: istgt_iscsi.c:5269:worker: ***ERROR*** iscsi_execute() failed on iqn.2011-03.example.org.istgt:vmtarget,t,0x0001(iqn.1998-01.com.vmware:localhost-4f95c65c,i,0x00023d000001)

Previously, I had no type of issues with device extents. The only change I've made is moving to FreeNAS 8.3. It seems my issue is now is with iSCSI in general and not with ZFS like I suspected.

Anyone has any suggestions?

maglaubig · Jan 5, 2013

Depending on the workload, the Zil idea may not help. From my understanding, it only helps write caching. An L2arc would be employed to help read caching.

Hpnas, any idea how many iops you're trying to drive in this config and if you're heavier on read or write? I've been using some of those green caviar drives and not been having the issues you've indicated on iscsi.

hpnas · Jan 11, 2013

Thanks everyone for the great suggestions. However, after not seeing enough improvement I decided to give NAS4FREE a try and NFS has dramatically improved. I have not yet tried iSCSI performance with NAS4FREE.

James Snell · Sep 11, 2013

hpnas said:
Thanks everyone for the great suggestions. However, after not seeing enough improvement I decided to give NAS4FREE a try and NFS has dramatically improved. I have not yet tried iSCSI performance with NAS4FREE.

Well, that's kind of encouraging for those wanting to stick it out further with FreeNAS. If NAS4FREE has better NFS performance, then there's probably some parameters to be tuned better. Or maybe NAS4FREE somehow forces async NFS writes, which would seem like a dangerous trajectory.

cyberjock · Sep 12, 2013

I thought someone had said it forces async NFS writes, but I couldn't find the post. Might have to go test it in a VM now...

Important Announcement for the TrueNAS Community.

ESXi NFS vs iSCSI performance

hpnas

Dabbler

maglaubig

Dabbler

hpnas

Dabbler

jgreco

Resident Grinch

hpnas

Dabbler

jgreco

Resident Grinch

hpnas

Dabbler

jgreco

Resident Grinch

maglaubig

Dabbler

cyberjock

Inactive Account

jgreco

Resident Grinch

bollar

Patron

hpnas

Dabbler

maglaubig

Dabbler

hpnas

Dabbler

James Snell

Explorer

cyberjock

Inactive Account

Similar threads

Important Announcement for the TrueNAS Community.

ESXi NFS vs iSCSI performance

Dabbler

Dabbler

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Inactive Account

Resident Grinch

Patron

Dabbler

Dabbler

Dabbler

Explorer

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ESXi NFS vs iSCSI performance"

Similar threads