jgreco and I have discussed the whole sync write fiasco to death. I'm sure someone is about to pop in here and start a fight over the iscsi and nfs thing, but I'll say this....
NFS with VMWare sucks solely because of the sync writes. NFS supports them and ZFS will, by design, honor them to the bitter end and at any and all consequences. In fact, you see the consequences when the pool becomes so slow it is useless.
iSCSI with VMWare sucks much much less because it doesn't have sync writes. iSCSI doesn't support them so ZFS' design isn't important.. too much. Except for one scenario.(I'll discuss this one scenario in a second).
Normally every write is a sync write with VMWare ESXi. Some say that is stupid. Other's call it the only safe bet. I think it is the most conservative as well as the only safe bet. There is no mechanism in ESXi that determines what host writes are sync writes and chooses to pass only those through NFS as sync writes. I can bet that there isn't likely to be for the forseeable future unless the guest OS adds that feature and allows VMWare to piggy-back that feature in the OS itself.
Take your standard Windows machine. *some* of your writes are sync writes. Most(damn near 100%) aren't(with some exceptions of course). We've all had that accidental power loss, bumped the power cord, etc and had our desktop turn off. Sometimes we boot back up and keep going. Sometimes we boot back up and stuff is extremely broken. It's pretty obvious in the latter that some write wasn't completed and resulted in file or file system corruption of some magnitude. Anyone remember when Windows 95, 98, and Me all would do a scandisk on bootup if you didn't do a proper shutdown? That was to try and mitigate the harm from that improper shutdown. How many of us had tons of files that ended up in the C:\FOUND.ooo folder? Every single one of us saw that first hand or we didn't use Windows. That's how frequent it happened.
Now considering the previous paragraph, keep this in mind.. do you *really* want to deal with the potential that 10+ VMs may have that kind of corruption, may or may not boot up, and you may have to restore multiple VMs from backup at the cost of data lost since the last backup was performed.
1. Would *you* deliberately disable sync writes?
2. Would *you* want to force all writes to sync writes if you could?
3. Would *you* want to take the most conservative path with your data in terms of protecting it?
Now back to that one scenario I talked about. Normally all NFS writes are sync writes. This is good and is obviously the most conservative. But, all iSCSI writes are non-sync. So you're looking at a situation that is only marginally more dangerous than choosing to unplug your desktop regularly. There are also new variables because it's possible for 10 writes to come into the iSCSI extent but ZFS actually writes them to the pool out of order resulting in some weird wacked out situation that normally wouldn't be possible. I'll ignore this scenario because I can't validate that is even possible since ZFS appears to handle the writes for a given file in order. So I'll assume this isn't possible.
Anyway, we all know that unplugging your computer with it powered on is bad, right? So naturally you want and should be mitigating this risk. For a pool that is strictly for NFS over ESXi sync=standard means the same thing in the big picture as sync=enabled since all writes are sync, right? But for iSCSI sync=standard effectively means the same as sync=disabled in the big picture since there is no such thing as sync writes for iSCSI.
So this is where things get *really* messy. After tons of research I'm convinced that doing NFS with sync=disabled is about the same danger level as using iSCSI with sync=standard or sync=disabled. Also, if you want to use iSCSI with approximately(keyword: approximately) the same level of protection as NFS with its sync writes, then you must set sync=enabled when using iSCSI. The reason for the "approximately" is that iSCSI may internally return the write as complete to the iSCSI initiator *before* issuing the write to ZFS. So there may be a chance that you could lose data even with sync=enabled. Keep in mind that in a properly operating system we are talking fractions of a millisecond, but that is the "write hole" that ZFS is supposed to prevent. But its still not quite the same as using NFS with sync writes. The safest is to use NFS with its sync writes.
There's 2 things to realize about the sync=disabled situation:
1. This will tell the system to never honor sync writes(this is obviously not recommended, but you may want to take that risk). If you have other services that are dependent on that sync write being honored and performed, you may be in for serious trouble.
2. There is no significant possibility of harm to the pool if you recognize that every write to the file system is an atomic write. That is to say that any single write to the file system either hasn't been performed or has been performed and completed. There is no such thing as a "partial" transaction. ZFS will automatically discard any incomplete transaction when the pool is remounted after a "partial" transaction. Obviously if ZFS discards a partial transaction then that data is lost.
So, if so many people are using sync=standard with iSCSI and not having problems, why am I hesitant to turn around and start recommending setting sync=disabled? I just said they are the same thing, right? Well, a few reasons come to mind:
1. I could be wrong. I think this is extremely likely because when I ask ZFS gurus they will basically call you out as a fool if you do that.
2. I could be write(get it?) in that I've properly assessed the situation and come to the conclusion that when comparing 2 exact situations and ignoring all other situations.
3. I could be both right and wrong, but not right and wrong for the right reasons. Think "dumb luck at getting to the right answer". People always say you shouldn't go around your elbow to get to your ass, right? But if you still get to your ass does it really matter how you got there? The destination is what matters, right?
4. I'm a major forum presence. I don't feel I should be ever making recommendations that aren't the most conservative. I shouldn't be making recommendations that I would do(or haven't done) myself. I also don't think I should be making recommendations that haven't been tested extensively by tons of users.
Let me expound on #4 for a minute.
I read almost every post that comes into and out of this forum. Because of this I can easily identify commonalities between problems and users. For example, the ECC RAM recommendation that I've recently introduced to the manual and a few other things I've added to the manual are all the result of finding common problems that have cost people their data. We(being the forums' relative general opinion) was not to really force ECC down people's throats 2 years ago. When I started in this forum in March 2012 ECC RAM was recommended, but there was very little evidence that it really mattered. I even built my best friend's first FreeNAS box with non-ECC RAM. He now runs ECC RAM though. The attitude towards RAM has made a major change. Mostly because I've seen so many people that have ended up being victims of bad non-ECC RAM and I've seen the consequences. rEvery single user has a horrible story to tell that's so hard to believe I sometimes don't believe it. But when dozens of users have the same problems and they all tell the same story I have to assume they are actually telling me the truth and things really are that bad.
When you see the same problem causing data loss regularly you see that there's a gap in either the software, the manual, or the hardware that needs to be addressed. I can't fix hardware problems but I can recommend against some hardware, and I do so without hesitation. I can provide recommendations for the software and I do edit the manual. But, the one thing that I have as a continuous reader of the forums is seeing the patterns of what works and what doesn't. I don't even have to know why something works or doesn't. I can't solidly explain why AMD based systems statistically have more problems than Intel. I have seen tons of evidence of it, and there are definitely exceptions. But if you gave me 100 random Intel based machines and 100 random AMD based machines the statistics say more AMD machines won't work than Intel based machines. I just have to recognize the pattern enough to tell people not to do something. I do like to know why, and I definitely spend significant time trying to understand why so I can determine if I can preemptively identify other scenarios that might have the same problem that we don't know about yet. But if I tell you not to jump off that cliff and you don't jump off the cliff I've saved your life even if neither of us understood gravity enough to realize it would kill you. The saved life is what matters.
My unpaid job here is to keep people from losing data. 99.9% of users that come to FreeNAS want the stability and reliability that ZFS offers. It's not my place to then tell you how to do things that are stupid or even be an experimental guinea pig. If you want to be that stupid, you can. I won't help you do it though.
Anyway, to finish up my hate speech for sync=disabled I am 99% certain that if everyone in this forum set sync=disabled right now most of us would probably never see any consequence of it. But that means that 1% of you reading this might suffer a consequences. I'm not going to take that 1% chance and tell people something that I can't really validate myself.
But, I will tell you that if/when I build my next ESXi box I will probably try doing sync=disabled for my home use(definitely not at work) and experiment with it to figure out how dangerous it really is.
As my last paragraph to my wall of text that wasn't supposed to be this long I want to make it very clear. I do not condone, recommend, endorse or approve of someone setting sync=disabled ever. Even more so in a work environment. I *do* think you are a worthless bottom feeder and that this forum doesn't need you if you really want to do that. If you are doing that, you should be a ZFS ninja. If you were that much of a ninja you wouldn't even be poking around a forum because you are already so pro you have better things to do. If your company can't afford to do ZFS properly you shouldn't be using ZFS. PERIOD.