Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.
Resource icon

Sync writes, or: Why is my ESXi NFS so slow, and why is iSCSI faster?

Peppermint

Neophyte
Joined
Jul 8, 2014
Messages
9
Hi, thanks for your reply.
I used IOmeter, h2benchw and HD Tune with the result of about 130 MB/s in average. iSCSI raised the speed up to about 215MB/s read (h2benchw). That's better, but this can't be the maximum?!
Is there a better way to measure r/w speed in a vm? Maybe i am using unsuitable tools... It seems that there is also an issue with the ixgbe ethernet driver for esxi 5.5. Read about that in the intel community.

Also maybe you have "a time" enabled on pool ? That is bad ?
atime is by default enabled on the pool. is that bad?

Thanks,
Peppermint
 

reqlez

Member
Joined
Mar 15, 2014
Messages
84
disable atime on all datasets and post your results ... if you are using NFS, for example ... every time ESXi accesses the VMDK file a filesystem "update" has to be written saying that "you accessed the file". This is really bad for performance, and so far i have not found a use for atime anyways ... About the 10GB NIC issue ... is this NIC a separate NIC or built in the board ? separate server NIC cards usually will give you a lot more performance because they handle offloading of checksums and other stuff ... while integrated cards usually just pin your CPU for the task ( i could be wrong, depending on NIC ). Somewhere i even heard it takes about 1GHZ to process 1000mbps of traffic, so you can see how a dedicated NIC with offloading can help here. i'm not saying buy another NIC, try to turn off atime and let me know.

Also ... if you wanna run this in production DO NOT disable sync writes on dataset... Get 2 enterprise SSDs with high write speeds and use them as an SLOG in mirror. Your whole system ( as far as ESXi goes ) will be limited by the write speed of those SSDs, by the way... so for ultra performance you need to get those super expensive DRAM with battery drives instead of SSDs.... Like how many VMs is this SAN going to be supporting ? is it mostly databases ? what kind of load ?

But for testing purposes, we need to disable sync writes on the dataset you are mounting for NFS so that the speeds are not horrible for testing.

Oh and one last thing ... raid 10 is the way to go for a SAN ( not controller raid 10 but ZFS software raid 10 ). RAIDZ2 has slow write performance penalty. I see you mentioned you had 6 drives in raid 10, i think it would make more sense to have 8 or 4 drives in raid 10... just because you cannot split a 128KB ZFS record size ( or a 1MB ESXi record size if you are using iSCSI ) between 3 stripes mathematically. Some times its better to have 2 pools with 4 drives each than 1 huge pool with 8 drives, just because of seek times ...

regarding NIC ... you should be able to get better performance if you set "mtu 9000" in the auxiliary options on the freenas NIC and also configure MTU 9000 on your ESXi vSwitch and vmkernel ( have to do both to get 9000 MTU )... make sure you connect the freenas NIC directly to ESXi NIC or have a dedicated 10GB switch with 9000 MTU capability ( 10GB switches are pretty expensive, tho )
 
Last edited:

Peppermint

Neophyte
Joined
Jul 8, 2014
Messages
9
disabled atime on my pool now, but speed is still about 130MB/s in average via NFS. the nics are dedicated intel x540-t2 cards.
 

reqlez

Member
Joined
Mar 15, 2014
Messages
84
oh by the way just benchmarked my system, and getting 250MB/s average on NFS ... please note that my system is very limited because I have 2x mirror 5 years old 15KRPM drives, and instead of dedicated hardware my freenas is virtualized ( passing thru HBA controller to VM directly ) and my freenas maxes out the CPU because of the "virtualized" part i'm sure. I used HDtune with default settings. Can you tell me how your system is set-up physically ? maybe draw a pic or something ? ( how ports are connected, etc ) Like you whole set-up from the freenas box to the ESXi servers ...
 

Peppermint

Neophyte
Joined
Jul 8, 2014
Messages
9
Very frustrating... :(
my freenas box is directly connected to an esxi server through two cat 6 cables using diffent subnets on each wire. both have the intel x540 installed.

freenas box port 1 <-> esx port 1; 10GbE full-duplex on both sides; subnet 10.1.0.0/24
freenas box port 2 <-> esx port 2; 10GbE full-duplex on both sides; subnet 10.2.0.0/24
 

Adam Bise

Neophyte
Joined
May 12, 2014
Messages
10
Hello. I'm new on the forums and have just finished my first FreeNAS build.

I've read numerous posts warning about disabling sync on ZFS. As I understand it, sync writes ruin the performance benefits with ZFS and it's caching. But there is one thing I'm unclear on.

On a local host, the typical solution to the problem is to get a RAID controller with write back cache and a BBU.
If this is the solution locally, then why couldn't a UPS be used for the entire system to achieve the same result?

Also, is it possible for ESXI to simply pass sync requests to the system instead of writing everything sync. I realize that is a VMware topic, but I'm curious if anyone has looked into it.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
ESXi defaults to making every single write sync. There is only 2 ways to do ESXi writes.. all sync or all with no sync. There is no in-between, which is the whole problem. If ESXi separated out the writes appropriately this problem would be less significant. Let's face it, in the real world not all writes are sync writes, so differentiating as appropriate system calls from the guest machine would be the savior. But the reality as I understand it is that most guest machines do not have that kind of system integration to allow VMWare to do that.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh, and to answer your question about the BBU, no. Not the same. The BBU is useful because the write has already been commited by the system itself. The write is stored on the RAID controller's cache and protected with the BBU. If a system crashes those writes in the RAID controller are still protected. If you use a UPS, on a system crash the writes that would be in system RAM would be lost. ;)
 

PacketLoss

Newbie
Joined
Oct 31, 2014
Messages
2
Sorry to Necro this thread, but I'm suffering from terrible NFS writes and I think, from this thread, I have two options available to me. I'd like opinions as to my thoughts.

1). I can add "sync=disabled" to the FreeNAS NFS configuration to force the sync writes off (ZiL off?). I can't tell if this is a bad thing (one guy says it is, one guy says it isn't) but I'm guessing there's a risk. All of my VMware hosts and FreeNAS boxes are on UPSes, but I'm still guessing there is a possibility of corruption and I don't want to spend a weekend rebuilding things.

2). Add a ZiL/SLOG to act as a cache. I'm not certain that this can be done now as I've already gotten the dataset created and populated with data. Any guidance here? Recommendations for the SSD to purchase?
 
Last edited:

jgreco

Resident Grinch
Moderator
Joined
May 29, 2011
Messages
13,526
1) you can safely distrust the guy who says it's not a bad thing... it places your VM data at risk. Maybe not a ton, but the risk is there and it's insane to think there's no risk. OTOH you can make an intelligent and informed choice to go this route. Noncritical, backed up VM's? Probably just fine to do.

2) Easy to add SLOG. Just pick a good one if you go that route. Think the S3700 is favored right now.
 

Ken Almond

Junior Member
Joined
May 11, 2014
Messages
19
I have been running FreeNAS for last 2 years, learned several things. One is performance with VMware / NFS.
For VMware, I run i7-3930K (3.2GHz) on Adpatect 3405 RAID with dual GigE ethernet.
For FreeNAS I run FreeNAS-9.2.1.5 on Quad CPU Q9400 @ 2.66GHz with 8GB RAM and GigE ethernet on RAIDZ1 (5 x 3TB Seageate 7200 desktop class).
The system is completely UPS'ed, moderately fast, and NOT overaloaded - so my performance metrics are not complex or complicated by side issues like overload or multiple users.

In the beginning I was dismayed at 2-5MB/s write speed to NFS doing VMware clones, but also robocopy from my desktop to FreeNAS Windows share.
I tried iSCSI - no real difference.

The *issue* is the sync=always. Every write is synced. When I set sync=disabled my write performance jumps 10x -20x to 40MB/s to 80MB/s range. Based on my hardware(desktop-class)/network(1GB) - this is what I would expect. It does vary from 40 to 80 but its clearly 10x (or more faster) with sync=disabled. Just experiment and you'll find this is THE key issue for FreeNAS write performance on a RAIDZ1.

What I do...
1) You can set sync= status per volume. I set sync=disabled on a specific vmware volume where I want performance and some risk is OK (e.g. I use Avamar to backup VMs nightly so I have recovery plan).
2) You can set sync=disabled / sync=always 'live' (takes 1 second) - no reboot or anyting difficult required. So I set sync=disabled when I'm doing clones or large copies - e.g. anything more than 20GB. Then, sync=always when finished.
3) I have things UPS'ed and I run scrub every week.

So in general, I manage the sync=xxx status by volume and by operation and I'm pretty happy. Never had a write failure with sync=disabled. I pretty sure because subsequent scrubs all came up clean and VMs are OK. I have had several disk failures due to hard-drive failure, replaced disk, reslivered, and all has been fine. I have NOT used sync=disabled during the hard-drive replacement/resliver procedure. I DO really pay attention to SMART reports/tests AND emails alerts from FreeNAS - e.g. you can turn on email to be notified immediately on many SMART events - so you don't have to wait for scrub or other failure to know a disk is failing.

But I do agree that running VMs on hard-disk RAIDZ1 with sync=always is 'doable' but 'a noticeable drag' on the VMs. So the 2-5MB/s is 'pretty slow', is on near the edge of unacceptable, for VMware VMs 'in general'. They work but if you have intense operations like a nightly backup - you will get disk wait spikes, fire off Nagios, and other annoying behavior.

I just recently added an 500GB SSD drive and formatted it wish ZFS and the VMs (with sync=always) are now running 'morally' compared to the RAIDZ1 hard-drive performance. So maybe using SSD instead of regular hard-drives for a particular VM volume can be a solution.


Random Site Note: - I built a 2nd FreeNAS with 6 disks (RAIDZ2) using NAS 3TB Seagates (intead of desktop 3TB). NAS disks are 5200RPM (slower) than desktop 7200RPM but reported to be much more reliable. The RAIDZ2 (2 disks out of 6 can fail) is also slower than RAIDZ1 (only 1 disk of 5 can fail) . I believe the performance is about 20% slower. So if you have sync=always and 3MB per second it will drop to 2.4MB/s. Or sync=disabled, a 50MBs perf will drop to 40MBs. This is anecdotal so take it with a grain of salt.
 

jgreco

Resident Grinch
Moderator
Joined
May 29, 2011
Messages
13,526
The *issue* is the sync=always. Every write is synced.
No, the *issue* is your use of RAIDZ, the VMware inability to identify critical from noncritical data (therefore promotes everything to critical), and then expecting magical unicorns inside your NAS to fart rainbow colored nitrous oxide to make it go at some speed you deem acceptable without actually paying for the technical capability to do it correctly.

If you don't care about possible corruption or data loss, you can literally do whatever the hell you want and yes there are lots of ways to make it faster.

The moment you start to care about properly handling VM disk data to avoid the cases that can cause corruption or data loss, your options become highly constrained. And usually expensive (or trading off speed).

Backups are not the same thing as making sure that your VM's don't get corrupted to begin with. Making things like snapshot-based backups of busy systems with databases tends to end you up with nonviable copies of the database files; other systems have similar gotchas.

Please don't come into my informational threads and tell people that the issue is X when the issue is Y, and that's even clearly explained up front in the first post.
 

mattlach

Senior Member
Joined
Oct 14, 2012
Messages
280
disable atime on all datasets and post your results ... if you are using NFS, for example ... every time ESXi accesses the VMDK file a filesystem "update" has to be written saying that "you accessed the file". This is really bad for performance, and so far i have not found a use for atime anyways
For VMWare and virtual disk images I would agree. Atime adds no real value. I also question how much performance impact adding a 3 bytes of writes (typical time stamp size) might have, but that's beside the point.

The only way I have used atime data myself is when using the find -atime command on a file server when I need to locate files based on their modification time.

Some of my automated backup scripts use this, but it is rare.
 

mattlach

Senior Member
Joined
Oct 14, 2012
Messages
280
ESXi defaults to making every single write sync. There is only 2 ways to do ESXi writes.. all sync or all with no sync. There is no in-between, which is the whole problem. If ESXi separated out the writes appropriately this problem would be less significant. Let's face it, in the real world not all writes are sync writes, so differentiating as appropriate system calls from the guest machine would be the savior. But the reality as I understand it is that most guest machines do not have that kind of system integration to allow VMWare to do that.
Well, you could store a minimal boot image in your VMWare datastore, and have the guest itself mount storage directly to FreeNAS. I presume this would mean that FreeNAS storage access would inherit sync and async properly on the portion mounted from the Guest.

The boot image itself we would probably want to be all sync writes anyway.
 

ibmg

Neophyte
Joined
Feb 2, 2012
Messages
7
[B said:
If you care about the integrity of your VM disks, sync writes - and guaranteeing that the data is actually written to stable storage - are both mandatory.[/B]

These are the four options you have:

  1. NFS by default will implement sync writes as requested by the ESXi client. By default, FreeNAS will properly store data using sync mode for an ESXi client. That's why it is slow. You can make it faster with a SSD SLOG device. How much faster is basically a function of how fast the SLOG device is.

  2. Some people suggest using "sync=disabled" on an NFS share to gain speed. This causes async writes of your VM data, and yes, it is lightning fast. However, in addition to turning off sync writes for the VM data, it also turns off sync writes for ZFS metadata. This may be hazardous to both your VM's and the integrity of your pool and ZFS filesystem.

  3. iSCSI by default does not implement sync writes. As such, it often appears to users to be much faster, and therefore a much better choice than NFS. However, your VM data is being written async, which is hazardous to your VM's. On the other hand, the ZFS filesystem and pool metadata are being written synchronously, which is a good thing. That means that this is probably the way to go if you refuse to buy a SSD SLOG device and are okay with some risk to your VM's.

  4. iSCSI can be made to implement sync writes. Set "sync=always" on the dataset. Write performance will be, of course, poor without a SLOG device.
Ok my question here.
I have a Server with
2 Hex Cores 2,4 GHz
48 GB Ram
2 240GB PCIe SSD's
12x 1TB SAS 7.2K (Mirror)

I plan to use this as storage for ESX (3 nodes)

Since i'm a NFS fan, use a SLOG (the two PCIe SSD's) and NFS will be the saver way than iSCSI, if i understand you correctly.
 

Ken Almond

Junior Member
Joined
May 11, 2014
Messages
19
From Above:
Some people suggest using "sync=disabled" on an NFS share to gain speed. This causes async writes of your VM data, and yes, it is lightning fast. However, in addition to turning off sync writes for the VM data, it also turns off sync writes for ZFS metadata. This may be hazardous to both your VM's and the integrity of your pool and ZFS filesystem.

Comment: Best explanation I've read so far about NFS, iSCSI / integrity - thank you!

Question about ".....hazardous to both your VM's and the integrity of your pool and ZFS filesystem....".
Does a scrub detect the integrity issue mentioned here? (or do these silently occur - even if scrub says all is OK).

Thank
You.
 

jgreco

Resident Grinch
Moderator
Joined
May 29, 2011
Messages
13,526
No. A scrub is primarily looking at blockwise consistency of the data, walking through all the blocks in use on the pool and checking that the parity/redundant information is intact.

So, think about this for a minute. Go install an OS on a computer with a removable drive. Start up some operation that's got lots of disk I/O. Now, without any other preparation, rip out the removable drive, go get a cup of coffee, then come back and put the removable drive back in.

That's what your VM sees when your NAS goes away.

Does it recover? Who knows. Maybe. You probably can't guarantee that never happens. Filers crash.

But we can try to make sure that we don't make the problem worse. What happens if the VM has "written" a bunch of stuff to its disks, the filer crashes and wipes out that in-RAM data, and then reboots? Now the VM thinks those blocks are written on the datastore, but they never made it. So a well-behaved VM that simply waited for its backing storage to return gets hosed up because of disk inconsistencies. If, instead, the filer has made sure that each write that was "written" will eventually make it to the pool, then that isn't a problem. That's your ZIL at work.
 

wreedps

Member
Joined
Jul 22, 2015
Messages
225
>o something like 4x the theoretical maximum for most server RAM throughput.
Hopefully you can see this screenshot of FreeNAS performance - Interface Traffic at 900,000,000 Bps / 8 = 112,500,000 Bps / 1024 = 109,863 KBps / 1024 = 107.28 MBs
The dip down to 400 MilBps is when some of the 5 x simultaneous VMware migrations finished. The uptick is when I added more.
In addition to this internet traffic on FreeNAS, the VMware side showed 100+ MBs in terms of Write Bytes on the NFS datastore.

How do you see all those metrics on one page?
 

Ken Almond

Junior Member
Joined
May 11, 2014
Messages
19
Updated approach to my VMware NFS use of FreeNAS 9.2.1.7.....
I finally understand more about ZIL (SLOG) etc.. and so I bought
1) Dell XPS 8700 - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, with 32GB RAM.
2) 2 x 500 GB Samsung EVO SSDs - Samsung 850 EVO 500GB 2.5-Inch SATA III Internal SSD (MZ-75E500B/AM)
3) 1 x 120 GB Kingston for ZIL - Kingston Digital 120GB SSDNow V300 SATA 3 2.5 Solid State Drive (SV300S37A/120G)

Steps:
1) Used web GUI to configured the 2 x 500 GB SSDs as Mirror zpool with lz4 (initial default compression 6.58x).
2) Used web GUI to configure the 1 x 120GB Kingstone as ZIL (Pool) - and then used GUI to detach it (leaving formatting in place). *I understand the ZIL does not need nearly 120GB, but it was only $50 and I have no other use for that SSD.
3) I used command line to attach the ZIL to the zpool

# zpool status
pool: aeraidz
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
aeraidz ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/cd5048df-b7ec-11e6-8d53-6805ca4185e3 ONLINE 0 0 0
gptid/cd6475c8-b7ec-11e6-8d53-6805ca4185e3 ONLINE 0 0 0
logs
ada1p2 ONLINE 0 0 0

4) I then setup NFS Share and mounted to my ESXI 5.5.0 hosts - with sync=default
[root@aenas3 /]# zfs get sync
NAME PROPERTY VALUE SOURCE
aeraidz sync standard default
aeraidz/.system sync standard default
aeraidz/.system/cores sync standard default
aeraidz/.system/rrd sync standard default
aeraidz/.system/samba4 sync standard default
aeraidz/.system/syslog sync standard default


5) Then I did a migration - of a 160GB (Provisioned, 72GB Used Storage) VM and WO HO... I have 90MBs / second write.....
5a) FreeNAS network .... almost fully saturated 1GB network
upload_2016-12-1_15-49-27.png


5b) ada1 = ZIL. ada2/ada3 = Mirror zpool
upload_2016-12-1_15-49-54.png



5c) Here's the VMware write performance in KBps of the migration
upload_2016-12-1_15-48-48.png



6) I did additional migrations and experimented with sync=disabled, and sync=always but it didn't materially change the write performance on other VMware Migrations.

Another data point - I have another FreeNAS box
- Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz with 8GB RAM.
- Single (striped zpool) Samsung 850 EVO 500GB SSD.
- NFS to VMware

With sync=disabled (very risky) I get 90GB /sec BUT only 7GB with sync=always. 7GB/sec is just too slow for practical VMware use.

CONCLUSION:
- It looks lik" the ZIL really works for the VMware / NFS case. Amazing after a couple of years of fooling with this that a simple ZIL addition seems to have made the difference between usable performance and non-usable performance for VMware on NFS.
- I think I have a fully 'sane' FreeNAS setup with sync=standard and reasonable VMware NFS performance. And with compression, that 500GB easily extends to 1TB+.... of space for my VMs.

I'd be interested in comments - particularly if I have not actually achieved an OK/Safe (e.g. ZFS meta data preserved, sync=standard) solution for an adequate level of VMware NFS performance.
 
Last edited:
Top