ESXi NFS Datastore - What are Best Practices?

Status
Not open for further replies.

reqlez

Explorer
Joined
Mar 15, 2014
Messages
84
Hello.

I had to post this because after reading over 2 hours of random information online ( mostly referencing Naxenta, Oracle, etc ) I was not able to come to a conclusion as to what is the recommended configuration for using freenas as an NFS Datastore.

So as we know, freenas defaults to an 8K ZVOL block size ( when using iSCSI ), but for datasets it defaults to 128K ZFS recordsize.

1. Lots of Naxenta best practices say to use 8K or 16K recordsize for NFS datastore to be used with ESXi 5.0+ because beginning with ESXi 5.0 they changed from 128K to 8K reads/writes.

2. People in MS SQL forums say to format SQL VMs NTFS with 64K and make a matching 64K ZFS recordsize.

3. Some random people online are saying that using 128K recordsize is hard on your SSD SLOG and that performance will suck.


I'm not trying to make a "SPECIALIZED" NFS datastore for "X APPLICATION", but would still like for the SQL server not to suck because mostly the VMs will work with MS SQL in some form. I would also not enjoy SSD SLOG sucking ( if it's even true ) because of large recordsize.

1. What is a general recommendation from the freenas forum members for an NFS recordsize that will work with mixed VMs environment ?

2. After setting to the recommended "recordsize" that fits all, Is it even worth trying to adjust your NTFS block size in different VMs ( like ... 64K for SQL, maybe 8K-32K for exchange, etc )

3. Any other best practices with ESXi and NFS on freenas ?

Any comments / explanations are appreciated !
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
FreeNAS 9.3 starts to use 16K block size for ZVOLs by default, and even bigger if working on top of wide RAIDZ pools.

Generally block size is a tradeoff: increasing it reduces data fragmentation and CPU over head on linear I/O, but causes more read-modify-write cycles for short/misaligned I/O. Really large block sizes like 128K may also create interface bottlenecks for SSDs -- if initiator doing 4K random I/O, then SSD has to do 128K I/O for every of those 4K.

What's for file systems -- first make sure that partitions are aligned to ZVOL block size. If partition is misaligned, then any FS block size may cause read-modify-write. After that, if it is possible, setting file system block size equal to ZVOL block size should eliminate all read-modify-write cycles, giving best performance. But sometimes that is may be not acceptable for initiator, depending on specifics of stored data and workload.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The short answer is that the block size doesn't affect performance in a way that you will ever care about. There's so many other ways you'll probably screw yourself (such as not enough RAM, no slog if you need one, no l2arc, etc.).

The long answer is you're already in error on a few things. Not the least of which is datasets.. they do not default to 128KB recordsizes. That is the maximum allowed recordsize. That value does NOT specify the actual record size.

Things kind of go downhill from there and while you could probably see a few percentage points (we're talking 2-3%) from trying to optimize your setup, you'd be talking dozens and dozens of hours to get that 2-3%. It's really not worth your effort.

Also, NFS is a *terrible* choice for a datastore. If performance is your concern NFS shouldn't even be considered. NFS alone *significantly* increases your hardware requirements.

Lastly, (and you didn't provide hardware specs), but I'd be willing to bet dollars to donuts that whatever hardware you are currently thinking about purchasing (if you don't already own it) is probably underspecced by a longshot. Everyone under specs their box when running VMs, then screams bloody murder when they find out it will take 96GB of RAM, a fat l2arc, etc.

Quite literally, if you buy TrueNAS we don't even recommend NFS at all anymore for datastores. It's iSCSI all the way. Right now, since 9.3 is out and the CTL iSCSI code works best with zvols, we recommend iSCSI zvols and that is it. If a customer wants to do NFS we will do it, but we'll warn them of the performance implications. Trying to change from NFS to iSCSI later is often impossible for many people and we don't want to be blamed for not doing what is best for our customers. ;)
 
Last edited:
  • Like
Reactions: rev

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
I can second cyberjock, that each protocol has own niche. For block storage it is better to use iSCSI. Doing block storage over NFS is possible, but is more like fallback for storages unable to do iSCSI well. FreeNAS 9.3 can to iSCSI great.

NFS is good for file level access -- if you need to store million small 4K files, then NFS will write each as such. Having knowledge about separate files is exactly where ZFS can benefit a lot from its variable block size feature, doing many things much more efficiently. But if you create one huge file and start "randomly" overwriting it over NFS, ZFS will have no idea about what it is and will access it all in fixed record size chunks.

NFS indeed had some benefits in some situations. Like if you delete your VM on NFS datastore, space on pool released automatically. But iSCSI in FreeNAS 9.3 got UNMAP support to handle that. On NFS datastore you may manually copy your VM image without transferring it over network, but iSCSI in FreeNAS 9.3 got XCOPY support to handle that.
 

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
@mav
I do not want to hijack the OT,
But if you create one huge file and start "randomly" overwriting it over NFS, ZFS will have no idea
with a single iscsi datastore on vsphere, zfs will also have the same issue, besides the blockstore advantage, doesn't it?‎
 

reqlez

Explorer
Joined
Mar 15, 2014
Messages
84
Interesting comment about the SLOG SSD ...

"XCOPY support to handle that" so how would I take advantage of that from within ESXi for "moving VMs" ?

There is a very good reason i'm doing NFS, and it's mostly due to the fact that i'm running the storage on the same server versus getting 2 servers. Why ? cost reasons ( not many users will use the machine, but i don't want horrid SQL performance either ) ....

Why couldn't i just run ESXi on a raid controller with cache ? I need the snapshot and replication feature of ZFS.

Why use NFS instead of iSCSI ? ESXi does a really crappy job starting up when it cannot reach an iSCSI datastore and it freezes and take a while and in the end the iSCSI datastore is not even available for the VMs to start form it after freenas VM boots, i tried to remedy this before with adding scripts to the ESXi itself for refreshing iSCSI and it works "mediocre" at best. NFS is the easiest and most reliable way i found to run freenas and other VMs on the same server while serving up VMs from freenas storage.

Now if the 128K is the "maximum" not the "set" size, that means VMware will issue its 8K or whatever sub block reads / writes and i don't really need to care about setting a specific recordsize i guess ? Then my only worry becomes the SSD SLOG, but ... why would the SLOG do a 128K write with a 4K request if the recordsize of 128K is the "MAX" not the "used"

Last but not least, with all this "iSCSI is the way to go" idea, as i understand iSCSI has some kind of "cache" and unless you set datastore to sync=always you will use the cache and might loose writes from VM, so is the general recommendation with iSCSI that you give customers is to enable sync=always ? NFS does that by default so isn't it the "Safest" way ?
 
Last edited:

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
"XCOPY support to handle that" so how would I take advantage of that from within ESXi for "moving VMs" ?

Using XCOPY you can vMotion and clone VMs within the same storage host without network traffic. Though in your case network is virtual, so difference in only in CPU overhead.

Now if the 128K is the "maximum" not the "set" size, that means VMware will issue its 8K or whatever sub block reads / writes and i don't really need to care about setting a specific recordsize i guess ? Then my only worry becomes the SSD SLOG, but ... why would the SLOG do a 128K write with a 4K request if the recordsize of 128K is the "MAX" not the "used"

ZFS uses sub-maximal block sizes only when storing small files. If file is bigger then 128K, all its blocks have size equal to dataset record size at the time of file creation. That is where NFS would benefit if it stored separate files, but not when doing blocks I/O. I am not completely sure about SLOG, whether ZFS writes there only modified data or whole modified blocks. But this problem does exist for L2ARC -- ZFS writes there full blocks, and reads them also as such, that may become a bottleneck.

Last but not least, with all this "iSCSI is the way to go" idea, as i understand iSCSI has some kind of "cache" and unless you set datastore to sync=always you will use the cache and might loose writes from VM, so is the general recommendation with iSCSI that you give customers is to enable sync=always ? NFS does that by default so isn't it the "Safest" way ?

iSCSI has no separate cache, only usual ZFS ARC. The difference from NFS is only in policy used by initiator (VMWare). While modern NFS supports both sync and async writes, VMWare sends all writes as sync. iSCSI also supports both sync and async writes, but for some reason in that case VMWare does not implicitly insist on syncing, while hopefully should pass sync requests through from VM.

If your hardware is able to handle all writes as sync, and for your data loss of last few seconds of writes in case of crash is critical, then you can always set sync=always policy, independently whether it is NFS or iSCSI share.
 

deasmi

Dabbler
Joined
Mar 21, 2013
Messages
14
Interesting comment about the SLOG SSD ...
Why use NFS instead of iSCSI ? ESXi does a really crappy job starting up when it cannot reach an iSCSI datastore and it freezes and take a while and in the end the iSCSI datastore is not even available for the VMs to start form it after freenas VM boots, i tried to remedy this before with adding scripts to the ESXi itself for refreshing iSCSI and it works "mediocre" at best. NFS is the easiest and most reliable way i found to run freenas and other VMs on the same server while serving up VMs from freenas storage.

I was running an all-in-one setup of ESXi and Freenas on NFS for a long time, but recently switched to iSCSI and have no regrets.

Yes, it does take a minute or so longer to boot but I find performance is much improved. I am not doing anything special with scripts just have a 500s delay in the auto startup list for the FreeNAS instance. That gives it plenty of time to come up, stabilise before the first of the VMs hosted on it start. But you'll be needing to do something very similar with NFS anyway.

If you need to reboot the FreeNAS instance ESXi copes fine, and recovers within seconds of FreeNAS coming back. NFS took ages or didn't recover at all requiring a reboot in my experience.

NFS is nicer to manage, I agree, and if you have Netapps I'd still use that but with FreeNAS I think iSCSI has the clear edge.
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
Hi All,

We might see a turn of events for NFS performance on ESXi 6.x and FreeNAS 9.3 as the new ESX supports NFS 4.1 or PNFS. Getting iSCSI to work with multiple initiators takes time on the VMware, network and FreeNAS side.

Thanks,
Joe
 

trekuhl

Dabbler
Joined
Jun 3, 2013
Messages
10
as far as i know; freenas 9.x currently only supports nfs v4.

4.1 should be in the freenas 10.x+ release so may be a bit to take full advantage. once esxi6 hits GA in the next month or so I dont think it will do a lot for freenas NFS at this time.

also esxi6 supports multipathing but its not using pnfs according to the stuff ive read thus far. http://wahlnetwork.com/2015/02/02/nfs-v4-1/
 
Last edited:

adamjs83

Dabbler
Joined
Sep 3, 2015
Messages
40
If your hardware is able to handle all writes as sync, and for your data loss of last few seconds of writes in case of crash is critical, then you can always set sync=always policy, independently whether it is NFS or iSCSI share.

Where can you find sync=always setting in the gui?
 

adamjs83

Dabbler
Joined
Sep 3, 2015
Messages
40
Do you know any resources covering how set this in the command line? I have seen lots of references to the setting but no instructions.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Those commands are unified for any existing ZFS implementation. You may read FreeBSD zfs man pages (just run `man zfs` in command line), or any other source documenting ZFS.

If in short, just run `zfs set sync=always <dataset_name>`.
 

adamjs83

Dabbler
Joined
Sep 3, 2015
Messages
40
Thanks so much.
 

reqlez

Explorer
Joined
Mar 15, 2014
Messages
84
by the way... i just upgraded to ESXi 6 and latest freenas on my "all in one" Freenas / ESXi server and there is a 75% chance that every time the server boots, the NFS datastore never comes online. Not sure what the hell ... but its annoying, restarting the NFS service in freenas after its booted makes the datastore available again. I even put in manual entry with client IP and hostname into the hosts database in freenas for the reverse DNS IP since some people commented that NFS needs reverse DNS working ...

Maybe i should be switching to iSCSI now ... but would really hate to lose the ability of extracting VMDK files direct from the datastore via NFS client on my PC :(
 

reqlez

Explorer
Joined
Mar 15, 2014
Messages
84
great ... converted to iSCSI ... left server rebooting overnight, the iSCSI datastore never refreshed even after hours. only way is to "rescan" volumes manually after boot.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I can definitely say that I don't know anyone else with ESXi 6.0 that are having the issue you are having with post-reboot issues. So I'm guessing its something unique to your setup. No clue what it would be though. :(
 

wreedps

Patron
Joined
Jul 22, 2015
Messages
225
I am using Iscsi and ESX6 and it works just fine. Super fast to and I am not evening running recommended hardware on this test box.
 
Status
Not open for further replies.
Top