ESXi, ZFS performance with iSCSI and NFS

Stylaren · Apr 15, 2014

HoneyBadger said:
Discussed in depth quite well here:
http://forums.freenas.org/index.php...xi-nfs-so-slow-and-why-is-iscsi-faster.12506/

Here's a pull-quote from jgreco on this:

For server VMs, I use Option #4 (#1 also works). Data there is crucial and rolling back to a snapshot or losing a txg isn't an option. For VMs that I don't care about and accept the risk of having to roll back to snapshot/backup in case of serious failure, I use #3.

pbucher said:
I use option #1 in an enterprise production environment for 16 months now without issue. The key is to use a good SLOG device if you have VMs from ESXi.

Thank you HoneyBadger and pbucher, this is exactly what i wan't to hear. Answer based on real experience make me more comfortable in my decision what i should do next to increase my performance. I think I'll go with Raid10 and mirrored SLOG SSD. Thank you all so much for the help and tips!

// Anthon

Stylaren · Apr 15, 2014

pbucher said:
I use option #1 in an enterprise production environment for 16 months now without issue. The key is to use a good SLOG device if you have VMs from ESXi.

By the way Pbucher, are you using mirrored SLOG device?

pbucher · Apr 15, 2014

Stylaren said:
By the way Pbucher, are you using mirrored SLOG device?

No I do not mirror my SLOG's. I did some performance testing back in the day plus when FreeNAS moved up to ZFS v28 it no longer became mission critical to mirror the SLOG. Prior to that if you lost the SLOG you lost the pool or at least had some less then straightforward recovery to do. Performance wise until you have the SLOGs on separate paths you are potentially causing congestion when writing to the mirror. The worst case is if you have both the SLOG die and FreeNAS panic,lose power etc in the few second window between when it wrote the last data to the SLOG and when it flushed it to disk. For most operations that risk is acceptable. Think of it as the same risk as the battery backup on your RAID card going bad and having a power failure before you get it replaced(though I bet the odds are much higher on the RAID card because most folks never notice the battery failing).

I'm currently doing a stripe with a ZeusRAM & a partition on a Intel S3700 SSD. In the end SLOG device choices are highly driven by need & budget and expect to be disappointed in raw performance you get with synthetic benchmarks out of a ESXi / ZFS setup. I say disappointed because my devices you see for SLOGs have great specs on their data sheets but when you lower the block size to what you get with ESXi & a SLOG you see all of a sudden that blazing fast SSD really blows at moving small blocks of data. I haven't ever seen any one post any benchmarks but I see that STEC now makes a SSD specifically tuned to be a ZFS SLOG device.

leenux_tux · Apr 16, 2014

One other thing that has not been mentioned which might help (putting the cat amongst the pigeons here I know) and this is of course specific to ESXi, not to FreeNAS, is to use "local host caching" for the VM's (see http://kb.vmware.com/selfservice/mi...nguage=en_US&cmd=displayKC&externalId=2058983).

So, in a FreeNAS/ESXi setup, when you don't have SLOG on SSD, where will you get your best "bang for your buck" ?? SSD's in the ESXi box as local cache or with the actual storage on the FN system. Or, do guests REALLY fly when you have both ?

Oh the agony of choice

cyberjock · Apr 16, 2014

leenux_tux: First, anything less than mirrored SLOGs is dangerous. Your choice to follow the advice or tell us you know better and not do mirrored like pbutcher does.

Second, I don't think anyone with any experience around here has weighed in on that because it's not something that's going to be recommended if you choose to do FreeNAS in a VM. As for your own VMs that may use FreeNAS as storage where FreeNAS is a separate server, that's a totally separate question and beyond the scope of this forum(We don't support ESXi questions generally) so you might want to ask a VMWare forum that question.

pbucher · Apr 16, 2014

cyberjock said:
leenux_tux: First, anything less than mirrored SLOGs is dangerous. Your choice to follow the advice or tell us you know better and not do mirrored like pbutcher does.

CJ: First I don't recall anyone saying anything about running FreeNAS in a VM and secondly get out of the past man, mirroring SLOGs is a waste of good money unless you are running something a lot more important then I think any one on this forum is running. Trust me my stuff is pretty important and I've got the budget to do mirrored SLOGs if I thought for a second I needed them. So let's leave the name calling behind and maybe let's try a few facts and not quote old outdated Sun documentation either.

leenux_tux: "local host caching" can not replace the need for an SLOG, local host caching is read cache not a write cache.

HoneyBadger · Apr 16, 2014

Let's not get rustled in here everyone.

With sync writes and SLOG, you expose yourself to a tiny but real risk. Let's say your motherboard or something goes with a txg being committed. So you've got, we'll say, 2GB of data that was in RAM (gone) and it's on your SLOG (which is OK.)

But what if your SLOG device didn't handle that failure gracefully? Maybe you picked an SLOG that's not a good one - it has no capacitors to flush its own internal writes - or maybe your board going boom sent. Whoops, SLOG is corrupted. There goes that 2GB of data into the ether.

Now, sometimes that isn't an issue. Maybe that wasn't 2GB of critical information. Roll back to the last snapshot/txg/etc and you're OK, maybe you just have to redownload that pirated episode of Game of Thrones. But maybe it corrupts the files it was part of. Or the VMDK it wrote to. Or it was the most important 2GB that would ever be written on your SAN. Mirrored SLOG helps to close that loophole.

@pbucher - you're right in that most people on this forum probably don't have anything that critical. But @cyberjock is also right in that anything less than mirrored SLOG is a risk. A small risk, but a real one.

pbucher · Apr 16, 2014

HoneyBadger said:
Let's not get rustled in here everyone.

With sync writes and SLOG, you expose yourself to a tiny but real risk. Let's say your motherboard or something goes with a txg being committed. So you've got, we'll say, 2GB of data that was in RAM (gone) and it's on your SLOG (which is OK.)

But what if your SLOG device didn't handle that failure gracefully? Maybe you picked an SLOG that's not a good one - it has no capacitors to flush its own internal writes - or maybe your board going boom sent. Whoops, SLOG is corrupted. There goes that 2GB of data into the ether.

@pbucher - you're right in that most people on this forum probably don't have anything that critical. But @cyberjock is also right in that anything less than mirrored SLOG is a risk. A small risk, but a real one.

Well said. Though if you choose a SLOG device without a capacitor having 2 of them mirrored isn't going to help much.

I do agree having it mirrored closes a hole, it's just such a small small hole and if you have data that important then you need something more then a mirrored SLOG device(or at least I'd highly recommend such). In my case I snapshot my most critical dataset every hour and replicate it to a completely different pool & physical box. My ZFS replication is redundant to a replication that I do at the app level that replicates to 2 other instances on average every 15 min(2nd instance being offsite), though once I get more time in I intend to only do the ZFS replication and probably increase the frequency of it but I want more time before I trust this data solely to FreeNAS & ZFS(it been 16 months now and I average doing a test restore from the ZFS replication monthly - not so much to verify that it works but it's how I refresh the data in my develop version of the app and say I test my backups).

leenux_tux · Apr 17, 2014

WOW !! Luckily for me I had my "flame" retardant underwear on when I re-opened this thread !!!

pbuche: Thanks for the information regarding the SLOG, succinct and to the point, no vitriol at all.

cyberjock: Please read my post again. I was not stating that FreeNAS should be run in a VM, we all know that running FreeNAS in a VM should be reserved for testing and education purposes only, anyone that doesn't, is asking for trouble. Also, the title of the thread is "ESXi, ZFS performance with iSCSI and NFS". The advice given was intended to assist with this, hence the reason I provided a URL to VMWare's web site for more information and reading. It's like saying someone can't mention anything about alternate backup solutions (robocopy for example) because they are "not directly related to FreeNAS so should not be discussed on this forum". You are getting quite abrasive with some of your responses to posts, which is a shame as you are obviously very knowledgeable on FreeNAS. Any "noobs" joining this forum might think twice about asking a question for fear of getting berated. I thought the whole point of this forum was to assist, not hinder.

OK, I must now remove the flame retardant underwear as it is starting to chafe

cyberjock · Apr 17, 2014

I didn't think you were running FreeNAS in a VM....

I'd have expected that the fact that I said....

As for your own VMs that may use FreeNAS as storage where FreeNAS is a separate server, that's a totally separate question and beyond the scope of this forum(We don't support ESXi questions generally) so you might want to ask a VMWare forum that question.

That should have been blatantly obvious... but whatever.

Good luck!

Got like 50 of these threads complaining about ESXi performance in ZFS.. so another one will just make it even harder to get good info.

datnus · May 1, 2014

For ESXi,
- For iSCSI, you're required to use sync=always as ESXi always send iSCSI packets as async. Otherwise, data may be lost.
- For NFS, ESXi always sends NFS packets as sync.

So the performance of
- iSCSI (sync=always) is equivalent to NFS (sync=always or sync=standard)
- iSCSI(sync=standard) is equivalent to NFS (sync=disabled)

pbucher · May 2, 2014

datnus said:
For ESXi,
- For iSCSI, you're required to use sync=always as ESXi always send iSCSI packets as async. Otherwise, data may be lost.
- For NFS, ESXi always sends NFS packets as sync.

So the performance of
- iSCSI (sync=always) is equivalent to NFS (sync=always or sync=standard)
- iSCSI(sync=standard) is equivalent to NFS (sync=disabled)

That pretty much covers it.

Save yourself the grief and just use NFS, though if you have ESXi 5.5 don't install Update 1 it's got some sort of nasty NFSs bug. I haven't been hit by it but it exists and they haven't fixed it yet, current solution is to fall back to 5.5 original.

cyberjock · May 3, 2014

I disagree... (not surprising considering the quality of posts pbucher has had though). That's a different argument though.

I've found iSCSI to be loads superior to NFS...

Performance tests I've done comparing the two on identical setups.. iSCSI won.
Multi-path.. iSCSI only. This is a very big deal for some players and is *the* deciding factor.
iSCSI provides user security settings, NFS doesn't. For some companies, this policy makes NFS a non-starter.

NFS does have one thing going for it though.. it's super easy to grab some VM and move it to another machine. With iSCSI you have to do the move from a machine that has the iSCSI already mounted.

pbucher · May 4, 2014

cyberjock said:
I disagree... (not surprising considering the quality of posts pbucher has had though). That's a different argument though.

I've found iSCSI to be loads superior to NFS...

Performance tests I've done comparing the two on identical setups.. iSCSI won.

Apparently CJ just can't agree to simply disagree....

Well I've gone 1 further and tested both iSCSI & NFS on the same box & pool and NFS easily beat iSCSI in my tests.....that said I think we all miss the fact that what works best in our own environments and work loads is not always the best setup for other peoples environment and workloads. Next up multipath, I do agree that is an advantage to iSCSI for folks in the 1Gb network world but when using 10GbE it's not that useful to do multi-path unless you have a serious need for redundancy & money to throw at it(10GbE enterprise switches are not cheap). Since I advocate virtualizing ones SAN on the same box has the hosted VMs, a 10GbE ultra low latency connection is very doable even for a home server.

This is a good of a thread has any to write this:

The lack of respect for different views on this forum and the complete do it my way or don't bother to post on this forum has brought me to the point that I'm ready to move on to greener pastures. I can't say the rapid fire release of .1 updates the last few months have helped the cause either. While I've never mentioned this before but I'm an IT architect with over 2 decades of experience(in multi-million dollar organizations that process millioins of transactions a month) and none of my systems have ever lost any data over these 2 decades do to design issues I'm proud to say(can't help when users delete the stuff by accident or someone drops tables from the production database, which is why backups/snapshots are good). I'm not just some guy who reads a few blogs and hacked together a house server for fun one weekend, though I have done these things. I really do appreciate the folks who put in tons of time to help out new users here and try to keep them on the right track , but this forum has become a rather poisoness place in the last year and has become a big turn off to folks looking to use FreeBSD & ZFS. In fact if I would have read this forum in it's current state today when I was first looking at FreeNAS I would have never bothered to even download and try it.

Now I've got some ZFS SAN solutions to eval for work(that provide paid support for setups that I'd be flamed into the ground if I even suggested trying to run on this forum, in fact 1 solution even heavily recommends a setup what would put CJ over the edge on the average day). Best of luck to all who continue to hang on and travel down this road, but it's time for me to travel a new one I feel. BTW: anyone interested in hearing about a 100%++ performance increase for serving ESXi VMs off of a ZFS pool in a commercial environment feel free to PM me(and no I'm not playing with tuning parameters or hacks either).

cyberjock · May 4, 2014

I've done well over 500% on some designs.. big whoop on 100% bro. I'm sufficiently knowledgeable to know you really don't know what you are doing. I'm also definitely not the only one here that has that opinion.

Good luck in your travels.

pbucher · May 5, 2014

cyberjock said:
I've done well over 500% on some designs.. big whoop on 100% bro. I'm sufficiently knowledgeable to know you really don't know what you are doing. I'm also definitely not the only one here that has that opinion.

Good luck in your travels.

I'm going to enjoy my travels. But you miss reading between the lines, I'm not making any design changes I'm looking at an OS change and comparing identical pool designs across different NAS appliances.

For fun I'll post a brief overview of my pool design: It's a Supermicro 45 slot JBOD unit with dual channel SAS expanders setup for multipath I/O which are hooked to a LSI SAS HBA PCI 3.0 card by 2 4 channel SAS cables. The pool itself is made up of 24 Seagate Constellation SAS drives in a RAID 10 configuration though I find I get generally get better throughput configuring the pool as 3 RAIDZ2 arrays(which goes against conventional wisdom). For my SLOG device I stripe across a STEC ZuesRAM & an Intel S3500 SSD(the SSD added recently to boast performance - I'd use a pair of S3500s in the future but they didn't exist 2 years ago - some day a Fusion I/O or other such device will get dropped in here but I do have a budget) both of which are on separate controllers from the pool so as to avoid any kind of bus contention. I then use a dedicated 10GeB SAN network using jumbo frames to feed my ESXi boxes and for replication between my SANs(which is why I don't care about security of the NFS server(vs iSCSI) nothing but the SANs and ESXi have access to this network). Finally the VM that powers FreeNAS has 32GB of RAM & 8 CPUs allocated to it. Just to show I care about protecting this stuff it's got redundant power supplies with each on plugged into a different 3000 VA UPS(that way a UPS failure won't take me down). Now if you have some magic to increase the throughput of this by 500% please speak up, I' always willing to learn.

aufalien · May 5, 2014

Hey PB, a striped SLOG doesn't seem to make sense as its a single threaded operation. This means it will wait until the write is done to the first device before writing to the second.

While I'm not poopoo-ing your idea, I'm curious if you have any number before and after which shows performance benifits of a striped SLOG? Also, ZuesRAM specs are way above Intels so why mix the 2?

pbucher · May 6, 2014

aufalien said:
Hey PB, a striped SLOG doesn't seem to make sense as its a single threaded operation. This means it will wait until the write is done to the first device before writing to the second.

While I'm not poopoo-ing your idea, I'm curious if you have any number before and after which shows performance benifits of a striped SLOG? Also, ZuesRAM specs are way above Intels so why mix the 2?

Hey some great questions there. I agree the ZuesRAM should blow the Intel away but it doesn't in actual usage. For the small block sizes ZFS writes the IOPS of the Zues isn't that great :(

Below are some less then real world tests, but what I think it is a good indication of what design will work better under real work loads. The #s come from a VM running on the same ESXi box as FN using a vmxnet3 interface and 9k jumbo frames. FN is 9.2.1.5. The VM runs CentOS 6.4 and simply has a 40 GB 2nd unformated HD stored on a NFS store exported by FN. Compression, dedup, & atime are off on the dataset. Compression is desirable most of the time but since I'm using all zeros it completely screws up my benchmark. The command I'm using is "dd if=/dev/zero of=/dev/sdb bs=4k count=2000000" which writes about a 8.2GB file in 4k chunks. These #s are for clean newly created pools(I've parked my data else where while I do some benchmarking & evaluations).

RAIDZ2 config(4 arrays of 6 disks):
Intel S3500 Only: 130MB/s
STEC Only: 133 MB/s
Both: 133MB/s
STEC Only when running FN 9.2.x version on the bare metal connected to another ESXi box over a SAN only 10GbE network: 92.4 MB/s

Mirror config(12 arrays of 2 disks):
Intel S3500 Only: 132MB/s
STEC Only: 149 MB/s
Both: 145MB/s
Both with STEC in JBOD unit: 136MB/s
STEC Only in JBOD unit: 132MB/s
sync=disabled(for fun only-RAM cache is fun): 1.1GB/s
no SLOG(not fun at all): 7.4MB/s
local(async): 451MB/s
local(forced sync):20MB/s
local(async /w bs=128k): 941MB/s

Big thanks to aufalien for pointing out the ZIL is single threaded. I missed that in my original testing because the SAN was under load and I had the STEC in the JBOD unit at the time(so the STEC only #s where much much lower then above) I did my original testing so I was getting better #s with the strip simply because the Intel had better #s and I failed to test the Intel standalone, I made the mistake that the stripe improved my #s. Now that I've got the STEC on a different LSI card then the main array with it's own dedicated SAS channel the Intel is actually dragging me down.

Final note(ready for flames): I need to confess that the Intel isn't dedicated to the FN array. It's my boot drive for ESXi and I create a 10GB thick Eager Zeroed vmdk on it for my FN VM. As you can see from the above #s considering the cost of a S3500 vs a ZuesRAM the econo setup if pretty dam good. And no I didn't loose the pool when I briefly converted to baremetal for testing, I just force imported the pool and dropped the dead drive from the ZIL stripe because I wasn't smart enough drop it before I converted to baremetal. Noob note: don't do this if you haven't had a clean shutdown of the pool.

ser_rhaegar · May 6, 2014

Thanks for the benchmarks pbucher.

Regarding the Intel, what size S3500 did you use?

pbucher said:
Intel S3500 Only: 130MB/s

pbucher · May 6, 2014

ser_rhaegar said:
Regarding the Intel, what size S3500 did you use?

It is the 120GB model which doesn't have the best performance in the series to start with, but I was buying it has a boot disk not for a SLOG at the time. For some lower end boxes I've put together using the virtual SAN/ESXi combo I've been using Seagate 600 Pro SSDs which if you get the under provisioned model they beat the S3500 on specs and cost about the same.

Here's the same test I used above run against my house server which is using econo SLOG method with a standard provisioned Seagate 600 Pro(the under provisioned wasn't available when I bought the setup). This is also from a pool that's seen a good amount of usage, is under load and is over half full. Disks are once again Seagate Constellation SAS drives with no expander. The LSI controller in this box is the older PCI 2.0 model. FN is 9.2.1.5 again.

Mirror config(3 arrays of 2 disks):
Seagate 600 Pro 120GB model in econo config: 92.7 MB/s

Its hard to get #s on this box because it's under load so the above was an actual # I got using the count parameter at 100000 and waiting for the load to drop down to a few hundred k of write activity.

-Paul

Important Announcement for the TrueNAS Community.

ESXi, ZFS performance with iSCSI and NFS

Cadet

Cadet

Contributor

Patron

Inactive Account

Contributor

actually does care

Contributor

Patron

Inactive Account

Contributor

Contributor

Inactive Account

Contributor

Inactive Account

Contributor

Patron

Contributor

Patron

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ESXi, ZFS performance with iSCSI and NFS"

Similar threads