VMware iSCSI tuning under 9.3 and 9.10

Status
Not open for further replies.

Bigtexun

Dabbler
Joined
Jul 17, 2014
Messages
33
So I found an old thread with some great answers from Jgreco talking specifically about tuning ZFS for VMware. I don't want to post to a really old thread, and a search did not find a more current thread. If there is one, please help an old grey-haired man find it...;)

In that thread from 2012, there was discussion of VMware loosing it's connection to the iSCSI filesystem due to ZFS flushing causing the iSCSI initiator to drop. This was under FreeNAS 8.0 and 8.2 back in 2012. My question is basically to revive the question for ESXi 6.0 and FreeNAS 9.3 and 9.10. I have seen one post from someone stating that 9.10 is causing this same problem since the update... VMware works for a while, then drops the filesystem.

The reason I am asking is that I am in the process of migrating my VMware systems off of a failing, really old SAN, to a FreeNAS iSCSI "SAN". I moved one server over, and I really like the performance and the flexibility it is giving me. It was a big "win". But based on the other user's complaint, it seems like I should spent some time getting the system properly tuned... or at least researching the issue so that I don't create a problem for myself later.

My VMware systems do not have a huge amount of disk i/o. They are mostly utility systems such as DNS, DHCP, and other mostly idle applications. So I'm not creating a performance bottleneck, and I don't care if there is some contention for resources. I've got multiple 10g switch blades in large chassis layer 3 switches, and the 10g ports are mostly idle, perhaps 2G of total network bandwidth between all of the switches combined. Most of the network is used for router QA where routers typically run at speeds below 100mb. It is a fast, mostly idle network, and fast but mostly idle VMware servers.

So I was trying to research the tunables that Jgreco cited in 2012. One I found, and it seems to be defaulted to the recommended value, and the other I can't find at all.

Jgreco said "By far the most important caveat is that your ZFS *must* be tuned to be responsive, which almost certainly involves setting vfs.zfs.txg.timeout to 5, and setting write_limit_override to a value substantially lower than you might expect, to limit the amount of stuff ZFS has to flush out to disk. When ZFS is flushing, it may be unresponsive, and that can cause an iSCSI initiator to drop. "

My main FreeNAS system has 64gig of ram, 30T of disk in a half-filled supermicro server system, with a RaidZ3 running on 11 drives. I have tried to do this one "right". This is not my first "big stack of drives" under FreeNAS, and I've done it is the distant past under FreeBSD... So I'm pretty comfortable with ZFS in general. However this IS my first FreeNAS iSCSI san. I have no budget for going out and buying a new SAN of a size similar to what I can afford on FreeNAS. So once again I want to do it right, and I humbly submit a request for guidance. I want to do this before I cause a problem.
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
I suspect the first bit of advise jgreco (or others) will give is to not use RaidZ3, but a stip of mirrors. In your case I suppose 5 2 disk mirrors if you only have 11 disks. Simply for the increase in IOPS, as 5 vdevs would vastly outperform 1 vdev. (In general of course it depends on the workload. I mention it here because you stated you wanted to do it "right".)

Beyond that I actually went with vfs.zfs.txg.timeout=1 to smooth out the writes on my system. (5 2-drive mirrors curiously enough, but only 16GB RAM.)
 

Bigtexun

Dabbler
Joined
Jul 17, 2014
Messages
33
Yeah, setting vfs.zfs.txg.timeout=1 was something I was thinking about. I found an interesting blog from 2014 that suggested massive iops improvements were possible with an L2ARC using an SSD. My machine has room for a couple internal SSD's. so I could consider that. The tuning article I found is here.

But to be fair, I was comparing my stock FreeNAS RaidZ3 performance to local Raid10 on the servers using SAS drives, and FreeNAS is very comparable to local Raid10 performance... Writes were slightly faster on FreeNAS, and reads were slightly faster on local disk. So benchmarking aside, the current performance is great... Of course I only tested sequential reads and writes, hardly a full benchmark, and no attention to iops at all. No real penalty for using iSCSI compared to local disk in sequential i/o. If the L2ARC advice is still valid, that looks like a potential big win for a small amount of investment.

And while this is all great strictly from the tuning angle, my real concern is simply stability. My aps are infrastructure support services that don't require high performance. Benchmarking those services so far has shown excelent performance compared to the previous generation of bare metal servers. So while is may well be worth improving performance further, the law of diminishing returns is already threatening to make all of this an academic waste of time. For our performance sensitive loads, where we simulate 3 million routers being issued firmware updates, configuration changes, and other simulated big-data activities, we spend money on "real" SAN's because that is what our customer is using. So before we try to make this as fast as possible, all I was really trying to do is avoid instability or availability issues as I heard reported by others. I don't want my DNS or DHCP servers to ever fall off the network just because I forgot to tune something.

So this is more of a reaction to an out of date statement made by an experienced user, and a legitimate concern brought up by a current user, both converging on a serious stability issue that may or may not really exist. I will look into this further and post some action items I settled on, and results.
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
If what you are wanting is stability, i recommend just using the default settings.

Hell i want performance and still use the default configs.
 

Bigtexun

Dabbler
Joined
Jul 17, 2014
Messages
33
Dihard,
I'm sort of coming to the same conclusion. I think the tuning warning I read from 2012 was before the FreeNAS developers spent time tweaking iSCSI performance.

I've been running benchmarks from a vm on my first iSCSI mounted server, and I'm seeing good numbers, iops running around 12000, sequential writes averaging 715M/sec, sequential reads averaging 299M/sec. The file create numbers are all off scale, so bonnie++ can't display them, so I have to work on how to fix that before I can see all of the numbers. This is all without any tuning. I think the writes are artificially high because of the write cache, but a single bonnie++ run should exceed what I typically see in the real world from all machines combined, so I don't know that I care about that.

I'm a little confused about the iops, they look high to me. I was getting lower numbers at first, running in the 1000 to 8000 range, but when I researched the "correct" way to use bonnie to measure iops, it shot up to the 11889 to 12188 range. But even the lowest numbers blow away physical disks running hardware based Raid10.

I think my next step is going to be to just run bonnie++ in a loop on 2 VM's per host, across several VMware hosts, and really try to hit it hard, and see if anything blows up. Of course I expect the results of that to look a lot worse than these numbers, but I need that data as well to draw any conclusions about stability under load.

The FreeNAS box I build is very beefy, the only places I skimped is I didn't get the more expensive SAS expander that would have boosted the disk i/o path by making it twice as wide, and I didn't get drives designed to be fast... and of course I haven't added a SSD cache or done any tuning.

I ran bonnie++ on another VM that is on the same VMware server, but with local raid10. The last time I did this, I thought I got comparable numbers, but I wasn't using a proper tool. As I recall the numbers were not too far apart from each other, but when I used Bonnie++ to do that test, the physical disks got poor performance compared to the iSCSI mount from FreeNAS. That was rather unexpected, but bonnie++ measures more data points, so there is more detail to compare... Latency of local disks was a bit better for many measurements, but not all, andwhere the FreeNAS write cache was involved was much faster than spinning disk. Overall FreeNAS running RaidZ3 across 11 drives is a LOT faster in general. That supports the idea I probably don't need to do much tuning...

So yeah, I don't have a unused commercial SAN I can compare to, but FreeNAS is doing pretty good. I'm sure if I did everything that I could, I could improve performance even more, but I don't seem to really need that.
 

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
I'm a little confused about the iops, they look high to me. I was getting lower numbers at first, running in the 1000 to 8000 range, but when I researched the "correct" way to use bonnie to measure iops, it shot up to the 11889 to 12188 range. But even the lowest numbers blow away physical disks running hardware based Raid10.

Are you using SSDs? If not, then this must be wrong. You are probably reading from ARC or something.
I guess realistically your pool could reach about 200 IOPS with fast 7.2K nearline drives. From your 11 disk storage array with RaidZ you sure get lots of bandwidth, but the IOPS of just a single drive. As iSCSI use cases typically need lots of IOPS, RaidZ is not recommended unless it is feasible to create multiple narrow vdevs (i.e. 4 x 6 RaidZ2 or somesuch).
 

Bigtexun

Dabbler
Joined
Jul 17, 2014
Messages
33
This is measured IOPS from a vm... so yes of course it is using ARC, I have 64gig of ram in the FreeNAS server. Obviously I am taking advantage of that.

But yes, I keep hearing it is not recommended. I read that over and over. But I read and hear LOTS of things that are not true in every real-world condition. Obviously I'm not an expert, my real experience is in Internet engineering, networking, and embedded engineering related to those things. But in this case, the server is not under-built. I was allowed to spend money, but I was required to keep each purchase below $1500, so no single part could cost more than $1385 plus tax. So it was built one part at a time, using good parts.
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
Two things will cause most benchmarks to be fairly useless when compared to a real-word VM scenario:

Data existing in ARC.
Pool being fresh , with no fragmentation.

Basically, use striped mirrors and leave plenty of free space to combat these. Just trust me on this one so i don't have to type a bunch more.
 
Status
Not open for further replies.
Top