iSCSI tuning and science

Status
Not open for further replies.

bahnburnr

Cadet
Joined
Mar 5, 2012
Messages
4
I'm having what looks to be an iSCSI tuning problem. I just built out a SAN box and with a little dd if=/dev/zero action I can consistently get between 250-350MB/sec, depending on the bs and count values. For a 5-disk RAID-Z, I feel like this is appropriate and I'm not at all disappointed in those numbers. When I try to hit it on iSCSI, I'm having trouble getting more than 70MB/sec throughput so far.

Here's what I got:

Core i3-21xx (Sandy Bridge, dual core, hyperthreading)
8GB RAM, can't recall if ECC or not
5x750GB WDs
4 onboard NICs, Intel something, maybe 82574L

At first I was playing with a hypervisor currently in production. I initiated from there and couldn't regularly exceed 60MB/sec. I gave up doing my testing on production machines and switched to an unused hypervisor, bonded its two NICs, bonded 2 of the SANs NICs (FEC, not LACP), and have a nice blank slate to work with. After the NIC bonding on both sides, I've gotten up from 60ish to 70-75ish. I've experimented with zvol device extents versus file extents. I've experimented with different MTUs on each side (currently both 9000). I've adjusted, sometimes wildly, the target values that seem relevant to throughput. I can't find anything that makes any difference, except when I bonded the two NICs on each side. Unfortunately, my hypervisor is fresh out of NICs to bond. One other curiosity is that if I make four targets and benchmark them simultaneously, they come out to around 40-45 each. Still not as much as I had hoped for, but this does seem to imply that the bonded NICs can move the traffic, the switch can move the traffic, et cetera. 40x4 far exceeds a single gbit line, so I feel confident that it's not a network issue, at least not until I can get up to around 200MB/sec. Switch is an oldish, but still fairly decent 3Com. Aside from no LACP support (only FEC), it's never let me down.

All the evidence I've found suggests an iSCSI issue, not a ZFS issue and not a network issue. Anyone disagree?

I just rebooted it with some of the suggestions in the ZFS tuning guide, but I don't expect them to be of much help in these tests. Any thoughts or suggestions? Something obvious that I've totally overlooked?

For the record, I also tried OMV, OpenFiler, and Windows as targets and they all performed no better than 30MB/sec. FreeNAS is starting out the best and my experience with pfSense has been very positive as well, so I want to stay in the BSD camp on this one. I just don't know the intricate details of iSCSI yet.

Cheers, everyone; appreciate any advice!
 

sabreofsd

Dabbler
Joined
Feb 8, 2012
Messages
11
Some quick notes that made a large difference for me:
Setting the MTU to 9000 (so that everything was using it) actually decreased my iSCSI performance.
Make sure to match up your burst,max, and receive data segment lengths on both sides
Set max sessions/connection to 64
Increase Max presend R2T to 32 and Maxoutstanding R2T to 64

Windows 7 has a iscsi utility, so if you're using that, perhaps connect to the iscsi target and do some performance testing from there.

Hope that helps :)
 

bahnburnr

Cadet
Joined
Mar 5, 2012
Messages
4
Okay, here's what I have so far. Set sessions and connections to 64. Set presend to 32 (already was, actually) and outstanding to 64 (was 16). I did get a modest improvement from those two. Maybe 8% or so.

Haven't adjusted the MTU yet. I'm remote right now and not convinced I can change the MTU without cutting myself off. None of this is in production, so it can be down until tomorrow if need be, but wanted to test what I could first. Will do those now and add results if I don't break anything.

Edit: Successfully adjust MTUs on both sides to 6000 and then 4000. No change, still high 60s.

I did wonder if I should have the first burst, max burst, and receive sizes the same on both sides, but I don't see anywhere in the Windows (HyperV 2008 R2 SP1) initiator to change those, though I do seem to recall its presence in earlier versions; maybe I hallucinated it. I've been casually assuming that the target tells the initiator what to do in such a circumstance, but I really have no idea. I know this isn't a Windows forum, so I won't ask... <Brando>The horror!</Brando> I am doing my testing on the bare metal hypervisor right now, not passing through to any VMs yet.
 

bahnburnr

Cadet
Joined
Mar 5, 2012
Messages
4
Okay, umm... Just to be sure I tried a different disk benching program. And now I'm getting substantially higher numbers, close to what 2 gbit lines should max out in some situations. Looks like I've been chasing a ghost. Looks like writing is substantially faster than reading, and not sure why, but I think that's a puzzle for another day. Do appreciate your tips, sabre.
 

sabreofsd

Dabbler
Joined
Feb 8, 2012
Messages
11
Happy I could help!

Matching up the first burst, max burst, and receive sizes on both sides should help you get even higher :)

If you can, it would be interesting to see if you had the same results with MTU sizes as I have: 1500 seems to be faster than 9000. It makes no sense to me logically, but it's what I've observed.
 

bahnburnr

Cadet
Joined
Mar 5, 2012
Messages
4
I fished out those parameters from the registry, no exposure of them in the Windows initiator GUI that I can find. They all lined up on both sides; not sure if that's accidental or because the target told it to. Just to be sure I set every value on both sides to 262,144. I'm getting a fairly consistent 300MB/sec write and 100MB/sec read regardless of size. The write speed is implausible because that exceeds the 2gbit link I have between them. Must be sitting in a cache somewhere, but I think I'm fine with that. Pretty happy overall with just some minor tweaking from stock configuration. I'll add in the other two NICs on the SAN at some point, but it won't help any individual hypervisor since all of mine currently only have 2 NICs. After that I think it's just about spindle speed.

I have played with the MTU values quite a bit and I can't see any statistically significant variances anywhere among values of 1500, 4000, 6000, and 9000. I suppose I'll leave it at 9000 just on general principle, but I can't say I have any good reason for that. I'm not sure what the situation is on the BSD side, but on the Windows side with the Broadcom drivers I'm running some offloading. TCP offload, IPv4 LSO and CO. I have no idea how to do this in BSD or even if it's available with those NICs and drivers, nor what I should be doing for this sort of workload, nor if this has any effect on the relevance of MTU values, so I'll leave the super tweaking for another day after a lot more reading.

Last on my mind for now...I turned off Hyperthreading on the SAN while I was trying to diagnose. Any strong opinions out there regarding HT in a primarily ZFS/iSCSI scenario?
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Hi guys,

Anybody know where to list or adjust some of these settings in ESXi 5.0?

-Will
 

sabreofsd

Dabbler
Joined
Feb 8, 2012
Messages
11
I use 5 at work and you can find the first burst, max burst, and receive sizes under the advanced section of the iSCSI adapter. The MTU needs to be adjusted on the switches and interfaces in vCenter to make it work.

Hope that helps :)
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Hi sabreofsd,

Ahh, there we go....

Thanks for the tip!

-Will
 
Status
Not open for further replies.
Top