SOLVED iSCSI MultiPathing slower then single path for writes

Status
Not open for further replies.

nasnuc

Cadet
Joined
Jul 31, 2014
Messages
9
The issue I'm having is with Round Robin on ESXi

One Zpool, 36 4TB disks, striped.

Server:
FreeNAS-9.2.1.6-RELEASE-x64 (ddd1e39)
SuperMicro SSG-6047R-E1R36L
SuperMicro X9DRD-7LN4F-JBOD
16 x DR316L-HL01-ER18 16GB 1866MHZ
Additional AOC-S2308L-L8E Controller
36 x Seagate ST4000NM023 SAS Disks
PCI Intel X540-T2 Dual 10Gb/s Network (iSCSI for Veeam)
PCI Intel I350-T4 Quad 1Gb/s Network (iSCSI for VMware)
Intergrated Intel I350-T4 Quad 1Gb/s Network (iSCSI for VMware)

24 disks are connected to one controller (front), and the other 12 are on the second controller (rear)

We have 6 ESXi 5.1 hosts that we are planning to connect over dual 1Gb/s iSCSI using MPIO/Round Robin for performance
This will be for archival data, old VMs that need to be moved off the production SAN for long term storage. These 6 hosts already have software iSCSI setup to an Equallogic array.
Currently designed with two 1Gb/s pNICs on one vSwitch bound 1-to-1 to two vmkernel ports. Each vmkernel has an IP on the same subnet. (Dell Equallogic BRP)
dOojSbx.jpg

I've added two more vmkernel ports each on a new subnet (.2.101, 1.101), bound them to the iSCSI Initiator
The FreeNAS system is configured with two integrated and two PCI I350-T4 setup for iSCSI, one portal with the 10.99.0.1, 1.1, 2.1 and 3.1 IPs

Code:
igb0    iSCSI_0    Active    10.99.0.1/24 \
                                             --> phySwitch0 -> ESXi vmnic0
igb6    iSCSI_2    Active    10.99.2.1/24 /

igb1    iSCSI_1    Active    10.99.1.1/24 \
                                             --> phySwitch1 -> ESXi vmnic4
igb7    iSCSI_3    Active    10.99.3.1/24 /


ESXi Host1;
vmnic0 -> igb6 10.99.2.101
vmnic4 -> igb1 10.99.1.101
ESXi Host2;
vmnic0 -> igb0 10.99.0.102
vmnic4 -> igb7 10.99.3.102
ESXi Host3;
vmnic0 -> igb6 10.99.2.103
vmnic4 -> igb1 10.99.1.103
Etc..

The first three tests (Starting at 13:15ish ending at 13:50) are with Round Robin set, and the last two (13:55 and 14:05) are with Fixed Paths set, each of the latter using a different vmkernel port.
gpsusJF.jpg

tXfZ5ZK.jpg


I used SQLio from a VM hosted on the ZFS datastore to test with the following parameters;
sqlio -kW -s50 -fsequential -o8 -b512
sqlio -kW -s50 -fsequential -o8 -b256
sqlio -kW -s50 -fsequential -o8 -b128
sqlio -kW -s50 -frandom -o8 -b128
sqlio -kW -s50 -frandom -o8 -b256
sqlio -kW -s50 -frandom -o8 -b512
sqlio -kR -s50 -fsequential -o8 -b128
sqlio -kR -s50 -fsequential -o8 -b256
sqlio -kR -s50 -fsequential -o8 -b512
sqlio -kR -s50 -frandom -o8 -b512
sqlio -kR -s50 -frandom -o8 -b256
sqlio -kR -s50 -frandom -o8 -b128


So each path can sustain full 128MB/s using Fixed Paths, but not when RR is implemented

I've played with
zfs set atime=off stripe
zfs set sync=standard stripe
Enabling/Disabling Delayed Ack
Set the Burst Lengths equal on ESXi and FreeNAS

But no change in performance when using RoundRobin

I've tested that the array can take the stress locally
Max ARC Size:
/mnt/stripe# sysctl vfs.zfs.arc_max
vfs.zfs.arc_max: 240018381619

/mnt/stripe# iozone -a -s 500g -r 4096
Auto Mode
File size set to 524288000 KB
Record Size 4096 KB
Command line used: iozone -a -s 500g -r 4096
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
524288000 4096 2268697 931022 2083094 2405500 115281 1958555 522214 5676487 885139 2097630 931524 1626351 2006373
500G 2.1635G 909M 1.9865G 2.2940G 112.5M 1.8678G


Before I rip out FreeNAS for another NAS solution, is there any quick thing to check/enable/etc?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Very well done sir. You've covered all the basis. Pretty impressed for a "first post". Normally I'd be able to find at least one thing you failed on, but you have done very well for yourself. I wouldn't have striped 36 disks as a failure of any disk means game over for the pool. Your MTTF is probably rated in days though. My guess is you did that just for testing to rule out the hard drives being the bottleneck.

Anyway, I don't see anything wrong with what you tried to do. Have you checked your networking hardware to see if it is properly setup? This screams of improper network setup IMO (most likely in your switches).

I will say that you should be able to saturate 10Gb with that kind of metal. I've seen far lower speced systems do 10Gb with ease.

Can you post the debug file for your server?
 

nasnuc

Cadet
Joined
Jul 31, 2014
Messages
9
You're correct, the stripe is strictly for eliminating the disk subsystem as a problem. Final design is to build 7xRaidZ1 pools of 5 disks (2^2+1) and have one hot spare, in addition to the 4 cold spares we have.

Matrix of possible disk designs:
Petjnhz.jpg


I've looked over the network system too unfortunately. Each single link from the ESXi hosts can hit 128MB/s and it's only when RoundRobin is implemented that the performance drops.
From the ESXi hosts to our equallogic system is able to saturate the two 1Gb/s iSCSI links as well.

Debug coming
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Is there a problem with ESXi implementation of roundrobin? I could be talking out my ass.. just sat down with coffee... but I don't think you *want* to do round robin simultaneously with multipath. Each path is supposed to be independent and "in the clear" with relation to the other paths, no?
 

nasnuc

Cadet
Joined
Jul 31, 2014
Messages
9
Looks like there is some tuning on the vmware Round-Roind Path Selection Policy that should be done;

Code:
esxcli storage nmp psp roundrobin deviceconfig set -d t10.FreeBSD_iSCSI_Disk______a0369f3d2b7c000_________________ -I 1 -t iops

esxcli storage nmp device list
t10.FreeBSD_iSCSI_Disk______a0369f3d2b7c000_________________
   Device Display Name: FreeBSD iSCSI Disk (t10.FreeBSD_iSCSI_Disk______a0369f3d2b7c000_________________)
   Storage Array Type: VMW_SATP_DEFAULT_AA
   Storage Array Type Device Config: SATP VMW_SATP_DEFAULT_AA does not support device configuration.
   Path Selection Policy: VMW_PSP_RR
   Path Selection Policy Device Config: {policy=iops,iops=1,bytes=10485760,useANO=0;lastPathIndex=0: NumIOsPending=0,numBytesPending=0}
   Path Selection Policy Device Custom Config:
   Working Paths: vmhba38:C0:T15:L0, vmhba38:C1:T15:L0
   Is Local SAS Device: false
   Is Boot USB Device: false



So the default RR policy is that once traffic on the LUN hits 1000 IOPS then it will switch to the next path. The above command sets this to 1 IOPS. I'll play around and find out what's ideal for my situation.

yAgyYDx.jpg

4U3Ia6d.jpg


Some reference
Managing Third-Party Storage Arrays

I'm now going to watch what happens on the zfs side of things.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Please report back what seems to work best for future readers. I'm interested in seeing where this goes myself. ;)
 

nasnuc

Cadet
Joined
Jul 31, 2014
Messages
9
Ok, so reporting back.
I didn't like how the write speeds were not saturating the entire availabe bandwidth of the 4 x 1gb/s cards so I started researching and playing with the istgt settings.

All of these tests used two ESXi Hosts each with dual 1gb/s connecting to 4 interfaces on the FreeNAS system (as opposed to the one host from my first post, connecting to two interfaces on FreeNAS)

These are a sample of the tests I did with the best outcomes;
3S9Bdzm.jpg

r8FQhzf.jpg

oEUbmfU.jpg


The biggest improvement came when I set the FreeNASs' First burst length equal to the ESXi software iSCSI's. Likewise with the Max receive data segment length and enabled autotune (Test 001e). Each one was implemented separately.

001a (11:20 - 11:45)
T0tk7Yx.jpg



001h (11:35 - 12:00)
anMfdCB.jpg


You may notice a slight decrease in read speeds, this was due to my college starting a VM clone on the same host I was using for testing... grrr

I then realized that I had created multiple dynamic targets on my ESXi hosts (one per IP on the FreeNAS unit) and undid that keeping the 4 static targets (001h). This was done to ease management of the ESXi side if I ever need to make changes to the ESXi iSCSI settings.

Also, I did not play with the Round Robin IO settings after I got it set to 1 IO before switching paths.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The one thing I find very odd about this whole thing is that I've never seen someone setup multipath from ESXi to FreeNAS(or TrueNAS) and actually need to change the settings. Even here at home I have a much crappier setup than you and I can saturate both iSCSI links simultaneously without changing any settings.

I really have to wonder what setting change, network hardware, or whatever is making your setup so different from the rest of the world that "just works".
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
Great post. I didn't know the software initiator on vSphere and the default target settings in FreeNAS had different settings.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Ok, so reporting back.
I didn't like how the write speeds were not saturating the entire availabe bandwidth of the 4 x 1gb/s cards so I started researching and playing with the istgt settings.


Also, I did not play with the Round Robin IO settings after I got it set to 1 IO before switching paths.


Nice work.. Thanks for sharing.

If you don't mind me asking, what's the underlying network infrastructure look like? Everything on a single 1G switch, a la EQL's best practices?(cough)
 

newlink

Dabbler
Joined
Oct 24, 2014
Messages
11
I experienced the same problem with MPIO and round robin enabled, it seems that Freenas is unable to max out the gigabit paths, maybe they are limited to about 95Mb/s (800M bits) for some reason
 
Status
Not open for further replies.
Top