Poking the bear.... iSCSI vs NFS?

kspare

Guru
Joined
Feb 19, 2015
Messages
508
Reading the notes it shows that there are quite a few improvements for both iscsi and nfs.

I'm just wondering if anyone has spent some time looking into this to see if maybe there is a clear winner (using winner loosely....)

We use freenas just for vmware storage via nfs and i'm willing to do some testing for the community if anyone has some suggestions?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Based on my reading, the VAAI extensions only exist for block (iSCSI) and not NAS (NFS) - they were dropped in 11.1U6.

Having UNMAP support for space reclaimation is pretty important to have, especially for a copy-on-write system like ZFS.

This won't be something you can spot in a head-to-head benchmark on a clean system, it would be one of those things that rears its ugly head after months/years of use and you're puzzling about trying to figure out why your VMs are slow and/or there's a bunch of space still used on your ZFS pool when you've deleted all the in-guest data.

The multipathing through MPIO is also quite a bit more mature than pNFS support; although the latter has gotten better as well.

If we really want to throw gas on the fire here, let's bring Fibre Channel into the fight. ;)
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
We use freenas just for vmware storage via nfs and i'm willing to do some testing for the community if anyone has some suggestions?

Love the idea. @HoneyBadger can you suggest real-world tests that would show MPIO and UNMAP advantages in action?
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
in my case, mpio wouldn't even matter. I'm running 10gb on all my hosts, and my freenas boxes run 40gb.

I do run nfs v3 right now and no iscsi. I was hoping someone has been playing! lol
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
Love the idea. @HoneyBadger can you suggest real-world tests that would show MPIO and UNMAP advantages in action?

Definitly willing to do some tests as well as we put the meta drive through it's paces.
 
Joined
Dec 29, 2014
Messages
1,135
in my case, mpio wouldn't even matter. I'm running 10gb on all my hosts, and my freenas boxes run 40gb.
Mostly the config here. My two primary ESXi hosts do have 40G, and the backup ESXi host has 10G. I am also on NFS v3. I would certainly be happy to participate in some testing. I can get short sustained 28G read from FreeNAS during Vmotion operations. I seemed to capped at about 5G max write. I'd love to see that higher, but it does meet my needs. I can Vmotion my 4 production VM's in 10-12 minutes, so to makes maintenance easy for me.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Love the idea. @HoneyBadger can you suggest real-world tests that would show MPIO and UNMAP advantages in action?

Definitly willing to do some tests as well as we put the meta drive through it's paces.

MPIO is pretty easy to see - fire off some storage benchmarks with a single link, add another one through MPIO, change the path policy (cripple it with VMW_PSP_MRU, then switch to VMW_PSP_RR and watch it fly) and rebench. Sequential/bulk copies basically will directly scale with link count.

UNMAP you'll want to make sure you're using sparse ZVOLs, VMFS6 datastores, and thin-provisioned guests with TRIM/UNMAP support on top. Create a VM on the datastore, fill it with /dev/random or some big ISOs, check your space usage. Then delete the in-guest data, wait a bit, and watch the used space drop as VMware does passive reclamation over time (rate of 25-50MB/s) - and ZFS likes free space for performance, so the more the better.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
in my case, mpio wouldn't even matter. I'm running 10gb on all my hosts, and my freenas boxes run 40gb.

So you're running 10Gbps single port? And MPIO doesn't matter?

It will be funny to see your face when your switch reboots and all your storage drops offline.

Redundancy FTW.
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
So you're running 10Gbps single port? And MPIO doesn't matter?

It will be funny to see your face when your switch reboots and all your storage drops offline.

Redundancy FTW.

I have a dual port 10gb nic using lagg in failver to dual 40gb switches, and esxi with failover...we actually run all nfs storage on one switch and all other traffic on the other to split it up, but if a switch failed you wouldn't even notice.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I have a dual port 10gb nic using lagg in failver to dual 40gb switches, and esxi with failover...we actually run all nfs storage on one switch and all other traffic on the other to split it up, but if a switch failed you wouldn't even notice.

So if you're going to do iSCSI, ditch the LAG, go MPIO. MPIO really does matter as it is the iSCSI equivalent of SAS redundancy.

All the hypervisors here have quad uplinks, two to each vSwitch.

On each vSwitch, one vmnic is "primary" to one physical switch and the other is "failover" to the other -- except that iSCSI traffic is on the "failover" port, so it is not just sitting there pointlessly burning watts.

The first vSwitch primary/failover goes to switch0/switch1, while the second vSwitch primary/failover goes to switch1/switch0.

If a "primary" vmnic fails, VM traffic for that vSwitch fails over to the other switch.

If a "failover" vmnic fails, the iSCSI traffic moves to the other "failover" vmnic on the other vSwitch.

This is a very useful design because no single link outage or no single switch failure impacts operations.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
@jgreco wouldn't that config potentially cause mixed LAN/SAN traffic in a failover situation? Unless I'm misunderstanding here, the failure of "primary" vmnic0 in vSwitch0 (going to pSwitch0) would cause VMware to start shipping LAN traffic on vmnic1 in vSwitch0 (going to pSwitch1) - which is still funnelling iSCSI packets around (assuming the failure is limited to vmnic0 or the cable/port on pSwitch0, otherwise everyone fails over to pSwitch1.)

I might be missing something. I'd rig it up:

vSwitch0 - vmnic0 to pSwitch0, vmnic1 to pSwitch1 - LAN traffic here
vSwitch1 - vmnic2 to pSwitch0 - iSCSI 1/2
vSwitch2 - vmnic3 to pSwitch1 - iSCSI 2/2

Gives you maybe some "idle links" or "wasted bandwidth" potentially; but unless you plan to go whole-hog into VMware NIOC I like keeping LAN and SAN split. Lets you delegate your permissions appropriately in a siloed work environment as well "you're on Networking - great, leave these switches the hell alone, that's Storage team's problem"
 
Last edited:

kspare

Guru
Joined
Feb 19, 2015
Messages
508
@jgreco wouldn't that config potentially cause mixed LAN/SAN traffic in a failover situation? Unless I'm misunderstanding here, the failure of "primary" vmnic0 in vSwitch0 (going to pSwitch0) would cause VMware to start shipping LAN traffic on vmnic1 in vSwitch0 (going to pSwitch1) - which is still funnelling iSCSI packets around (assuming the failure is limited to vmnic0 or the cable/port on pSwitch0, otherwise everyone fails over to pSwitch1

Yeah....but it's a failover.....
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
@jgreco wouldn't that config potentially cause mixed LAN/SAN traffic in a failover situation? Unless I'm misunderstanding here, the failure of "primary" vmnic0 in vSwitch0 (going to pSwitch0) would cause VMware to start shipping LAN traffic on vmnic1 in vSwitch0 (going to pSwitch1) - which is still funnelling iSCSI packets around (assuming the failure is limited to vmnic0 or the cable/port on pSwitch0, otherwise everyone fails over to pSwitch1.)

I might be missing something. I'd rig it up:

vSwitch0 - vmnic0 to pSwitch0, vmnic1 to pSwitch1 - LAN traffic here
vSwitch1 - vmnic2 to pSwitch0 - iSCSI 1/2
vSwitch2 - vmnic3 to pSwitch1 - iSCSI 2/2

Gives you maybe some "idle links" or "wasted bandwidth" potentially; but unless you plan to go whole-hog into VMware NIOC I like keeping LAN and SAN split. Lets you delegate your permissions appropriately in a siloed work environment as well "you're on Networking - great, leave these switches the hell alone, that's Storage team's problem"

Yes, you're missing something, you're missing half of your VM LAN connectivity when you're grooming fiber, updating switch firmware, or when a transceiver fails involving vmnic 0/1.

I am perfectly fine with iSCSI and VM traffic being on the same port when something's amiss in the network. I run production networks and they're expected to avoid failing unnecessarily.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I don't know that I'd be comfortable without Network I/O Control or some method of traffic shaping, but I see the appeal. Every time I see LAN and SAN getting mixed without it, the two tend to get in slap-fights over bandwidth and neither one ends up happy. I'll have to play around with it and see how it behaves under a few failure scenarios and workloads.

We're digressing though. Speed differences and CPU utilization between software iSCSI and NFS are mostly academic. Suggestions for other ways we can shake out the finer differences in an emperical/observable manner?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I don't know that I'd be comfortable without Network I/O Control or some method of traffic shaping, but I see the appeal. Every time I see LAN and SAN getting mixed without it, the two tend to get in slap-fights over bandwidth and neither one ends up happy. I'll have to play around with it and see how it behaves under a few failure scenarios and workloads.

If you're running that much steady state traffic, you're already oversubscribed for burst traffic and should have upgraded to 25 or 40G...
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
Something to keep in mind with VMWare; it by default sets RR MPIO to start occurring at 1000 IOPS. I tried it with setting at 1 IOP per some forum/blogs/etc and found that to go a bit overboard where in some cases it's a bit too much RRing and performance drops - sweet spot for me seems to be 100 IOPS.



Below is a few lines to run on the ESXi host to get this set; or tweak to your desire...


== RR Default and Tuning ==
esxcli storage nmp satp set -s VMW_SATP_ALUA -P VMW_PSP_RR

{reboot host}

esxcli storage nmp device list
{collect naa/info on current config}

for i in `esxcfg-scsidevs -c |awk '{print $1}' | grep naa.xxxx`; do esxcli storage nmp psp roundrobin deviceconfig set --type=iops --iops=100 --device=$i; done

{naa.xxxx matches the first few characters of your iSCSI naa IDs}
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
Defiantly required for gig links...i/m running 10 and 40gig fortunately I dont need this.
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
If your only running VMware with NFS you can expose the snapshots to your backup system and backup all your VMDK files from FreeNAS to Bacula on your FreeNAS host. If you have FreeNAS connected to vCenter your snapshots communicate between FreeNAS and vCenter for application aware snapshots that can be backed up.... VMware 6.x and NFSv4 with multiple ip addresses work well.
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
the testing on iscsi has been pretty successful...running 12b2.1 iscsi running rms200 for log/zil p3700 for meta drive and p3700 for l2arc. it's easily handling the server load..... much better than nfs.

I do notice that the l2arc stays full and much more caching is going on. so thats good.

What i'm wondering now is with the sync setting. currently we are running always, but with the latest version of iscsi can you run standard safely?

In our case we do run a ups and have generator backup as well.

So just assessing the risk vs performance that could be had?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You can not run "standard" "safely".

You can run "standard" with some risk of losing data. If you never crash and you never have a power outage that takes out storage but not the hypervisors, then you probably have very little actual exposure to actually losing data.

Not sure if that makes it a good idea though. Lots of people disable sync on other types of NAS units because it is so frustratingly slow when sync is enabled.
 
Top