Slow iSCSI VMWare zvol

npearson94

Dabbler
Joined
Mar 29, 2019
Messages
15
Hi All,

I've configured a zvol with:
5x Samsung 860 Evo 250GB disks

I've put this into RAIDz1.

The networking is 4gbps with Ubiquiti EdgeSwitches (high throughput) and doesn't leave the switch for datastore to hypervisors.

There is 1 VM running on this datastore - netdata says it's not doing anything in terms of w/r, but when I do the following write test:
sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync

The results are:
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 21.67 s, 49.5 MB/s

This seems really bad to me, I was expecting MUCH more write and read speeds.

Could someone help please? This is really driving me crazy.

CPU and RAM are absolutely fine, other than FreeNas eating 90% of the 16gb of ram constantly (but to my knowledge this is by design?).

I've attached a few screenshots.

Many Thanks,
Nick
 

Attachments

  • Screenshot 2019-03-29 at 11.56.41.png
    Screenshot 2019-03-29 at 11.56.41.png
    256.4 KB · Views: 288
  • Screenshot 2019-03-29 at 11.56.48.png
    Screenshot 2019-03-29 at 11.56.48.png
    295.7 KB · Views: 309
  • Screenshot 2019-03-29 at 11.57.08.png
    Screenshot 2019-03-29 at 11.57.08.png
    1 MB · Views: 393

npearson94

Dabbler
Joined
Mar 29, 2019
Messages
15
Please post full hardware specs. You may search the forum for iscsi vs nfs and sync writes. It may help you.

I tried NFS and I got half the performance I'm currently getting, the best way to present vmware datastorage is iSCSI or vSAN.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
But VMware stores big vmdk files right? So I thought it would have been the better choice?

But just because someone stores something big doesn't make a random choice that you made correct.

RAIDZ is optimized for long sequential runs of contiguous data, ideally archival type data. A VMDK is a large file, but the instant you rewrite any block within that VMDK, it stops being sequential, because ZFS is a copy-on-write filesystem. RAIDZ with a single vdev also inherits the general performance characteristics of the slowest component disk. RAIDZ consumes a variable amount of space for parity and you can actually end up totally hosing yourself if you make the bad choices.

Mirrors are optimized for random reading and writing of data. A vdev becomes faster for reading as more components are joined, and because a four drive arrangement with mirrors is probably two vdevs striped, that translates to roughly twice the write speed and maybe up to as much as four times the read speed. Mirrors consume a fixed amount of space for redundancy and are harder to configure in ways that amount to a catastrophic mistake.

This is explained in that article I linked.
 

npearson94

Dabbler
Joined
Mar 29, 2019
Messages
15
Sorry I meant I have:
6x Samsung 860 Evo 250GB disks

Putting this into Mirror mode, only gives me capacity for 2 of these disks.

How do I get the full performance with VMWare for all 6 disks?
 

npearson94

Dabbler
Joined
Mar 29, 2019
Messages
15
I tried Striped and this also sucked. Mirrored got me about 400Mb/ps, however, with 6 disks, I really want to reach their full potential.

It seemed the best way todo this is RAID 10, however, I'm not 100% sure about how to go about this with FreeNAS.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Sorry I meant I have:
6x Samsung 860 Evo 250GB disks

Putting this into Mirror mode, only gives me capacity for 2 of these disks.

Unless you specifically choose 3-way mirrors, you should get the capacity of three disks here.

I tried Striped and this also sucked. Mirrored got me about 400Mb/ps, however, with 6 disks, I really want to reach their full potential.

It seemed the best way todo this is RAID 10, however, I'm not 100% sure about how to go about this with FreeNAS.

If you mean 400MB/s then you're about at the practical limit of your 4Gbps networking. If that's 400Mbps then you aren't even hitting the limits of gigabit ethernet, and this might mean you've got another issue at play. What do the ESXi stats show as far as datastore performance?

"RAID10" in the ZFS world is mirrors. As far as "getting the performance of all six" the only way would be to go YOLO and stripe it, having zero redundancy. That's the point of something like ZFS or another RAID-style provider; sacrifice some capacity for redundancy.
 

npearson94

Dabbler
Joined
Mar 29, 2019
Messages
15
Like this?

Screenshot 2019-03-31 at 15.49.48.png


I assume, other than going YOLO with Striping, I guess the above config will give me best performance, with most capacity and best redundancy?
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
is nobody going to mention sync writes? Granted this person with their extremely limited knowledge of storage systems would be best to leave such settings alone.

Also EXACLY how is your 4gb network achieved? Is this a LAGG to the switch? Are you simply using MPIO over 4 subnets? are you running 5gb ethernet and reserving 1gb of bandwidth on the interface in vSphere?

Please provide the output of lspci and ifconfig from your FreeNAS server.
 

npearson94

Dabbler
Joined
Mar 29, 2019
Messages
15
is nobody going to mention sync writes? Granted this person with their extremely limited knowledge of storage systems would be best to leave such settings alone.

Also EXACLY how is your 4gb network achieved? Is this a LAGG to the switch? Are you simply using MPIO over 4 subnets? are you running 5gb ethernet and reserving 1gb of bandwidth on the interface in vSphere?

Please provide the output of lspci and ifconfig from your FreeNAS server.

4gb networking is done via 4x1gb ports to a Ubiquiti Edgemax 16 port Switch. The hypervisors are on the same switch.

Code:
root@esxi-san[~]# lspci
00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller
00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1
00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0)
00:1c.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d0)
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 4 (rev d0)
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 6 (rev d0)
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 7 (rev d0)
00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1
00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family Z97 LPC Controller
00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode]
00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family SMBus Controller
01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
03:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03)
04:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
04:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
06:00.0 SATA controller: Marvell Technology Group Ltd. Device 9215 (rev 11)
07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)


Code:
root@esxi-san[~]# ifconfig
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=98<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
    ether 00:11:0a:32:04:0c
    hwaddr 00:11:0a:32:04:0c
    nd6 options=9<PERFORMNUD,IFDISABLED>
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=98<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
    ether 00:11:0a:32:04:0c
    hwaddr 00:11:0a:32:04:0d
    nd6 options=9<PERFORMNUD,IFDISABLED>
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
re0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
    ether d8:cb:8a:7a:f9:d0
    hwaddr d8:cb:8a:7a:f9:d0
    nd6 options=9<PERFORMNUD,IFDISABLED>
    media: Ethernet autoselect (none)
    status: no carrier
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
    ether 00:11:0a:32:04:0c
    hwaddr 6c:b3:11:1b:f5:b8
    nd6 options=9<PERFORMNUD,IFDISABLED>
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
    ether 00:11:0a:32:04:0c
    hwaddr 6c:b3:11:1b:f5:b9
    nd6 options=9<PERFORMNUD,IFDISABLED>
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
    inet6 ::1 prefixlen 128
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6
    inet 127.0.0.1 netmask 0xff000000
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
    groups: lo
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=98<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
    ether 00:11:0a:32:04:0c
    nd6 options=9<PERFORMNUD,IFDISABLED>
    media: Ethernet autoselect
    status: active
    groups: lagg
    laggproto lacp lagghash l2,l3,l4
    laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
    laggport: em1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
    laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
    laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
vlan1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 00:11:0a:32:04:0c
    inet 192.168.100.253 netmask 0xffffff00 broadcast 192.168.100.255
    nd6 options=9<PERFORMNUD,IFDISABLED>
    media: Ethernet autoselect
    status: active
    vlan: 1 vlanpcp: 0 parent interface: lagg0
    groups: vlan 
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
is nobody going to mention sync writes?
It's a lazy Sunday and I'm phoneposting. I was planning to get to it later. But we've found a more important issue.

Is this a LAGG to the switch?
Based on the ifconfig output, that's a bingo.

@npearson94 you will only get a max of 1Gbps here due to the use of a LAGG - you need to switch this to MPIO and use multiple non-overlapping subnets. How many links do you have from your hosts for iSCSI?

Also turn compression back on, LZ4 is basically free on any modern CPU.
 

npearson94

Dabbler
Joined
Mar 29, 2019
Messages
15
@npearson94 you will only get a max of 1Gbps here due to the use of a LAGG - you need to switch this to MPIO and use multiple non-overlapping subnets. How many links do you have from your hosts for iSCSI?

Okay, I'd say this is besides the point at the moment though, right?

My write test only got up to 63.5MB/s.

Not getting even close to 1Gbps.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Okay, I'd say this is besides the point at the moment though, right?
Not entirely; your hosts/FreeNAS unit/switch will all be doing a little bit of overhead in order to manage the LACP channel. Dropping that entirely and going to a single link removes a lot of potential places to introduce problems; once you're maxing out a single link we can add more.

What are the NICs in your hosts? You set sync=disabled on the ZVOL and your in-VM base size is 1M so it should be able to max out a single link without too much effort. But if they're Realtek then that might be a factor.

Testing the link itself with iperf would be prudent as well, but I don't know if there's a way to run that directly on a VMware host.
 

npearson94

Dabbler
Joined
Mar 29, 2019
Messages
15
Ah crap. I just did a test by running the write test on my switch and I can see the problem.

The inbound and outbound throughput is reaching 1gb.

I guess I need to sort out my network links.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Okay, I'd say this is besides the point at the moment though, right?

My write test only got up to 63.5MB/s.

Not getting even close to 1Gbps.

It really is pretty close, given the topology as it exists.

https://www.ixsystems.com/community/threads/lacp-friend-or-foe.30541/

Read that, ... please!

So couple of things here.


You mean 4x1Gbps

Ubiquiti EdgeSwitches (high throughput)

Is that a new product line? Their stuff is OK but it sure doesn't qualify as "high throughput." It'll get you probably 90-95% of the way there without a problem. The people who make real high throughput stuff will probably take issue with it though. :smile:

The usual problem with LACP is that you really need to get all your stuff lined up just right and just perfect and then it maybe works well enough to get you some benefit if you have a lot of clients all talking simultaneously. It also creates a bunch of pain points, some of which are wickedly difficult to diagnose. This specifically includes things like out-of-order packet delivery, which often present as poor performance.

If you want faster than 1Gbps, there are some ways to do it, but they're all somewhat complex and not always "reliably fast."
 
Top