My Success with 9.3, ESXi + FreeNAS VM w/ iSCSI

Status
Not open for further replies.

lucid

Cadet
Joined
Nov 30, 2014
Messages
7
I thought I should share my success with FreeNAS with the community, I have a somewhat unconventional setup. I was using 9.2 but I switched to the Beta for the iSCSI kernel driver, it works MUCH better.

Hardware:
Asus P9D-C4/L
Xeon E3-1234 v3 - 3.3ghz 4c/8t
32gb 1600 ddr3
Asus Pike 2008 (LSI 2008, just integrated into board)

Software
ESXi 5.1
FreeNAS 9.3

Drives
USB for ESXi boot
500gb SSD for ESXi
various HDDs for FreeNAS


Design Goals
When I set out to do this, I had set a few goals for myself, and the biggest one was an "All-In-One" ESXi box. Fileserver, router, firewall, webserver, teamspeak, vpn, etc. I have met all my goals, but it took me a while to get to that point. This design goal means that I also must present datastores to ESXi from within ESXi, thats where FreeNAS takes over.

The All-In-One build also led me to the Asus board I'm using. The 2 things I liked were the quad onboard 1g and the Pike card. The Pike card was a gamble, but it's working great.

I've also tried NFS and iSCSI. I don't want to go into the details, but iSCSI is where it's at.


ESXi Setup
Firstly, I cant stress this enough: It's almost mandatory to use a SSD with ESXi, in my opinion. There are a couple reasons I do this, obviously speed is the first (and biggest) and the second is memory overcommitment.

If you overcommit or not, be sure to check the "Reserve all guest memory (All locked)" box, in VM properties under the resources tab. This is something you should do for all critical VMs (FreeNAS, Firewall).

So by using overcommitment, you can have FreeNAS soak up HUGE amounts of ram, and let ESXi or even the VM do some swapping. Thus, getting more out of your hardware. And if the VM swaps internally, it'll swap to the FreeNAS iSCSI datastore, which is pretty fast.


FreeNAS Setup
FreeNAS 9.3 beta installed with 8gb vHD on SSD
8gb+ ram
2 cpu cores
2 nics: one e100, and one VMXnet 3
SCSI controller set to SAS.
LSI SAS2008 / Pike card PassedThrough to VM

The amount of ram you choose is up to you. Currently I'm happy with 8gb, but I've tried 6gb up to 16gb. More ram does have a noticeable effect the more intense the disk activity gets.

The only modification I did to FreeNAS itself was import the VMware modules from the VMware tools CD image (for FreeBSD 9), to /boot/modules and add a loader tunable "vmxnet3_load" with a value of "YES". I copied over all the modules, even replacing the ones that were already there.

This VMXNET3 adapter is a beast. It is also essential to iSCSI's performance.


Final Config
After all that, iSCSI is setup on FreeNAS and ESXi. Nothing too special here except isolated vSwitch for SAN traffic.


Results
I did have an issue with no-op timeouts and 9000 mtu , the mtu was set in FreeNAS, VMkernel and the vSwitch. It would only occur with a very high load. 1500 mtu works flawlessly however.

I did a few benchmarks:
In a Win7 VM (2gb, 1 cpu, paravirtual HDD)
Freenas VM - 8gb ram, 4x500gb 5300rpm HDD, 30gb iSCSI file LUN
2rdez9s.jpg

Not bad eh?


Requests

VMware-Snapshots - It'd be nice to have a list of VMs and the ability to exclude some. PCI passthrough VM's can't be snapshotted. Also, I'd like more info on how this works, I know FreeNAS tells VMware to make a snapshot of the VM before the a ZFS snapshot, but does it work with zvols or file luns?

There are no longer iSCSI settings in Freenas 9.3, So it'd be nice to have client side recommendations for BurstLength's, R2T, etc.

----------------------------

All in all, I'm very happy with the results and everything has been rock solid. The new iSCSI target is very good, VMware gets along great with it.

Keep up the good work!
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
There are no longer iSCSI settings in Freenas 9.3, So it'd be nice to have client side recommendations for BurstLength's, R2T, etc.

iSCSI negotiates those parameters between initiator and target during login process. That is why exaclt match between initiator and target is not required there -- connection will use lowest common denominator of each side capabilities. That is why those options are not configurable in new iSCSI target, using some defaults reasonable for present code implementation.
 

DaveFL

Explorer
Joined
Dec 4, 2014
Messages
68
Is this a legitimate test? Everything that I have read on this forum says that you should not be able to get good performance with ZFS and ISCSI using less than 128GB+ of Ram. Could someone chime in on why this setup works and the rest don't?
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Because every workload has unique parameters. Huge amount of RAM may be needed to handle huge data sets with complicated access patterns on heavily fragmented pool. More RAM is always good, but if active data set is small and pattern is simple and pool is not fragmented, then why would iSCSI not work even with small RAM? Also we don't know much about those benchmarks. For example, if they write all zeroes, then enabled compression may almost completely skip disk I/O except some metadata updates. In such cases with new iSCSI I am able to get million IOPS and many gigabytes per second.

Also don't forget that FreeNAS is also evolving: 9.2.x got new ZVOL implementation, 9.3 got completely new iSCSI target, ZFS code is updated all the time.
 

lucid

Cadet
Joined
Nov 30, 2014
Messages
7
Because every workload has unique parameters. Huge amount of RAM may be needed to handle huge data sets with complicated access patterns on heavily fragmented pool. More RAM is always good, but if active data set is small and pattern is simple and pool is not fragmented, then why would iSCSI not work even with small RAM? Also we don't know much about those benchmarks. For example, if they write all zeroes, then enabled compression may almost completely skip disk I/O except some metadata updates. In such cases with new iSCSI I am able to get million IOPS and many gigabytes per second.

Also don't forget that FreeNAS is also evolving: 9.2.x got new ZVOL implementation, 9.3 got completely new iSCSI target, ZFS code is updated all the time.

Exactly what I was going to say, but better!

He's exactly right about the working set. If i were to present the whole 1TB chunk to ESXi, the results with be much different, especially random access. If I run the random test back the back, the second time is insanely quick.

Unless you have bazillions of GB of RAM, SSD caches (and even if you do to some degree) my understanding is that your spindle with always be there (you can see that clearly in the write speeds from crystal). Its hard to get an accurate/consistent bench mark with conventional tools inside a VM due to all the caching, but caching is the point isn't it?

Would it be worth it to see what results I get with VERY restricted RAM? like 4gb? I know that's well below recommended, but that should give a better indication of drive performance, right?

If there's a test/configuration you want me to perform, just ask! :)
 

DaveFL

Explorer
Joined
Dec 4, 2014
Messages
68
Exactly what I was going to say, but better!

He's exactly right about the working set. If i were to present the whole 1TB chunk to ESXi, the results with be much different, especially random access. If I run the random test back the back, the second time is insanely quick.

Unless you have bazillions of GB of RAM, SSD caches (and even if you do to some degree) my understanding is that your spindle with always be there (you can see that clearly in the write speeds from crystal). Its hard to get an accurate/consistent bench mark with conventional tools inside a VM due to all the caching, but caching is the point isn't it?

Would it be worth it to see what results I get with VERY restricted RAM? like 4gb? I know that's well below recommended, but that should give a better indication of drive performance, right?

If there's a test/configuration you want me to perform, just ask! :)

Since you are offering, I'd be curious to see if you are getting similar test results as shown using the bonnie test in the above link. It seems odd that the users read performance took such a big hit. A 4GB Ram example would be somewhat pointless as for the most part users seem to run with a minimum of 16-32GB.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Is this a legitimate test? Everything that I have read on this forum says that you should not be able to get good performance with ZFS and ISCSI using less than 128GB+ of Ram. Could someone chime in on why this setup works and the rest don't?

As with everything ZFS, it is workload-dependent. In the environment here, for example, we have lots of FreeBSD VM's that are carefully designed to avoid superfluous or trite writes, including such obvious low-hanging targets such as the disabling of atime updates.

The average VM user probably isn't taking a lot of engineering time to optimize a VM for better I/O characteristics. Heavy writes in particular will make VM performance suffer, both because interactive response is impacted (realtime effect) and over time fragmentation increases (long term effect). The traditional ZFS answer to fixing these issues is to throw massive resources at the problem, which does in fact substantially reduce the problem: more spindles, larger spindles, and lots of RAM - once the ARC (and hopefully L2ARC) has warmed up, things are faster. This assumes sufficient ARC for the working set or a large portion of it. Throwing lots of free space in the pool helps with fragmentation. Throwing more vdevs in the pool increases IOPS. Increasing these values to sufficient numbers generally results in acceptable performance.

But this is dependent on workload. If you happen to have a workload that is 99%+ read, for example, the variables change and could result in a workload that performs acceptably on a smaller resource footprint. This is essentially banning any fat guys from the buffet.

I am planning to build a VM storage appliance this year and give FreeNAS a chance to shine on it. Probably something like a SC216BE26-R920WB with X10SRW-F, E5-1650v3, and 64GB of RAM for the base system. The memory prices for DDR4 are currently killer so I definitely see 128GB as a future upgrade. Bunch of NAS class 2.5's in mirrors to provide IOPS, and probably a 120GB-240GB SSD for L2ARC. My suspicion is that the working set here isn't really all that large, but VMware provides such awesome tools to help identify that... not. But I want something that can be lower power (2.5's, hexacore) yet keeps up when things get busy. Thinking to play a bit with SLOG devices to figure out what makes the most sense. I can throw an LSI RAID BBU at the issue and be happy with that, since the VM's here are mostly FreeBSD and those are specifically not write-piggy.

The current inventory is 118 VM's but the overall disk usage isn't that high, so I'm thinking I can probably introduce a nice iSCSI filer and get away with it and probably have lots of capacity to grow, even.
 

lucid

Cadet
Joined
Nov 30, 2014
Messages
7
New Debian 7 VM, 1gb ram, paravirtual controller, on 100gb file LUN
FreeNAS/ESXi unchanged since inital post.

Command: (same but -u root for root user)
bonnie++ -m "iSCSI_K_L2_80G" -r 8192 -s 81920 -d /tmp/ -f -b -n 1 -u root >> bonnie.txt

Results:
Code:
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
iSCSI_K_L2_80G  80G           595279  32 188477   9           563770  14 165.3   2
Latency                         781ms     426ms               114ms     383ms
Version  1.96       ------Sequential Create------ --------Random Create--------
iSCSI_K_L2_80G      -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  1    31   0 +++++ +++    47   0    32   0 +++++ +++    48   0
Latency               762ms      48us     183ms     191ms      15us     178ms
1.96,1.96,iSCSI_K_L2_80G,1,1417812796,80G,,,,595279,32,188477,9,,,563770,14,165.3,2,1,,,,,31,0,+++++,+++,47,0,32,0,+++++,+++,48,0,,781ms,426ms,,114ms,383ms,762ms,48us,183ms,191ms,15us,178ms
 

DaveFL

Explorer
Joined
Dec 4, 2014
Messages
68
Thanks! Can you show your results with the non experimental driver. Also your results are indicative of your test not taking place across the LAN (I think).
 

DaveFL

Explorer
Joined
Dec 4, 2014
Messages
68
For iSCSI or ESXi paravirtual?
Well if you have the time, I'd be curious to see both. Would also like to see the results of a machine that is connecting the the LUN over your LAN.

Your post has me re-thinking some things here now :)
 

lucid

Cadet
Joined
Nov 30, 2014
Messages
7
I cant change the iSCSI target in 9.3, it's stuck with the CAM target layer.

SAS controller instead of paravirtual:
Code:
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
iSCSI_K_L2_80G  80G           421471  20 180420   9           565200  15 159.1   3
Latency                         844ms     704ms               168ms     493ms
Version  1.96       ------Sequential Create------ --------Random Create--------
iSCSI_K_L2_80G      -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  1    36   0 +++++ +++    53   0    37   0 +++++ +++    51   0
Latency               824ms      41us     125ms     154ms      25us     141ms
1.96,1.96,iSCSI_K_L2_80G,1,1417811584,80G,,,,421471,20,180420,9,,,565200,15,159.1,3,1,,,,,36,0,+++++,+++,53,0,37,0,+++++,+++,51,0,,844ms,704ms,,168ms,493ms,824ms,41us,125ms,154ms,25us,141ms
 
Status
Not open for further replies.
Top