BUILD Full Flash PoC - XenServer SR for RDS Servers

Status
Not open for further replies.

flcn_

Cadet
Joined
Jul 28, 2015
Messages
7
Background:
We want to start a Proof of Concept (PoC) based on better IOPs provided by FreeNas to improve the current user experience on our Front-End Remote Desktop Servers. Our current Front-End servers (30) are based on the following specs:
  • OS: Windows 2008 R2 Datacenter edition
  • Middleware: Citrix XenApp 6.5
  • Application Virtualization: Microsoft App-V 5 SP2 hotfix 4
    Local package cache (globally published)
  • Local installed software: Microsoft Office 2013, SafeNet, SafeSign & PDF Creator 7.2
  • Updates: Most recent Windows and Office updates installed
  • CPUs: 4 vCPUs
  • RAM: 16GB
  • Disk: 1x 200GB (C:\ drive)
  • Number of concurrent users per RDS server (15~20)
  • Reboot behavior: twice a week
  • Servers are based on a template image, not provisioned by PVS or MCS.
    I know that this means base copy's and chained VHDs which doesnt help your performance.
    On the new SR we plan yo make one main image and distribute this with Microsoft's MDT so every RDS VM got's its own VHD bypassing the "chained VHD" issue and making deduplication possibly more effective.
Our back-end
  • Hypervisors: 9x DL360-G8 with 32 vCPUs (8x Quad core Intel Xeon CPU E5-2650 @ 2Ghz) and 196GB ECC memory
  • Storage: 3x Netapp
    • Acceptation: FAS2050
    • Production Back-End servers: FAS2240-2
    • Production Front-End servers: FAS2240-2
For the Front-End servers the aggregate exists of 20x 15K SAS disks in RAID-TP offering about 3.060 random 4k writes in an ideal world (20 -2) * 170 = 3060 and about (20-2) * 300 = 5400 Sequential Writes very roughly said.

Knowing this math in relation to what the NETAPP performance graphs is showing we very strongly suspect that the number of delivered IOPs is simply not enough resulting in CPU peaks, Queueing and latency.

We also did some research regarding the behavior of our users and we found out that every single RDS server is generating 200 ~ 800 IOPs with peaks to 1500 triggered by users starting local applications as well as virtualized applications.This means roughly 500x 30 = 15000 IOPs possible at any moment during production time (wishful thinking, this will be much worse in real life, the Netapp aint gonna cut it).

We used Login Consultant's VSI and Microsoft performance counters to acquire this statistics.
We believe to have found the bottleneck in the number of IOPs delivered by the Netapp appliance.

To proof this and offer a very fast and effective solution to this problem we want to build a PoC based on a proper hardware box running FreeNas with a full flash array of 12, 16, 20 or 24 SSDs.
We want to attach this SR to our XenServer farm by using NFS or possible iSCSI.
Buying a Netapp full flash array or iX TrueNas solution is not yet possible within our budget right now.

Planned improvements to achieve goal:

Configuration
  • FreeNas 9.3 Stable
  • iSCSI or NFS storage repository for XenServer 6.5.4
  • 1x 128GB disk for OS (I Know 128GB is overkill for the OS only need 2 GB)
  • option 1: 1 aggregate of 12, 16, 20 or 24 SSD of 400GB each (Samsung 845DC pro 400GB effective 372GB each)
  • option 2: 2 aggregates of 12 SSDs (Samsung 845DC pro 400GB effective 372GB each)
  • 2x 10Gib bonded via LCAP IEEE 802.3AD (Performance + failover) for storage traffic (NFS or iSCSI)
  • 4x 1GiB bonded via LCAP IEEE 802.3AD (Performance + failover) for management and future use (backup)
Hardware:
All our server hardware is HP (specials: Netapp, Baracuda & Netscaler) so we are a little bound to this brand.
If we had the choice we probably choose a SuperMicro motherboard.

System board:
DL380p Gen8 2* 8C-E2650 V2/32GB/P420i-2GB/25SFF
2x 750watt psu

NIC 2x 10Gib + 4x 1Gib:
- 10Gib 533FLR-T FlexFabric Adapter 2 port on board (LACP 802.11 AD)
- 4x LOM 1Gbit in trunked bond to Cisco CoreSwitch (LACP 802.11 AD)

Memory 64GB:
8x 8GB (1x8GB) Dual Rank x4 PC3-14900R (DDR3-1866)

Raid Controller:
- Onboard HP Smart Array 420i (disabled)
- We will replace this with an LSI 9211 8i card with IT firmware (instead of the IR firmware it shipped with).

Brackets:
We have neough "HP 2.5 HotSwap Tray SAS/SATA Harddisk Caddy brackets"

Solid State Disk Drives:
1x 128GB 850 SSD Pro Serie SSD 2,5" SATA-600 7mm
24x Samsung 845DC Pro 400GB

Questions:
Before starting our PoC we would very much like some input from users that are experienced with full flash arrays.
We did of course our research and based our hardware / configuration profile on those findings.

I especially want to stress the following aspects:
  1. Is the LSI 9200 8i with IT firmware the right choice?
  2. Samsung 845 DC Pro 400GBs (Datacenter edition) may or may not be the best choice regarding to enterprise environments but those meet the budget and are optimized for writes and use 3D-NAND which is better then TLC memory chips regarding endurance/life-span.
  3. Our choice for Z2 meaning not to much performance loses and allowed to lose 2 disks at the same time seems fair to us. Furthermore our RDS servers might be an core component to our infrastructure but are by any means replaceable within short notice and individually not that important/mission critical.
  4. Last but not least we have the discussion NFS vs iSCSI. Ive read a lot about it and both have their pros and cons.

    Performance wise they are not that much of a difference in my opinion, at least if you compare NFSv4 vs. iSCSI. Whereas iSCSI is a bit more complex to configure and maintain, on the other hand I’m not sure if NFSv4 is possible between XenServer 6.5 and FreeNas 9.3.
p.s.
My censere apologies if i posted the thread in the wrong section, this seemed the right place to do so.
 
Last edited:

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
freenas is the test and community version for truenas. you should call ixsystems (the company behind it) for your request. without deep zfs knowledge a proof of concept system does not make any sense.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,155
I'll have to echo the statement that this is best discussed with iXSystems. Your proposed build is way too big for the forum's experience.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
If your current system is IOPS and latency bound then a 12, 16, 20 or 24 SSD wide RAID-Z2 probably isn't the way to go. Of course you could do a test and see what kind of performance you get. You could do 6 4 drive RAID-Z1's or Striped Mirrors (sorta like RAID 10).

I think the all flash elements change some of the standard thinking. However, the 9200-8i is probably a good choice. keep in mind that depending on how you setup your devices, you may eventually be in the realm of maxing the SAS2 channel bandwidth it can provide.
 

flcn_

Cadet
Joined
Jul 28, 2015
Messages
7
Thanks for your opinion and advice thus far. We certainly see this as a "PoC' and if it proofs it self we will consider a Full Flash Enterprise product, maybe IX System or some other brand.
 

flcn_

Cadet
Joined
Jul 28, 2015
Messages
7
Before we buy our full flash hardware we are running some test in our test environment.
Currently we are using an HP Dl380-G7 with 8GB Mem and 8 SAS disks in RAID-Z1.
Volume exists out of 8 SAS disks of 400GB in RAID-Z1 configuration, with l4z compression and deduplication enabled.
There is no dedicated ZIL/SLOG disk(s) yet.
And 8GB of RAM is a bare minimum, i know but we don't have more on stock yet.

We use NFSv3 to attach to XenServer 6.5 (NFSv4 not supported yet).
Since NFS uses synchronous writes the performance is not that good, at first 80-120MB/s (almost a Gigabit) degrading fast to 3-10MB/s (Almost thinking everything was connected via 100 mb/s networking).
When I disabled the Sync feature (for testing) the throughput was a bit better around 30-80MB/s but still not that good.
In both cases I monitored my Disks array with GSTAT seeing max possible I/O on the physical disks and high (red) %busy numbers, even though the network throughput is extremely low.


Now I have the following questions:
  1. Is this caused but to less RAM so it cache on the conventional spinning disks array? Since we did not use a Fast device for the ZIL.
  2. Even though we managed to have higher Throughputs (to 120MB/s) can this still be a network issue? Our core switches and NICs are all 1GB + all cables are CAT5e).
  3. Since NFS uses synchronous write and iSCSI doesn’t, will it matter to switch over tot iSCSI? Since XenServer only supports NFSv3 and NFS cannot use bonding interfaces for more bandwidth but only for failover this seems a good solution to me.
  4. When we buy our full flash PoC hardware meaning our whole aggregate will be SSD do we still need to specify a fast device for ZIL or can it just resides on the same aggregate (containing the Raid-Z2 volume).
* Note: It also seems that the max throughput (Over NFS) is only reached during the first 2GB of a big file (16GB *.img file), after 2GB it seems to degrade to about 1/4 of the max.
@ Sync enabled: first 2GB @ 10-20MB/s, after 2GB its degraded to 1-3MB/s
@ Sync disabled: first 2GB @ 90-120MB/s, after 2GB its degraded to 25-35MB/s
Is this due the fact there is only about 2GB RAM left of the total of 8GB for write cache or something else?
 

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
u need to learn to understand zfs. do not do the trial and error bs you are posting.



btw...dedeup is..stupid in your situation. you just waste ram for it.
 

flcn_

Cadet
Joined
Jul 28, 2015
Messages
7
My sincere apologies if I don’t understand ZFS in relation to FreeNas not that good yet.
I should have read some more literature. Don’t get me wrong but there does go a lot of stuff round on the Internet regarding ZFS and FreeNas which is not always true or well augmented making it a bit confusing to me.

I found some good written literature about it on the Dutch forum Tweakers.net and found the Slideshow brought by “Cyberjock” very helpful getting things straight.
Back to our situation, I managed to get some things straight;

1. Since we are using less than the recommended RAM in the test environment, we cannot expect FreeNas (with ZFS) to perform better then is does right now right?
(We have 8GB, we should have about 8+4 12GBs assuming we have about 4GB of storage space).

2. Since the test environment exists out of conventional spinning SAS disks making up one Raid-Z1 array of 8x 400 GB with no separate SLOG device for the ZIL. This will cause the ZIL to be places on the same volume (thus platerns) were the data resides meaning huge performance degradation because of the old “slow” disks plus that both data IO and ZIL IO well compete for resources on the same disk array. So unless adding an separate SSD or at the very minimum a separate conventional disk(s) as an SLOG (device were the ZIL remains) there will be mayor performance issue (assuming the RAM issues at point 1 has been solved).

3. I think we will stay away from iSCSI on ZFS because of all the warning I see everywhere about ASYNCHRONE writes being very dangerous regarding losing In-Fligh date.
In relation to the use case which is being an Storage Repository for our Hypervisors it is NOT-DONE to lose metadata of the VMs.
iSCSI being the protocol basicly sending the RAW commands/data over the network that normaly flow true your SAS cables feels a bit odd in this context.

Besides the ASYNCHRONE writes there are also warnings regarding not using more than 50% of your volume when using iSCSI.

If I add up both arguments plus the fact iSCSI is a bit more complex to manage compared to NFSv3 (which is supported on XenServer) and the fact that it’s an head-on race if we talk about performance...
Well it seems common sense to me to stay with NFS.

4. Your comment regarding the use of Deduplication maybe make sense for the TEST environment but certainly not for the PoC. We are planning to place around 30 RDS VMs that have a food print of 200GB allocated (effective about 100GB) each resulting in potentially 6TB of needed storage, if no deduplication is applied this will costs a lot more storage space thus more SSDs which are relative expensive. Of course there are better solutions like PVS or MCS, but that’s an other discussion.


After summarizing all of this, there really remains one question left for me;

Since this is a Test environment to test various aspects of FreeNas like;

  • HowTo install.
  • Is my hardware supported.
  • Howto configures services, default settings and Active Directory integrations.
  • Howto configure volumes, ZIL, Scrubs etc.
  • Flashing the HBA device to IT mode.
  • Testing iSCSI vs NFS (we now use NFS).
  • Basicall how does it work and how does it perform in real-world situations, you simply can't work-out everything only in theory.
So we did not spend the money on new hardware like newer server, SSDs and RAM yet, but after testing our Proof of Concept will follow.
This is what we going to do for our Proof of Concept, meaning that we will have one big array of 12/24 SSDs configured being a Raid-Z2 or Raid-Z3. We also think to add 128GB RAM instead of 64GB, but this can be done at a later point in time.

Now the Question:
So when using a full flash array, is there still need for an separate SLOG with ZIL on it?
For example does it make sense for us to place two SSDs in a mirror configuration to place the ZIL on in this situation?

I read up on it and I understand that it makes sense when you are using a big array of spinning disks to speed things up, but when your whole array exists out of SSD''s that do not share the mayor drawbacks that conventional disks have… is it still recommend?

(For example to prevent that writes to the data volume and the ZIL are getting in each others way battling for resources like the do when they both reside on the same platerns in conventional disks)
 

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
I do not want to be rude just lazy to read all the text. raidzX is slow. it will also be slow with ssd. if you want plenty of IOs: do mirrored vdevs. (similar to raid10). or if you want to get smiling: to a three way mirror.

dedup almoast never makes sense. there are some posts here regarding the amount of mem u will need.

I can only speak for myself, but the big text blocks in one forum posts do not make sense.

split your problems.


you are not talking to a IT consultant you get paid to do your work for the eval part.

most of your concerns can only be answered by ones who deal with this kind of setup regulary. and that is IXsystems.


your productive setup even will need support.

I know, everybody wants to save money. but your questions are typically pre sales issues.



to make bring it to the point: spent much money and time to understand the things (and learn how everything will work with your envorinment) or call the big boys who sell TrueNas and are the makers of freenas.

talking to them at least gives you the direction u can go with.
 

flcn_

Cadet
Joined
Jul 28, 2015
Messages
7
Thanks for your reply zambanini, we are strongly evaluating going for TrueNas or some other vendor to provide full-flash storage.
But before we do that, our FreeNas PoC needs to succeed and proof it concept (lol).

I will dive a bit into Mirrored VDEVs vs. Raid-Zx
Thnx
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
@flcn_ be sure to mention to IXSystem you full requirements but also your location, since shipping/import/taxes might also play a big role in the budget.
but in any case, you will get a quote from them with a spec of a solution which would make it a lot easier for you to evaluate.

If things with IXSystem don't go as expected, I'd suggest contacting a consultant (freelance and/or company) that would be able to provide maintenance and support as well...
 
Last edited:

flcn_

Cadet
Joined
Jul 28, 2015
Messages
7
Thx for al the tips thusfar!

I contact IX to see if they can provide something that fit the budget.
Nevertheless we will build a PoC to see how it will perform over our current spining disk array.

We are considering the following change to our PoC scope;
  • One pool existing our of 24 SSDS (Samsung 845 DC 400GB) in a mirrored configurationSo 12 vDEV's of 2 SSDs each. If im correct striping will occure evenly over all the vDEVs in the zpool.
The command for this woould be something like: "zpool create tank mirror sda sdm mirror sdb sdn mirror sdc sdo mirror sdd sdp mirror sde sdq mirror sdf sdr mirror sdg sds mirror sdh sdt mirror sdi sdu mirror sdj sdv mirror sdk sdw mirror sdl sdx" assuming the SSDs will be named sd*.
  • We will test both NFS and iSCSI performance against our XenServer hypervisors an then measure the performance and compare thisvalue to eas-of-management and DATA safety before deciding whether to go for iSCSI or NFS.
  • We are convinced that deduplication is rather expensive on IO and RAM usage, the con is that we need more disks (Even though in our situation of almost 30 identical VMs it can come in handy).
  • Since the impact of compression (l4z) is not that big on modern hardware we are going to apply this (of course we will also test this both ways).
  • To connect all the drives we will use two LSI 9201-16i and evenly spread the drives over the two cards. So every mirror has one drive on both cards to make it even more resilient (drive and card failure).
  • We will leave out a dedicated SLOG to place the ZIL on since the whole array is made up of SSDs. I read some cases were SSDs in a mirror configuration as a SLOG became the bottleneck on Full SSD arrays. Since it’s a PoC we can still switch.
  • We will implement a minimum of 128GB DDR3 ECC memory instead of 64GB).
    16x8GB instead of 8x16GB (more memory bandwidth over the future expansion benefit).
Is there any indication what kind of performance to expect from such a storage array?

Simply sum the total possible IOPs of 50% of the SDDs is of course not realistic, for example we have to deal with max network throughput @ 10 Gib/s, the type of data (cached or not), CPU speeds, PCI bottleneck and the max speed of the LSI HBA device @ 6Gb/s. But still is there some kind of reference/indication?

If it performs way better like we expect it does then we can decide to buy an enterprise device to replace current storage (Netapp). The other solution if the budget is not sufficient and the risk is acceptable we maybe implement the PoC to our production enviroment for RDS Servers only (meaning servers that are not mission critical and can be replace/rebuild if needed).
 

bmh.01

Explorer
Joined
Oct 4, 2013
Messages
70
1. The drives won't be sd* (FreeNAS isn't linux)
2. You won't run any commands, you'll do it all through the GUI unless you want to make a bigger clusterfuck than you already have.

You are stating/making so many mistakes it's hard to believe you've read any of the good documentation out there (or at least understood it), you really need to do some more research.

1. I'd split the drives into two zpools personally to limit failure domain (however it may happen), although all in one pool would be ok I guess just that if anything happens that single pool you have lost all the data.
2. Dedup is a non starter barring exceptional circumstances, this isn't one of those. (again if you'd researched well you'd already know this)
3. Try slog either way, i'd expect you not to need it but see how you get on.
4. The more RAM the better (up to a point).

How fast/IOPs will it go? Too many variables to give an indication, your testing with your working set will answer that for you alone.

Sorry to be blunt but it's painful to read the same errors being repeating over and over. And the current system you have configured is underpowered and completely incorrectly configured.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Just wondering, did you ever get a TrueNAS system? You know that iXsystems has a "try before you buy" program for TrueNAS, right?
 

flcn_

Cadet
Joined
Jul 28, 2015
Messages
7
Hi Cyberjock,

Sorry for my "late" response, we are travelling USA for a month (now in SanFrancisco).

We contacted sales @ TrueNas for a solution to our business case with the same background story. The offer we got had a to high price for our budget , certainly a reasonable offer but more compared to the kind of budget we have when we replace our enterprice wide storage. This is only for our RDS environment.

The sales representative replied and asked for our budget. On which I answered. He couldn't offer something better for that budget and we conclude that a self build best suites the case.
 
Status
Not open for further replies.
Top