Building a high performing datastore for ESXi hosts.

Status
Not open for further replies.

VSXi-13

Dabbler
Joined
Sep 17, 2014
Messages
15
Primary Goal: To build a FreenNAS box and serve out shared storage over a dedicated 10Gbps storage network to a dual socket E5-2670 system. This system will be running ESXi 6.0 and will primarily be used for doing some very abusive SQL Server testing scenarios. This is educational for me, as my workplace does not have a dedicate test environment to do this sort of testing.

Secondary Goal: This server will become my primary FreeNAS box at my house. It will still need to act as a NAS in addition to its SAN duties. My existing system (AsRock C2750, 4x4 TB, 16 GB RAM) will likely be relegated into a local ZFS send target or a secondary datastore for my ESXi hosts.

Questions/Thoughts:

Given the potential for abuse that this datastore will take, I am assuming that I will need to go with a fairly high performing build in order to get the results I’m looking for and to make sure the datastore provided by FreeNAS is not the bottleneck.

Here are my initial thoughts (nothing is set in stone):
  • Processor:
    • Xeon-D 1537/1541 8 Core SoC.
      • Pros: All in one, most come with a controller that should be able to be flashed into IT mode for a FreeNAS build. Low power draw. Up to 128 GB of DDR4 RAM.
      • Cons: I don't know how CPU intensive FreeNAS is when there is heavy data I/O going on, so the processor while having 8 cores may be a little weak. More information is needed on this.
    • Xeon E5-1650
      • Pros: 6 cores, extremely fast (3.5 ghz), many PCI lanes for future upgrades. Can add up to 256 GB of RAM, 128 GB is more cost effective vs 128GB in Xeon-D
      • Cons: Uses more power. Cost is simliar to Xeon-D when adding in a motherboard
    • Xeon E3 v5:
      • Cheaper, can take up to 64GB ECC.
      • Limited expandability compared to other processors.
  • RAM
    • Uncertain with how much to go with. Obviously more is better, but it would be good to know what others are getting for workloads against certain amounts of RAM. Thinking 128GB should be sufficient to help drive the I/O I want, however I am once again uncertain. Maybe 64GB would be sufficient. If 64 can work, then is it better to go with an E3 v5 build instead?
  • Motherboard:
    • Going with some manner of SuperMicro.
  • SLOG
    • Given that I will be serving content via iSCSI and I want to make sure that sync-writes are on, I will need a very low latency SLOG. Based on what I've read on these forums and on Reddit, it appears going with a DC3500 or DC3700 is the best option here. The SLOG doesn't have to be huge.
  • Storage Media
    • I would like to go with a SSD-based pool and keep the spinners for backups and media. I know there are concerns about ZFS burning out SSD's, but given that I'll have a backup strategy in place, I will be fine.
    • I am uncertain as to what SSD's would work well here. Any suggestions?
  • L2ARC
    • Given that I want to be using a SSD-based pool, I am uncertain of the gains that will occur from having an L2Arc in place, but I am definitely up for creating one.
    • Once again, what SSD will work well here is another question.
  • Configuration of zpool and vdevs.
    • Based on everything that I have read, I have decided to go with a raidz3...just kidding -- I will be using mirrored vdevs. That being said, apparently the amount of disks to use in each vdev has been a debated subject, but my plan is to go with four. I believe this makes it a striped/mirrored vdev which should give better performance (please correct me if I'm wrong.) I decided to test this out in VMware Workstation, so I created the following:
      • SU4xsqu.png
    • To add a vdev to the zpool, I chose the existing volume name to extend and add my new disks to
      • 1xqbfVe.png
    • Is this how two vdevs of 4 disks should look from the show zpool status command? (Ignore spinning rust)
      • VWmgGDh.png
    • I just want to make sure that I'm creating this correctly, as my existing system is simply a single RaidZ2 vdev.
I am interested to hear any feedback from those who have attempted to setup something similar or have experience. I've researched this to the best of my ability, but I'm at the point where I would normally start to bounce ideas off of others to get their feedback. I appreciate any that the FreeNAS community can give me.
 
  • Like
Reactions: MtK

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
I'm not qualified to give any real feedback on this type of system, yet I'm looking forward to learning from this thread :]]
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There's no such thing as a striped vdev. You can do a four disk mirror, but that's simply mirroring a disk four ways. A striped mirror of four disks can only possibly be two vdevs.

I've been running both the Intel 535 and Samsung 850 Evo's for cheap (not bottom of the barrel) storage. My theory is that 150TBW for the $150 500GB model is sufficiently large that if I do burn it out, it'll be in a year or three, when the cost of these has dropped to $75, and at these price points, they're nearly disposable. All you really have to do is remember that it was only a few years ago that $1/GB was a good price for crap-grade SSD. And that's the thing... move towards a mindset where you're recognizing that these things ought to be viewed as disposable. What's likely to happen is that the SSD's will actually outlive the 150TBW and life'll go on. Anyways. :smile:

The E5-1650v3 is nice. The Xeon D is probably nice.
 

VSXi-13

Dabbler
Joined
Sep 17, 2014
Messages
15
There's no such thing as a striped vdev. You can do a four disk mirror, but that's simply mirroring a disk four ways. A striped mirror of four disks can only possibly be two vdevs.
Ok, I think I get it, but just to clarify, in the screenshots I had in my initial post, essentially when I am creating that I am creating two vdev's at once, each one consisting of a mirror?

I've been running both the Intel 535 and Samsung 850 Evo's for cheap (not bottom of the barrel) storage. My theory is that 150TBW for the $150 500GB model is sufficiently large that if I do burn it out, it'll be in a year or three, when the cost of these has dropped to $75, and at these price points, they're nearly disposable. All you really have to do is remember that it was only a few years ago that $1/GB was a good price for crap-grade SSD. And that's the thing... move towards a mindset where you're recognizing that these things ought to be viewed as disposable. What's likely to happen is that the SSD's will actually outlive the 150TBW and life'll go on. Anyways. :)

The E5-1650v3 is nice. The Xeon D is probably nice.

Thanks for the info on the SSD's. Any recommendations on the amount of RAM, based off of your experience (I've seen you on a lot of the threads that go over people trying to do higher end builds.)
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
  • SLOG
Given that I will be serving content via iSCSI and I want to make sure that sync-writes are on, I will need a very low latency SLOG. Based on what I've read on these forums and on Reddit, it appears going with a DC3500 or DC3700 is the best option here. The SLOG doesn't have to be huge.
I'm interested to see how this drive performs if you end up getting it.. are there any product comparisons available for it and a P3700?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Thanks for the info on the SSD's. Any recommendations on the amount of RAM, based off of your experience (I've seen you on a lot of the threads that go over people trying to do higher end builds.)

As much RAM as you can afford. Typically, 128GB of RAM is a very good starting place if you plan to do heavy stuff (like SQL stuff). SQL type workloads are very ugly to virtualize, so more RAM would be strongly recommended. ;)
 

VSXi-13

Dabbler
Joined
Sep 17, 2014
Messages
15
I'm interested to see how this drive performs if you end up getting it.. are there any product comparisons available for it and a P3700?
I was looking at that last night. I saw a couple reviews where the P3700 seems to handedly outperform the S3700. The reviewers seemed to think the reduced latency by the P3700 being on the PCI-E bus and overall increased throughput as a major reason why it does very well. I believe both are capacitor backed. The P3700 is pricey, but for the testing I'm looking at doing, it may be necessary.

As much RAM as you can afford. Typically, 128GB of RAM is a very good starting place if you plan to do heavy stuff (like SQL stuff). SQL type workloads are very ugly to virtualize, so more RAM would be strongly recommended. ;)

Based on the amount of troubleshooting I've had to do for customers who virtualize their database server without telling us and then complain when their application runs like garbage, I agree. End goal would be to have most of the database either in ARC or L2ARC on the storage side.

After researching the joys of the DC S3700 and P3700 yesterday, I was thinking about maybe running a P3500 for a L2ARC, as the review for the P3700 stated that if you're not using the drive for write intensive 4K, that the read performs almost as well as the P3700, which seems ideal for L2ARC.

@cyberjock, Do you think that the processor in this case is overkill? Could I step down to a 4 core E5-1620 v3? In reviewing on the forums and the google, I haven't been able to find a lot on processor requirements for ZFS when ZFS is handling a OLTP-type data transactions. I was thinking about maybe stepping all the day down to the Xeon-D line and going with either a 4 or 8 core variant, but I need to make sure there are enough PCI-Lanes for handling potentially 2x PCI-E 4x drives.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The P3500 is kinda pricey for L2ARC. The Intel 750's cheaper with similar performance. The Samsung 950 Pro is somewhat slower but offers more space at half the cost of the P3500.

As for the processor, the 1620's fine, until you start really pushing it. The 1650's an extra ~~$300? but when you start looking at the cost of the overall system, ...

The Xeon D is going to be problematic in this sort of system because none of the current offerings are built to handle lots of add-on. The right Xeon D would have sufficient CPU without any question.

There's not a lot of difference in OLTP vs VM datastore, so I can tell you that I'm vaguely regretting our E5-1650v3 for VM storage, but one of our other forum regulars, @titan_rw , has managed to burn up all six cores.

Overall I think it isn't worth the risk of running out of CPU.
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
Do you actually want a L2ARC in front of an SSD pool? If you have multiple vdevs, you'd have to have an incredibly fast L2ARC to actually be faster than going straight to the pool.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Do you actually want a L2ARC in front of an SSD pool? If you have multiple vdevs, you'd have to have an incredibly fast L2ARC to actually be faster than going straight to the pool.
It can still make sense. Say, heavy reads and writes simultaneously. Pool can service the writes while ARC/L2ARC does most of the reading (assuming roughly independent workloads).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Do you actually want a L2ARC in front of an SSD pool? If you have multiple vdevs, you'd have to have an incredibly fast L2ARC to actually be faster than going straight to the pool.

To expand upon @Ericloewe ... the pool may well consist of low grade SATA SSD while L2ARC could be fed from Intel 750/Samsung 950 Pro with the gigabytes-per-second NVMe interface, which gives you the benefits of lots of pool IOPS. In the right environment it could make sense.

I've got so much L2ARC on our VM filer here that the entirety of the pool activity seemed to be just writes. I finally managed to ruin that by running updates on several dozen Windows VM's...
 

VSXi-13

Dabbler
Joined
Sep 17, 2014
Messages
15
The reason for L2ARC is what @jgreco is referring to. The SSD pool will likely be made of Samsung 850 Evos or maybe intel 535's.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The reason for L2ARC is what @jgreco is referring to. The SSD pool will likely be made of Samsung 850 Evos or maybe intel 535's.

Ah, two of my favorite drives. The thing you're going to have to be careful about is that there'll be some balance point for L2ARC vs pool that "makes sense". It depends on your intended use, to be sure, but it isn't entirely clear to me where the balance point might be. When you look at a large FreeNAS system, let's say 2 x 10GbE, that's 20Gbit of I/O capacity, and a mere four mirror vdevs of SSD would exceed that (barely?) in capacity. As you give these devices breathing room, their ability to do simultaneous read and writes improves. So a pair of Samsung 950 Pro NVMe's (~~600K IOPS) or a pair of Intel 750 NVMe's (~~860K IOPS) ... well, if you do the math, 860K IOPS and an average size-per-IO of 4K, that's 3.5GBytes/sec or about 28 gigabits of capacity there. So we've calculated "max practical L2ARC" (at least for a 2 x 10GbE system).

But the thing is, with the 850 Evo, which by the way many people seem to find out is only reliable for about 50K IOPS, if you make an array of 24 of them (12 x mirrors) even at the pessimistic value of 50K IOPS you're talking nearly 20Gbps of write capacity, and for reads, since ZFS can read from any of them, there you've got between ~20Gbps...~40Gbps of read capacity. It isn't clear to me that L2ARC would be meaningful on that kind of system.
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
But the thing is, with the 850 Evo, which by the way many people seem to find out is only reliable for about 50K IOPS, if you make an array of 24 of them (12 x mirrors) even at the pessimistic value of 50K IOPS you're talking nearly 20Gbps of write capacity, and for reads, since ZFS can read from any of them, there you've got between ~20Gbps...~40Gbps of read capacity. It isn't clear to me that L2ARC would be meaningful on that kind of system.

Even 50k seems optimistic, Anandtech's review is showing about 26k for random reads, and around 73k for QD1 random writes.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Even 50k seems optimistic, Anandtech's review is showing about 26k for random reads, and around 73k for QD1 random writes.

They're fulla crap. They're only doing three simultaneous IO's. You're well into the overall system latency department when you do stupid like that.



It's like punching yourself in the privates and then complaining that you can't run fast.

For the average workload on a hypervisor datastore, there's going to be plenty of parallelism going on, both because the device isn't being accessed by a single VM, but also because ZFS can be way aggressive about things like aggregating writes and prefetching. If you're running one singlethreaded program on one VM on one hypervisor and it is the only thing going on with the datastore, maybe then you've got a problem.
 
Status
Not open for further replies.
Top