BUILD Virtualization/File server build

Status
Not open for further replies.

clicks

Dabbler
Joined
Jan 29, 2015
Messages
11
I'm restructuring our companies storage and virtualization systems and came up with a new setup. Hope to hear your comments about it and have some questions:

VM Storage (VS):
1x SYS-2028R-E1CR24L
2x Intel® Xeon® Processor E5-2623 v3 (10M Cache, 3.00 GHz)
2x 16GB LRDIMM DDR4-2133, ECC
3x M1015
8x Samsung SSD PM853T (Mirror)

Backup Storage (BS):
1x Supermicro SC826E16-R500LPB (X9SRL-F)
1x Xeon E5-2603
1x 24 GB DDR3, ECC
12x WD Red 3 TB (Mirror)
1x Intel SSD 750 (SLOG) (if required)
1x Intel SSD 750 (L2ARC) (if required, yes I know more RAM needed as well)

External Backup (EB):
1x SYS-6048R-E1CR24L
2x Intel® Xeon® Processor E5-2623 v3 (10M Cache, 3.00 GHz)
2x 16GB LRDIMM DDR4-2133, ECC
1x M1015
12x 4 TB HDD (RaidZ2)

The VS will offer shared storage via NFS to the virtualization hosts (not listed above). We will start with 8 out of 24 slots to have some reserve for future growth (and to keep the budget on track :)
The BS on the other hand should offer SMB shares for network drives (User files etc.). This way I can offer fast SSD-based virtualization with much Random access and HDD-based file services which are more sequential and don't have to be this fast.
Furthermore will BS serve as a backup system for the VS in case it breaks. Hence I plan to replicate the SSD pool on the HDD pool of BS. Of course this would be much slower, but in case of system failure we would be up running (or better walking) in a short time frame.

As VS and BS will be in one rack I planned an additional external backup system (EB) which will be used to collect replications of VS and BS (and again could be used to mitigate hardware failures of the primary systems) and is physically separated.

In case you are wondering why BS uses so old parts compared to the VS and EB system, it simply is an existing server which is young enough to be repurposed.

a) Does this system make sense? Do you see any pitfalls?
b) Is it possible to use the LSI 3008 integrated in the Supermicro boards?
I have seen jgreco using a system with the 3108 chips.
c) Does it make a difference to us 8 mirrored drives with 1 vdev or 4 vdevs?
Adding new drives later on would presumably be done in 2 drive batches.
d) Can i mix drives in a mirrored setup for future expandability?
e) The 10GBit X540 Intel cards are supported by Freenas, aren't they?
f) Memory too low? (I think I know the answer...)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Your BS server has zero use for L2ARC or SLOG as a backup server.

As for the VS server, you need more RAM than just 32GB. I wouldn't try less than 64GB, and I wouldn't try without an L2ARC. Also, if you plan to use NFS for the VMs, an SLOG is mandatory unless you are cool with 5MB/sec throughput for your VMs.

The 3008 is not well supported, so no you can't use it if your data is important. Search the forum for the gravestones of people before you.

Mirrors is the only way to get good performance with VMs, period. You also want as many vdevs as possible. Generally speaking, if you plan to run more than 3-4 VMs in anything except low workload conditions, you'll probably need more than 4 vdevs. iXsystems doesn't really sell servers for VMs with anything less than 5 vdevs, and it is generally made clear with that few vdevs that you run the risk of not having "great performance". It just depends on what your storage requirements will be for your VMs as that will dictate your hardware needs.
 

marbus90

Guru
Joined
Aug 2, 2014
Messages
818
VS:
More RAM. I wouldn't even try with 64GB if you need that much performance. 8x 16GB DIMMs may offer a more cost-effective entry with higher troughput = quadchannel instead of singlechannel. Also I'd rather go E5-2620 v3 (same DIMM speed, lower TDP, more overall performance) or 2637 (highest DIMM speed possible atm, even higher singlecore speeds).

EB:
same goes for this box. Are the HDDs existing as well? If not, I'd go for 7200rpm enterprise grade drives.

BS:
SLOG would rather be for VS. SATA SSDs are kind of slow, even for sync writes. not 5MB/s slow, but a NVMe based SSD would be about 2-2.5 times faster. L2ARC seconded, but an Samsung 850 or Crucial BX100 would be enough for that.

b) I would never use crossflashed used cards in production systems. Order Supermicro AOC-S2308L-L8e with those systems, which are LSI 2308 based HBAs. 12Gbps offer too many pitfalls for now.
c) each mirrorset is its own vdev.
d) yup
e) jpaetzel reported that there are issues with the X540. Chelsio T420-BT may be a safer option for 10Gbase-T.

My receipt for this:
VS:
1x Supermicro SC216BA-R920LP
1x Supermicro X10DRi or X10DRi-T if you want to risk the X540 NIC
2x Intel E5-2620 v3 or E5-2637 v3
8x Samsung M393A2G40DB0-CPB 16GB DIMMs
3x Supermicro AOC-S2308L-L8e
2x Supermicro SSD-DM064-PHI for boot
1x Intel 750 or P3700 400GB for SLOG
In any case, this is also a candidate for the Ultraserver barebone lineup. The price difference isn't all that big, but you can choose an onboard NIC between 4x1Gbase, 2x 10Gbase, 4x 10Gbase or 2x SFP+. The backplane is direct-attached as well.

ES:
1x Supermicro SC846BE16-R920
1x Supermicro X10DRi or X10DRi-T if you want to risk the X540 NIC
1x Intel E5-2620 v3
2-4x Samsung M393A2G40DB0-CPB 16GB DIMMs
1x Supermicro AOC-S2308L-L8e
2x Supermicro SSD-DM064-PHI for boot

BTW, iXsystems does also ship to Europe. If you aren't bound to sysgen due to localism or existing contracts it might be an interesting option, especially since an expansion to Europe is planned. While one alone might be able to set up a working solution, their team knows the specifics of FreeNAS and can plan systems to your satisfaction.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I have gotten a pretty good laugh at the people pushing quad channel memory. Yes, it's faster. But let's put it in perspective.

DDR3-1333 (which you probably wouldn't buy for a server these days anyway) does 10.7GB/sec throughput, per channel. So just two channels is 21.4GB/sec. I don't know about you but most people I know, even higher end users, don't even hit speeds that would make dual channel slow. The real performange gain would be from having more RAM, not necessarily having more channels.

At DDR3-1600 (which is what many people buy these days), you are now talking 16GB/sec throughput... for a single channel.

Still think triple and quad channel is important? I don't....

Pretty funny that every time I disappear for a few weeks somehow, somewhere, this falsehood starts spreading like wildfire. I clarify it and people are all "oh.. yeah.. I forgot" and I have no doubt that if I took another vacation in 2 months that people would be arguing for quad channel all over again. :P

Even if you consider the theoretical versus actual, you'd be pretty disappointed about how much channels affect ZFS while the quantity of your ARC plays a massive role in actual zpool performance.
 

marbus90

Guru
Joined
Aug 2, 2014
Messages
818
Then there's still the cost and latency of LRDIMMs versus RDIMMs. LRDIMMs only gain with 3DPC setups... and that usually means 12 DIMMs per single CPU. aka 768GB RAM in a single host. Which is a little bit over the top, even for ZFS. If you didn't notice it, that's gonna be a system running 24 direct-attached SSDs - so I wouldn't build it with performance pitfalls which even cost more money.

With 2DPC RDIMMs are cheaper and faster. Then there's the amount of RAM. 64GB is a starting point for such a massive filer. So you're at 128GB with any sense. Either 4x32GB LRDIMMs, which are slower, more expensive (and use less channels), or 8x16GB which are utilizing all 4 channels at a lower price and lower latency.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I just want to clarify a few things:

1. I'm not arguing that 8x16GB is worse than 4x16GB or 2x16GB. We all agree that more RAM can't be a bad thing.
2. I would go with a configuration that leaves extra unused RAM slots. Starting off with populating all DIMM slots means that any upgrade, if performed, will require you to invest in even more expensive DIMMs to add what may only be 16GB of RAM per DIMM.
3. If you have 8 slots, then you have 4 channels. So 4 DIMMs (one in each channel) will yield full quad-channel for those interested. You only need 1 DIMM populated per channel to activate that channel.

Sure, even with SSD based zpools you don't want to have performance pitfalls. But let's be honest with a few points:

  • Rarely anyone does more than a single 10Gb network card. So no matter how much bandwidth you have internally, you definitely aren't pushing more than 10Gb. I know people that have 2+ 10Gb, and they either do iscsi multipath or LACP. Virtually nobody ever sees more than a couple hundred MB/sec, even when moving VMs. There just isn't enough iops for anyone I know to be upset about that. Even all-SSD pools I've worked with that are pretty darn large don't do more than about 500MB/sec because other VMs took away some of the io of the zpool. I have no doubt someone can come up with an exception, but generally you start having other bottlenecks. For example, SSDs see a marked drop in throughput and iops if you are connecting them to a SAS expander in a second chassis because of the extra distance and signal path in between. This is why the TrueNAS Z35 only allows for L2ARC and slogs to be on the head unit. Those devices are very sensitive to small amounts of latency. I'm not an iXsystems salesperson, but I believe that if you buy a Z35 you cannot put spindle disks in the head unit with the default designs. Not that they aren't compatible, but the head unit slots are pre-reserved for SSDs.
  • Generally SSDs are used for VM storage, and I/O is the limiting factor, not throughput. I have yet to see a system where throughput is the limiting factor. People generally don't care about getting 200MB/sec throughput. They *do* care about getting 200 iops versus 200,000 iops. A relatively modest throughput generally doesn't affect VM performance. But if your iops sucks, you will know. I've seen people do 18+ vdevs, 256GB of RAM, and over 512GB of L2ARC in 2+ SSDs and still found that ultimately they were io bound. Even with zpool status -v their throughput wasn't really high second-for-second. But their iops was the limiting factor because you only have so much to use at any given moment in time.
I just look at the whole picture. As long as your bottleneck, whatever that may be, isn't something you are going to hit, then who cares?

It's like the people that are upset over PCIe 2.0 SAS controllers and need to have a PCIe 3.0 SAS controller. At 8x you are talking 500MB/sec times 8 lanes (4GB/sec) with 2.0 versus almost 1GB/sec times 8 lanes (8GB/sec). Even at 4GB/sec, I don't know anyone that would:

- Be upset that their zpool was bottlenecked by the PCIe bus at "just" 4GB/sec.
- Even be capable of making an argument that 4GB/sec is "too slow".

- You are guaranteed to be talking an all-SSD zpool to get 4GB/sec.
- You are guaranteed to be talking a minimum of 4+ 10Gb cards just to saturate that kind of throughput over your network.
- You are guaranteed to be talking about a server that was built for performance primarily (size will be secondary or lower priority), so you won't have 100s of TB of data to scrub. Even at 50TB of data, at 4GB/sec (pcie 2.0 saturation) you can scrub that much data in less than 4 hours.
- Even if everyone in this forum had 10Gb NICs tomorrow, you'd be amazed at how hard it is to actually saturate a 10Gb LAN with traffic. I know lots of people that can't do it and find that trying to get it (which they were hoping for) was extremely cost prohibitive. I can only saturate 10Gb when testing from

So I'm betting on 2 things:

1. Anyone that falls outside what I just described is probably building lots of machines for redundancy and such, so a scrub taking 4 hours (or more) is unlikely to be a major problem for them.
2. Anyone that has these kinds of "bottlenecks" is definitely not going to be coming to the forums. Nobody in their right mind is going to be dropping what could easily be a $50k+ server at self-built prices and rely on a few of us to provide them with support. They're going to want a support contract with some big company.. iXsystems, NetApp, Qnap, etc and will happily pay for someone to do the required tuning of the OS, ZFS, etc to get those speeds, and be able to argue that they aren't getting their money's worth with their 6 or 7 figure investment in their file server.

So I see no justifiable use case where someone has a valid argument to argue that they are going to have problems with bottlenecks on PCIe 2.0 (or missing out on a RAM channel or two). It's all about perspective and whether it really and truly is a problem.

To be honest, I bet if a bug in FreeNAS suddenly dropped all PCIe links to 1.0 speeds, very few of us would even be able to prove that they were bottlenecked. I know I couldn't. :P

To me, some of these alleged bottlenecks are things that nobody is going to see with anything that is currently available on the market, won't have the necessary resources to saturate those links anyway, nor care that they aren't saturating some aspect of their system because of other limitations. I can't saturate Samba over 10Gb no matter what. I can saturate NFS over 10Gb, but only when using a RAM drive (not even mutiple SSDs on a RAID will work due to micro latencies). And anyone doing iSCSI won't saturate 10Gb because ZFS can't efficiently cache iSCSI devices. I think too many people suffer from e-penis syndrome.
 

marbus90

Guru
Joined
Aug 2, 2014
Messages
818
The dual-CPU board I listed features 16 DIMM slots, which equals to 256GB RAM with 16GB RDIMMs. As far as I know, the TrueFLASH Z50 is sold with 256GB RAM as a seemingly fixed configuration. (it doesn't come with expansion options either, the Z35 seems a "Standard style" TrueNAS which would also run with HDDs as well and offers expansion options.)

With PCIe troughput it's the pitfall of ZFS overhead. It's no RAID controller, so if you'd do 2 or 3way striped mirrors, you'd land at below 2GB/s theoretical write troughput.

Last but not least: why buy that old/slow clunker for more money when you can have a new shiny car for less monies? That's what I was getting to. Would you bet on an used M1015 crossflashed to something it isn't in production? I would not. I would buy the LSI HBA which comes with HBA firmware and doesn't need crossflashing. Also SAS2008 HBAs are kind of EOL now. 2308 has been sold for quite a while now, with lots of systems. Price differences are white noise. Again, why pick the older/slower clunker here? It's a production system, not your joe homeuser whose wife will leave him if he spends 1$ more than neccessary.. :P
 

ALFA

Explorer
Joined
Aug 23, 2014
Messages
53
This is a very informative thread, I like all the information that you give to us guys, many thanks to @cyberjock and @marbus90

@marbus90 one question, and sorry to hijack your thread @clicks :p

but its best 128 GB RDIMM(8x16GB) over 256 GB LRDIMM(8X32GB) for example? and what its the difference between the LRDIMM and the RDIMM (beside been more expensive, use less channels and high latency) Im mean why they are more expensive? is there any real benefit using this LRDIMM type memories over RDIMM?

thanks guys, and thanks to this great community ;)
 
Last edited:

marbus90

Guru
Joined
Aug 2, 2014
Messages
818
If you got the money and expect you'll use 256GB rather soon, go LRDIMMs. If you compare 8 RDIMMs and 8 LRDIMMs you get the same channel usage. but thing is, I'm comparing total amount of memory, not 8x16 with 8x32GB. It's got lower latency than RDIMMs with 3DPC setups. Means 384GB RAM per CPU. so: main reason is density. but you should start on a certain amount of DIMMs, because you can certainly achieve 384GB on 2 CPUs with 24 DIMM slots already.

So: If you think you'll need 768GB RAM in the near future, go for it.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
With PCIe troughput it's the pitfall of ZFS overhead. It's no RAID controller, so if you'd do 2 or 3way striped mirrors, you'd land at below 2GB/s theoretical write troughput.

Yes, but you are totally ignoring the fact that you are almost certainly running 10Gb of throughput on the networking side, MAX. 2GB/sec is almost twice what a 10Gb network pipe can push. Even with 2x10Gb, you are going to be io bound before you are bandwidth bound. I thought I explained, very well, in my previous post that there is much much more than a saturated throughput over PCIe.

I'm sure we won't agree on this. And that's fine. I just know where my smart money would go. ;)
 

clicks

Dabbler
Joined
Jan 29, 2015
Messages
11
Thanks for your feedback. I'm glad to hear that there seems to be no major misconfiguration or misconception.

Ok, more RAM for the VS (128) and ES (64) is set.
I initially choose the E5-2623 for it's higher clock rate as I read somewhere in this forum that it is important for SMB file sharing.
As the Supermicro Boards come with integrated X540 ports I would try to use these first, Chelsio Cards would be a backup solutions if the X540 turn out unusable.
I'm a bit surprised that the SSD based VS system would profit from a SLOG device. I will get one Intel 750 400GB and test it.
NFS is slow for VM storage? Will iSCSI be faster? From what I have read in this forum iSCSI is the more problematic part (fragementation etc.)

Actually I already contacted iXSystems and asked for an offer of a TrueNAS system. But they are a bit too "enterprisy" for our needs. High redundancy is nice but also very costly. And actually I don't fell well investing in fast enterprise SAS spinners while even "cheap" enterprise SATA-SSDs blow them out of the window. And on the other hand the TrueFlash is too expensive for us.
Hence I prefer to use more "commodity" hardware with the option to replace it in case of failure and to gain a good balance between features and price.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes, they are enterprisy. That's what they sell. ;)

Unfortunately, for some people, being enterprisy has saved their butts. SAS does provide some value though. There are things that SAS does that SATA just doesn't.

I've known quite a few people that, as a result of a single failing hard drive in their zpool, had every VM that was stored on their FreeNAS server go offline. This was because IO was stalled, which caused all the VMs to disconnect. I have yet to see that on SAS.

There's other pitfalls too that come with supporting your own hardware. What if the NIC suddenly has a compatibility issue? Will you be able to prove it is the driver? Will you even know how to fix it? Will you even be able to fix it? Are you okay with potentially taking downtime measured in days while the bug is worked out, fixed, and pushed out officially on FreeNAS?

Yes, SAS can be more expensive, and having support contracts for hardware can be more expensive. But for many people, it's easier to pay someone else to make it their problem if their NIC suddenly stops working, the zpool suddenly goes offline and you don't know if the data is gone forever or if you can fix it with a one-line command, or the system starts acting erratically and you have no clue where to go.

I remember one person called iX for a quote. They got the quote and decided to build and support it themselves to save some cash. Well, things went badly for them. They ended up calling us for a one-time contract. Those kinds of things are handled as a best-effort kind of thing, and it takes time to have sales do up the contract, get the money, etc. Well, this guy was flipping out. He was upset because 15 minutes went by and Sales hadn't contacted him yet (it wasn't even normal working hours yet, so Sales wasn't in the office). Ultimately he threw the big card.. "we're losing over $15k/hour in lost revenue while we are down". Suddenly those savings he thought he was going to enjoy by supporting it himself were gone in like 2 hours and he ended up being down for 2 or 3 days if I remember correctly. He happily bought a TrueNAS HA and a 24x7 support contract after losing a boatload of money. He is actually pretty cool and when he puts in a ticket I happily help him. He's no slouch, but he doesn't have the time or resources to deal with issues directly like a real support contract can provide.

Just keep in mind that if you do it yourself, you are also accepting responsibility for it. You never know what might go badly, and if time = lots of money then you should definitely factor that in. I can pretty much guarantee you that at some point in the first year, you will see some kind of problem. Whether it's a hardware failure, software bug, something you mis-configured on accident, or something that changed in the software you weren't aware of, it *will* come. The question will be how will you determine what the problem is, how will you recover from it, how long will it take to recover, and how much money will be lost while you are down. ;)

I do wish you luck in whatever choice you make. :)
 

clicks

Dabbler
Joined
Jan 29, 2015
Messages
11
I completely agree with you and I didn't wanted to imply that ixsystems offer was expensive or over the top.
I know of SAS advantages over SATA and understand the price gap between it.
But on the other hand my budget is limited (like everyones else) and I have to find a way to balance hard- and software features with costs. Buying a "proprietary" solution might give me a solid (mid-sized) storage base but leaves no room for decent virtualization hosts and/or a proper backup storage.

Although it is a shame to admit but the (self-made) solution will be a huge leap forward in performance and security compared to the current situation.
 

trumee

Explorer
Joined
Jun 29, 2015
Messages
68
@cyberjock @marbus90 Sorry to post in this old thread. I am planning which RAM to buy for the same motherboard (X10DRi-T). This motherboard has 16 slots for two cpus. The motherboard specifications says 'Up to 2TB ECC 3DS LRDIMM, 1TB ECC RDIMM'.

As a start I will only populate a single CPU. Also, I dont need more than 64GB at present hence I was thinking of buying a denser RAM and thus more free slots to upgrade in the future. I was planning to start with 2x32GB RAM for my single CPU.

32GB costs:
RDIMM, 1x32GB Samsung M393A4K40BB0-CPB, 2R, costs $157
LRDIMM, 1x32GB Samsung M386A4G40DM0-CPB, 4R, costs $192

16GB costs:
RDIMM, 1x16GB Samsung M393A2G40DB0-CPB, 2R, costs $74.98
LRDIMM, Coudnt find a 16GB micron/crucial stick


Since the costs between 32GB RDIMM and LRDIMM is not significant, which is a better choice amongst the two?

Thanks
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you plan to go to very large with RAM (32GB+ sticks in every RAM slot, fully populated), LRDIMMs are the better choice.
 
Status
Not open for further replies.
Top