Build for ESXi - All SSDs - About 9K

Chris Kuhn · Aug 3, 2017

Here is what I am considering for the build:

SuperServer 1028R-WTRT, 1U, Intel C612, 10x SATA, 16x DDR4, Dual 10Gb Ethernet, 700W Rdt PSU
2 x Xeon E5-2609 v4 Eight-Core 1.7GHz, LGA 2011-3, 6.4 GT/s QPI, 20MB L3 Cache, DDR4, 14nm, 85W, Retail Processor
64GB (8 x 8GB) Dual-Rank, DDR4 2400MHz, CL17, ECC Registered Memory
Arctic Silver 5, 12g, High-Density Polysynthetic, Silver Thermal Compound
2 x SSD-DM128-SMCMVN1 SuperDOM 128GB SATA Disk on Module, 520/180 MB/s, Retail
9 x 1TB 850 Pro 7mm, 550 / 520 MB/s, 3D V-NAND, SATA 6Gb/s, 2.5-Inch Retail SSD
400GB DC P3500 1/2 Height, 2200 / 1000 MB/s, MLC, PCIe 3.0 x4 NVMe, AIC OEM SSD
10/40Gbps Ethernet Converged Network Adapter, X710-DA2, (2x SFP+)

I was planning on using the 2x SSD SuperDOMs for FreeNAS (Mirrored).
8 of the Samsung SSDs for the primary data store, at the highest level on ZFS for redundancy (So whatever the RAID6 or RAID10 equivalent is) expecting to only use about 2 TBs after de-dup.
1 of the other Samsung SSDs for ZIL.
400GB DCP3500 for the SLOG.
One 10GBPS Fiber link as primary and one as backup in case of failure.

I am open to any suggestions on changing this as long as it stays around the 9k price-tag this is currently at.

Thanks for your help and suggestions in advance.

Spearfoot · Aug 3, 2017

Chris Kuhn said:
Here is what I am considering for the build:

SuperServer 1028R-WTRT, 1U, Intel C612, 10x SATA, 16x DDR4, Dual 10Gb Ethernet, 700W Rdt PSU

2 x Xeon E5-2609 v4 Eight-Core 1.7GHz, LGA 2011-3, 6.4 GT/s QPI, 20MB L3 Cache, DDR4, 14nm, 85W, Retail Processor

64GB (8 x 8GB) Dual-Rank, DDR4 2400MHz, CL17, ECC Registered Memory

Arctic Silver 5, 12g, High-Density Polysynthetic, Silver Thermal Compound

2 x SSD-DM128-SMCMVN1 SuperDOM 128GB SATA Disk on Module, 520/180 MB/s, Retail

9 x 1TB 850 Pro 7mm, 550 / 520 MB/s, 3D V-NAND, SATA 6Gb/s, 2.5-Inch Retail SSD

400GB DC P3500 1/2 Height, 2200 / 1000 MB/s, MLC, PCIe 3.0 x4 NVMe, AIC OEM SSD

10/40Gbps Ethernet Converged Network Adapter, X710-DA2, (2x SFP+)

I was planning on using the 2x SSD SuperDOMs for FreeNAS (Mirrored).
8 of the Samsung SSDs for the primary data store, at the highest level on ZFS for redundancy (So whatever the RAID6 or RAID10 equivalent is) expecting to only use about 2 TBs after de-dup.
1 of the other Samsung SSDs for ZIL.
400GB DCP3500 for the SLOG.
One 10GBPS Fiber link as primary and one as backup in case of failure.

I am open to any suggestions on changing this as long as it stays around the 9k price-tag this is currently at.

Thanks for your help and suggestions in advance.

Welcome to the Forums!

Just to be clear, 'ZIL' and 'SLOG' in this context are synonyms... you're talking about using a Samsung 850 Pro and/or an Intel DC P3500 as ZIL SLOG devices. These wouldn't make a good match if you plan on mirroring them; they differ too much in capabilities. Also, neither device is an optimal choice for a ZIL SLOG device. Quoting myself from another thread:

"Regarding ZIL SLOG devices: these need to have particular characteristics: power protection, low latency, fast write speed, and high durability. Though they'll 'work', your DC S35xx's aren't well-suited for this purpose. The S3500-series SSDs are optimized for reads; the S3700-series are optimized for writes, which makes them a better choice as a SLOG device. The good, better, and best Intel selections for SLOG devices run sorta like this:

Good: Intel SSD DC S3700 Series SATA, inexpensive, 10 drive writes per day
Better: Intel SSD 750 Series PCIe, mid-level price, 70GB writes per day
Best: Intel SSD DC P3700 Series PCIe, costly, 17 drive writes per day

All three have the power protection, low latency, fast writes, and high durability you need in a SLOG device."

Or do I misunderstand your intent?

On the subject of pool design: you will want to use mirrors when providing block storage to ESXi for virtual machines. Mirrors provide the most IOPS for any given number of disks, because IOPS scale by vdevs. Your 8 SSDs configured as 4 mirrored-pair vdevs will have 4 times the IOPS of the same 8 SSDs configured as a RAIDZ2 array. If you're really worried about redundancy, you can use 3-way mirrors... but these are horribly space-inefficient (33 1/3%). And that's probably over-kill, as SSDs aren't quite as failure-prone as spinning rust.

What's more... you need to keep capacity utilization low or performance will suffer. This means you shouldn't use any more than ~50% of your pool's capacity. So, 8 x 1TB SSDs layed out as 4 mirrored pairs would give you a total maximum usable capacity of ~2TB. Given your described use-case, you wouldn't have any spare capacity for future needs. If you were to use 9 x 1TB SSDs in a 3-vdev ,3-way mirror configuration, you'd only have ~1.5TB of usable space. This is daunting, I know...

Deduplication is expensive in terms of memory requirements. Given the small total data size you describe, 64GB may be enough RAM... or it may not. Perhaps someone more experienced with deduplication can chime in about this.

There is a tremendous amount of reading matter here on the forum regarding ZIL SLOG setup, block storage (iSCSI) configuration for ESXi use, etc. The old saying "Search is your friend" applies here. Look for posts by user @jgreco; of all the forum members, he probably has the most real-world experience on these subjects.

Good luck!

Chris Kuhn · Aug 3, 2017

Spearfoot said:
Welcome to the Forums!

Just to be clear, 'ZIL' and 'SLOG' in this context are synonyms... you're talking about using a Samsung 850 Pro and/or an Intel DC P3500 as ZIL SLOG devices. These wouldn't make a good match if you plan on mirroring them; they differ too much in capabilities. Also, neither device is an optimal choice for a ZIL SLOG device. Quoting myself from another thread:

"Regarding ZIL SLOG devices: these need to have particular characteristics: power protection, low latency, fast write speed, and high durability. Though they'll 'work', your DC S35xx's aren't well-suited for this purpose. The S3500-series SSDs are optimized for reads; the S3700-series are optimized for writes, which makes them a better choice as a SLOG device. The good, better, and best Intel selections for SLOG devices run sorta like this:

Good: Intel SSD DC S3700 Series SATA, inexpensive, 10 drive writes per day

Better: Intel SSD 750 Series PCIe, mid-level price, 70GB writes per day

Best: Intel SSD DC P3700 Series PCIe, costly, 17 drive writes per day

All three have the power protection, low latency, fast writes, and high durability you need in a SLOG device."

Or do I misunderstand your intent?

On the subject of pool design: you will want to use mirrors when providing block storage to ESXi for virtual machines. Mirrors provide the most IOPS for any given number of disks, because IOPS scale by vdevs. Your 8 SSDs configured as 4 mirrored-pair vdevs will have 4 times the IOPS of the same 8 SSDs configured as a RAIDZ2 array. If you're really worried about redundancy, you can use 3-way mirrors... but these are horribly space-inefficient (33 1/3%). And that's probably over-kill, as SSDs aren't quite as failure-prone as spinning rust.

What's more... you need to keep capacity utilization low or performance will suffer. This means you shouldn't use any more than ~50% of your pool's capacity. So, 8 x 1TB SSDs layed out as 4 mirrored pairs would give you a total maximum usable capacity of ~2TB. Given your described use-case, you wouldn't have any spare capacity for future needs. If you were to use 9 x 1TB SSDs in a 3-vdev ,3-way mirror configuration, you'd only have ~1.5TB of usable space. This is daunting, I know...

Deduplication is expensive in terms of memory requirements. Given the small total data size you describe, 64GB may be enough RAM... or it may not. Perhaps someone more experienced with deduplication can chime in about this.

There is a tremendous amount of reading matter here on the forum regarding ZIL SLOG setup, block storage (iSCSI) configuration for ESXi use, etc. The old saying "Search is your friend" applies here. Look for posts by user @jgreco; of all the forum members, he probably has the most real-world experience on these subjects.

Good luck!

Thanks for the advice - I have been reading and reading, just wanted to post my build for input. Appreciate it! I switched out the Intel SSD DC to the P3700 as recommended

I also bumped the RAM to 128 GB to be on the safe side.

I bumped the SSDs up to 10 instead of 9 - What would you recommend in terms of fault tolerance while leaving 2 TB usable?

Spearfoot · Aug 3, 2017

Chris Kuhn said:
Thanks for the advice - I have been reading and reading, just wanted to post my build for input. Appreciate it! I switched out the Intel SSD DC to the P3700 as recommended

I also bumped the RAM to 128 GB to be on the safe side.

I bumped the SSDs up to 10 instead of 9 - What would you recommend in terms of fault tolerance while leaving 2 TB usable?

You're very welcome!

128GB? Sounds good... it's hard to have too much memory! :D

With 10 SSDs I'd set up a pool with 5 mirrors. That would give you 5TB of total capacity = ~2.5TB of usable space without exceeding 50% utilization.

I'm hoping someone else with dedup experience will lend us the benefit of their knowledge. There's a good possibility you don't really need to use deduplication, but I'm sure this depends on your actual usage.

Chris Kuhn · Aug 3, 2017

Spearfoot said:
You're very welcome!
I'm hoping someone else with dedup experience will lend us the benefit of their knowledge. There's a good possibility you don't really need to use deduplication, but I'm sure this depends on your actual usage.

Right now on our current NetApp system for the volume that we are looking to move, this is where we are currently at for de-dup.

jgreco · Aug 3, 2017

Spearfoot said:
You're very welcome!

128GB? Sounds good... it's hard to have too much memory! :D

With 10 SSDs I'd set up a pool with 5 mirrors. That would give you 5TB of total capacity = ~2.5TB of usable space without exceeding 50% utilization.

I'm hoping someone else with dedup experience will lend us the benefit of their knowledge. There's a good possibility you don't really need to use deduplication, but I'm sure this depends on your actual usage.

With SSD, there's still fragmentation issues to consider ("50% utilization"), but it may be somewhat less of a killer issue. It might be reasonable to go with four two-drive mirrors and leave two as spares. They can be added to the pool later as an additional vdev, but if it turns out you don't need them, then having warm spares available is very comforting.

I am not super-convinced that there is value in a dual CPU system for most filers. Supermicro also makes the 1018R-WC0R, based on the X10SRW, which pairs very nicely with the much faster E5-1650 v3 (that's what we have). The biggest downside is the lack of LRDIMM support on the E5-16xx (RDIMM only). Overall the power usage can be lower. The 2x2609's will give you 16 slow nonthreading 1.7GHz cores (27.2GHz) with 40GB of cache while the 1650v4 is 6 fast 3.6-4.0GHz cores (21.6GHz) with 15GB of cache. Tradeoffs. The WTRT would at first glance seem to have a significant advantage in terms of the onboard 10GbE, but the WC0R allows you to use the RHS HH PCIe, so you can add a card of your choosing there, which is nice since FreeNAS vaguely prefers Chelsio. I'm not suggesting either course of action here, just throwing out information.

If your data is compressible, and most VM data is, you might not have as much data as you think. Dedup works fine if you resource it properly, but you will need to have L2ARC. ZFS stores the DDT in ARC and flushes out to L2ARC table entries that aren't active. So I think your first step would be to try building a pool and putting all your data on it compressed. A janky desktop with simple HDD would be totally fine for this. Compression is better than dedup. Dedup creates additional sharp pointy edges that you are better off avoiding.

So what you might want to think about is that if you're loading up a DC P3700 and an X710, you're out of slots with the 1028R-WTRT. No good options to place an L2ARC SSD in that configuration, without burning a drive bay. However, if you went the 1018R-WC0R way, you can add an AOC-SLG3-2M2 with either one or two Samsung SM951's.

So let me ask you a different question. Why are you crushing this into 1U? It totally locks you into a bunch of bad-to-awful choices. You could do something like a CSE216BE16-R920WB with an X10SRW to get yourself five PCIe slots and up to 26(!) bays. Only downside for an SSD deployment would be the SAS expander. You could also obviously do something like a CSE216BA-R920WB with X10SRW or CSE216BA-R920LPB with X10SRL and have to figure out some cabling and HBA issues. The big thing is that this is much less limiting in terms of where do you go if it turns out that you guessed wrong and didn't get enough SSD, or want to expand in a year or two.

Chris Kuhn · Aug 3, 2017

First, I want to thank you for taking the time to provide me with so much information and guidance. You have given me a lot of food for thought! :) Even more than I already had from browsing these forums and others.

jgreco said:
With SSD, there's still fragmentation issues to consider ("50% utilization"), but it may be somewhat less of a killer issue. It might be reasonable to go with four two-drive mirrors and leave two as spares. They can be added to the pool later as an additional vdev, but if it turns out you don't need them, then having warm spares available is very comforting.

In my use case scenario, I need to have a large enough drive that I can utilize Desktop Layering (Unidesk), which splits up all of the component Applications, OS, and User data. The cachepoints of which I will need several (each consuming about 100-200 GB). All of the disks that all of the VMs are using are shared and this server will probably be backing about 150 - 200 desktops. My IOPS typically run more write heavy then read and my overall average for this on our other system is about 4500 iops, with it peaking at about 13-15k iops. With this extra information, would you still recommend the four two-drive mirrors? If I understand everything that I have read so far, splitting them into multiple vdev's will give me higher IOPS performance (correct?)

jgreco said:
I am not super-convinced that there is value in a dual CPU system for most filers. Supermicro also makes the 1018R-WC0R, based on the X10SRW, which pairs very nicely with the much faster E5-1650 v3 (that's what we have). The biggest downside is the lack of LRDIMM support on the E5-16xx (RDIMM only). Overall the power usage can be lower. The 2x2609's will give you 16 slow nonthreading 1.7GHz cores (27.2GHz) with 40GB of cache while the 1650v4 is 6 fast 3.6-4.0GHz cores (21.6GHz) with 15GB of cache. Tradeoffs. The WTRT would at first glance seem to have a significant advantage in terms of the onboard 10GbE, but the WC0R allows you to use the RHS HH PCIe, so you can add a card of your choosing there, which is nice since FreeNAS vaguely prefers Chelsio. I'm not suggesting either course of action here, just throwing out information.

I was really hemming and hawing about this as well - I am going to do some more research with all of this in mind and get back to you. Should I focus on raw CPU power or newer more efficient processors? Or does it really not matter for what I am trying to accomplish?

jgreco said:
If your data is compressible, and most VM data is, you might not have as much data as you think. Dedup works fine if you resource it properly, but you will need to have L2ARC. ZFS stores the DDT in ARC and flushes out to L2ARC table entries that aren't active. So I think your first step would be to try building a pool and putting all your data on it compressed. A janky desktop with simple HDD would be totally fine for this. Compression is better than dedup. Dedup creates additional sharp pointy edges that you are better off avoiding.

To this point - we are not currently employing compression, as we had lots of issues with compression and the desktop layering. Our NetApp device (FAS2552) is probably partly to blame as I know it doesn't have the beefiest processor. Thoughts?

jgreco said:
So let me ask you a different question. Why are you crushing this into 1U? It totally locks you into a bunch of bad-to-awful choices. You could do something like a CSE216BE16-R920WB with an X10SRW to get yourself five PCIe slots and up to 26(!) bays. Only downside for an SSD deployment would be the SAS expander. You could also obviously do something like a CSE216BA-R920WB with X10SRW or CSE216BA-R920LPB with X10SRL and have to figure out some cabling and HBA issues. The big thing is that this is much less limiting in terms of where do you go if it turns out that you guessed wrong and didn't get enough SSD, or want to expand in a year or two.

This is really a test for feasibility, and I wanted to approach these systems as "nodes" for a storage array. We have limited rack space where I am deploying this and I wanted to be as space conscious as possible. My other thought is IOPS and network related on the larger chassis - if we do end up expanding, I don't want to run into an IOPS or link saturation issues. I am fairly certain that for the next 5 years the use is going to stay about the same, as we have migrated a majority of everything else to the large cloud providers. If we end up growing it is going to be in different facilities and not at this location. In our business, it is not feasible to grow above a certain size due to employment saturation rates.

Stux · Aug 3, 2017

Much less crush in 2U than one U and you can still get the HH AIC.

I agree with the advise to think about the e5-1650. And then you can get 128GB of ram in 32GB modules, leaving 4 slots free.

I have no experience with all flash arrays, but I do wonder if mirrors, which are to avoid hd seek time, still applies as a valid criteria on an all flash array?

jgreco · Aug 4, 2017

Stux said:
Much less crush in 2U than one U and you can still get the HH AIC.

Well actually with the 2U WIO parts I suggested, you get up to *five* slots and four are full height full length. Sticking with the L/LPB gives you a variety of HH.

I agree with the advise to think about the e5-1650. And then you can get 128GB of ram in 32GB modules, leaving 4 slots free.

It isn't entirely clear. I usually make the E5-16xx argument as a point to people who are looking at two mid-tier E5-26xx's, where the extreme price jump doesn't make sense. The low-end E5-26xx's are actually fairly reasonable pricewise, but there's a price jump to go with the dual board, and a power tax as well. However, you do get a lot of memory capacity and if you go 2U then you also can get 7 slots in WIO IIRC. The additional L2 cache on the E5-26xx is "nice" but maybe not too meaningful on a filer.

I have no experience with all flash arrays, but I do wonder if mirrors, which are to avoid hd seek time, still applies as a valid criteria on an all flash array?

Mirrors are not only to increase IOPS. As far as IOPS go, yes, they're probably the most effective IOPS-increasing thing because you only involve two or three drives, so you have more vdevs, and you get 1x IOPS for write (parallel write) and up to 2x or 3x on the read side (independent read). However, it is important also to realize that RAIDZ space allocation for ZFS blocks is variable. The ZFS record size therefore becomes important.

This is a RAIDZ on a 5-drive system. This illustrates the basic problem, but this gets more complex (trending worse) with RAIDZ2 and RAIDZ3. For the first orange 32K record, it fits perfectly into the ZFS stripe size and has the "expected" 20% overhead. If you look at a single 4K record, such as the brown block, you'll notice that there's 100% overhead to store it, which isn't what a lot of people are expecting. An 8K record, the sky blue, includes a padding sector in order to prevent allocation of an odd number of sectors. But even the 16K record, the purple immediately before it, does too. So you might at first be tempted to think "well then I better set my recordsize to 32K". That works, except it doesn't, because once you turn on compression, well, data compresses.

So the thing is, you can just remember it this way: RAIDZ is best for storing large files and sucks at small files, databases, VM storage, and other similar small-record storage. Mirrors offer consistent space allocation and are better suited to this type of task, but with the obvious space penalty.

jgreco · Aug 4, 2017

Chris Kuhn said:
In my use case scenario, I need to have a large enough drive that I can utilize Desktop Layering (Unidesk), which splits up all of the component Applications, OS, and User data. The cachepoints of which I will need several (each consuming about 100-200 GB). All of the disks that all of the VMs are using are shared and this server will probably be backing about 150 - 200 desktops. My IOPS typically run more write heavy then read and my overall average for this on our other system is about 4500 iops, with it peaking at about 13-15k iops.

So ... you're saying 4500 IOPS, mostly write, no one ever says what their I/O size is so let's guess 8K, that works out to more than 1000 IOPS per second to each SSD in a hypothetical 8 SSD's-in-mirror, or 32MB/sec, or 772GB/day. The drive is rated for 300TB endurance, so, "dead in a year."

Obviously these numbers are not actually correct for your actual workload, but the napkin math says you should take a closer look just to be safe. Plus the observed endurance on these things is lots higher.

With this extra information, would you still recommend the four two-drive mirrors? If I understand everything that I have read so far, splitting them into multiple vdev's will give me higher IOPS performance (correct?)

I don't think mirrors are optional. The discussion in the previous reply will cover why. You could absolutely do RAIDZ on a large enough system that gave you sufficient vdev's to handle your load, but your 1U system definitely isn't that.

I was really hemming and hawing about this as well - I am going to do some more research with all of this in mind and get back to you. Should I focus on raw CPU power or newer more efficient processors? Or does it really not matter for what I am trying to accomplish?

It isn't clear to me.

As you may have figured out, I've seen a lot of users come in here with a big beefy system and say "oh this'll make a great home fileserver." The thing about a home user is that their needs tend towards Samba, and Samba is singlethreaded for most of the important stuff, so with your E5-2609 you're talking 1.7GHz versus my suggested E5-1650 turboing up to 4.0GHz. Most home users will never be using cores 9-16 on their second E5-2609.

I've also seen a lot of people come in here who go something like "I heard Samba's a total thread pig so I picked a dual E5-2637v4". Or the even more hilarious "How about an X10SRL and an E5-2637v4". Add up the prices for those options, then check the X10SRL plus E5-1650v4 price to understand the joke.

But for a large busy filer, the answer is less clear. There are advantages to extra PCIe lanes, which you won't be using (1U). There are advantages to extra cache, which I'm skeptical of in a fileserver environment. More non-hyperthreaded cores means lots of capacity, but you're *probably* not compute bound. (The E5-1650v3 here has done 6 gigabits per second at 85% idle with a HDD based pool.) The cost thing isn't a huge issue as the E5-2609v4's are dirt cheap, so the two solutions ballpark around the same cost for CPU+MB. The TDP numbers speak for themselves. The additional RAM slots are nice.

It isn't clear.

To this point - we are not currently employing compression, as we had lots of issues with compression and the desktop layering. Our NetApp device (FAS2552) is probably partly to blame as I know it doesn't have the beefiest processor. Thoughts?

Compression FTW. The speed at which a modern CPU can do compression exceeds the I/O speed of a typical HDD, and even for SSD I'd expect very good results. You can change compression options on the fly, and future data written will use the new selection. The compression algorithms typically use more CPU for write, with decompress-on-read being unnoticeable, so even if you choose a different algorithm later (or even disable it) you aren't hosed.

This is really a test for feasibility, and I wanted to approach these systems as "nodes" for a storage array. We have limited rack space where I am deploying this and I wanted to be as space conscious as possible. My other thought is IOPS and network related on the larger chassis - if we do end up expanding, I don't want to run into an IOPS or link saturation issues. I am fairly certain that for the next 5 years the use is going to stay about the same, as we have migrated a majority of everything else to the large cloud providers. If we end up growing it is going to be in different facilities and not at this location. In our business, it is not feasible to grow above a certain size due to employment saturation rates.

Okay, just askin', 'cuz it felt like there was a tradeoff here.

Chris Kuhn · Aug 4, 2017

Thanks for all of your feedback. I cancelled the system I was ordering and I am going back to the drawing board and contacted IX Systems as well.

Going to look into the other options.

I really appreciate everyone who has contributed on this thread!! You all have been extremely helpful and I have learned a lot.

Once I get a new system configuration to work with I will post it here. I have amended the budget for the system to 11K. So everyone will get to know what I ended up with - I will also post some system stats after we have it in production. Numbers are always helpful.

IceBoosteR · Aug 4, 2017

Hi,

I want to put my 5 cents here too.
I order to CPU usage - we got an Oracle ZFS at work - 4x Xeon E7 with 10 cores each I think. We got Infiniband connected (40GBit) and I was running tests with different compression algorithms. In short: LZJB (in FreeNAS LZ4) is the was to go, not much overhead. Everything with GZIP will use more CPU of course and only with much cores I would recommend anything higher. As you are using SSDs, you will not be bottlenecked by transfer speeds to plan carefully the needs.
With deduplication enabled, as you are using right now on your NetAPP system, you want to have strong CPUs to handle the load. Deduplication is CPU intensive, demending on the I/O coming over the network.
For example for my ZVOL at home using for ESXi I use deduplication on my E3-1225v3 with lz4 comprerssion also enabled. As compression is only using ~5% CPU, with dedup its going up to 35% in a gigabit network.
On the other hand, my second FreeNAS host, with two Xeon L5630 (4cores, 2,13Ghz) are up to 45% CPU usage, as they are older and they got more disks to handle.
My experience sofar:
- got a lot of ram
- if RAM is out use L2ARC
- if synchronous writes are performed by NFS; use a SLOG device
- think about compression and deduplication
- if you got the possibility to go with a 8 core with 3Ghz each core OR a 10 core with 2Ghz each core; go with the 8 core as some processes do perform only single threaded and even if they're multithreaded, single core performance matters in my eyes.

Also FreeNAS is deduplicating in action, when the file is accessed or written. With DataONTAP this can be sceduled e.g. at night, so the end-user will not see that CPUs are running at 100% as there comparing all the blocks (this was the fact with cDOT 8.3) ;)

Maybe this information can help you a bit with your decisions. And I would go for the 2U version, maybe you want to add something in the future, 1U is more to a closed system.

jgreco · Aug 4, 2017

Yeah, the different compression algorithms give different performance and have different system impact and space savings. However, it is worth noting that for reads, they are all generally very fast, so if you run into a problem with slow writes and a specific algorithm, even though that data is already written with the problematic algorithm, you'll probably be fine reading it back from the pool with that algorithm. Don't worry about trying to "rewrite" it.

I'm not sure what in dedup would cause such high CPU utilization.

IceBoosteR · Aug 5, 2017

jgreco said:
Yeah, the different compression algorithms give different performance and have different system impact and space savings. However, it is worth noting that for reads, they are all generally very fast, so if you run into a problem with slow writes and a specific algorithm, even though that data is already written with the problematic algorithm, you'll probably be fine reading it back from the pool with that algorithm. Don't worry about trying to "rewrite" it.

I'm not sure what in dedup would cause such high CPU utilization.

Hopefully I got not missunderstood here. For read operation, dedup does not hit CPU performance in any case. For my examples, I wrote "up to", normally its something below the mentioned percent, but peaks should be covered. Just checked my setup again for detailed graphs (E3 system) and usually we talk about 15-20%. On the other hand, the 100% CPU usage of dedup is something, which is related to the DataONTAP. Mostly at night it does a block by block comparison which really hits performance (so this is while the night is a good place to do it) and the CPU in FAS-series head are not that powerful, what I have seen. It also depends on how many disk shelves you have, and the transfer speed (FC or SAS).

With all that said, I just want to make sure that OP does not buy something to slow, as with 10/40Gbit you can see that CPU performance got into place. Compression with LZ4 can be used in any way, that fine - but with GZIP-5 for example its defenetly worth mentioning that this can slow down the whole array.

Important Announcement for the TrueNAS Community.

Build for ESXi - All SSDs - About 9K

Chris Kuhn

Cadet

Spearfoot

He of the long foot

Chris Kuhn

Cadet

Spearfoot

He of the long foot

Chris Kuhn

Cadet

jgreco

Resident Grinch

Chris Kuhn

Cadet

Stux

MVP

jgreco

Resident Grinch

jgreco

Resident Grinch

Chris Kuhn

Cadet

IceBoosteR

Guru

jgreco

Resident Grinch

IceBoosteR

Guru

Similar threads

Important Announcement for the TrueNAS Community.

Build for ESXi - All SSDs - About 9K

Cadet

He of the long foot

Cadet

He of the long foot

Cadet

Resident Grinch

Cadet

MVP

Resident Grinch

Resident Grinch

Cadet

Guru

Resident Grinch

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Build for ESXi - All SSDs - About 9K"

Similar threads