First large build - sanity check

Status
Not open for further replies.

beezel

Dabbler
Joined
Nov 7, 2016
Messages
33
I've got a bit of experience in the smaller ZFS range (we run five 8 disk supermicros), but it's time to grow and build a larger/faster system. Workload will be iSCSI to ESXi, so sync writes all day long. We currently have about 75 vms with growth planned, and would really like to increase performance above all else (though more capacity is also welcome).

My current thoughts are building this:
System: SuperMicro SSG-6049P-E1CR24H https://www.supermicro.com/products/system/4U/6049/SSG-6049P-E1CR24H.cfm
CPU: 2x Intel® Xeon® Bronze 3104 Processor 6-core 1.70GHz 8.25MB Cache (85W) (not married to this, it's just the cheapest option. Not really sure how to gauge what cpu I need, would love feedback)
Controller: Board ships with 1 Broadcom 3108 AOC for 8 SAS ports. Is this a decent card to run in JBOD for ZFS? I have little experience here.
RAM: 12x 16GB for 192GB ECC
NIC: 1x Intel x710-da4 for 10gbE (+onboard 1gb for management/failover)

Disk pool: 8x 4TB Hitatchi SAS in 1 pool, striped and mirrored for 16TB, probably ~14TB usable?
SSD Pool: 4x1.6TB Intel S3520 in striped mirror for 3.2TB or ~3TB usable?
L2ARC: 1x 960GB intel dc s3520 (came to this via 192GB ram x5 for l2arc size based off https://forums.freenas.org/index.php?threads/formula-for-size-of-l2arc-needed.17947/#post-97362) would only apply l2arc to spinny pool
ZIL: 2x 150GB intel s3520, one per pool.

I *think* I've got the pools designed and sized right for what we need, but would really love any input on the chasis/CPU/SAS controller. I would not mind if there was a way to get a 12 disk rust pool, something like striping two 6disk raidz2. I'm trying to build as much resilience as possible, hence the s3520s all around. Any better options, or should I cheap out a bit on the SSD pool and let ZFS handle the resilience? I could also do m2. PCIe Samsung 960 Pros for ZIL or l2arc (though I dont think they have super caps?)

Obviously I have some idea what's going on, but am not 100% confident. Any help is appreciated!
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Geez, aren't we getting a bit ahead of ourselves? Intel's new platform has been out for a day and you want to jump in head first? Sure, early adopters are nice, but your application hardly seems to need the new and shiny.

Controller: Board ships with 1 Broadcom 3108 AOC for 8 SAS ports. Is this a decent card to run in JBOD for ZFS? I have little experience here.
Usable, but hardly ideal. An SAS3008-based controller is a much better idea (and cheaper).

Disk pool: 8x 4TB Hitatchi SAS in 1 pool, striped and mirrored for 16TB, probably ~14TB usable?
SSD Pool: 4x1.6TB Intel S3520 in striped mirror for 3.2TB or ~3TB usable?
https://forums.freenas.org/index.php?resources/zfs-raid-size-and-reliability-calculator.49/

L2ARC: 1x 960GB intel dc s3520
Decent, I guess.

ZIL: 2x 150GB intel s3520, one per pool.
Nope, for serious applications, the P3700 is the way to go.
 

beezel

Dabbler
Joined
Nov 7, 2016
Messages
33
Geez, aren't we getting a bit ahead of ourselves? Intel's new platform has been out for a day and you want to jump in head first? Sure, early adopters are nice, but your application hardly seems to need the new and shiny.


Usable, but hardly ideal. An SAS3008-based controller is a much better idea (and cheaper).


https://forums.freenas.org/index.php?resources/zfs-raid-size-and-reliability-calculator.49/


Decent, I guess.


Nope, for serious applications, the P3700 is the way to go.


Thanks for the reply, I suppose. I did not realize that this came out 'yesterday.' We're starting to budget so I hit supermicro.com and they have them for sale. Obviously (since I am here, not at supermicro.com with my CC pulled out) I came here for advice.

Regarding the 3008 vs 3108, why the preference for an older model? Also, the supermicro comes with a SAS expander preconfigured - are those still to be avoided? Would I be better off going more DIY with 2x 3008s/3108s vs 1+expander? Supermicros site is awful, and the Freenas Hardware guide is almost a full year out of date at this point (oct. '16).

I was hoping for a bit more help or insight other than 'guesses' and unfounded preference for older versions.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Regarding the 3008 vs 3108, why the preference for an older model?
It's not older, it's the same family. The 3008 does IT mode with no RAID cruft. The 3108 doesn't, though it supports direct-attach drives using the mrsas driver, which should be the default these days (instead of the older, crap mfi driver).
IT mode has at least one order of magnitude more hours of experience behind it than running a RAID controller with direct-attach disks, so there's that, too.

Also, the supermicro comes with a SAS expander preconfigured - are those still to be avoided?
Avoided? No, certainly not when used with SAS3 HBAs.

Freenas Hardware guide is almost a full year out of date at this point (oct. '16).
The title is
Hardware Recommendations Guide Rev 1e) 2017-05-06
and the guide is most certainly not out of date. Even if it was missing new but reasonably well-known hardware, the general information in it would still be valid.

I was hoping for a bit more help or insight other than 'guesses' and unfounded preference for older versions.
If you want help, ask questions and drop the snark. If you ask for help, the least you can do is respect what is being said.
 

beezel

Dabbler
Joined
Nov 7, 2016
Messages
33
It's not older, it's the same family. The 3008 does IT mode with no RAID cruft. The 3108 doesn't, though it supports direct-attach drives using the mrsas driver, which should be the default these days (instead of the older, crap mfi driver).
IT mode has at least one order of magnitude more hours of experience behind it than running a RAID controller with direct-attach disks, so there's that, too.


Avoided? No, certainly not when used with SAS3 HBAs.


The title is
and the guide is most certainly not out of date. Even if it was missing new but reasonably well-known hardware, the general information in it would still be valid.


If you want help, ask questions and drop the snark. If you ask for help, the least you can do is respect what is being said.

Before warning me to 'drop the snark,' please re-read your first reply. This forum is notorious for being caustic and unwelcoming to newcomers, and seems to live up to it with a roll of the dice. If my questions do not intrigue you, please, feel free to not reply. I come here for expertise and friendly correspondence. Having the first reply immediately start with sarcasm and a lack of specifics while making me 'feel stupid' does not help. Obviously I need help - that is the entire reason for this forum, right? What I don't need is someone saying "Geez, aren't you getting ahead of yourself?" "Decent, I guess." What level of discourse is that? I am not sure how I am supposed to "respect what is being said" with content such as that. Maybe it passes at the water cooler, or for generic discussion, but I have specific questions about a specific build.

The information regarding 3108 vs 3008 is helpful, thank you. SAS expanders seem to be avoided with many fast SSDs, but I do not know exactly what this means in practice. Obviously the more expansion done, the less bandwidth available per device. "The picture changes for SSD, and expanders may not be a good idea for use with large numbers of SSD's if you are expecting high throughput." (https://forums.freenas.org/index.ph...-sas-sy-a-primer-on-basic-sas-and-sata.26145/). I am hopeful someone has a bit of real world experience and guidelines with this. 8 or 12 SAS spinners and 7 SSDs seems (to a complete laymen) like it would saturate the bandwidth available to 1 8 port HBA.

And yes, the guide was updated recently (minor revision). I have read it fully, many times, including the (also outdated) link to SAS expanders. Its general information is very good and very helpful, but does not answer specifics (especially with new SAS3 + much faster SSDs, which is where a lot of my questions lie).

The 3700 does look like a much better solution - not sure why my google-fu came up with the 3520 first.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I've got a bit of experience in the smaller ZFS range (we run five 8 disk supermicros), but it's time to grow and build a larger/faster system. Workload will be iSCSI to ESXi, so sync writes all day long. We currently have about 75 vms with growth planned, and would really like to increase performance above all else (though more capacity is also welcome).

My current thoughts are building this:
System: SuperMicro SSG-6049P-E1CR24H https://www.supermicro.com/products/system/4U/6049/SSG-6049P-E1CR24H.cfm
CPU: 2x Intel® Xeon® Bronze 3104 Processor 6-core 1.70GHz 8.25MB Cache (85W) (not married to this, it's just the cheapest option. Not really sure how to gauge what cpu I need, would love feedback)
Controller: Board ships with 1 Broadcom 3108 AOC for 8 SAS ports. Is this a decent card to run in JBOD for ZFS? I have little experience here.
RAM: 12x 16GB for 192GB ECC
NIC: 1x Intel x710-da4 for 10gbE (+onboard 1gb for management/failover)

Disk pool: 8x 4TB Hitatchi SAS in 1 pool, striped and mirrored for 16TB, probably ~14TB usable?
SSD Pool: 4x1.6TB Intel S3520 in striped mirror for 3.2TB or ~3TB usable?
L2ARC: 1x 960GB intel dc s3520 (came to this via 192GB ram x5 for l2arc size based off https://forums.freenas.org/index.php?threads/formula-for-size-of-l2arc-needed.17947/#post-97362) would only apply l2arc to spinny pool
ZIL: 2x 150GB intel s3520, one per pool.

I *think* I've got the pools designed and sized right for what we need, but would really love any input on the chasis/CPU/SAS controller. I would not mind if there was a way to get a 12 disk rust pool, something like striping two 6disk raidz2. I'm trying to build as much resilience as possible, hence the s3520s all around. Any better options, or should I cheap out a bit on the SSD pool and let ZFS handle the resilience? I could also do m2. PCIe Samsung 960 Pros for ZIL or l2arc (though I dont think they have super caps?)

Obviously I have some idea what's going on, but am not 100% confident. Any help is appreciated!
Do you have a good estimate of the space you'll need for VMs?

The reason I ask is because best practice in designing block storage for virtualization is to plan on using no more than 50% of the storage pool capacity. So your disk pool would only be good for ~8TB of capacity and the SSD pool for ~1.6TB. Shocking, I know...

Also, you may want to consider a better ZIL SLOG device than the Intel 3520 SSD series. These are read-optimized, where you want a write-optimized device for SLOG. A better choice would be something from the Intel P3700 line.

Good luck!
 

beezel

Dabbler
Joined
Nov 7, 2016
Messages
33
Do you have a good estimate of the space you'll need for VMs?

The reason I ask is because best practice in designing block storage for virtualization is to plan on using no more than 50% of the storage pool capacity. So your disk pool would only be good for ~8TB of capacity and the SSD pool for ~1.6TB. Shocking, I know...

Also, you may want to consider a better ZIL SLOG device than the Intel 3520 SSD series. These are read-optimized, where you want a write-optimized device for SLOG. A better choice would be something from the Intel P3700 line.

Good luck!

Thanks Spearfoot. My original designs (our older FreeNAS boxes) did not account for the 50% pool capacity scenario, and I discovered it 'the hard way.' We currently have 15TB of VMs spread across 4 supermicros, with probably 5TB of that being rather 'cold' data that is very rarely accessed. My goal was to bring about 4TB of that over as tier2 (rather snappy) storage, and about 1TB over to the SSD pool (for our DBs). So I've sort of accounted for that, but not left as much room for future growth as I'd hoped. I might bump up the spinners to 6TBs and I plan on leaving room for vdev additions to the SSD pool (as well as empty bays). We don't really have the budget to go wild with the SSDs right now, but adding a vdev in 6mo or a year seems much more likely.

I've been reading recommendations for NVMe ZIL, but the cost and sizing (can't seem to find one smaller than 400GB) seems prohibitive. Any experience with that, or with loading a single HBA with as many fast SSDs as I plan?

Thanks!
 

beezel

Dabbler
Joined
Nov 7, 2016
Messages
33
The x5 is no longer up-to-date, with the freeBSD 11 base now in freeNAS you can get away with x8 or more L2ARC.
Awesome, thank you very much. Looks like the next step up in size is 1.6TB anyway, so that fits perfectly with the 1.536TB needed.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Nice thing with mirrors is you can expand/grow them.

You could use cheaper Samsung drives for the SSD pool.

Definitely use 32GB RDIMMs (half the amount) rather than 16GB DIMMs, otherwise you halve your ram capacity.

I'd stick with tried/tested broadwell rather than skylake-sp at the moment.

PCI NVMe from intel for zil.
 

beezel

Dabbler
Joined
Nov 7, 2016
Messages
33
Nice thing with mirrors is you can expand/grow them.

You could use cheaper Samsung drives for the SSD pool.

Definitely use 32GB RDIMMs (half the amount) rather than 16GB DIMMs, otherwise you halve your ram capacity.

I'd stick with tried/tested broadwell rather than skylake-sp at the moment.

PCI NVMe from intel for zil.

What are your thoughts on putting both ZILs on one device? I have trouble buying 2 400gb NVMes @ 350$ (intel 750) each for ZIL, knowing they'll use just a tiny fraction of it and I could have gotten the 100gb SATA 3700 for 140$ each. Am I missing a product somewhere? Is the latency difference that much worth it? I understand that if budget was no concern it is the best bet, but I'd have to sacrifice 2 PCIe lanes and $400+ difference in price.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
What are your thoughts on putting both ZILs on one device? I have trouble buying 2 400gb NVMes @ 350$ (intel 750) each for ZIL, knowing they'll use just a tiny fraction of it and I could have gotten the 100gb SATA 3700 for 140$ each. Am I missing a product somewhere? Is the latency difference that much worth it? I understand that if budget was no concern it is the best bet, but I'd have to sacrifice 2 PCIe lanes and $400+ difference in price.
Well based on what you stated in your first post...
so sync writes all day long. We currently have about 75 vms with growth planned, and would really like to increase performance above all else (though more capacity is also welcome).
You need to be concerned most with endurance and latency. So you should be looking at the nvme Intel P series. Don't use one device for both pools you will end up with sync que competition... Not good. Performance above all else comes with a price $$
 

beezel

Dabbler
Joined
Nov 7, 2016
Messages
33
Well based on what you stated in your first post...

You need to be concerned most with endurance and latency. So you should be looking at the nvme Intel P series. Don't use one device for both pools you will end up with sync que competition... Not good. Performance above all else comes with a price $$
Thanks. Just trying to work on justification. What about the 750 vs p3700? They are a boatload less expensive, and are not that much worse of a performer.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Thanks. Just trying to work on justification. What about the 750 vs p3700? They are a boatload less expensive, and are not that much worse of a performer.
The difference is the endurance, the latency is comparable but the P series has an order of magnitude better endurance.

Edit: the 400GB 750 is rated for 70GB/day drive writes. The 400GB P3700 is rated for 4000GB/day drive writes.

How busy are your VMs, are you happy to spend less now knowing you'll probably be replacing the 750s at some point?
 
Last edited:

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Thanks. Just trying to work on justification. What about the 750 vs p3700? They are a boatload less expensive, and are not that much worse of a performer.
The 750 will work. On another thread I posted this:

"Regarding ZIL SLOG devices: these need to have particular characteristics: power protection, low latency, fast write speed, and high durability. Though they'll 'work', your DC S3520s aren't well-suited for this purpose. The S3500-series SSDs are optimized for reads; the S3700-series are optimized for writes, which makes them a better choice as a SLOG device. The good, better, and best Intel selections for SLOG devices run sorta like this:
All three have the power protection, low latency, fast writes, and high durability you need in a SLOG device."

To be honest... "Besterer" would be a ZeusRAM device. But they're crazy expensive.

You can get by with the DC S3700 SSD series -- I do (see 'my systems' below) -- though they don't have the performance of the PCIe-connected units. The larger-capacity S3700's are faster than the smaller-capacity versions, so try to get the largest you can afford if you decide to go this route.

How well the more entry-level devices will work for you depends on how write-intensive your VMs are.

Here's a good article at STH on the subject: "Top Picks for FreeNAS ZIL/SLOG Drives".

Here's another related article from iXsystems that covers calculating the ZIL SLOG minimum size: "To SLOG or not to SLOG: How to best configure your ZFS Intent Log". Turns out you only need ~6.25GB for a 10GB connection like yours, so you can over-provision the SLOG device to something like 8-16GB. I run 10G here and I always over-provision my ZIL SLOG devices to a size of 8GB, following these instructions at Thomas-Krenn: "SSD Over-provisioning using hdparm"

You're installing enough RAM that you may not need an L2ARC cache drive. But, again, it depends on your workload.
 
Status
Not open for further replies.
Top