BUILD What can I build for 150k?

Status
Not open for further replies.

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
I've built 4 36-bay Supermicro FreeNAS 6 x 6 RAIDZ2 boxes over the years for tier3 storage. Raw performance wasn't a primary design consideration, but now we're looking for a storage solution for another DR site. Preliminary pricing the from commercial storage vendors would eat up most of the DR budget.

Here's the criteria:
200 TB useable (ideal case but could be less to start)
Fast enough to run a production environment on (mixed bag of AD, Exchange, about a dozen SQL servers, 100 terminal servers, and maybe another 100-200 VMs of different types.
Storage will most likely be presented via iSCSI.

I'm assuming that spinning disks cannot be made performant enough, even with mirrored pairs, ZIL/L2ARC, lots of memory, etc. Is this a safe assumption?

The Supermicro E1CR48L looks interesting. With 4 TB SSDs I'd get 192 TB raw and realistically 70 TB useable. I'd probably be looking at 110k or so for that, plus maybe 15-20k for a replication target box with 3.5" drives.

Is there a better solution out there that would get me more space that would run at or near Nimble speeds for maybe 130k all-in?
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Fast enough to run a production environment on (mixed bag of AD, Exchange, about a dozen SQL servers, 100 terminal servers, and maybe another 100-200 VMs of different types.
Storage will most likely be presented via iSCSI.
What are your peak IOPS requirements? Can you determine this from your current environment or are you already IO bound?
I'm assuming that spinning disks cannot be made performant enough, even with mirrored pairs, ZIL/L2ARC, lots of memory, etc. Is this a safe assumption?
That all depends on your working set size and IOPS requirements. See my first question.

Let do some real capacity planning...
 

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
What are your peak IOPS requirements? Can you determine this from your current environment or are you already IO bound?
That all depends on your working set size and IOPS requirements. See my first question.

Let do some real capacity planning...
Our main production SAN is currently at 13k IOPs with a peak of 30k IOPs within the last 24 hours.
I don't see anything exceeding that going back 30 days.
Our normal average appears to be 15k-20k IOPs.
Our production VMware environment has about 350 VMs but we could probably get by with 250 or so in a DR situation.
We do have some direct-attach iSCSI storage for a few physical boxes, but 98% of the storage is consumed by VMware.
We're currently at about 150 TB active, in-use storage on our production SAN.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
So this sounds like a hot DR site for primary services. Assuming you are not IOP bound and you don't have any micro bursting (you would never see this just looking at IO, you would need high res queue monitoring) A target of 37k-40k IOPS should be reasonable. The actual design however will depend largely on your working set. What we would need to know is how much unique data is read and appended per day. This is not the best way to get the working set but it's a start that will give us a baseline to work with to guess the size of L2ARC and ARC. Once we know more about the caching, we can size RAM.
 
Last edited by a moderator:

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
I don't believe we are IO bound, since the latency stays under 3 ms with rare exceptions, and peak load doesn't exceed 5 ms from what I can see. This is planned to be a hot DR site as you surmised. We will not be running every VM that we have but I do not have the exact numbers yet regarding what is mission critical and what is not. I would guess we could get by on 1/2 to 2/3 of the current production workload.

I attached the config so far. I've used Chelsio 520 NICs in the past but from what I can tell the Intel XL710 NICs have decent support now. And I'm assuming ZIL/L2ARC is a waste of time...

The quote doesn't specifically state it but I believe this is the chassis: http://www.supermicro.com/products/system/2U/2028/SSG-2028R-NR48N.cfm

Addendum: Current reads/writes.

Average read (MiB/s)
SEQ 317
RAND 132

Average write (MiB/s)
SEQ 31
RAND 51
 

Attachments

  • nvme_hotness-040518.png
    nvme_hotness-040518.png
    15.5 KB · Views: 247
Last edited by a moderator:

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
And I'm assuming ZIL/L2ARC is a waste of time
You will always have a ZIL with ZFS its just a matter of where the ZIL will be. You must be thinking of the SLOG (Separate intent LOG) and you may find you still want one. Perhaps Optaine? Something like the Intel SSD DC P4800X with 500,000 random write IOPS.

I see you listed 48 4TB NVMe (the stuff of dreams) but that will only give you 192TB raw space. How do you plan to layout your vdevs? Normally I would suggest 2 drive mirrors providing N/2 space but with all flash you could probably get away with RAIDZ1 and 4 drive vdevs so.. about 144TB usable before compression. Just have a spare on hand.
 

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
I'm also looking at a second system with spinning media, which will be a replication target for the critical stuff on the nvme box and which will also host lower-resource VMs.

This also has the XL710 4-port NIC like the current proposed NVMe box. I'm going to trawl through the forums a bit more on this to verify that this card is on par with the Chelsios I've used in the past. It does seem that the drivers have been improved in FreeNAS 11.

I wanted NVMe cards for ZIL/L2ARC but all the vendor has in stock right now is 2 TB cards. Obviously 2 TB for ZIL is not ideal, so I'd like to partition these drives to provide 100 GB each for ZIL and the rest for L2ARC. I remember reading that while it wasn't possible to do this in the GUI but you could partition drives from the commandline and they would be picked up by the GUI. Is this still the case for FreeNAS 11? I tried to search but I'm getting empty pages currently.
 

Attachments

  • e1cr36l.png
    e1cr36l.png
    16.4 KB · Views: 224

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
You will always have a ZIL with ZFS its just a matter of where the ZIL will be. You must be thinking of the SLOG (Separate intent LOG) and you may find you still want one. Perhaps Optaine? Something like the Intel SSD DC P4800X with 500,000 random write IOPS.

I see you listed 48 4TB NVMe (the stuff of dreams) but that will only give you 192TB raw space. How do you plan to layout your vdevs? Normally I would suggest 2 drive mirrors providing N/2 space but with all flash you could probably get away with RAIDZ1 and 4 drive vdevs so.. about 144TB usable before compression. Just have a spare on hand.
Yes, sorry, I know there's always a ZIL and I did mean a dedicated SLOG device. The P4501 NVMe drives are rated for 360,000/46,000 IOPS and it's not obvious to me that an external SLOG would be of measurable benefit.

WRT layout, I was thinking of doing mirrored VDEVs both for the NVMe box and the e1cr36l replication target listed above. This is so the replication target will be able to run lower resource VMs acceptably.

I don't trust RAIDZ1 particularly, though mirrors aren't very redundant either. If I had more budget I would use 3-way mirrors.
 

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
I wanted NVMe cards for ZIL/L2ARC but all the vendor has in stock right now is 2 TB cards. Obviously 2 TB for ZIL is not ideal, so I'd like to partition these drives to provide 100 GB each for ZIL and the rest for L2ARC. I remember reading that while it wasn't possible to do this in the GUI but you could partition drives from the commandline and they would be picked up by the GUI. Is this still the case for FreeNAS 11? I tried to search but I'm getting empty pages currently.
It looks to me like partitioning devices for SLOG/L2ARC is really frowned upon. I will research this more. The project will be delayed if I go with lower capacity NVMe cards because the 2TB cards are all the vendor has in stock currently.
 
Status
Not open for further replies.
Top