Example recommended builds?

Status
Not open for further replies.

Ron Watkins

Dabbler
Joined
Oct 27, 2016
Messages
13
Is there a document showing recommended builds?
Im looking to put together an array which can serve approx 2GByte/sec or possibly 2 arrays with 1GByte/sec each to a pair of 16Gb FC switches.
Host application needs approx 2GByte/sec for running simulations.
The sims need about 120Tb and will be running for approx 6 months to complete a single sim. Thus, reliability is important.
Hosts are HP DL380 Gen9 boxes with 2x E5 2690 v4 processors and 192GB ram.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Hardware thread. It's in my signature.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Have a look through the threads in my sig. They are sort of a collection of what I deem the most important readings of the forum to get a good grasp about hardware/setting up a FreeNAS box.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The sims need about 120Tb and will be running for approx 6 months to complete a single sim.
You need 6 months uninterrupted uptime? If so, you need to call iXsystems and ask for a TrueNAS quote.
 

Ron Watkins

Dabbler
Joined
Oct 27, 2016
Messages
13
You need 6 months uninterrupted uptime? If so, you need to call iXsystems and ask for a TrueNAS quote.
6 months doesn't seem like any big deal. Most Linux systems stay online for years before needing any reboots. That's not the hard part here, getting the 2GByte/sec seems to be the tricky part.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
6 months doesn't seem like any big deal.
It's more a question of unforeseen circumstances, like a random motherboard failure. Everything fails, and if a failure means you lose five and a half months of work (I'm guessing your environment isn't that sensitive, but you never know), that's a pretty crappy day, which could've been avoided with High Availability.

If you're feeling lucky, make sure to burn in the hardware for very long.

getting the 2GByte/sec seems to be the tricky part.
Eeh, that's the part you can control by throwing hardware at the problem. FC makes things more complicated (it's not very popular with FreeNAS/TrueNAS), but 10GbE or 40GbE would both be viable, well-known solutions.

You'd probably be looking at triple-mirror vdevs for this, possibly SSDs. Need more performance, just add more vdevs (and make sure you stay below 50% full, otherwise IOPS will drop - a lot).

I assume this would need SLOG, which means you need it in a form factor that can be replaced with the server running - that means a U.2 backplane for an Intel P3700 or two.

The biggest problem are the 120TB... Just what kind of workload is the storage going to see? Because if you need SSD speeds from all 120TB, your life is not going to be pleasant. If accesses will focus around a sort of sliding window, with lots of repeated accesses, ARC and L2ARC can significantly speed up reads.
 

Ron Watkins

Dabbler
Joined
Oct 27, 2016
Messages
13
Our thoughts are that the FreeNAS would run on server grade HP DL380 G9 or DL560 G9 boxes.
Theu use redundant SAS ports to the storage trays. Ive never had an HP board go south in over 7 years of working with them. They are pretty fault tolerant and allow for hot swap of many parts, but not (obviously) the cpu, ram or motherboard.
The thought is that we were going to run the client side simulation software under ESXi, which allows us to tune the performance dynamically between VMs. The "arrays" would be using HW raid so that I can keep the workload off the CPUs and also allow for dynamic rebuilds for failed drives.
We are not planning to use SSD, rather are looking at the 6Tb or 8Tb SAS 7.2k enterprise drives from Seagate. From the description, each drive should be able to support 100MB/s, so a Raid-5 7+1 should achieve at least 500 MB/s. Using 4 raid groups on seperate storage trays allows us to have the 2GB/s throughput target as well as being cost effective. We are still thinking about using Raid-6 instead, to get raid rebuilds without the headache of a second drive failure from the rebuild workload taking the array down.
 
Joined
Feb 2, 2016
Messages
574
The raw performance required is certainly available through FreeNAS given appropriate hardware.

What is your working data set size, Ron?

I understand your total data set is 120TB but how much of that is touched inside, say, a four-hour period? Is your data set primarily random or are you reading and writing large streams? I'm trying to guess out how much of your data is going to come out of cache as opposed to having to be read from storage.

Fibre Channel instead of Ethernet makes me a bit nervous given your uptime requirements. Ethernet is much better supported and tested. When you say six months to complete a simulation, certainly it can be interrupted? If interrupted unceremoniously, what are the ramifications?

SSDs are sweet but, with enough spindles, conventional disks should be able to provide the throughput required at a much lower price point. (If your storage were five times as faster, could you finish in a third the time and make twice the money? If so, SSDs are delicious.)

How are you accessing the data? NFS? CIFS? iSCSI?

Do you need snapshots and replication?

Cheers,
Matt
 

Ron Watkins

Dabbler
Joined
Oct 27, 2016
Messages
13
The simulator works mostly in RAM, but will be reading through the entire dataset and re-writing the entire dataset each iteration. As the dataset doesn't fit in RAM, the reads and writes overlap as it moves in a portion of the data and recomputes then writes it back out again, moving through the dataset as a whole.
The 100TB is actually more like 120TB. It ramps up to about 50% of that size pretty quickly, within a few days, then will be reading/writing that for the duration of the simulation. At this point, im estimating about 6 months, but depending on the disk througput, it may be faster or slower. This is the reason for the high throughput requirement.
I could run this on a crappy storage system, but then it would take 1-2 years to complete the simulation. If I can get around 2GB/s thenI should be closer to the 6 month estimate.
Naturally, if I could get hold of a 100TB RAM, then this wouldn't be an issue and I could probably finish the simulation in a few days. The speed of the I/O (in MB/s) will be the major factor in the runtime of the simulation.
Ive already run this simulation on smaller datasets to get a feel for how it works. Basically it has a cache built into it, so the reads and writes are not synchronized. It will fill the buffer with a large chunk of data and allow the cpu to recompute, which will mark the data a dirty and force it to get flushed back to disk.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The "arrays" would be using HW raid
That's not going to work with FreeNAS.
so that I can keep the workload off the CPUs
What else are they there for? The simulation is running elsewhere.
and also allow for dynamic rebuilds for failed drives.
What on earth is a "dynamic rebuild"? Whatever it is, ZFS does it better.
From the description, each drive should be able to support 100MB/s, so a Raid-5 7+1 should achieve at least 500 MB/s.
That's an excessively simplistic assumption. IOPS are probably going to be a much bigger issue.
 

Ron Watkins

Dabbler
Joined
Oct 27, 2016
Messages
13
I wanted to use HW raid to provide the reliability factor and ability to keep the lun active while the simulation is active even with a failed drive. Ive testing using software raid on linux (md) and found that I could easily saturate the cpu's and slow-down.
As I understand Raid-Z/ZFS, it also provides some protection, and if I layer that on top of HW raid then it should be even more reliable.
After all, how would the host know the difference between a physical disk and a LUN. both come out of the controller the same way, just one is more reliable than the other. If I totally rely on ZFS and have any issues, im stuck. However, if I use both HW raid under ZFS then either could fail and the other would protect me.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
As I understand Raid-Z/ZFS, it also provides some protection
Not "some" protection, all protection that cab be reasonably provided.
and if I layer that on top of HW raid then it should be even more reliable.
No, the opposite is true.

ZFS is designed from top to bottom to interface as directly as possible with disks. That means absolutely no RAID controller. Off the top of my head, here's a sampling of everything that will go wrong, otherwise:
  • Conflicting, uncoordinated caching mechanisms will absolutely destroy performance.
  • Errors detected by ZFS are impossible to correct because ZFS doesn't have any redundancy to work from.
  • Should the array stall for whatever reason, ZFS will not be happy and, at the very least, make the stall longer. Using ZFS properly avoids this scenario because ZFS will simply drop any offending drives and carry on normally.
  • FreeNAS has exactly zero facilities for dealing with HW RAID controllers and whatever stupid software they might need. Disk failure notifications? Nope, you'd better hope someone notices the red light that the HW RAID controller hopefully turned on.
After all, how would the host know the difference between a physical disk and a LUN. both come out of the controller the same way
Wrong, a physical disk has SMART data, for one. Good luck monitoring the drives without proprietary crap.
If I totally rely on ZFS and have any issues, im stuck.
ZFS doesn't randomly have issues. It's an immensely-tested filesystem.
However, if I use both HW raid under ZFS then either could fail and the other would protect me.
Wrong, either fails and you're screwed. You just added points of failure and complexity to the system, with no advantage.
 
Joined
Feb 2, 2016
Messages
574
each drive should be able to support 100MB/s, so a Raid-5 7+1should achieve at least 500 MB/s.

Seems optimistic. At 80% reads, it might be seeing 500 MB/s. As soon as you get into a 50:50 mix of reads and writes, it'll be more along the lines of 320 MB/s according to an online RAID calculator.

I've laid out the disks a half dozen different ways and the only conclusion I've come to is you're going to need a lot of spindles. I'd go with 12 RAIDZ1 groups of six drives each. With that many drives, my back-of-a-napkin math says you'll meet your throughput requirements and your space requirements (with 3TB or 4TB drives). IOPs won't be great - under 2,000 - but you haven't listed that as a requirement.

Also of note, that's raw disk speed in the best case scenario. Protocol overheads and reality are going to slow it further. How are you attaching your processor nodes to FreeNAS? NFS? CIFS? iSCSI?

Cheers,
Matt
 

Ron Watkins

Dabbler
Joined
Oct 27, 2016
Messages
13
The freenas node(s) will be connected via dual 8Gb or 16Gb ports. If the numbers are as bad as you suggest, a pair of 8Gb ports per freenas node will be more than sufficient as each will yeild only 800 MByte/sec max. Do you really think that if I use 6 boxes, each with 2 sets of RAIDZ1 SSD drives that each will only pump out around 2GByte/sec total accross all 6? That seems odd and quite a bit lower than the manufacturer suggests. They seem to think that you can push 500 MB/sec per ssd.
Also, im not that familiar with SuperMicro, but our HP rep said they are pretty bad compared to HP (naturally, since he's the HP rep).
He suggested using a pair of HP DL380 Gen9 boxes with dual e5-2690 v4 chips and 128GB of ram, each with 24 of the 3.82TB SSD enterprise drives.

Im assuming HP boxes are supported along with the QLE 2562 controllers?
 

Simon Sparks

Explorer
Joined
May 24, 2016
Messages
57
24 of the 3.82TB SSD enterprise drives will NOT meet your storage requirements unless you are going to be using compression and deduplication on FreeNAS which will need HBAs and direct disk access for it to work correctly NOT a RAID card NEVER a RAID card
 

Simon Sparks

Explorer
Joined
May 24, 2016
Messages
57
1 x 8 Gigabits per second fiber channel port after the 8b/10b encoding provides roughly 800 Megabytes per second
1 x 16 Gigabits per second fiber channel port after the 64b/66b encoding provides roughly 1600 Megabytes per second

Should you loose one of your 8 Gigabit per second fiber channel connections you will have destroyed your predicted timescale for the simulation to complete.

It is always best to over provision the links in case of failure because you should be designing for High Availability NOT Load Balancing.
 
Status
Not open for further replies.
Top