BUILD New Highend build with FreeNAS

Status
Not open for further replies.

notjoe

Explorer
Joined
Nov 25, 2015
Messages
63
I am going to be building a large storage array for the compay and I wanted to get some advice from you guys as I will be using FreeNAS to power the build.

The main requirement for me is throughput of the data, both with reads and write, and making sure that it is rock solid build and fully redundant.

I am considering two systems with identical specs. The specs I'll be listing are per server. I plan on using ZFS zend/receive to keep these machines in sync.

The platform I've picked is the supermicro chassis. Here is a link to it: https://www.supermicro.com/products/system/4U/6048/SSG-6048R-E1CR60L.cfm
Specs of what will be in each server:

Memory: 256GB Memory of DDR4 2400MHz ECC Reg
CPU: INTEL XEON 6 CORES 1.7Ghz 15M
Storage Drives: 30x6TB Drives (every 2x6TB Drives will be paired for raid 1, each pair will create the overall storage array)
ZIL Drives: 2x1TB MLC SSD Drives (Mirrored)
L2ARC Drives: 2x1TB MLC SSD Drives
HDD Connectivity: 2 HBA LSI 3008 in IT mode (JBOD)


Networking: 1 NIC w/2x10GB Ports.

Is there anything I might be missing? Anything I should be concerned about or what about anything that you guys might change for the build?

Thanks in Advance!
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Using 1TB ZIL is way over kill. Plus, you want something that has both power loss
protection and high numbers of writes per day. The actual size is based on some
formula which goes something like, (don't quote me :), 2 x maximum throughput
for 5 seconds. Also, the need of such is highly application specific. Meaning some
uses don't benefit from one.

It's better to maximize RAM, than add a L2ARC, since some RAM is used for the
directory of the L2ARC anyway.

The term you are looking for in regards to using a pair of disks is Mirrored vDev.
ZFS uses the term vDev, (virtual Device), for sub-units of a pool. A pool made up
of Mirrored vDevs tends to work better for VM storage and applications that need
high IOPS. Each vDev gives the amount of IOPS for it's slowest component, (disk).
Thus, a pool of 15 vDevs has the IOPS has the equivalent of 15 disks, pretty good.

You might want to mention you use case. And network topology. Some people think
that aggregating 2 network ports on a FreeNAS server will double throughput for a
singe client, it won't.

PS. The real name for the ZIL Drives is SLOG, (Separate intent LOG). All ZFS pools
come with a ZIL inside the pool, unless a SLOG is configured. Using the wording of
SLOG automatically implies a separate device, unlike a ZIL.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
What the Half-Elven lady said... we really do need to know your intended use of the server.

If I understand you correctly, your plan is to use 15 mirrored pairs in your pool. That may or may not be appropriate... mirrors generally give the best performance, but are horribly space-inefficient -- 50%! So your 30 x 6TB drives will only yield the storage capacity of 15 x 6TB drives. And bear in mind that you will lose your data if both drives in any particular vdev fail. ZFS does provide for 3-way mirrors, which would increase the safety factor - but at the cost of even less space efficiency - 33 1/3%!

If you're planning on using the server for general-purpose file sharing, you may want to consider using RAIDZ2 or RAIDZ3, as these are both more space-efficient. Some examples:
  • Pool = 5 x 6-disk RAIDZ2 vdevs (2 parity drives per vdev): 66 2/3%
  • Pool = 3 x 10-disk RAIDZ3 vdevs (3 parity drives per vdev): 70%
@Arwen is right about the L2ARC; you don't need one unless you've installed the maximum memory supported by your motherboard and still have performance issues. And you probably won't need a SLOG device unless you're providing block storage. Typically this is to one or more hypervisors for running virtual machines.

Good luck!
 

notjoe

Explorer
Joined
Nov 25, 2015
Messages
63
Thanks for the replies...This is exactly the sort of thing that I am looking for!

Workloads:
The storage array is mainly going to be used to work with a lot of heavy VR Videos. We might be rendering, encoding, or otherwise working directly off this storage array. I can easily see a few hundred mbps being sustained between each of the editors and/or encoding machines. Right now we will have 1 video editor and 3 machines that will be accessing the array. I can easily see that jumping up to 2 or 3 editors and 5 or 6 encoding machines. The network might actually end up using more network bandwidth if we start building encoding machines with dual GPU and dual CPUs. There will be a lot of writing to / reading from taking place at the same time.

Networking:
The dual port nic is jut because. I wasn't planning on using port spanning and knew that it would not double the speed that the storage array can achieve.

Storage Configuration:
I know that by using the above configuration I am losing 50% of the space. The reason for this configuration is that if a drive does fail it is easily replaced without any sort of raid rebuild process taking place, which would hinder the operation by slowing everything down. Sure, it'll have to re-sync the data to the new drive but it'll be a lot less intensive of a task than resilvering the array and take a lot less time to do it. Supermicro provisions with minimum of 30 drives. That'll give me 90TB worth of storage to start, another 90TB of storage available (or more if i use bigger drives to create the new vdevs). I don't plan on using dedupe option so that should help out with the memory. Would 256GB of memory be a good starting point for this configuration?

I don't foresee block storage being needed yet but who knows about the future.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Would 256GB of memory be a good starting point for this configuration?
As with so very many things in life... 'it depends'. :smile: But yes, sir, 256GB is a good start. The rough rule-of-thumb is "16GB + 1GB for every 1TB of storage", which you meet handily. However, given your workload, you may need more RAM. The savvy thing to do would be to leave yourself plenty of memory slots so that you can add additional memory in the future, i.e., don't fully populate the board with smaller-capacity RAM modules.

You've obviously given this design some thought... kudos!
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
You might plan ahead and get the dual 10Gbps & dual 25Gbps network card. Even
if you don't use the 25Gbps ports to a switch, you can use one of them on each server
as the inter-connect. Simply setup a private sub-net and private host name for the hosts.
Then use that for the replication. This also keeps the replication traffic off the main port.

Of course, you can implement such a scheme with your second 10Gbps port.

Plus, if you ever do need more performance, you can use the 2nd 25Gbps port to a newer
switch as the FreeNAS's main port.

If I have the choice, I never buy an under provisioned card, (Fibre Channel, Network or
SAS), if I can help it. So even if you don't get a network card with dual 25Gbps ports,
you can get the 4 x 10Gbps card instead. That gives you options to have multiple sub-nets
that serve your different needs, (editors verses encoders). And a free port for inter-connect
to your other server.
 
Last edited:
Joined
Feb 2, 2016
Messages
574
Would 256GB of memory be a good starting point for this configuration?

Absolutely, yes. The entire configuration looks solid. You did your research.

As others have noted, examine the SSDs you're considering and do the math to see if they have the latency, endurance and power characteristics necessary. Using a way larger than needed SSD for SLOG is fine if you're doing that for over provisioning and the enhanced write endurance and not because you are actually using all that space for SLOG.

L2ARC may not be much use if your most recently used or most frequently used files are all in excess of its size. The more testing and experience I have with L2ARC, the less useful I find it for the data sets I've thrown at it. Of course, I've never thrown video its way and don't have nearly as much RAM as you do. Still, I'd do performance testing without it first and see if it meets your needs.

For RAM, if raw performance is the overriding goal, you'll need to look at your active data sets. How much data is actually moving around? Are you editing a minute at a time? Five minutes? Grading an entire movie? If you can keep your entire active data set or often-used elements in ARC by jumping from 256GB to 512GB, that's the first upgrade I'd make. On the other hand, if 2TB of RAM still wouldn't be enough to keep it all in ARC, I'm not sure it would be worth upgrading to even 512GB.

Cheers,
Matt
 
Last edited:

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,478
Lots of good advice so far and this is a bit out of my experience level but one thing I can recommend (and I believe @MatthewSteinhoff will chime in on this) is that you don't need a similarly specd backup target. Save your money and invest more in the main machine if need be.

Replication (built into FreeNAS and the best way if you are backing up from one zfs to another zfs) is very efficient and extremely light on the CPU. If the replication target machine is solely going to be used for that, you can get by with minimum specs, letting the backups run overnight and not have the machine break a sweat.
 

notjoe

Explorer
Joined
Nov 25, 2015
Messages
63
Absolutely, yes. The entire configuration looks solid. You did your research.

As others have noted, examine the SSDs you're considering and do the math to see if they have the latency, endurance and power characteristics necessary. Using a way larger than needed SSD for SLOG is fine if you're doing that for over provisioning and the enhanced write endurance and not because you are actually using all that space for SLOG.

L2ARC may not be much use if your most recently used or most frequently used files are all in excess of its size. The more testing and experience I have with L2ARC, the less useful I find it for the data sets I've thrown at it. Of course, I've never thrown video its way and don't have nearly as much RAM as you do. Still, I'd do performance testing without it first and see if it meets your needs.

For RAM, if raw performance is the overriding goal, you'll need to look at your active data sets. How much data is actually moving around? Are you editing a minute at a time? Five minutes? Grading an entire movie? If you can keep your entire active data set or often-used elements in ARC by jumping from 256GB to 512GB, that's the first upgrade I'd make. On the other hand, if 2TB of RAM still wouldn't be enough to keep it all in ARC, I'm not sure it would be worth upgrading to even 512GB.

Cheers,
Matt

Thanks for the thoughtful reply. Video files can be anywhere from 10gb to 50gb. The main editor has and I have been discussing things and we might end up not using videos at all but rather rendering the videos to image sequences. Each video is 60fps and each frame is about 50mb, give or take and millions of files. I suspect there might be a hit on the initial caching but if the person is reusing those image sequences then it should be fine. Maybe 512GB of memory might be worthwhile to be able to help cache all these images/videos.

I definitely like your approach though. I'll start off with 256GB memory and see how that works out first. I believe it should be enough to hold the data that editors will be working with in memory, if not, I'll upgrade to 512GB and see how much of an improvement I gain. After that, its either another CPU+Memory or L2ARC cache.
 

notjoe

Explorer
Joined
Nov 25, 2015
Messages
63
Lots of good advice so far and this is a bit out of my experience level but one thing I can recommend (and I believe @MatthewSteinhoff will chime in on this) is that you don't need a similarly specd backup target. Save your money and invest more in the main machine if need be.

Replication (built into FreeNAS and the best way if you are backing up from one zfs to another zfs) is very efficient and extremely light on the CPU. If the replication target machine is solely going to be used for that, you can get by with minimum specs, letting the backups run overnight and not have the machine break a sweat.

The backup is pretty important. We need to be able to replace once machine with the other in the event of a failure. We're willing to accept some downtime/data loss, so, the replication doesn't need to be real time but once per 30 minutes via applying snapshots to the backup machine should be fine. A lot of $$$ is spend on people working and on the content. The storage array being offline for a day costs us in many ways.
 

notjoe

Explorer
Joined
Nov 25, 2015
Messages
63
So, I've been doing some more research and I came across this site with some recommendations about ZIL/SLOG devices.

I am thinking that the Intel DC P3700 drive that has power loss protection would be the way to go for the ZIL. I'll hold off on SLOG until I've reached memory capacity of the motherboard.

Incase any of you are interested, the article can be found here: https://www.servethehome.com/buyers...as-servers/top-picks-freenas-zil-slog-drives/
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
So, I've been doing some more research and I came across this site with some recommendations about ZIL/SLOG devices.

I am thinking that the Intel DC P3700 drive that has power loss protection would be the way to go for the ZIL. I'll hold off on SLOG until I've reached memory capacity of the motherboard.

Incase any of you are interested, the article can be found here: https://www.servethehome.com/buyers...as-servers/top-picks-freenas-zil-slog-drives/
The Intel DC P3700 has been mentioned around here as a decent SLOG.

The SLOG has little to do with the memory capacity, that's the L2ARC, (aka Cache pool entry). It's
confusing I know, and will get worse since OpenZFS is introducing Metadata cache devices, (so your
directory and file attributes can live on SSD).

Think of it this way, SLOGs are write only unless you have a server crash. Then you want something
that survived the crash to pick up any synchronous writes that still need to be written to the normal disks.
Thus, the desire to have one with power loss protection. To be practical, it should be noticably faster
that your normal devices, (generally disks). So SSDs come into play with both low latency and high
write speeds.

L2ARC / Cache devices are the opposite, they are read mostly. Only when data is read alot and can't
fit into RAM, then you want it written to a higher speed and lower latenancy device. Their is a directory
entry in RAM for each L2ARC / Cache entry, thus the desire to max out RAM before using a L2ARC.
 
Last edited:

notjoe

Explorer
Joined
Nov 25, 2015
Messages
63
Yeah, I always referred to it as ZIL but then found out it was SLOG and got confused since they're the same!
 
Joined
Feb 2, 2016
Messages
574
To be practical, it should be noticably faster that your normal devices, (generally disks). So SSDs come into play with both low latency and high write speeds.

Read that a couple times then do the math, @notjoe: noticeably faster. With a 15-wide striped mirror of 7200rpm drives, you're going to have 15 * 175 MB/s, give or take, of throughput: 2,625 MB/s aggregate. The Intel P3700 has 1,900 MB/s of throughput. If you SLOG your pool with the P3700, you could actually have lower performance, give or take.

Your machine - and requirements - are full beast. Rules of thumb are going to break down at that level.

For your replication host, @nojohnny101 is correct in my belief you can really cut corners there depending on your service-level requirements. If the primary node fails, it may be quicker to pull parts out of the replication target and put them in the primary to get it back up and running than it would be to fall over all services to the secondary. When I look at what cascade of failures would be required to cause the primary to fail, those failures are so unlikely that I'm happy to limp along on an underpowered secondary while waiting for parts for the primary to arrive.

Cheers,
Matt
 
Last edited:

notjoe

Explorer
Joined
Nov 25, 2015
Messages
63
Read that a couple times then do the math, @notjoe: noticeably faster. With a 15-wide striped mirror of 7200rpm drives, you're going to have 15 * 175 MB/s, give or take of throughput: 2,625 MB/s aggregate. The Intel P3700 has 1,900 MB/s of throughput. If you SLOG your pool with the P3700, you could actually have lower performance, give or take.

Your machine - and requirements - are full beast. Rules of thumb are going to break down at that level.

For your replication host, @nojohnny101 is correct in my belief you can really cut corners there depending on your service-level requirements. If the primary node fails, it may be quicker to pull parts out of the replication target and put them in the primary to get it back up and running that it would be to fall over all services to the secondary. When I look at what cascade of failures would be required to cause the primary to fail, those failures are so unlikely that I'm happy to limp along on an underpowered secondary while waiting for parts for the primary to arrive.

Cheers,
Matt

I am starting to see the light! I think , taking it a step further, is if we assume that I will consume the other 30 drive bays, for a total of 60, then the math would be 30 * 175 MB/s, give or take of throughput: 5250 MB/s aggregate, a ZIL drive would most definitely kill performance. So, ZIL/SLOG is out of the question then? Whether the data gets redistributed across newly added disks or those newly added disks get used more until they hit a capacity similar to the disks in the rest of the raid array is beyond me. I might be over thinking this.

As for the backup machine. The expectation is that with a failed server we can be operational within minutes using a backup. I'd imagine it should only take a few minutes to switch IP addresses out. Personally, I'd probably just stock some replacement parts and call it a call but the requirements of it are a 1:1 mirror of the primary. I'll happily spend money on kit though ;)

Thanks for saving me ;)
 

ChriZ

Patron
Joined
Mar 9, 2015
Messages
271
Hello..
Just one thing.
In your first post you mentioned your CPU specs, from which I understand it is an E5-2603v4..
Don't know if this CPU is good enough for your intented workload.
Don't quote me on this, though.. Let's see what others think about this...
 
Joined
Feb 2, 2016
Messages
574
So, ZIL/SLOG is out of the question then?

Short answer: maybe, I don't know.

Long answer: Not out of the question. Just a more complicated answer. If one is too slow, maybe two in a stripe would work? You're way beyond what I've built. Here's my napkin math...

With a 30-wide, 60-disk mirrored stripe, you're moving up to 10,500 MB/s, (5,250 MB/s per side of the mirror assuming 175 MB/s per disk). Your LSI 3008 is limited to 6,000 MB/s, right? Further, the PCIe 3.0 bus is just 985 MB/s per channel: an 8x card such as the LSI SAS 9300 is limited to 985 * 8 for 7,880 MB/s.

So, while the 5,250 MB/s you could be moving is less than the 7,880 MB/s you have available, you're running harder than most. You also haven't accounted for everything else chewing up the other PCIe channels such as the 10G NIC.

Your NIC, by the way, will likely be the bottleneck: 10G is 1,250 MB/s. You need 40G of network to reach 5,000 MB/s of disk.

Whether the data gets redistributed across newly added disks or those newly added disks get used more until they hit a capacity similar to the disks in the rest of the raid array is beyond me.

If your data is not sedentary and your original pool isn't 100% full, rebalancing should happen magically as data is written. You should receive improved performance along with the additional space as soon as you bring the VDEVs into the pool. Further, since you're adding the same number of drives as was in the original pool, the worst case scenario is that the performance won't change.

(Where you would see a problem is if you had a ten spindle pool that was really close to being full then added two more spindles. Until the spindles rebalanced, those two relatively empty spindles would be hammered while the original ten spindles would not be nearly as used. If your performance was 1,000 MB/s initially, you might choke down to 200 MB/s. Your IOPS, too, would suffer. But, like I said, that isn't your scenario. (An archival pool where data is added then never deleted or changed is harder to rebalance since nothing is naturally moving around.))

an E5-2603v4.. Don't know if this CPU is good enough for your intented workload.

An E5-2603 v4 does seem slow. For most situations, any CPU will have plenty of performance as long as you're not running applications on the FreeNAS host itself (most notably, Plex). In this case, I'd throw a bit more horsepower at the server.

Cheers,
Matt
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Uh, lets not forget that SSDs have MUCH lower latency. Meaning I could write data and get
acknowledgement that's its written in a mili-second. But, hard disks might take 8 mili-seconds
just to seek to the point on the disk. Then wait a bit more until the disk spins to the right block.

Further, the pool can also be busy with normal reads, backups, scrubs, re-silvers, async writes
and SLOG write backs, (which come from RAM). But a SLOG is a dedicated device, (or
generally should be).

Remember, in the case of synchronous writes, it's how fast you can get the data in a secure
storage, (the pool builtin ZIL, or external SLOG). Because until that happens, the write is not
complete and the writer program is waiting.

That's why it's application specific. If the application does not need or want synchronous writes,
the NAS does not need SLOG.
 
Joined
Feb 2, 2016
Messages
574
You are correct, @Arwen. For many applications latency would be paramount.

I discounted latency in my analysis because video file writes strike me as being larger than any reasonable SLOG - you're going to end up waiting on the spinning platters anyway. Large, streaming writes instead of bursty writes.

At least that's how I'm imagining it. Truth is, most of my clients don't do a video. They are still photographers with hefty archives of largish (35MB - 200MB) images. For their workflow, SSD SLOG is sweet.

Cheers,
Matt
 

notjoe

Explorer
Joined
Nov 25, 2015
Messages
63
Status
Not open for further replies.
Top