BUILD Looking to build kicka.. NFS server

Status
Not open for further replies.

vrod

Dabbler
Joined
Mar 14, 2016
Messages
39
Hi all,

I've been using FreeNAS on and off for the last few years and I am currently in the process of starting a hosting company. In this case, I am looking to build a NFS which is capable of high IOps together with being simple to manage.

I found FreeNAS pretty promising because of the easy webgui (yet still able to do advanced stuff), because of ZFS (i've used this on ubuntu for a long time and pretty satisfied with it) and because of the monitoring possibilities. I will be replicating pools to a second box at some point as well.

This NFS server will be handling a lot of pxe-booting systems which will mainly be hosting minecraft servers so a lot of 4K stuff. It will also be hosting a single 1T datastore for my 2 ESXi-boxes. I've heard lots of bad stories about FreeNAS + NFS + ESXi so I suppose I will go with iSCSI. I've already conducted my own tests with NFS to ESXi and was far from impressed - I have yet to try iSCSI. I'm planning to have a dataset for each purpose so it's separated as well. Now, I already have some decent hardware which will be running this box but I'm searching the guidance from experts on how to set it up correctly. Here's my current hardware:

SuperMicro X9DRi-F (dual 2011 mb, dual gbit lan)
Xeon E5-2603 QC processor
64GB DDR3 ECC memory
4 x 2T 7,2K RPM SATA HDD's
Extra Intel DP Server adapter (2x gbit lan)
Redundant 620W powersupply

I want this system to be as fast as it can, so I've thought about throwing in a couple of NVMe SSD's to accellerate the pool. My question is, how much is sufficient? I am especially looking at write performance here and especially 4K wise. The minecraft servers will be writing and loading 4K stuff so it's important for me that the performance is there. I've been looking at the Samsung 950 pro 256gb drive. I already have one of those so I wouldn't have a problem chippin' in another. My idea is to partition them each with 2x100gb partitions and leave the rest for overprovisioning. Then mirror partition 1 on the SSD's for ZIL and mirror them as well on partition 2 for L2ARC. For L2ARC though, would it make more sense to chip in an extra 64gb of memory? I have a lot of different information as well telling me that more RAM gives better writes, where other information says it does not. What's the deal here?

I hope some of you might be able to guide me in the direction of getting the best performant FreeNAS box. :) MB/s does not matter to me, the only thing I'm out after is the raw 4k/8k IOps.

Thanks in advance,
Chris
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Wow.

Okay, so, the bad... your 2603 is one step below what I've been known to call the most contemptible processor ever seen in Xeonland (the 2609). ZFS is a software implementation of what is normally done by a hardware RAID controller in silicon. You do yourself no favors with a bad CPU. The dual CPU board is also kinda hurty if you don't need duals, because it eliminates options like the X9SRL and the E5-1650 v2 (or the newer X10SRL and E5-1650 v3).

There's nothing bad about NFS or iSCSI except that ZFS is a CoW filesystem and therefore you have the related issues there. We know how to mitigate those issues. For writes, keep gobs of free space on the pool. For reads, L2ARC. I write about this approximately daily so it shouldn't be hard to find threads on these forums about this.

The Samsung 950 Pro is a nice NVMe drive but is not suitable for SLOG use. Where did you get the idea that this was a good drive for that?

Splitting a device to use part as SLOG and part as L2ARC gives you the worst of both worlds, in any case. Don't do that. Your 64GB RAM probably gives you enough RAM to use a 256GB 950 Pro as L2ARC without problems, but before adding another 256GB 950 Pro, you need to add another 64GB RAM. This keeps the ratio of ARC:L2ARC around 1:4, which is very safe. You can also analyze your existing system once it is in production and see if maybe a 512GB 950 Pro would be a better addition when you add a second 64GB of RAM. You need to make sure you're not pressuring the ARC in a bad way. For L2ARC, don't worry about overprovisioning.

As for your disks, be sure you're doing mirrors, not RAIDZ, and you'd be better off with about twice as many disks. If you can get larger disks, this works better because it creates more free space on the pool, which directly translates to better write speeds.
 

vrod

Dabbler
Joined
Mar 14, 2016
Messages
39
Hello jgreco,

Thanks for your inputs! I did have a feeling that the 2603 might bite the dust but I bought the board and cpu (+ an 8gb dimm) as a cheap bundle, that's why I wanna use the dual board. I use intel s2600's for my other app servers. I have an e5-2620 at my disposal (just gotta remove it from a server currently in a stage environment), I guess that would give a bit more firepower. Tbh., the 2603 also does look to perform a bit slow. :)

I was thinking of the 950 pro because of its high amounts of 4K iops (both read and write) and mirroring a couple of partitions just seemed logical to me, but I also do have a couple of S3700 100GB dc's I can use instead. I suppose this would make more sense, but the IOps wouldn't be as good. Then it would make sense to use the NVMe as L2ARC yes. I wouldn't need more space than the 256gb device can provide right now, but perhaps yes a 512gb upgrade could be viable (or just raid0'ing 2 x 256gb) when the need is there.

Yes, I will certainly be doing mirrors, not raidz. I suppose this will also stress the CPU less. So it would actually be better with 8x2TB drives in the same raid10 pool? I do have the ports and hotswap bays but I was thinking to create a second pool with the 4 additional drives whenever I needed it. I'll try it out, thanks for your advice!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The whole purpose of the SLOG is to insure data integrity. The 950 Pro's can't do that because they lack power loss protection. The IOPS of your SLOG device aren't relevant if the SLOG isn't providing the function it is supposed to; it is better to just omit the SLOG and turn off sync writes and magically double your speed than it is to use a bad choice of SLOG.
 

vrod

Dabbler
Joined
Mar 14, 2016
Messages
39
Thanks for your input again, you are right about the power loss protection. I'm not sure if power loss will be an input, the server will be colo'ed at Interxion which runs 2 phases to my rack (both separated UPS systems) and I'll connect each phase to a plug in the redundant power supply.. the chance is very little but I get your point - it's still there.. So I might as well just go with the S3700 SSD's, I guess they'll do pretty badass anyway. :) Is it not something that some of the ZIL is also first stored in memory? I believe I read that in a powerpoint presentation, posted somewhere else in this forum.
 

vrod

Dabbler
Joined
Mar 14, 2016
Messages
39
Thanks, looked at this. I've built the system now, however without the NVMe L2ARC drive so far (currently Intel UEFI's has an issue with 950 PRO NVMe's). For now I don't have any l2arc and I have a couple of DC S3700 100GB in mirror as ZIL. I am getting pretty alright performance, read-wise but write-wise I was maybe hoping for more. I have a couple more of the DC S3700 SSD's, would it make sense to add those as a mirror as well (thus resulting in a striped mirror) for better performance? Or would the SLOG then be too big? I am not really looking for the bigger pool size, just some more IO on the write-end.
 

Attachments

  • anvil.PNG
    anvil.PNG
    45.9 KB · Views: 234

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Write performance with a SLOG is always a lot less than it is without. Here's some idea of what's happening under the hood when you request a sync write:

A sync write starts at the client, and has to make a very complicated round trip, in lockstep, for EACH WRITE REQUEST. The "sync" part of "sync write" means that the client is requesting that the current data block be confirmed as written to disk before the write() system call returns to the client. Without sync writes, a client is free to just stack up a bunch of write requests and then they can send over a slowish channel, and they arrive when they can. Look at the layers:

Client initiates a write syscall
Client filesystem processes request
Filesystem hands this off to the network stack as NFS or iSCSI
Network stack hands this packet off to network silicon
Silicon transmits to switch
Switch transmits to NAS network silicon
NAS network silicon throws an interrupt
NAS network stack processes packet
Kernel identifies this as a NFS or iSCSI request and passes to appropriate kernel thread
Kernel thread passes request off to ZFS
ZFS sees "sync request", sees an available SLOG device
ZFS pushes the request to the SAS device driver
Device driver pushes to LSI SAS silicon
LSI SAS chipset serializes the request and passes it over the SAS topology
SAS or SATA SSD deserializes the request
SSD controller processes the request and queues for commit to flash
SSD controller confirms request
SSD serializes the response and pssses it back over the SAS topology
LSI SAS chipset receives the response and throws an interrupt
SAS device driver gets the acknowledgment and passes it up to ZFS
ZFS passes acknowledgement back to kernel NFS/iSCSI thread
NFS/iSCSI thread generates an acknowledgement packet and passes it to the network silicon
NAS network silicon transmits to switch
Switch transmits to client network silicon
Client network silicon throws an interrupt
Client network stack receives acknowledgement packet and hands it off to filesystem
Filesystem says "yay, finally, what took you so long" and releases the syscall, allowing the client program to move on.

That's what happens for EACH sync write request.

The trick to increasing write performance is to optimize as many of these steps as possible. One of the better optimizations possible is to get rid of SAS from the equation and go straight to NVMe, which eliminates some handling in the middle. For your particular case, with that 2603 CPU, another optimization is to get a better CPU that allows the NAS to process things faster. It isn't clear to me HOW much faster that would make it, just that it's a thing that affects performance.
 

vrod

Dabbler
Joined
Mar 14, 2016
Messages
39
Hello,

yes, the 2603 is already out of the window, instead I have got a 2620 in place. for now I suppose I will have to deal with the SATA SSD's as slog, but I will try to scout a couple of p3600.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Why not just try a single P3600 to see if it does what you need?
 

vrod

Dabbler
Joined
Mar 14, 2016
Messages
39
since it's SLOG, wouldn't it be better to mirror the writes? I don't want to risk data loss, that wouldn't be too nice..
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Since you don't know if it is going to give you the performance you are hoping for, wouldn't it be better (for your wallet) to try just one for now? The likelihood of a new Intel SSD failing in the first month is very small.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Since you don't know if it is going to give you the performance you are hoping for, wouldn't it be better (for your wallet) to try just one for now? The likelihood of a new Intel SSD failing in the first month is very small.
And it'd have to be coupled with a system failure to actually cause data loss (instead of just painful performance).
 

vrod

Dabbler
Joined
Mar 14, 2016
Messages
39
Hey guys... thanks for all the inputs. I did indeed consider just getting a single P3600 to start with... Now I have purchased a couple of 750 800GB ssd's (they were lowered by 35%... :oops:)..... 800GB would indeed be a bit too much for the L2ARC cache and now I'm considering to actually build a complete mirror in ZFS for just those 2 drives... They are so powerful that I would need no cache or anything, not even with dedupe or compression on as well.. Is that an insane idea, or does it actually make sense? I still have a couple of DC S3700's I could use for SLOG but I don't even know if that would make sense either since they are slower...
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Doesn't hurt to give it a whirl and see what happens.
 

vrod

Dabbler
Joined
Mar 14, 2016
Messages
39
Yeah true... I'll try it out when I receive them and will report back. :) Thanks so far for all your advice.
 

yourmate

Contributor
Joined
Apr 4, 2016
Messages
105
Okay, so, the bad... your 2603 is one step below what I've been known to call the most contemptible processor ever seen in Xeonland (the 2609). ZFS is a software implementation of what is normally done by a hardware RAID controller in silicon. You do yourself no favors with a bad CPU. The dual CPU board is also kinda hurty if you don't need duals, because it eliminates options like the X9SRL and the E5-1650 v2 (or the newer X10SRL and E5-1650 v3

Sorry to chip in without the possibility to add any value to the discussion but I'm just shocked as I've currently bought a SuperServer 7047R-TRF, which is basically the above mentioned board with a good Platinum rated PSU and 4U server case and I was thinking about buying two E5-2609 CPUs to power it...

Did I just make a very bad decision by spending £400 (~$560) on the chassis, PSU & the X9DRi-F board?!?
Although mine is just a home server...
 

vrod

Dabbler
Joined
Mar 14, 2016
Messages
39
why not just pop in a 2620 or similar? that works pretty well.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Or head on over to Amazon etc for the blowout special on E5-2670's, around $70. These appear to be retired datacenter CPU's. Even I'm thinking about that...

Put it this way, buying two of those would net you an unreasonably fast box at a bargain-basement price.
 
Status
Not open for further replies.
Top