Help with a new system

Griwa

Cadet
Joined
Jun 7, 2017
Messages
5
Hi everyone.

Typically I do not post, and would search for the right answers and make my own decisions. I have been successfully running a FreeNAS system for the company I work at for the last 2 years, with a second remote backup server. In the last year or so we have moved heavily into processing photogrammetry surveys and I am quickly filling up our current server with this data, so it is time to add an additional server for point cloud and photogrammetry purposes.

The reason I would like some help, is the data that I process gets accessed by a cloud composed of 13 current computer nodes, simultaneously. Typically we do 5-10 projects at a time, in 2 weeks, and then store useful data, and remove the working files.

A project starts out as images from our survey, typically 8,000-15,000 images at 10MB each (80-150GB). Each node accesses the images and creates large processing chunks, typically 200 x 0.5-2GB finished project is about 1TB. During next stages these chunks are accessed for further processing. Each node is fed by 10Gb fiber, and the server has a LAGG of 2x 10Gb fibers, this appears to be sufficient for our current purposes as nodes take longer to process and bandwidth seems to not be an issue, but may be upgraded in the future. Currently peaking at 14Gb through the LAGG while processing.

Due to large amount of random read (images) and synchronized write and read (working chunks), I would like to pick a pool setup that would not throttle much, would RAIDZ3 be up to the task? I am looking at a 22 disk chassis, possibly filling it with pool of 2 groups of 11 x 16TB disks per vdev. Suggestions on HBA controller?

To help with frequent access of data I am thinking of adding L2ARC, is this a bad/good idea for such a workload? If so what would be recommended?

Our current server runs on E3-1245 with 64GB of ram, would a similar spec CPU be sufficient?

Any help would be appreciated, or if any more info required I would be happy to do so. Thank you.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If you need IOPS, you need mirrors, no way around that. Probably SSDs, too. I'm partial to having a fast SSD pool and automatically moving stuff over to spinning rust on a different pool, whenever that's viable. The spinning rust can then be RAIDZ2/3, since it's doing very sequential I/O.

To help with frequent access of data I am thinking of adding L2ARC, is this a bad/good idea for such a workload? If so what would be recommended?
The decision tree is pretty simple. Do you need more performance? If yes, are the ARC deadlists getting a substantial amount of hits? If so, add RAM and/or an L2ARC.

Our current server runs on E3-1245 with 64GB of ram, would a similar spec CPU be sufficient?
You'll probably want a better platform to allow for more RAM and connectivity, but the CPU itself would probably do okay.
 

Griwa

Cadet
Joined
Jun 7, 2017
Messages
5
Thank you for the response. I think after a few discussions internally we are going to split the long term storage and rendering storage into two additional servers.

I'm not fond of mixing SSD and HDD in a single server, as bandwidth while processing with SSD's will hinder the performance of the HDD storage sector for our typical workloads. Our intent is to add more nodes, which will load up render server quiet a bit. I did a basic calculation at the rate we are expanding, and it appears we would require substantially more data storage.

My current idea is to build a RAIDZ2 based 40+ disk system and migrate our current storage pools there. This will free up our current server which is RAID 10 based for rendering. Later when the bandwidth becomes a factor, we will add a SSD based render server and turn the current server into additional RAIDZ2 backup. Not sure if this is optimal, but it will buy us time from making excessive expenses.

That said, with a SSD based render server, is there any practical use of having L2ARC? I feel like the speed from the SSD's would match any L2ARC setup I can add, so simply having a large ARC would be sufficient. Is this correct?

I know this may have been answered somewhere, but I am trying to clarify for myself; is L2ARC populated by most recent or most used files coming from ARC? Or is it more complicated than that? For instance, dumbed down example, if we had 3 files on the server and only 2 could fit on the L2ARC, if file A and B have been used frequently, and file C is uploaded. Which files would sit in the L2ARC, A+B due to use case, or B+C as C was recently parsed through the ARC while being uploaded?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
That said, with a SSD based render server, is there any practical use of having L2ARC? I feel like the speed from the SSD's would match any L2ARC setup I can add, so simply having a large ARC would be sufficient. Is this correct?
Well, not all SSDs are insanely fast. Of course, there's a difference between "could be faster" and "needs to be faster".

Not sure if this is optimal, but it will buy us time from making excessive expenses.
It's valid.

know this may have been answered somewhere, but I am trying to clarify for myself; is L2ARC populated by most recent or most used files coming from ARC? Or is it more complicated than that? For instance, dumbed down example, if we had 3 files on the server and only 2 could fit on the L2ARC, if file A and B have been used frequently, and file C is uploaded. Which files would sit in the L2ARC, A+B due to use case, or B+C as C was recently parsed through the ARC while being uploaded?
First of all, not files, blocks. The pool layer has no understanding of files, that all happens at the dataset layer.
As for your question, the answer is "whatever was about to be evicted from the ARC". The ARC keeps track of most-recently used and most-frequently used blocks and the L2ARC grabs stuff that's about to fall of the lists.

I'll link in a video about the ARC, it's an interesting perspective.
 
Top