New to FreeNAS - Need Help Assessing what I have inherited

Status
Not open for further replies.

onigiri

Dabbler
Joined
Jul 17, 2018
Messages
32
Hello all,

First, sorry for the wall of text, I wanted to give enough information to be useful but not put you to sleep!

I recently started working with a company that purchased two huge NAS Boxes from a company called 45 Drives back in December of 2017 and each of them are running Freenas 11.1u5 . From what I understand these NAS boxes were designed for Freenas. Now I have spent the better part of the last couple weeks reading through these forums and watching videos on youtube. I even built a Prometheus backend that scrapes the netdata information and then puts it into Grafana for me so that I have some historical data to work with in the future (Ill make a guide for this if someone wants me to).

Now the problem: We are constantly seeing bottlenecks and I cant quite put my finger on what it is. So what I wanted to do is reach out to the community to get some feedback because you all are the experts, not me. Below is going to be all the information that I currently have as well as some statistical data. If there is anything you might want to see just let me know and I can get it. What I am looking for is a general evaluation of what I have and hopefully some pointers to make my two boxes run better.

  • These are 10GbE adapters on both boxes, with a 10GbE switches and NIC Cards in every production machine.
  • There are on average about 60users hitting the box daily.
  • Right now we are in the process of moving everything from one(Apollo) to another (Apollo) to reconfigure the zpool so Gemini is under heavier IO load

About their use: We are a GIS firm and these primarily store a massive amount of images and .gdb (geodatabase) files. There are millions (im not kidding) of tiny files and from what I can tell alot of random I/O. Because these two NAS's are separate, I cannot pool the storage and 45drives does not offer expansion shelves like IXSystems does with their TrueNAS. So for now we are storing different dataset on each server. One is hosting the majority of project data used by ArcGIS (Map and Desktop) and Simactive and the other is housing a warehouse for another application (GeoCue). We do use other applications, but heavy enough to cause any large spikes in IO.


There are two NAS boxes that are configured identically hardware wise, but the pool is configured differently. The reason behind this: My predecessor just took delivery from the NAS boxes, plugged them in and started dumping data to them. So we have been meticulously dancing around production to move all the data off of the servers to reconfigure the pool (because of bad performance). The shipped configuration was a zpool configured ad Raidz2 consisting of 4 vdevs of 15 drives. The post reconfiguration has changed it to 10 vdevs of 6 drives. I am interested to know what the difference and advantage of doing that is. My company paid a consultant to reconfigure Gemini zPool from the 4x15 to the 6x10 vdev configuration

My immediate thoughts was to add an L2Arc. We have two PCIe slots available and we can purchase a couple 905P or 4800x (PLP) to go there, but I am unsure because of the amount of random data, but honestly, could it hurt to try? From what I read a SLOG will not help us as we have sync writes turned off. Should we turn it back on and provision a part of the SSD as a SLOG? (I saw a post on how to do this)

Through reading the forums I have found some useful commands that you fine folks have asked others to for and to post their output. That along with the useful commands post I have compiled some screenshots of those commands run on both boxes. Let me know if I can provide anything else!

45 Drives Purchase and hardware
Apollo arc_summary.py
Apollo Tunables
Apollo zpool status -v
Gemini arc_summary.py
Gemini Tunables
Gemini zpool status -v
 
Last edited:
Joined
Feb 2, 2016
Messages
574
zpool configured ad Raidz2 consisting of 4 vdevs of 15 drives. The post reconfiguration has changed it to 10 vdevs of 6 drives.

The more VDEVs you have making up a pool, the better the performance, both in terms of bandwidth and IOPS. Going from four VDEVs to ten VDEVs should improve performance by 150%.

What little I know about GIS tells me it needs IOPS. With millions of tiny files floating around, IOPS may be more important than bandwidth.

It looks as though your machines were purchased with 64G RAM and upgraded to 256G? Is that the maximum amount of RAM supported by the motherboard? If not, RAM before L2ARC.

You can add an expansion chassis to your server if you have space PCIe slots which it sounds like you do. Just pick up an LSI card with external SAS ports and then a drive tray. If you can't add more RAM, adding more drives may be counter productive.

If you want to do data reorganization with a minimal amount of downtime and fairly quick performance, buying an external SAS HBA and drive tray still may be worthwhile. You can create your new pool structured for performance on the external drives, get it how you like by copying across the internal SAS connection. When the new drives are working as you like and their performance meets your requirement, shut everything down, move the drives to the internal slots and pull the external storage. Of keep the external bays online for archive data that isn't often accessed so it isn't filing your ARC.

At the scale you're discussing, you really need an expert with both FreeNAS experience and GIS experience using your specific tools to get your storage dialed-in. Best of luck finding that person. I'm sure they are out there if not here.

Cheers,
Matt
 

onigiri

Dabbler
Joined
Jul 17, 2018
Messages
32
It looks as though your machines were purchased with 64G RAM and upgraded to 256G? Is that the maximum amount of RAM supported by the motherboard? If not, RAM before L2ARC.

That is correct, we are currently maxing out each MOBO at 256GB of RAM and it is all ECC RAM too. I did learn through my readings that RAM is king in ZFS.
 
Joined
Dec 29, 2014
Messages
1,135
That is correct, we are currently maxing out each MOBO at 256GB of RAM and it is all ECC RAM too. I did learn through my readings that RAM is king in ZFS.

That is certainly true about RAM. Whether you need an SLOG (dedicated ZFS intent log) or an L2ARC is a different story. If you are doing something that does synchronous writes (like ESXi accessing FreeNAS via NFS), an SLOG would help performance a lot. See this thread.
https://forums.freenas.org/index.php?threads/slog-benchmarking-and-finding-the-best-slog.63521

L2ARC is more helpful if you are constantly reading the same file(s). It all depends on what data you are accessing, and the protocol with which you access it.
 
Joined
Feb 2, 2016
Messages
574
When you add L2ARC, you take away from system RAM because the L2ARC's index is kept in system RAM. Adding L2ARC takes away from ARC. Even a lightening-fast SSD is going to be slower than actual RAM.

The rule of thumb is you lose 1G of RAM for ever 5G of L2ARC added. In CyberJock's guide, it is noted that 'L2ARC shouldn’t be bigger than about 5x your ARC size'.

So, there is a trade-off to adding L2ARC. Without knowing a lot more about your dataset and tools, I can't tell you if L2ARC would be helpful.

Are you able to effectively benchmark your application? If so, I'd baseline the system as-is. Then I'd add a 500G or 1TB SSD - whatever you have laying around is fine - as L2ARC and benchmark your performance again. (Make sure to warm it up first, maybe run the benchmark a few times before and after adding the SSD.)

If you see a meaningful bump in performance, look around for a L2ARC solution. By testing with an off-the-shelf SSD (Samsung EVO, for example), you're not out a lot of money if it doesn't improve performance. If it does work, you can throw an Intel Optane or other such high-performance solution at FreeNAS and know it will be even better than in your testing.

I've only touched on L2ARC as I'm guessing you're reading a lot more often than writing. If you think you have a write performance issue, I'd advise mostly the same testing procedure with a SLOG. Benchmark, add a cheap SSD, benchmark, if it works, replace the cheap SSD with a high-endurance, power-loss-protected device.

Cheers,
Matt
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I even built a Prometheus backend that scrapes the netdata information and then puts it into Grafana for me so that I have some historical data to work with in the future (Ill make a guide for this if someone wants me to).
That would be very helpful. I am sure there are others here that would like to do something similar.
 

onigiri

Dabbler
Joined
Jul 17, 2018
Messages
32
We are relatively intensive on both read and write. We are not running any VM's off the zpool and it is purely storage. All traffic is SMB.

The biggest problem with doing anything to these boxes is the fact that the software here takes days or longer to render maps and point clouds so I cannot just shutdown and reboot willy nilly.

Here is data from the last week (6 days to be exact, as I setup Prometheus last Thursday)
We are not currently writing asynchronously as I thought that would have better performance, but I am not opposed to forcing sync write and getting a SLOG.


That huge spike over the weekend was the Scrub on Gemini.

s2By9FR.png
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The shipped configuration was a zpool configured ad Raidz2 consisting of 4 vdevs of 15 drives.
That is how the one I have came also, except it is the 'Turbo' model and came with the dual socket system board.
https://www.45drives.com/products/storinator-xl60-configurations.php
My company paid a consultant to reconfigure Gemini zPool from the 4x15 to the 6x10 vdev configuration
We would have told you for free.
My immediate thoughts was to add an L2Arc.
You might benifit from that for things that are accessed repetitively or by multiple users. Another thing that might help is SLOG, for write cache.
What little I know about GIS tells me it needs IOPS. With millions of tiny files floating around, IOPS may be more important than bandwidth.
We do ArcGIS and file geo databases where I work and this is absolutely true. IOPS because the files are tiny, many are only 1k.
It looks as though your machines were purchased with 64G RAM and upgraded to 256G? Is that the maximum amount of RAM supported by the motherboard? If not, RAM before L2ARC.
That is the single socket board and maxes out at 1TB of LRDIMM memory. It would be very expensive to go there. It might be more 'bang for the buck' to get around 512GB of RAM and add the SLOG and L2ARC?
http://www.supermicro.com/products/motherboard/xeon/c600/x10srl-f.cfm
That is correct, we are currently maxing out each MOBO at 256GB of RAM and it is all ECC RAM too. I did learn through my readings that RAM is king in ZFS.
That is not maxed out. Not by a long, long shot.
http://www.supermicro.com/manuals/motherboard/C612/MNL-1585.pdf
On page 2-15 of the manual, it says that you can put 128GB LRDIMM modules in each of the 8 memory slots for a total of 1TB of RAM.
 
Joined
Feb 2, 2016
Messages
574
Do you have a spare SAS/SATA port and power? You can add (and remove) L2ARC without taking the pools offline. It's non-destructive.

Cheers,
Matt
 

onigiri

Dabbler
Joined
Jul 17, 2018
Messages
32
That is how the one I have came also, except it is the 'Turbo' model and came with the dual socket system board.
https://www.45drives.com/products/storinator-xl60-configurations.php

We would have told you for free.

It is a new era at my company. I am the type of person that like to learn how things work and do what I can myself. My predecessor paid others to just do it and pointed fingers away from himself.

That is the single socket board and maxes out at 1TB of LRDIMM memory. It would be very expensive to go there. It might be more 'bang for the buck' to get around 512GB of RAM and add the SLOG and L2ARC?
http://www.supermicro.com/products/motherboard/xeon/c600/x10srl-f.cfm
That is not maxed out. Not by a long, long shot.
http://www.supermicro.com/manuals/motherboard/C612/MNL-1585.pdf
On page 2-15 of the manual, it says that you can put 128GB LRDIMM modules in each of the 8 memory slots for a total of 1TB of RAM.

Thanks for pointing this out to me, the system was purchased with 64GB RAM each and then they decided to upgrade to 256GB because of the terrible performance. Again my predecessor just pointed to 45Drives and said 'it should have come configured right the first time'. It was then, they upgraded the memory. I will look into what it will take to upgrade that RAM because if it is what I think it is, (8x32GB) Sticks, then we will be sitting on alot of VERY expensive RAM that we cannot re-appropriate. and would need to replace with 8x64 or 8x128.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
It is a new era at my company. I am the type of person that like to learn how things work and do what I can myself. My predecessor paid others to just do it and pointed fingers away from himself.
That is how I am, I would want to build the system from parts and understand how it works. My boss would rather buy a solution from NetApp and have someone else to blame if it doesn't work.
The 45drives system was already here when I started this position, because one of the department heads got tired of waiting for my boss and went around him to buy it.
We need more space, so we are working on getting 2 more systems, this time they will be Supermicro systems. It will be an interesting time, copying all the data over to the other two systems.
I was only able to get my boss to compromise on a factory server with FreeNAS as the operating system because he will still have me to blame for the situation if it doesn't work. Well, that and the fact that the quotes from vendors like Dell, HP, and others; were over 100k each. We wanted to get 3 systems, but there was no way to fit even one of those into the budget.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I know a few big GIS companies that are building Supermicro systems for their environment.
This is the kind we ordered, 2 of them, and they should be in later this year:

upload_2018-7-24_19-56-26.png


This is the page on Supermicro's site that talks about the expansion chassis:
http://supermicro.com/products/chassis/4U/946/SC946SE1C-R1K66JBOD
and
http://supermicro.com/products/chassis/4U/946/SC946SE2C-R1K66JBOD
the second one there is the dual expander backplane model that can be setup with redundant management nodes, but you won't need that with FreeNAS.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Isn't it an option to use a SSD based pool for the very small files?
There is no way to have small files go to one pool and large files go to another. The Geo Database that ArcGIS uses can be huge with millions of files and the vast majority of those files will be tiny but there are also files that are not tiny. The server I have is storing about 290TB of data and there are some significant problems with the way my predecessor configured it, because they were configuring the storage as if it were an archive (cold storage), they were not aware of how it would be accessed. It has gzip-9 compression for one thing, that makes it slower just because of the way gzip compression works. Then it also has the vdev configuration issue that @onigiri had which slows the system down also. I have arranged for the people I work for to obtain new servers so the pool can be configured in a better way and the data can be transferred over. Then the existing server can be used as an archive, the way it was setup to be used to begin with.
 
Joined
Feb 2, 2016
Messages
574
It has gzip-9 compression for one thing, that makes it slower

You know you can change that live, right? It won't decompress what is there but whatever is written going forward will be in your new compression format.

Whenever we build a new server and migrate large amounts of data, we set the pool to gzip-9 while filling. Then, once everything has been copied, we switch to the recommended lz4. Static data that never changes - and there always seems to be more than you think - will sit around as gzip-9 taking up the minimum amount of space. Meanwhile, data that is actively written will be laid down in the performance-positive lz4. Since gzip decompresses quickly and efficiently, there doesn't seem to be a performance hit on reading the highly-compressed gzip data.

Cheers,
Matt
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Whenever we build a new server and migrate large amounts of data, we set the pool to gzip-9 while filling. Then, once everything has been copied, we switch to the recommended lz4. Static data that never changes - and there always seems to be more than you think - will sit around as gzip-9 taking up the minimum amount of space. Meanwhile, data that is actively written will be laid down in the performance-positive lz4. Since gzip decompresses quickly and efficiently, there doesn't seem to be a performance hit on reading the highly-compressed gzip data.
That is an interesting idea. I knew about the new writes being lz4 after switching the compression, but I had not considered doing gzip-9 and then switching to lz4 as a way to make data that is infrequently used take less space.
 

onigiri

Dabbler
Joined
Jul 17, 2018
Messages
32
This is my problem:

Right now, Production is in full swing and we have several applications that have very long running processes. And if these things fail for any reason, they need to be restarted. When something that takes 2 weeks to process, fails after 5 days, a domino effect happens and pushes back production.

Here is my CrystalMark during production:

t7vaR03.png



That Q8, Q32 write is a giant hamstring to everything we do.

I have confirmed that I am getting near max line speed (9.7Gbps) for network performance.


While the sequential is amazing and outperforms the local SSD by almost 80%, the poor Random Write performance is cataclysmic.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
You have 60 heavy GIS users hitting these two servers, the workload pattern one both is a sustained mix of small-block reads and write, your working set is way bigger than your ARC based on your cache hit%, and you're using spinning disks in parity vdevs.

(I think this is where the kids use that "OOF" meme.)

Quite simply, I believe you're asking too much from those poor disks on the back end of your pool. I imagine that with async you're able to absorb a small amount of incoming writes into a transaction group, but it very quickly chokes itself trying to actually flush it out to your overworked disks.

Pop an ssh session, do gstat -p and have a look at your ms/r and ms/w (milliseconds per read/write) columns, as well as the throughput numbers.

Your only shot at better performance with this config mirrors in a 30x2 setup, but that gives you only about 263TiB of usable space at the end of things. Usually a relevant question I suppose is "how much data do you have, and how full are these pools?"

Other than that, I don't really have much here other than your workload being a poster child for solid state.
 

onigiri

Dabbler
Joined
Jul 17, 2018
Messages
32
Thanks for your response HoneyBadger.

Here is a screengrab of gstat -p
It is constantly changing but there are always 1-2 drives in the red during refresh.

To respond to your question about how much data is in the pools. Right now we are moving data from one server to another to reconfigure the drives from 4x15 to 10x6 vdev config. All of the data does not fit on one server that is roughly 380TB. We have to put some data on another server, which is an EMC Isilon, that data that is living on the Isilon that shouldnt be right now is about 60T.

dvyQQfF.png
 
Status
Not open for further replies.
Top