New to FreeNAS - Need Help Assessing what I have inherited

onigiri · Jul 24, 2018

Hello all,

First, sorry for the wall of text, I wanted to give enough information to be useful but not put you to sleep!

I recently started working with a company that purchased two huge NAS Boxes from a company called 45 Drives back in December of 2017 and each of them are running Freenas 11.1u5 . From what I understand these NAS boxes were designed for Freenas. Now I have spent the better part of the last couple weeks reading through these forums and watching videos on youtube. I even built a Prometheus backend that scrapes the netdata information and then puts it into Grafana for me so that I have some historical data to work with in the future (Ill make a guide for this if someone wants me to).

Now the problem: We are constantly seeing bottlenecks and I cant quite put my finger on what it is. So what I wanted to do is reach out to the community to get some feedback because you all are the experts, not me. Below is going to be all the information that I currently have as well as some statistical data. If there is anything you might want to see just let me know and I can get it. What I am looking for is a general evaluation of what I have and hopefully some pointers to make my two boxes run better.

These are 10GbE adapters on both boxes, with a 10GbE switches and NIC Cards in every production machine.
There are on average about 60users hitting the box daily.
Right now we are in the process of moving everything from one(Apollo) to another (Apollo) to reconfigure the zpool so Gemini is under heavier IO load

About their use: We are a GIS firm and these primarily store a massive amount of images and .gdb (geodatabase) files. There are millions (im not kidding) of tiny files and from what I can tell alot of random I/O. Because these two NAS's are separate, I cannot pool the storage and 45drives does not offer expansion shelves like IXSystems does with their TrueNAS. So for now we are storing different dataset on each server. One is hosting the majority of project data used by ArcGIS (Map and Desktop) and Simactive and the other is housing a warehouse for another application (GeoCue). We do use other applications, but heavy enough to cause any large spikes in IO.

There are two NAS boxes that are configured identically hardware wise, but the pool is configured differently. The reason behind this: My predecessor just took delivery from the NAS boxes, plugged them in and started dumping data to them. So we have been meticulously dancing around production to move all the data off of the servers to reconfigure the pool (because of bad performance). The shipped configuration was a zpool configured ad Raidz2 consisting of 4 vdevs of 15 drives. The post reconfiguration has changed it to 10 vdevs of 6 drives. I am interested to know what the difference and advantage of doing that is. My company paid a consultant to reconfigure Gemini zPool from the 4x15 to the 6x10 vdev configuration

My immediate thoughts was to add an L2Arc. We have two PCIe slots available and we can purchase a couple 905P or 4800x (PLP) to go there, but I am unsure because of the amount of random data, but honestly, could it hurt to try? From what I read a SLOG will not help us as we have sync writes turned off. Should we turn it back on and provision a part of the SSD as a SLOG? (I saw a post on how to do this)

Through reading the forums I have found some useful commands that you fine folks have asked others to for and to post their output. That along with the useful commands post I have compiled some screenshots of those commands run on both boxes. Let me know if I can provide anything else!

45 Drives Purchase and hardware
Apollo arc_summary.py
Apollo Tunables
Apollo zpool status -v
Gemini arc_summary.py
Gemini Tunables
Gemini zpool status -v

MatthewSteinhoff · Jul 24, 2018

onigiri said:
zpool configured ad Raidz2 consisting of 4 vdevs of 15 drives. The post reconfiguration has changed it to 10 vdevs of 6 drives.

The more VDEVs you have making up a pool, the better the performance, both in terms of bandwidth and IOPS. Going from four VDEVs to ten VDEVs should improve performance by 150%.

What little I know about GIS tells me it needs IOPS. With millions of tiny files floating around, IOPS may be more important than bandwidth.

It looks as though your machines were purchased with 64G RAM and upgraded to 256G? Is that the maximum amount of RAM supported by the motherboard? If not, RAM before L2ARC.

You can add an expansion chassis to your server if you have space PCIe slots which it sounds like you do. Just pick up an LSI card with external SAS ports and then a drive tray. If you can't add more RAM, adding more drives may be counter productive.

If you want to do data reorganization with a minimal amount of downtime and fairly quick performance, buying an external SAS HBA and drive tray still may be worthwhile. You can create your new pool structured for performance on the external drives, get it how you like by copying across the internal SAS connection. When the new drives are working as you like and their performance meets your requirement, shut everything down, move the drives to the internal slots and pull the external storage. Of keep the external bays online for archive data that isn't often accessed so it isn't filing your ARC.

At the scale you're discussing, you really need an expert with both FreeNAS experience and GIS experience using your specific tools to get your storage dialed-in. Best of luck finding that person. I'm sure they are out there if not here.

Cheers,
Matt

onigiri · Jul 24, 2018

MatthewSteinhoff said:
It looks as though your machines were purchased with 64G RAM and upgraded to 256G? Is that the maximum amount of RAM supported by the motherboard? If not, RAM before L2ARC.

That is correct, we are currently maxing out each MOBO at 256GB of RAM and it is all ECC RAM too. I did learn through my readings that RAM is king in ZFS.

Elliot Dierksen · Jul 24, 2018

onigiri said:
That is correct, we are currently maxing out each MOBO at 256GB of RAM and it is all ECC RAM too. I did learn through my readings that RAM is king in ZFS.

That is certainly true about RAM. Whether you need an SLOG (dedicated ZFS intent log) or an L2ARC is a different story. If you are doing something that does synchronous writes (like ESXi accessing FreeNAS via NFS), an SLOG would help performance a lot. See this thread.
https://forums.freenas.org/index.php?threads/slog-benchmarking-and-finding-the-best-slog.63521

L2ARC is more helpful if you are constantly reading the same file(s). It all depends on what data you are accessing, and the protocol with which you access it.

MatthewSteinhoff · Jul 24, 2018

When you add L2ARC, you take away from system RAM because the L2ARC's index is kept in system RAM. Adding L2ARC takes away from ARC. Even a lightening-fast SSD is going to be slower than actual RAM.

The rule of thumb is you lose 1G of RAM for ever 5G of L2ARC added. In CyberJock's guide, it is noted that 'L2ARC shouldn’t be bigger than about 5x your ARC size'.

So, there is a trade-off to adding L2ARC. Without knowing a lot more about your dataset and tools, I can't tell you if L2ARC would be helpful.

Are you able to effectively benchmark your application? If so, I'd baseline the system as-is. Then I'd add a 500G or 1TB SSD - whatever you have laying around is fine - as L2ARC and benchmark your performance again. (Make sure to warm it up first, maybe run the benchmark a few times before and after adding the SSD.)

If you see a meaningful bump in performance, look around for a L2ARC solution. By testing with an off-the-shelf SSD (Samsung EVO, for example), you're not out a lot of money if it doesn't improve performance. If it does work, you can throw an Intel Optane or other such high-performance solution at FreeNAS and know it will be even better than in your testing.

I've only touched on L2ARC as I'm guessing you're reading a lot more often than writing. If you think you have a write performance issue, I'd advise mostly the same testing procedure with a SLOG. Benchmark, add a cheap SSD, benchmark, if it works, replace the cheap SSD with a high-endurance, power-loss-protected device.

Cheers,
Matt

Chris Moore · Jul 24, 2018

onigiri said:
I even built a Prometheus backend that scrapes the netdata information and then puts it into Grafana for me so that I have some historical data to work with in the future (Ill make a guide for this if someone wants me to).

That would be very helpful. I am sure there are others here that would like to do something similar.

onigiri · Jul 24, 2018

We are relatively intensive on both read and write. We are not running any VM's off the zpool and it is purely storage. All traffic is SMB.

The biggest problem with doing anything to these boxes is the fact that the software here takes days or longer to render maps and point clouds so I cannot just shutdown and reboot willy nilly.

Here is data from the last week (6 days to be exact, as I setup Prometheus last Thursday)
We are not currently writing asynchronously as I thought that would have better performance, but I am not opposed to forcing sync write and getting a SLOG.

That huge spike over the weekend was the Scrub on Gemini.

Chris Moore · Jul 24, 2018

onigiri said:
The shipped configuration was a zpool configured ad Raidz2 consisting of 4 vdevs of 15 drives.

That is how the one I have came also, except it is the 'Turbo' model and came with the dual socket system board.
https://www.45drives.com/products/storinator-xl60-configurations.php

onigiri said:
My company paid a consultant to reconfigure Gemini zPool from the 4x15 to the 6x10 vdev configuration

We would have told you for free.

onigiri said:
My immediate thoughts was to add an L2Arc.

You might benifit from that for things that are accessed repetitively or by multiple users. Another thing that might help is SLOG, for write cache.

MatthewSteinhoff said:
What little I know about GIS tells me it needs IOPS. With millions of tiny files floating around, IOPS may be more important than bandwidth.

We do ArcGIS and file geo databases where I work and this is absolutely true. IOPS because the files are tiny, many are only 1k.

MatthewSteinhoff said:
It looks as though your machines were purchased with 64G RAM and upgraded to 256G? Is that the maximum amount of RAM supported by the motherboard? If not, RAM before L2ARC.

That is the single socket board and maxes out at 1TB of LRDIMM memory. It would be very expensive to go there. It might be more 'bang for the buck' to get around 512GB of RAM and add the SLOG and L2ARC?
http://www.supermicro.com/products/motherboard/xeon/c600/x10srl-f.cfm

onigiri said:
That is correct, we are currently maxing out each MOBO at 256GB of RAM and it is all ECC RAM too. I did learn through my readings that RAM is king in ZFS.

That is not maxed out. Not by a long, long shot.
http://www.supermicro.com/manuals/motherboard/C612/MNL-1585.pdf
On page 2-15 of the manual, it says that you can put 128GB LRDIMM modules in each of the 8 memory slots for a total of 1TB of RAM.

MatthewSteinhoff · Jul 24, 2018

Do you have a spare SAS/SATA port and power? You can add (and remove) L2ARC without taking the pools offline. It's non-destructive.

Cheers,
Matt

onigiri · Jul 24, 2018

Chris Moore said:
That is how the one I have came also, except it is the 'Turbo' model and came with the dual socket system board.
https://www.45drives.com/products/storinator-xl60-configurations.php

We would have told you for free.

It is a new era at my company. I am the type of person that like to learn how things work and do what I can myself. My predecessor paid others to just do it and pointed fingers away from himself.

Chris Moore said:
That is the single socket board and maxes out at 1TB of LRDIMM memory. It would be very expensive to go there. It might be more 'bang for the buck' to get around 512GB of RAM and add the SLOG and L2ARC?
http://www.supermicro.com/products/motherboard/xeon/c600/x10srl-f.cfm
That is not maxed out. Not by a long, long shot.
http://www.supermicro.com/manuals/motherboard/C612/MNL-1585.pdf
On page 2-15 of the manual, it says that you can put 128GB LRDIMM modules in each of the 8 memory slots for a total of 1TB of RAM.

Thanks for pointing this out to me, the system was purchased with 64GB RAM each and then they decided to upgrade to 256GB because of the terrible performance. Again my predecessor just pointed to 45Drives and said 'it should have come configured right the first time'. It was then, they upgraded the memory. I will look into what it will take to upgrade that RAM because if it is what I think it is, (8x32GB) Sticks, then we will be sitting on alot of VERY expensive RAM that we cannot re-appropriate. and would need to replace with 8x64 or 8x128.

Chris Moore · Jul 24, 2018

onigiri said:
It is a new era at my company. I am the type of person that like to learn how things work and do what I can myself. My predecessor paid others to just do it and pointed fingers away from himself.

That is how I am, I would want to build the system from parts and understand how it works. My boss would rather buy a solution from NetApp and have someone else to blame if it doesn't work.
The 45drives system was already here when I started this position, because one of the department heads got tired of waiting for my boss and went around him to buy it.
We need more space, so we are working on getting 2 more systems, this time they will be Supermicro systems. It will be an interesting time, copying all the data over to the other two systems.
I was only able to get my boss to compromise on a factory server with FreeNAS as the operating system because he will still have me to blame for the situation if it doesn't work. Well, that and the fact that the quotes from vendors like Dell, HP, and others; were over 100k each. We wanted to get 3 systems, but there was no way to fit even one of those into the budget.

Ogmore · Jul 24, 2018

Chris Moore said:
We need more space, so we are working on getting 2 more systems, this time they will be Supermicro systems.

I know a few big GIS companies that are building Supermicro systems for their environment.

Chris Moore · Jul 24, 2018

Ogmore said:
I know a few big GIS companies that are building Supermicro systems for their environment.

This is the kind we ordered, 2 of them, and they should be in later this year:

This is the page on Supermicro's site that talks about the expansion chassis:
http://supermicro.com/products/chassis/4U/946/SC946SE1C-R1K66JBOD
and
http://supermicro.com/products/chassis/4U/946/SC946SE2C-R1K66JBOD
the second one there is the dual expander backplane model that can be setup with redundant management nodes, but you won't need that with FreeNAS.

AlainD · Jul 28, 2018

Isn't it an option to use a SSD based pool for the very small files?

Chris Moore · Jul 28, 2018

AlainD said:
Isn't it an option to use a SSD based pool for the very small files?

There is no way to have small files go to one pool and large files go to another. The Geo Database that ArcGIS uses can be huge with millions of files and the vast majority of those files will be tiny but there are also files that are not tiny. The server I have is storing about 290TB of data and there are some significant problems with the way my predecessor configured it, because they were configuring the storage as if it were an archive (cold storage), they were not aware of how it would be accessed. It has gzip-9 compression for one thing, that makes it slower just because of the way gzip compression works. Then it also has the vdev configuration issue that @onigiri had which slows the system down also. I have arranged for the people I work for to obtain new servers so the pool can be configured in a better way and the data can be transferred over. Then the existing server can be used as an archive, the way it was setup to be used to begin with.

MatthewSteinhoff · Jul 30, 2018

Chris Moore said:
It has gzip-9 compression for one thing, that makes it slower

You know you can change that live, right? It won't decompress what is there but whatever is written going forward will be in your new compression format.

Whenever we build a new server and migrate large amounts of data, we set the pool to gzip-9 while filling. Then, once everything has been copied, we switch to the recommended lz4. Static data that never changes - and there always seems to be more than you think - will sit around as gzip-9 taking up the minimum amount of space. Meanwhile, data that is actively written will be laid down in the performance-positive lz4. Since gzip decompresses quickly and efficiently, there doesn't seem to be a performance hit on reading the highly-compressed gzip data.

Cheers,
Matt

Chris Moore · Jul 30, 2018

MatthewSteinhoff said:
Whenever we build a new server and migrate large amounts of data, we set the pool to gzip-9 while filling. Then, once everything has been copied, we switch to the recommended lz4. Static data that never changes - and there always seems to be more than you think - will sit around as gzip-9 taking up the minimum amount of space. Meanwhile, data that is actively written will be laid down in the performance-positive lz4. Since gzip decompresses quickly and efficiently, there doesn't seem to be a performance hit on reading the highly-compressed gzip data.

That is an interesting idea. I knew about the new writes being lz4 after switching the compression, but I had not considered doing gzip-9 and then switching to lz4 as a way to make data that is infrequently used take less space.

onigiri · Jul 30, 2018

This is my problem:

Right now, Production is in full swing and we have several applications that have very long running processes. And if these things fail for any reason, they need to be restarted. When something that takes 2 weeks to process, fails after 5 days, a domino effect happens and pushes back production.

Here is my CrystalMark during production:

That Q8, Q32 write is a giant hamstring to everything we do.

I have confirmed that I am getting near max line speed (9.7Gbps) for network performance.

While the sequential is amazing and outperforms the local SSD by almost 80%, the poor Random Write performance is cataclysmic.

HoneyBadger · Jul 30, 2018

You have 60 heavy GIS users hitting these two servers, the workload pattern one both is a sustained mix of small-block reads and write, your working set is way bigger than your ARC based on your cache hit%, and you're using spinning disks in parity vdevs.

(I think this is where the kids use that "OOF" meme.)

Quite simply, I believe you're asking too much from those poor disks on the back end of your pool. I imagine that with async you're able to absorb a small amount of incoming writes into a transaction group, but it very quickly chokes itself trying to actually flush it out to your overworked disks.

Pop an ssh session, do gstat -p and have a look at your ms/r and ms/w (milliseconds per read/write) columns, as well as the throughput numbers.

Your only shot at better performance with this config mirrors in a 30x2 setup, but that gives you only about 263TiB of usable space at the end of things. Usually a relevant question I suppose is "how much data do you have, and how full are these pools?"

Other than that, I don't really have much here other than your workload being a poster child for solid state.

onigiri · Jul 30, 2018

Thanks for your response HoneyBadger.

Here is a screengrab of gstat -p
It is constantly changing but there are always 1-2 drives in the red during refresh.

To respond to your question about how much data is in the pools. Right now we are moving data from one server to another to reconfigure the drives from 4x15 to 10x6 vdev config. All of the data does not fit on one server that is roughly 380TB. We have to put some data on another server, which is an EMC Isilon, that data that is living on the Isilon that shouldnt be right now is about 60T.

Important Announcement for the TrueNAS Community.

New to FreeNAS - Need Help Assessing what I have inherited

Dabbler

Guru

Dabbler

Guru

Guru

Hall of Famer

Dabbler

Hall of Famer

Guru

Dabbler

Hall of Famer

Cadet

Hall of Famer

Contributor

Hall of Famer

Guru

Hall of Famer

Dabbler

actually does care

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "New to FreeNAS - Need Help Assessing what I have inherited"

Similar threads