Hello all,
First, sorry for the wall of text, I wanted to give enough information to be useful but not put you to sleep!
I recently started working with a company that purchased two huge NAS Boxes from a company called 45 Drives back in December of 2017 and each of them are running Freenas 11.1u5 . From what I understand these NAS boxes were designed for Freenas. Now I have spent the better part of the last couple weeks reading through these forums and watching videos on youtube. I even built a Prometheus backend that scrapes the netdata information and then puts it into Grafana for me so that I have some historical data to work with in the future (Ill make a guide for this if someone wants me to).
Now the problem: We are constantly seeing bottlenecks and I cant quite put my finger on what it is. So what I wanted to do is reach out to the community to get some feedback because you all are the experts, not me. Below is going to be all the information that I currently have as well as some statistical data. If there is anything you might want to see just let me know and I can get it. What I am looking for is a general evaluation of what I have and hopefully some pointers to make my two boxes run better.
About their use: We are a GIS firm and these primarily store a massive amount of images and .gdb (geodatabase) files. There are millions (im not kidding) of tiny files and from what I can tell alot of random I/O. Because these two NAS's are separate, I cannot pool the storage and 45drives does not offer expansion shelves like IXSystems does with their TrueNAS. So for now we are storing different dataset on each server. One is hosting the majority of project data used by ArcGIS (Map and Desktop) and Simactive and the other is housing a warehouse for another application (GeoCue). We do use other applications, but heavy enough to cause any large spikes in IO.
There are two NAS boxes that are configured identically hardware wise, but the pool is configured differently. The reason behind this: My predecessor just took delivery from the NAS boxes, plugged them in and started dumping data to them. So we have been meticulously dancing around production to move all the data off of the servers to reconfigure the pool (because of bad performance). The shipped configuration was a zpool configured ad Raidz2 consisting of 4 vdevs of 15 drives. The post reconfiguration has changed it to 10 vdevs of 6 drives. I am interested to know what the difference and advantage of doing that is. My company paid a consultant to reconfigure Gemini zPool from the 4x15 to the 6x10 vdev configuration
My immediate thoughts was to add an L2Arc. We have two PCIe slots available and we can purchase a couple 905P or 4800x (PLP) to go there, but I am unsure because of the amount of random data, but honestly, could it hurt to try? From what I read a SLOG will not help us as we have sync writes turned off. Should we turn it back on and provision a part of the SSD as a SLOG? (I saw a post on how to do this)
Through reading the forums I have found some useful commands that you fine folks have asked others to for and to post their output. That along with the useful commands post I have compiled some screenshots of those commands run on both boxes. Let me know if I can provide anything else!
45 Drives Purchase and hardware
Apollo arc_summary.py
Apollo Tunables
Apollo zpool status -v
Gemini arc_summary.py
Gemini Tunables
Gemini zpool status -v
First, sorry for the wall of text, I wanted to give enough information to be useful but not put you to sleep!
I recently started working with a company that purchased two huge NAS Boxes from a company called 45 Drives back in December of 2017 and each of them are running Freenas 11.1u5 . From what I understand these NAS boxes were designed for Freenas. Now I have spent the better part of the last couple weeks reading through these forums and watching videos on youtube. I even built a Prometheus backend that scrapes the netdata information and then puts it into Grafana for me so that I have some historical data to work with in the future (Ill make a guide for this if someone wants me to).
Now the problem: We are constantly seeing bottlenecks and I cant quite put my finger on what it is. So what I wanted to do is reach out to the community to get some feedback because you all are the experts, not me. Below is going to be all the information that I currently have as well as some statistical data. If there is anything you might want to see just let me know and I can get it. What I am looking for is a general evaluation of what I have and hopefully some pointers to make my two boxes run better.
- These are 10GbE adapters on both boxes, with a 10GbE switches and NIC Cards in every production machine.
- There are on average about 60users hitting the box daily.
- Right now we are in the process of moving everything from one(Apollo) to another (Apollo) to reconfigure the zpool so Gemini is under heavier IO load
About their use: We are a GIS firm and these primarily store a massive amount of images and .gdb (geodatabase) files. There are millions (im not kidding) of tiny files and from what I can tell alot of random I/O. Because these two NAS's are separate, I cannot pool the storage and 45drives does not offer expansion shelves like IXSystems does with their TrueNAS. So for now we are storing different dataset on each server. One is hosting the majority of project data used by ArcGIS (Map and Desktop) and Simactive and the other is housing a warehouse for another application (GeoCue). We do use other applications, but heavy enough to cause any large spikes in IO.
There are two NAS boxes that are configured identically hardware wise, but the pool is configured differently. The reason behind this: My predecessor just took delivery from the NAS boxes, plugged them in and started dumping data to them. So we have been meticulously dancing around production to move all the data off of the servers to reconfigure the pool (because of bad performance). The shipped configuration was a zpool configured ad Raidz2 consisting of 4 vdevs of 15 drives. The post reconfiguration has changed it to 10 vdevs of 6 drives. I am interested to know what the difference and advantage of doing that is. My company paid a consultant to reconfigure Gemini zPool from the 4x15 to the 6x10 vdev configuration
My immediate thoughts was to add an L2Arc. We have two PCIe slots available and we can purchase a couple 905P or 4800x (PLP) to go there, but I am unsure because of the amount of random data, but honestly, could it hurt to try? From what I read a SLOG will not help us as we have sync writes turned off. Should we turn it back on and provision a part of the SSD as a SLOG? (I saw a post on how to do this)
Through reading the forums I have found some useful commands that you fine folks have asked others to for and to post their output. That along with the useful commands post I have compiled some screenshots of those commands run on both boxes. Let me know if I can provide anything else!
45 Drives Purchase and hardware
Apollo arc_summary.py
Apollo Tunables
Apollo zpool status -v
Gemini arc_summary.py
Gemini Tunables
Gemini zpool status -v
Last edited: