FreeNAS on ESX

Status
Not open for further replies.

BenFL

Cadet
Joined
Jun 27, 2017
Messages
8
I know this has been discussed back and forth. We are running FreeNAS on ESX5.5 with a local boot into a FC channel drive array. We mirror the unit to another ESX server where the FC array is attached for failover. Its tested and worked everytime ..

What we do seem to notice is that over time (maybe a few weeks) the san becomes super slow. We try disabling sync which makes a huge difference in the first few days, but nothing seems to speed it up or make any changes. A reboot in the past seems to have a huge increase in performance back to normal.

Right now we have 4vCPU and 32GB ram assigned. We had disabled swap on the disks from some performance tweaks we were reading about so not sure if thats the issue or not. Once the device starts crawling its almost unusable ... and i cannot get the disk i/o at anything normal.

Currently with sync=disabled I am getting 36/35MB/s peak. When I reboot it and disable sync I will hit 350/300 easy.

Im running with 12x1tb SATA Drives in 4 RaidZ sets with a 300GB SAS Cache and 300GB SAS Log. I tried removing the log/cache and it doesnt seem to make a difference.


NAME STATE READ WRITE CKSUM
SAN40 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/eabfb9ca-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
gptid/eb7edba4-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
gptid/ec6fe31b-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
gptid/ed288400-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
gptid/edd0e41b-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
gptid/ee87b10d-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
gptid/ef3605f6-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
gptid/efeddb61-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
gptid/f096d748-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
gptid/f15170d0-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
gptid/f20c7366-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
gptid/f2b85a7a-5ce5-11e7-8c44-005056a58f4f ONLINE 0 0 0
logs
gptid/f4266076-882b-11e7-9bcc-005056a58f4f ONLINE 0 0 0
cache
gptid/fc4a0dbc-882b-11e7-9bcc-005056a58f4f ONLINE 0 0 0
spares
gptid/f36d4bb7-5ce5-11e7-8c44-005056a58f4f AVAIL
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
You mention ESX but didn't clearly define if it's used as a ESX datastore or for something else. If you are using it for a ESX datastore, then the disk topology you're using isn't the best performing.

Can you enlighten us on how you're using the pool?
 

BenFL

Cadet
Joined
Jun 27, 2017
Messages
8
Hi sorry, so the FreeNas boot is a VM on ESX, the volume itself is presented as a NFS share back to the network on a 10Gbe dedicated nic. We mounted the Fiber card hardware right into the VM and then mounted all the drives. Like I mentioned it actually seems to work fine for days , or a week and then slowly gets slower and slower. Im going to shutdown about 12 vm's on it tonight and reboot and probably the performance will go back to what we expect to see. Ill put up ifstat info which i know is network but thats how I am monitoring it , disk usage and busy time seems within reason .. (I came from nexenta so we always would watch %b/%w for overused or possible hw failures, idk if its the same with freenas but i would assume its just a zfs thing)
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
I sort of had an "ah ha" moment while thinking about this. I suspect you're exhausting the RAM because of the L2ARC device. A week sounds about right for the L2ARC device to kick in.

In order for that L2ARC device to perform it's function, pointers to what is stored on the L2ARC are up in RAM. My theory is that you don't need an L2ARC device and it will perform adequately without it.

My suggestion is to remove the L2ARC from the pool and see if you experience any kind of performance issues a week down the road.

It's safe to remove the L2ARC device from the pool and can be done via the GUI.

This is a perfect example of why the community does not recommend an L2ARC, until you start getting into 100GB+ RAM ballpark.
 

BenFL

Cadet
Joined
Jun 27, 2017
Messages
8
Hmm okay, definitely will try it .. Im going to kick it out tonight, do you think I need to reboot? Im still on 9.1 BTW if it matters.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Next time, or before you restart, post some pictures of the Memory usage reports (from the Reporting tab)
 

BenFL

Cadet
Joined
Jun 27, 2017
Messages
8
I didnt get to restart it last night, we are in the midst of hurricane prep as well :) yipee! but i plan to do it today/tonight so i will get the memory before i do it.
 

BenFL

Cadet
Joined
Jun 27, 2017
Messages
8
Here it shows full memory basically. I guess one of the techs moved it to 24GB a few weeks ago, I thought it was 32GB .. but clearly its loaded
 

Attachments

  • memory-maxed.png
    memory-maxed.png
    102.4 KB · Views: 221

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
What drive is the SAS log device?

How sustained is the traffic to the NAS?

I know you mentioned that you'd tried removing L2ARC, but you should try to correlate the L2ARC fullness vs performance. It could be too much L2ARC is slowing you down.

As an example, I found if I used 128GB of L2ARC with a 24GB FreeNAS instance that caused performance issues, but 64GB did not.

You could try removing the L2ARC, and seeing if after a reboot (to discount anonamolies), that the problems don't return. If they do return, then that fully discounts the L2ARC.

(I mention the reboot, because I've seen strange performance regressions when removing l2arc etc without the reboot)
 

BenFL

Cadet
Joined
Jun 27, 2017
Messages
8
It is a 300GB 15k RPM SAS drive, for some reason we put 73GB SSDs there and kept getting errors. I was thinking its because of SMART? Not sure, but definitely freenas doesnt like them, and yet they work fine in other devices.

I have kicked out the l2arc two days ago, performance really hasnt changed. I am scheduled to restart tonight as we have some production items there that cant be down I had to migrate them off. So after the reboot I will monitor again. Ive been using ifstat to keep an eye on it, the actual disk busy/wait/performance on the NAS is pretty low. Here is my ifstat since i took off the l2arc as you can see its not doing much in the past 2 days.

ifstat-before-reboot.png
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Sat down with Allan Jude and discussed this very thread. Your L2ARC is the most likely cause, because you're exhausting RAM with pointers and leaving nothing for the read-ahead buffer.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Also noticed that swap has been turned off in the screenshot provided. Any reason for doing that?
 
Joined
Mar 22, 2016
Messages
217
A 300GB 15K RPM SAS HDD for L2ARC? I'm not a ZFS guru but that seems like that's the problem. 1 HDD, even a 15K one can't be faster or provide more IO than your pool can I would think.

I'm not sure what the pool consists of, but I'm sure it can push out more than what 1 15K HDD can. If your ARC is full and starts using the L2ARC off that HDD wouldn't that be a pretty massive hit to performance? From what I've gathered and experienced, your L2ARC needs to be as fast as possible. After a while your L2ARC will fill your ARC with headers and your performance will likely suffer. Instead of using the super fast RAM for ARC your system starts to use the HDD for L2ARC and gets gutted.

Though I could be entirely wrong. Just casual observations.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
@BenFL, just a follow up. How'd it go with removing the L2ARC from the pool? Did you notice any improvement?
 
Status
Not open for further replies.
Top