Carl Thompson
Dabbler
- Joined
- May 22, 2017
- Messages
- 15
Hello, this is my first post on the forum. Please be gentle. I have several questions and need some advice on performance tuning. Thank you in advance. While this is my first post I've read many, many posts here as well as all over the internet so I think I have a reasonable grasp of the fundamentals of FreeNAS and ZFS. Thanks @jgreco, @cyberjock and all others who have posted here as I have found your posts absolutely invaluable. You guys rock.
I have several identical FreeNAS servers which are used almost exclusively as backend storage for VMs in a test lab with 200+ users (VM creators). The servers were built from older enterprise hardware we've acquired extremely cheaply because cost is an issue for the lab. Here's what the hardware in each of the servers looks like:
Mostly.
The big problem is once in a while (every couple of weeks or so) a server will decide to take a vacation for some minutes. The server always comes back but in the meantime VMware freaks out and vSphere HA reboots all the VMs using the datastore after a few minutes. Even if the VMs don't get rebooted most of them are Linux based and Linux acts pretty stupid when its disk goes away. (I can't write to my disk for a few minutes? I'll just remount it read-only forever!) I believe this is the same problem that @jgreco has posted about several times. Aside from these episodes while the VMs generally "feel" OK my latency graphs are spikier and the data have more outliers than I'd like.
So what can I do about it? I've obviously made some mistakes in the initial design and this may be a good time to rectify those as I put new servers in production and rebuild older ones.
I wonder if 144GB of RAM just isn't enough with dedup on and 560GB of L2ARC. From what I understand only 1/4 of the ARC is allocated to metadata caching and I've read that both the dedup tables and L2ARC indices count as metadata. My total ARC size on these servers is about 120GB but if I'm calculating correctly my dedup table on one of the servers is about 40GB by itself. So I'd guess I either need to bump up the memory significantly or change the percentage of the ARC that can be used for metadata. What's the best way to do that? Is changing this ratio a good idea?
Another thought is that I may be beyond what NFS is capable of with FreeNAS and VMware. I've heard (here) that VMware's NFS implementation is a second class citizen and that iSCSI is the way to go for VMware. I'm ready to switch things to iSCSI if the consensus is that its performance and consistency is better for VMware (though this will involve some pain). If so then I'm also looking for best practices advice for iSCSI (I'm an iSCSI n00b). I also am considering the iSCSI switch for VAAI integration benefits.
Would it be advisable to combat the server "vacations" by tuning the maximum transaction group size or timeout lower? Wouldn't that increase fragmentation? Is fragmentation an issue I really need to worry about?
This post is already too long so I'll end it here. I'll post more information from the servers and more questions in later posts.
Thank you,
Carl Thompson
I have several identical FreeNAS servers which are used almost exclusively as backend storage for VMs in a test lab with 200+ users (VM creators). The servers were built from older enterprise hardware we've acquired extremely cheaply because cost is an issue for the lab. Here's what the hardware in each of the servers looks like:
- Tyan S7012 motherboard (LGA1366)
- 2x Intel Xeon L5630 CPUs (2.13GHz, 4 cores each)
- 144GB ECC DDR3 RAM (Mixed dual-rank from a variety of sources)
- 2x LSI 9211 HBAs in IT mode (or 2x Dell H200s HBAs cross-flashed to LSI 9211 IT firmware)
- 14x HGST Ultrastar 7K4000 4TB enterprise SATA 7200RPM mechanical drives
- 2x Intel S3710 400GB enterprise SSDs
- Intel X540-T2 10 gigabit ethernet card on dedicated 10 gigabit storage network
- 16 bay 3U chassis
- 2x Redundant 700W power supplies
- FreeNAS 9.10
- 6x 4TB drive mirror VDEVs (12 drives total in 6 mirror VDEVs)
- 2x 4TB drive hot spares
- 2x 20GB partitions from SSDs mirrored for SLOG (20GB total)
- 2x 280GB partitions from SSDs striped for L2ARC (580GB total)
- Compression is ON (~1.5 ratio)
- Dedup is ON (~2.0 ratio)
- Autotune is ON (no other tuning done yet)
- Currently using NFS to serve clients
- Using sync=always on VM datasets
- Most use is from a VMware 6.5 cluster (~10 hosts)
- Some use is from from an oVirt 4.1 cluster (5 hosts)
- Data is snapshotted every 4 hours and replicated to a bigger, slower FreeNAS server
- All servers are under 40% of total capacity (at most 8TB allocated)
- ARC hit rate is 93% to 99% currently (depending on server)
- L2ARC hit rate is 25% to 40% currently (depending on server)
- Fragmentation is between 45% and 75% currently (depending on server)
Mostly.
The big problem is once in a while (every couple of weeks or so) a server will decide to take a vacation for some minutes. The server always comes back but in the meantime VMware freaks out and vSphere HA reboots all the VMs using the datastore after a few minutes. Even if the VMs don't get rebooted most of them are Linux based and Linux acts pretty stupid when its disk goes away. (I can't write to my disk for a few minutes? I'll just remount it read-only forever!) I believe this is the same problem that @jgreco has posted about several times. Aside from these episodes while the VMs generally "feel" OK my latency graphs are spikier and the data have more outliers than I'd like.
So what can I do about it? I've obviously made some mistakes in the initial design and this may be a good time to rectify those as I put new servers in production and rebuild older ones.
I wonder if 144GB of RAM just isn't enough with dedup on and 560GB of L2ARC. From what I understand only 1/4 of the ARC is allocated to metadata caching and I've read that both the dedup tables and L2ARC indices count as metadata. My total ARC size on these servers is about 120GB but if I'm calculating correctly my dedup table on one of the servers is about 40GB by itself. So I'd guess I either need to bump up the memory significantly or change the percentage of the ARC that can be used for metadata. What's the best way to do that? Is changing this ratio a good idea?
Another thought is that I may be beyond what NFS is capable of with FreeNAS and VMware. I've heard (here) that VMware's NFS implementation is a second class citizen and that iSCSI is the way to go for VMware. I'm ready to switch things to iSCSI if the consensus is that its performance and consistency is better for VMware (though this will involve some pain). If so then I'm also looking for best practices advice for iSCSI (I'm an iSCSI n00b). I also am considering the iSCSI switch for VAAI integration benefits.
Would it be advisable to combat the server "vacations" by tuning the maximum transaction group size or timeout lower? Wouldn't that increase fragmentation? Is fragmentation an issue I really need to worry about?
This post is already too long so I'll end it here. I'll post more information from the servers and more questions in later posts.
Thank you,
Carl Thompson
Last edited: