msmollin
Cadet
- Joined
- May 11, 2014
- Messages
- 5
Hey everyone. I'm delurking here with a question regarding ESXi + FreeNAS. Really it's about advice and direction with how to increase the performance of my system. Here's some background:
For the better part of 2 years I was using ESXi 4.1 with FreeNAS 9.0. I upgraded FreeNAS to 9.2.0 when that release dropped, with absolutely no issues. I really love this NAS software. The ESXi host mounts via NFS a share off the FreeNAS box for its virtual machines without issue, and I've got ZFS doing hourly snapshots with replication to an attached USB hard drive. Everything was and continues to work well.
Now recently I upgraded the host to ESXi 5.1 (yes I realize 5.5 is out, but there was some weirdness with licensing + dropping support for the C# client that pushed me in the direction of 5.1 for now). The old host only had 8GB of RAM, so it got congested once Minecraft wanted to eat 3GB all by itself. The new host has 32 GB RAM, so now I can give all my machines at least 2 if not 4 (I run a zimbra instance which also eats RAM) and everything is now chugging along very happily.
What I've noticed since moving to the new host hardware is the data store write latency seems higher than I'd like - an average of 150ms. Read latency is great, normally below 5ms and spikes to 75 (I'm guessing when replication bangs on the FreeNAS proc - more on that in a bit). I'd really like to get the write latency average under 100ms, as I notice on some of my VMs that SSH is slow to open a session, and sometimes is slow to respond to terminal, slow to write small files, and this is over the local network so it's not an internet-related problem. Also it's fairly consistent - that 150ms doesn't spike too often.
So here's the current setup:
ESXi Host:
Supermicro X7DBR-3 2x Quad-Core Xeon E5450 3.0GHz w/32GB RAM
No drives in box
ESXi v5.1
FreeNAS Server
HP MicroServer Gen7 AMD Turion w/ 8GB RAM
4x 500GB 2.5" WD Blacks in a ZFS "RAID10" aka two mirrored VDEVs striped together, effectively 1 TB in capacity
FreeNAS 9.2.0
Network:
8 port TRENDnet Gig-E unmanaged switch that has the NAS and ESXi management side of the host plugged in, and uplink to the rest of the network. The VMs are internet facing inside a DMZ, so they go through the other physical interface on the SM box.
So yeah, there's a few ways I could go here. I've read cyberjock's n00b manual front to back, and read the FreeNAS manual (albeit some of the stuff in there needs updating - like the part about root != administrator, which is no longer the case). I've also read several bug reports about ESXi + ZFS which seems to mostly have been resolved in the latest ZFS updates, which I am up to date on as far as 9.2.0 will let me go. Here's paths forward I see:
1. Obviously the Microserver is a little underpowered on processor and RAM. The replication to USB is done via the SSH subsystem in the GUI, so anytime replication hits, both little cores goto 100%, which correlates with the read latency spikes. Yeah I could fix this by setting up the replication job in the CLI, but I really don't want to go behind the GUI's back if I can avoid it. Also, read latency really isn't an issue for 90% of the time. Should I go for something with more RAM / Processor? Would that help the write latency at all?
2. The disks are 7200 RPM consumer drives. which at 2.5" is better than similar 3.5" drives, but still not great. I could move to 3.5" WD Reds, or 2.5" WD Raptors. Would that help? If I got an IBM 1015 I could goto 15K SAS drives, but that kind of cost outlay for a setup that doesn't really garner me any income would be tough to justify.
3. I've read a dedicated ZIL on a small SSD could be beneficial to ESXi loads. I'm running 5 VMs currently, and the Zimbra & Minecraft VMs are fairly busy. Downside here is I'm out of SATA ports on that Microserver, so I'd have to get a HBA to support anything - probably an IBM 1015. Thoughts here?
4. I could upgrade to 9.2.1.x series - there are some ZFS updates in it, but I'd really prefer to not goto 9.2.1.x as it seems like it's a little unstable in the Samba department, which I do turn on occasionally to move ISOs onto the datastore for ESXi to use.
Anything I'm missing? I could also get a real Intel NIC for the HP Server, but my experience in enterprise networking tells me usually that's the last thing you look at. The broadcom NIC in that server is OK - not as bad as a Realtek but certainly not a good intel chip either. I'm not having throughput issues either - I can easily pull 100 MB/s and push close to that. It's this latency that has me scratching my head and wondering if I can do better. Could be the buffer on the NIC is getting saturated, but seems unlikely.
I appreciate any direction anyone can give. Thanks!
--EDIT--
Fixed my RAID10 dyslexic description. Tis what I get for late night typing.
For the better part of 2 years I was using ESXi 4.1 with FreeNAS 9.0. I upgraded FreeNAS to 9.2.0 when that release dropped, with absolutely no issues. I really love this NAS software. The ESXi host mounts via NFS a share off the FreeNAS box for its virtual machines without issue, and I've got ZFS doing hourly snapshots with replication to an attached USB hard drive. Everything was and continues to work well.
Now recently I upgraded the host to ESXi 5.1 (yes I realize 5.5 is out, but there was some weirdness with licensing + dropping support for the C# client that pushed me in the direction of 5.1 for now). The old host only had 8GB of RAM, so it got congested once Minecraft wanted to eat 3GB all by itself. The new host has 32 GB RAM, so now I can give all my machines at least 2 if not 4 (I run a zimbra instance which also eats RAM) and everything is now chugging along very happily.
What I've noticed since moving to the new host hardware is the data store write latency seems higher than I'd like - an average of 150ms. Read latency is great, normally below 5ms and spikes to 75 (I'm guessing when replication bangs on the FreeNAS proc - more on that in a bit). I'd really like to get the write latency average under 100ms, as I notice on some of my VMs that SSH is slow to open a session, and sometimes is slow to respond to terminal, slow to write small files, and this is over the local network so it's not an internet-related problem. Also it's fairly consistent - that 150ms doesn't spike too often.
So here's the current setup:
ESXi Host:
Supermicro X7DBR-3 2x Quad-Core Xeon E5450 3.0GHz w/32GB RAM
No drives in box
ESXi v5.1
FreeNAS Server
HP MicroServer Gen7 AMD Turion w/ 8GB RAM
4x 500GB 2.5" WD Blacks in a ZFS "RAID10" aka two mirrored VDEVs striped together, effectively 1 TB in capacity
FreeNAS 9.2.0
Network:
8 port TRENDnet Gig-E unmanaged switch that has the NAS and ESXi management side of the host plugged in, and uplink to the rest of the network. The VMs are internet facing inside a DMZ, so they go through the other physical interface on the SM box.
So yeah, there's a few ways I could go here. I've read cyberjock's n00b manual front to back, and read the FreeNAS manual (albeit some of the stuff in there needs updating - like the part about root != administrator, which is no longer the case). I've also read several bug reports about ESXi + ZFS which seems to mostly have been resolved in the latest ZFS updates, which I am up to date on as far as 9.2.0 will let me go. Here's paths forward I see:
1. Obviously the Microserver is a little underpowered on processor and RAM. The replication to USB is done via the SSH subsystem in the GUI, so anytime replication hits, both little cores goto 100%, which correlates with the read latency spikes. Yeah I could fix this by setting up the replication job in the CLI, but I really don't want to go behind the GUI's back if I can avoid it. Also, read latency really isn't an issue for 90% of the time. Should I go for something with more RAM / Processor? Would that help the write latency at all?
2. The disks are 7200 RPM consumer drives. which at 2.5" is better than similar 3.5" drives, but still not great. I could move to 3.5" WD Reds, or 2.5" WD Raptors. Would that help? If I got an IBM 1015 I could goto 15K SAS drives, but that kind of cost outlay for a setup that doesn't really garner me any income would be tough to justify.
3. I've read a dedicated ZIL on a small SSD could be beneficial to ESXi loads. I'm running 5 VMs currently, and the Zimbra & Minecraft VMs are fairly busy. Downside here is I'm out of SATA ports on that Microserver, so I'd have to get a HBA to support anything - probably an IBM 1015. Thoughts here?
4. I could upgrade to 9.2.1.x series - there are some ZFS updates in it, but I'd really prefer to not goto 9.2.1.x as it seems like it's a little unstable in the Samba department, which I do turn on occasionally to move ISOs onto the datastore for ESXi to use.
Anything I'm missing? I could also get a real Intel NIC for the HP Server, but my experience in enterprise networking tells me usually that's the last thing you look at. The broadcom NIC in that server is OK - not as bad as a Realtek but certainly not a good intel chip either. I'm not having throughput issues either - I can easily pull 100 MB/s and push close to that. It's this latency that has me scratching my head and wondering if I can do better. Could be the buffer on the NIC is getting saturated, but seems unlikely.
I appreciate any direction anyone can give. Thanks!
--EDIT--
Fixed my RAID10 dyslexic description. Tis what I get for late night typing.