Jason Keller
Explorer
- Joined
- Apr 2, 2015
- Messages
- 61
Hi folks! Been a while since I was last on the forums, and I have another stumper. Hopefully @jgreco and @mav (and all of you!) have some ideas, as it's quite strange.
So I've got a pool on a storage head (HP DL380 G6, 2x E5649, 96GB memory, 16x Intel 320 series 120GB, 2x LSI 9207-8i, Chelsio T520-CR). Basic settings on the pool (LZ4 is on, no dedup, no sync, no atime). Built up a slightly overprovisioned ZVOL (1TB) and exported via iSCSI IPMP on the Chelsio card to the two 10Gbit BNT switches in our Bladecenter H chassis. I've got the blades (2x E5-2670, 96GB memory, Broadcom 57711) using Software iSCSI in vSphere 6.5 across both fabrics using IOPS=1 RR multipathing. The LUN is consumed as a VMFS6 datastore.
Some interesting bits...jumbos and toe have been tried on/off with minimal impact (toe seems to make it steadier).
So here we get to the meat of the issue...when I'm running IOmeter in one VM on one blade, I'm hitting 980MB/s at 120,000 IOPS with an average 260us latency for a 100% random 8k read run. CPU sits around 17%, and I'm only hitting about 4.5Gbit/s per 10Gbit lane.
So lets add another VM/blade into the equation! I fire up IOmeter in another VM (sitting in a different blade) and start that one up as well. My latencies double, my IOPS and bandwidth get cut in half between VMs. Ready for the worst part? CPU spikes up to 75% and stays there for the duration.
And this is where I'm puzzled - did something change in FreeNAS? I hadn't run my IOmeter instances in quite a while, and now suddenly this happened. I've read the posts about only being able to get to ~4 Gigabit in the 10Gbit thread, but that was VM to VM, this is a bare metal FreeNAS to ESXi. My previous setup had two ESXi heads direct attached to a similar storage head configuration with one lane per piece, and I was able to get 120k IOPS per host, around 970MB/s throughput, and 350us latency per 8k read using Chelsio's hardware offload iSCSI in ESXi with both hosts hitting the storage simultaneously (yes, I was scraping 1/4 million IOPS in 8k reads!). CPU never lifted above 40% or so (granted this was late 2015 or so).
I've tried 9.10.2 U3 as well as 11-RC, with no change in behavior.
If anyone has any ideas I'm all ears.
So I've got a pool on a storage head (HP DL380 G6, 2x E5649, 96GB memory, 16x Intel 320 series 120GB, 2x LSI 9207-8i, Chelsio T520-CR). Basic settings on the pool (LZ4 is on, no dedup, no sync, no atime). Built up a slightly overprovisioned ZVOL (1TB) and exported via iSCSI IPMP on the Chelsio card to the two 10Gbit BNT switches in our Bladecenter H chassis. I've got the blades (2x E5-2670, 96GB memory, Broadcom 57711) using Software iSCSI in vSphere 6.5 across both fabrics using IOPS=1 RR multipathing. The LUN is consumed as a VMFS6 datastore.
Some interesting bits...jumbos and toe have been tried on/off with minimal impact (toe seems to make it steadier).
So here we get to the meat of the issue...when I'm running IOmeter in one VM on one blade, I'm hitting 980MB/s at 120,000 IOPS with an average 260us latency for a 100% random 8k read run. CPU sits around 17%, and I'm only hitting about 4.5Gbit/s per 10Gbit lane.
So lets add another VM/blade into the equation! I fire up IOmeter in another VM (sitting in a different blade) and start that one up as well. My latencies double, my IOPS and bandwidth get cut in half between VMs. Ready for the worst part? CPU spikes up to 75% and stays there for the duration.
And this is where I'm puzzled - did something change in FreeNAS? I hadn't run my IOmeter instances in quite a while, and now suddenly this happened. I've read the posts about only being able to get to ~4 Gigabit in the 10Gbit thread, but that was VM to VM, this is a bare metal FreeNAS to ESXi. My previous setup had two ESXi heads direct attached to a similar storage head configuration with one lane per piece, and I was able to get 120k IOPS per host, around 970MB/s throughput, and 350us latency per 8k read using Chelsio's hardware offload iSCSI in ESXi with both hosts hitting the storage simultaneously (yes, I was scraping 1/4 million IOPS in 8k reads!). CPU never lifted above 40% or so (granted this was late 2015 or so).
I've tried 9.10.2 U3 as well as 11-RC, with no change in behavior.