FreeNAS9.3, RAIDZ2, iSCSI slow read

Status
Not open for further replies.

Hobbel

Contributor
Joined
Feb 17, 2015
Messages
111
The nasty part with I/O latency is it doesn't take much to go from a situation where you have ms of latency to 30+ seconds. If you get enough latency you *will* start corrupting VMs. I've seen it PLENTY of times and I can tell you it really happens and it really sucks. ESXi gives you a warning at something like 20 seconds of latency, and it's funny because people assume it means ms of latency and not actual seconds.
...
This is just meant as a warning. You might not care about performance (I didn't care about performance with my "play" VMs). But when they are constantly corrupting themselves that gets old and the whole purpose for the VMs goes out the window.

Lesson learned. Thanks.:)
Moved to 2 mirrored vdevs and now iSCSI ZVOL performance is as expected. But now and then I get latency warnings on the ESXi, which didn't occur before (e.g. from 3892 to 103082 microseconds). Have to keep an eye on that.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Just so you know, if you start getting warnings on ESXi, that means you are basically in conditions where you may start losing writes. Unfortunately there's no way for you to know you are losing writes until they are actually lost.

So you should conservatively consider any latency warning as "I've just lost data" because it's trivial to go from a situation where the warning is just a warning to the warning also including data loss.

Now you see why I say its a "do it right or don't do it". "The" warning that you are going to start losing data *is* the latency warning on ESXi. ;) So you're already in a bad way with ESXi.
 

Hobbel

Contributor
Joined
Feb 17, 2015
Messages
111
little update:
moved to vSphere 6 (a week ago). No more warnings. Performance of FreeNAS is good. :)

My next FreeNAS build will have more performance ;) (many thx @cyberjock)
 

wreedps

Patron
Joined
Jul 22, 2015
Messages
225
The nasty part with I/O latency is it doesn't take much to go from a situation where you have ms of latency to 30+ seconds. If you get enough latency you *will* start corrupting VMs. I've seen it PLENTY of times and I can tell you it really happens and it really sucks. ESXi gives you a warning at something like 20 seconds of latency, and it's funny because people assume it means ms of latency and not actual seconds.

Here at home I have a couple of "play" VMs. I tried to run them on a RAIDZ2 with 32GB of RAM (my main system.. in my sig). The problem: they kept getting corrupted because of excessive latency leading to discarded writes. ESXi will only cache writes for so long before deciding to discard them. Once you hit that threshold, life goes over the cliff pretty quickly.

It's *really* hard to swallow, but ZFS needs RAM, L2ARC, etc to get good performance (and end up with something that isn't a hop, skip, and a jump away from trashing your VMs). Trying to do VMs on ZFS is a "do it right or don't do it" because of the nastyness that can result. To make things worse, I've seen some people that thought they'd win by doing lots of snapshots. Well, when you have to combine snapshots someday that creates lots of I/O. I've seen quite a few people that had a VM with a dozen or more snapshots and the entire VM went up in smoke because partway through a snapshot merge process some writes were lost and the end result was a non-viable VM.

I've even got a system with 48GB of RAM and 3 vdevs. I run 2 VMs. A Windows 7 VM and a Linux Mint VM. Both are "appliances" for me and neither do much of anything except sit around. But I've had some times where I could hit 10+ seconds just trying to update one VM while the other was idle. When you view the ESXi latency chart it might sit at 2-5ms for days, and suddenly its a vertical line to 10+ seconds.

This is just meant as a warning. You might not care about performance (I didn't care about performance with my "play" VMs). But when they are constantly corrupting themselves that gets old and the whole purpose for the VMs goes out the window.


I am running 60 2012 R2 VMs on the following for 4 months with no issues. What is the difference in my setup to yours?
X9SRL with Xeon 2609
32GB Memory
1x Sandisk L2ARC
6x Ultrastars 4TB in 3 mirrors
2x Intel 1gbe in LAG
FreeNas 9.3

Is it the mirrors in stead of RAIDZ2?
 
Last edited:
Status
Not open for further replies.
Top