Finding suitable ZIL and L2ARC drives

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
scotch_tape, you are correct that you don't need to underprovision the new Intel drives, but assigning more than 1GB or 2GB to the ZIL makes a mess because you couldn't possibly use more than 2GB of a ZIL. The recommendation is to not have more than 5-10 seconds worth of maximum transfer rates for a total ZIL size. For those with 1Gb/sec LAN that comes out to about 1GB. So even if you go "overboard" with a 2GB ZIL, you are grossly oversizing it.
 

leonroy

Explorer
Joined
Jun 15, 2012
Messages
77
So to get back to my question ;) what's possibly holding back performance of sync writes over NFS?
 

leonroy

Explorer
Joined
Jun 15, 2012
Messages
77
Don't think that's the issue. Running dd returns a transfer rate of 360MB/s and disabling the ZIL returns a write rate of 90+MB/s.

[root@freenas] time dd if=/dev/zero of="16G.bin" bs=1024k count=16384 && sync
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 45.484006 secs (377712315 bytes/sec)
45.77 real 0.08 user 6.66 sys
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
A couple things. The DC S3700 series is much more under provisioned than any other drives, removing the need to extra-under-provision it. Secondly, the drive does super aggressive real time garbage collection, that is so effective that under-provisioning that drive will not give you any performance gains at all.

Have a looky at this: http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review/3

Yeah, no, you're mistaken.

The underprovisioning we seek is different than what you're thinking of. A major goal is to guarantee the availability of a complete flash block (avoiding a read/update/write cycle). So for a 100GB flash device, you'd need to have overprovisioning on the order of 8x or 16x, and I'm pretty certain that Intel is not sticking 800GB or 1.6TB of flash into its 100GB flash devices. Because a SLOG device's workload is 100% writes, and the garbage collection feature you are believing is "so effective" relies on the controller being given a break to actually do that now and then, we can look at the underlying technology to actually understand how to accomplish what we want.

Let's take a hypothetical 100GB flash device. It contains 200GB of flash, underprovisioned, and uses a 4KB page size. Now, I'm going to start writing 512 byte sectors to it, sync, which is arguably one of the most challenging possible workloads. And this write stream happens at full speed, leaving the device no time to do background garbage collection. So a 4KB page is allocated for each 512B sector, and this proceeds at full speed fifty million times, at which point all 200GB of flash pages have been used once, but only 25GB of data has been written to that SSD.

So now, if you do not underprovision my-way, and instead are busy believing the marketing hype like you, this is where technical reality comes and beats you over the head.

Because if you underprovisioned my-way, the sectors you are writing have already wrapped twice, and so the drive is still seeing a large pool of free-but-dirty pages out there.

But your way, you've got a train wreck. You have to go into degraded mode.

Now, fortunately for you, you're probably not capable of making a setup that's stressy enough that the drive never gets a chance to do garbage collection, so you get "saved" that way. However, garbage collection still results in a read/update/write cycle, which involves a write, which reduces endurance.

So from my point of view, I'm going to pick the technique that does a better job of guaranteeing page availability and also doesn't require superfluous writes. You can keep telling yourself that you don't need to do it. Doesn't bother me, but thanks for not trying to "correct" me in the future unless you have something better than marketing hype.
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
Hi I wanted to follow up on this thread, Leonroy I am building a very similar system, were you able to find out what the performance bottleneck was in your system and resolve it? I'd love to hear anything else you learned!
 

leonroy

Explorer
Joined
Jun 15, 2012
Messages
77
I posted a topic with the same subject on a different forum sure you can find it but long and short is that network latency + ssd latency means I'm unlikely to hit above 46 MB/s sync. If the latency of the ssd is ignored max throughput over gigabit would be in the region of 88 MB/s. There are no ssd drives with 0 latency so perhaps with a raid 0 zil I might improve my speeds but you're going to see diminishing returns.

In any case I think throughout is a red herring since VMware guests are unlikely to hit the max speed very often. The figure which is more useful is IOPs and sustaining a small queue depth. If you learn anything more post back!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There are too many variables to specify a certain number; 46 is definitely just pulled out of thin air. Sizing of NFS buffers, size of transactions pushed, latency of the SSD device (including availability of pages), any latency associated with the controller, etc. Each of these will significantly impact.

Two things that have the potential to help are jumbo frames (potentially reduces network latency) with suitable buffer sizes and avoiding the use of SSD for SLOG. It is possible to abuse the write cache on a RAID controller instead, though sadly it'll want some disks attached to back that up. Heh.
 
Status
Not open for further replies.
Top