Finding suitable ZIL and L2ARC drives

cyberjock · Jul 11, 2013

scotch_tape, you are correct that you don't need to underprovision the new Intel drives, but assigning more than 1GB or 2GB to the ZIL makes a mess because you couldn't possibly use more than 2GB of a ZIL. The recommendation is to not have more than 5-10 seconds worth of maximum transfer rates for a total ZIL size. For those with 1Gb/sec LAN that comes out to about 1GB. So even if you go "overboard" with a 2GB ZIL, you are grossly oversizing it.

leonroy · Jul 11, 2013

So to get back to my question ;) what's possibly holding back performance of sync writes over NFS?

leonroy · Jul 12, 2013

Don't think that's the issue. Running dd returns a transfer rate of 360MB/s and disabling the ZIL returns a write rate of 90+MB/s.

[root@freenas] time dd if=/dev/zero of="16G.bin" bs=1024k count=16384 && sync
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 45.484006 secs (377712315 bytes/sec)
45.77 real 0.08 user 6.66 sys

jgreco · Jul 14, 2013

scotch_tape said:
A couple things. The DC S3700 series is much more under provisioned than any other drives, removing the need to extra-under-provision it. Secondly, the drive does super aggressive real time garbage collection, that is so effective that under-provisioning that drive will not give you any performance gains at all.

Have a looky at this: http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review/3

Yeah, no, you're mistaken.

The underprovisioning we seek is different than what you're thinking of. A major goal is to guarantee the availability of a complete flash block (avoiding a read/update/write cycle). So for a 100GB flash device, you'd need to have overprovisioning on the order of 8x or 16x, and I'm pretty certain that Intel is not sticking 800GB or 1.6TB of flash into its 100GB flash devices. Because a SLOG device's workload is 100% writes, and the garbage collection feature you are believing is "so effective" relies on the controller being given a break to actually do that now and then, we can look at the underlying technology to actually understand how to accomplish what we want.

Let's take a hypothetical 100GB flash device. It contains 200GB of flash, underprovisioned, and uses a 4KB page size. Now, I'm going to start writing 512 byte sectors to it, sync, which is arguably one of the most challenging possible workloads. And this write stream happens at full speed, leaving the device no time to do background garbage collection. So a 4KB page is allocated for each 512B sector, and this proceeds at full speed fifty million times, at which point all 200GB of flash pages have been used once, but only 25GB of data has been written to that SSD.

So now, if you do not underprovision my-way, and instead are busy believing the marketing hype like you, this is where technical reality comes and beats you over the head.

Because if you underprovisioned my-way, the sectors you are writing have already wrapped twice, and so the drive is still seeing a large pool of free-but-dirty pages out there.

But your way, you've got a train wreck. You have to go into degraded mode.

Now, fortunately for you, you're probably not capable of making a setup that's stressy enough that the drive never gets a chance to do garbage collection, so you get "saved" that way. However, garbage collection still results in a read/update/write cycle, which involves a write, which reduces endurance.

So from my point of view, I'm going to pick the technique that does a better job of guaranteeing page availability and also doesn't require superfluous writes. You can keep telling yourself that you don't need to do it. Doesn't bother me, but thanks for not trying to "correct" me in the future unless you have something better than marketing hype.

David E · Nov 1, 2013

Hi I wanted to follow up on this thread, Leonroy I am building a very similar system, were you able to find out what the performance bottleneck was in your system and resolve it? I'd love to hear anything else you learned!

leonroy · Nov 1, 2013

I posted a topic with the same subject on a different forum sure you can find it but long and short is that network latency + ssd latency means I'm unlikely to hit above 46 MB/s sync. If the latency of the ssd is ignored max throughput over gigabit would be in the region of 88 MB/s. There are no ssd drives with 0 latency so perhaps with a raid 0 zil I might improve my speeds but you're going to see diminishing returns.

In any case I think throughout is a red herring since VMware guests are unlikely to hit the max speed very often. The figure which is more useful is IOPs and sustaining a small queue depth. If you learn anything more post back!

jgreco · Nov 1, 2013

There are too many variables to specify a certain number; 46 is definitely just pulled out of thin air. Sizing of NFS buffers, size of transactions pushed, latency of the SSD device (including availability of pages), any latency associated with the controller, etc. Each of these will significantly impact.

Two things that have the potential to help are jumbo frames (potentially reduces network latency) with suitable buffer sizes and avoiding the use of SSD for SLOG. It is possible to abuse the write cache on a RAID controller instead, though sadly it'll want some disks attached to back that up. Heh.

Important Announcement for the TrueNAS Community.

Finding suitable ZIL and L2ARC drives

cyberjock

Inactive Account

leonroy

Explorer

leonroy

Explorer

jgreco

Resident Grinch

David E

Contributor

leonroy

Explorer

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

Finding suitable ZIL and L2ARC drives

cyberjock

Inactive Account

leonroy

Explorer

leonroy

Explorer

jgreco

Resident Grinch

David E

Contributor

leonroy

Explorer

jgreco

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Finding suitable ZIL and L2ARC drives"

Similar threads