2x 256GB SSD L2ARC with 32GB/64GB RAM

LubomirZ · Oct 22, 2015

guys, in our TESTING lab : 28GB physical RAM (soon to be expanded to 64GB), 2x dual-port 10Gbit Brocade BR1020 NICs in ethernet mode, each interface with IP on different TCP/IP subnet. iSCSI to five ESX6.0 hosts load balanced with MPIO, FreeNAS 9.3.1 fully updated. Six 3TB Hitachi 7200rpm drives (three RAID1 zvols due to VAAI combined together to single volume, yes I know this is not best for redundancy, default LZ4 compression, atime off, no dedup). No NFS at all. Asus X99-S motherboard X99, i7-5820k. At the moment I'm booting from SSD, but I'll change it to USB (mirrored).

I have two SSDs, 256GB Samsung 850Pro and 256GB Crucial BX100, which I'd like to use as L2ARC. Samsung punches 90.000+ random I/O 4kB read IOPS, with 25% overprovisioning 40.000 random 4kB write IOPS. I'm not worried about endurance. I know this is not PCI-e NVMe performance, but that level of performance would be excellent for our usage.

Soon we will be running virtual machines with primary focus on VMware infrastructure things - this is our VMware lab, so random I/O pattern. No SQL databases, no big data, no movies, nothing sequential. I don't expect to be too much write intensive there, just regular stuff. VMs will be backed up to 6TB disk in external box every night so I'm not too worried about loosing the pool - if it happens, yeah it happens. No customers, no SLAs/contracts. Break-it-and-fix-it lab.

My intention is to accelerate HDD layer which obviously is sssslow by default. I have six disks, providing 100 random IOPS each at best. Our working dataset size is unknown at this moment because we are just building this, but definitely much more than 64GB [RAM] I will have. That drives me to L2ARC usage.

My humble question for now is L2ARC sizing. I feel I saw the "5x RAM = L2ARC max" rule, so in my case that would be around 300GB. Can I break that rule ? My idea is "let's sacrifice SOME more RAM for much bigger SSD caching tier (2x 256GB if possible) ". I know RAM is the primary performance factor, but our stuff won't fit in so I'd be more than happy to use 250GB SSD read-cache, possibly both SSDs so 2x 250GB SSD read cache. I believe to be able to use two physical SSDs as L2ARC for HDD tier - can I ? [RTFM, ZFS Primer, yes I can ; the question is now the capacity which will be 500GB]

Calculation : iSCSI, 4kB blocks, 180 bytes RAM used per block (saw it somewhere here on forum). If I use 250GB as L2ARC, I'd consume approx. 11.25GB RAM for L2ARC index. Is this calculation correct ? With 64GB RAM, I will gladly kill 11.25GB RAM to have 250GB read accelerated SSD tier.

I would HAPPILY consume additional 11.25GB RAM if I can add additional 250GB SSD read-cache and double total L2ARC capacity from 250GB to 500GB (2x 250GB) so we will have to get to spinning disks less and less. That might be a huge benefit for us performance wise. If my numbers are right, out of 64GB RAM approx. 2GB is used for system, 2x 11.25GB for L2ARC index, leaving about 39GB RAM for ARC.

Scenario ............. | Total RAM | L2ARC size .| expected max. ARC size |
no L2ARC ...........| 64GB ....... | 0 GB ............ | ~62GB |
1x 250GB L2ARC | 64GB ....... | ~250 GB ...... | ~ 50GB |
2x 250GB L2ARC | 64GB ....... | ~ 500 GB ..... | ~ 39GB | this is my preferred variant. Less ARC, but as much SSD L2ARC as I possibly can.

Be gentle with me, please, I'm FreeNAS greenhorn. Are my expectations totally off ?

Robert Trevellyan · Oct 22, 2015

LubomirZ said:
Asus X99-S motherboard X99, i7-5820k.

Before you start asking about RAM vs L2ARC, you need to select more suitable base hardware.

Mlovelace · Oct 22, 2015

Increasing you l2arc to 500gig will hurt your system performance.

cyberjock · Oct 22, 2015

First, don't even consider an L2ARC until you have 64GB of RAM, period. End of that short discussion. ;)

Even at 64GB, your ARC is going to be around 50GB or so (maybe a little less), which puts you at about 250GB. So one of those SSDs.

Also, thinking that having 10x the ARC is a great idea because you've got some justification for it in your head is totally wrong. If you starve the ARC of usable space (that is, space that isn't for the L2ARC index) you'll end up with the ARC churning through data and evicting stuff you'll need a short time later, which tanks the performance of your zpool.

As we've said 1000x times, 5x the ARC. Don't think that you're going to find a logical way to justify going with more. It doesn't work that way, it's not designed to work that way, and you won't be able to get it to work that way. ;)

If you want to have 500GB of L2ARC then upgrade to 128GB of RAM.

Just trust us on this.

LubomirZ · Oct 22, 2015

sure I will listen to your advices && big thanks guys.

It's pretty easy to compare performance without L2ARC and with 250GB L2ARC - we are without any pressure so I can easily and peacefully run both configurations and see what works better for us. Also, statistics are going to be our friend but as I say, we are not running workloads yet so I don't have any at this moment. I fully understand the story behind ECC and yes, we don't have them.

Sharing your knowledge and experience is tremendous source of help for all of us here. Thank you !

cyberjock · Oct 23, 2015

Well, if you aren't running workloads yet, you cannot "test" the L2ARC. The L2ARC is something that MUST be tested with real world workloads. So you've basically just told me that your tests won't really tell you how much better L2ARC will be. :P

This is probably the biggest mistake people make with benchmarking. L2ARC introduces a hybrid storage system that is a combination of flash and spinning disk. It is extremely difficult to accurately provide values without having extremely (and I stress 'extremely') detailed information about your system and its workload. Being that this is your first jump into ZFS, I can virtually guarantee you that you have no clue how to even determine those values. You don't need to feel bad about it. 99.9% of iXsystems customers that call have a hard time even quantifying their performance needs. Most overestimate by an order of magnitude or more. The only value that customers can usually be fairly accurate on is how many TB of storage they need. Ask about iops and you get numbers that are totally unrealistically high, or so low you question where the number came from. This is one of the reasons that iXsystems does so well. They are pretty good at figuring out what you actually need and rightsizing the system for your needs. :P

jgreco · Oct 23, 2015

Cyberjock said all the right words but the explanation might not have been understandable to mere mortals. ;-)

The problem with having far-too-little RAM is that ZFS never gets a good opportunity to retain blocks in the ARC long enough to quantify how useful they are. For example, if your VM that is booting up reads two blocks of data, one of which happens to be from an often-read library or DLL file, and another of which happens to be part of the boot splash screen, only one of those is worth caching out to L2ARC (or even retaining in the ARC). However, there's a torrent of blocks flowing through the system at a hellish pace, and it is easy for that DLL file block to get flushed out of the ARC before it is seen to be accessed a second time.

Now in theory you can tune the L2ARC to soak all that flushed data up. Yay, great. However, if that block gets hit while in L2ARC, then it has to be read back into the ARC for delivery to the client, which in turn causes the ARC to flush some other bit of data. You're causing a lot of stress on the L2ARC device because the only real hope this has of working is to move a hell of a lot of data both to and from the L2ARC very quickly, and you're introducing a lot of pain and thrashing in the ARC as well. It can kind-of be made to work, if the numbers aren't too obscenely out of range, and you're careful about tuning, and you don't mind burning P/E cycles on yout L2ARC device, but the better choice is to have sufficient ARC to the task.

It turns out that what you want in practice is to have enough RAM that ZFS can often-to-usually-and-preferably-almost-always identify blocks that would be useful to store in L2ARC while those blocks are still residing in ARC from their first access. And it turns out the process is awful-messy, for many of the same reasons I talk about in this post: https://forums.freenas.org/index.ph...res-more-resources-for-the-same-result.28178/

LubomirZ · Oct 23, 2015

absolutely understand what you are saying. I'm 3Par admin running some 10000 series, 7400, 20850 all flash, that kind of stuff. I deal with configurations counting 400+ spinning drives, dozens of SSDs and 100k+ IOPS daily including Sundays (unfortunately).

My real problem is lack of experience with FreeNAS/ZFS. We needed something REALLY cheap and we needed that fast (can't wait three weeks for SuperMicro to deliver and it's nowhere near the price I paid), heh so we built it. I know it's not the best in the world and everybody here accepts it. Yes, it doesn't have ECC.

>> Well, if you aren't running workloads yet, you cannot "test" the L2ARC. The L2ARC is something that MUST be tested with real world workloads.

totally agree, no discussion about it. That's why I openly said from the very first beginning : no workloads at this second, that's why I wrote "statistics ARE GOING TO BE our friend". For sure I know ~600 IOPS achievable by six consumer-class SATA drives is not going to positively surprise me :) and I won't even have those 600 in reality. It's only maturity of your product, caching and caching and more heavy caching that is going to save us.

When we have our virtual machines installed and kicking next week, I will be able to start THOROUGHLY comparing performance with and without SSD L2ARC. I hope L2ARC will do miracles.

Until then, nothing else just collecting info and listening to knowledgeable guys. I really appreciate your info, effort, time and dedication - you are tremendous value for this forum !

jgreco · Oct 23, 2015

Okay, so we won't re-emphasize the ECC thing (haha, lied, I did anyways!) but the next best thing you can do for your ZFS based system serving VM's is to heap gobs and gobs of memory on it. That link at the end of reply #7 explains a lot of what goes on. It's not a deep dive but even if you're a knowledgeable technical person it is a good reminder of what is and isn't happening within the storage system, and why the gobs of memory solution is the usual fix.

tvsjr · Oct 23, 2015

Keep in mind you also don't have 600 IOPS there. A 7.2K SATA drive is lucky to do 75 IOPS... with two vdevs, that's 150 IOPS. I'd suggest that you need to fix the underlying issues (properly size and configure the array, proper motherboard with ECC memory, etc.) before you start trying to do advanced stuff like L2ARC.

cyberjock · Oct 23, 2015

LubomirZ said:
We needed something REALLY cheap and we needed that fast (can't wait three weeks for SuperMicro to deliver and it's nowhere near the price I paid), heh so we built it. I know it's not the best in the world and everybody here accepts it. Yes, it doesn't have ECC.

Underline emphasis is mine.

Famous last words from people that have later lost their zpools. :/ You absolutely get what you pay for when it comes to building a server. If history is any indicator with others before you, I'd start polishing that resume of yours. This isn't going to end well for you.

LubomirZ · Oct 23, 2015

nobody's job is in stake... check out fourth and fifth word of #1 please. [yes, those may be famous words, too : nobody's job...]

mav@ · Oct 23, 2015

Speaking about performance for multiple VM servers, I would recommend to use 3 mirrored vdevs instead of 2 RAIDZ1. By the cost of loosing only 25% of capacity, you will get 3x read IOPS and 1.5x write IOPS. No matter how good will your L2ARC work, after reboot it will be empty, and you would definitely want to have all possible raw disk IOPS at that point.

By the way, due to more efficient small blocks allocation on mirrors then on RAIDZs you may actually save significant part of those 25% by better compression rates.

jgreco · Oct 23, 2015

Hey mav@ has there been any OpenZFS progress on persistent L2ARC or any of the other attractive changes from Oracle? ( https://blogs.oracle.com/roch/entry/it_is_the_dawning_of )

LubomirZ · Oct 23, 2015

three mirrored vdevs is what I have - wrote it at the top, just incorrectly stated "RAID1" instead of mirror. Good ol'times...

mav@ · Oct 23, 2015

jgreco said:
Hey mav@ has there been any OpenZFS progress on persistent L2ARC or any of the other attractive changes from Oracle?

Unfortunately we should forget about Oracle. They are in different universe now.

Persistent L2ARC work is periodically recalled on OpenZFS community. AFAIK so far work is not complete, or at least it is not committed, but hopefully process goes.

From other improvements, FreeBSD stable/10 recently got bunch of them, catching up present Illumos state. In particular, it got patch reducing RAM usage by L2ARC headers by almost 50% (with 8KB blocks from ~3% to ~1.5% of size). I have no plans to merge that to FreeNAS 9.x so far, but at least 10.x will have it all, when released.

jgreco · Oct 23, 2015

Well I know the work has forked and Oracle isn't contributing, but the laundry list of improvements is lust-worthy. A patch to reduce ARC usage by L2ARC headers would be awesome too.

cyberjock · Oct 23, 2015

Last I heard the hotness with OpenZFS development was with L2ARC compression, then L2ARC persistence.

jgreco · Oct 23, 2015

L2ARC compression is less meaningful to me than reducing the L2ARC header space required, but that's mostly because my applications are typically block oriented and therefore gain less from compression and more from a larger L2ARC.

Ericloewe · Oct 24, 2015

cyberjock said:
Last I heard the hotness with OpenZFS development was with L2ARC compression, then L2ARC persistence.

It's actually generalized ARC compression.

Apparently, it is planned to only work on already-compressed blocks, so pools have to be compressed. This is done to minimize overhead.

jgreco said:
L2ARC compression is less meaningful to me than reducing the L2ARC header space required, but that's mostly because my applications are typically block oriented and therefore gain less from compression and more from a larger L2ARC.

Good news for you: since compressed ARC/L2ARC removes the need for an additional checksum, the ratio of metadata to data improves from the 1.56% quoted by mav@ to 1.07%

https://drive.google.com/file/d/0B5hUzsxe4cdmbEh2eEZDbjY3LXM/view

Important Announcement for the TrueNAS Community.

2x 256GB SSD L2ARC with 32GB/64GB RAM

Dabbler

Pony Wrangler

Guru

Inactive Account

Dabbler

Inactive Account

Resident Grinch

Dabbler

Resident Grinch

Guru

Inactive Account

Dabbler

iXsystems

Resident Grinch

Dabbler

iXsystems

Resident Grinch

Inactive Account

Resident Grinch

Server Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "2x 256GB SSD L2ARC with 32GB/64GB RAM"

Similar threads