BUILD L2ARC or Enough Memory

BTW · Apr 12, 2016

Hello,
I have a question about L2ARC and should I do it with my build.

motherboard: Tyan S7012
CPU: 2 x E5620
Mem: 148GB
Controller: 2 x IBM M1015
Case: Norco RPC-2212
Drives: 12 x 7.2K SATA III

Would I really get any advantage dropping two disk for L2ARC (400GB SSD - 1GB of memory for every 10GB of L2ARC...if I remember correctly) considering the amount of memory on-board?
Thanks
B

jgreco · Apr 12, 2016

Yes. No. Maybe.

Why would you ask a question like that without providing even the most basic information about what you're using the filer for, how busy it is, etc?

BTW · Apr 13, 2016

jgreco said:
Yes. No. Maybe.

Why would you ask a question like that without providing even the most basic information about what you're using the filer for, how busy it is, etc?

It is going to be used in my lab for ESXi environment that will be running approximately 25-30 VMs ranging from infrastructure, backup, media, databases (non-transactional), mainly Linux but some Windows, etc....

Can you explain the situations where your answer would be valid considering the different use case impact on L2ARC performance?
Thanks
B

jgreco · Apr 13, 2016

L2ARC would be almost completely useless on a server soaking up backups (dedup exempted).

L2ARC would be mostly useless on a general purpose fileserver with light to moderate usage patterns.

L2ARC becomes fairly useful on a very busy departmental fileserver where the pool is being hit hard enough to be >80% busy for substantial periods.

ARC and L2ARC are usually very important on a highly fragmented VM datastore pool.

Absent any sort of context about what the purpose of the filer is, I can give you any answer on the spectrum from yes to no and be totally correct for some use model that is not yours. I don't care to waste my time explaining all the possible scenarios, because I could write a small book trying to cover them all.

By the way, how'd you end up with 148GB of RAM? 128 + 16 + 4 ...?

For an environment with 25-30 semi-active VM's, what you probably want to do is to figure out an estimate of how much data is actually being accessed on a regular basis. This is called the "working set." You can define "regular basis" as desired, that could mean "once an hour" or "once a day." And of course it is more than just a bit of a guess. You can take a little bit of guidance from the solid state hybrid disk guys for things like booting a Windows box, where they figured that 8-24GB of flash was sufficient to make their devices feel really fast. The difference with ZFS is that because it is a COW filesystem, what might seem to be linear read requests of the VM disk file are actually highly fragmented, and L2ARC is the mitigation strategy for that, so the SSHD numbers are probably a lowball estimate of what you want per VM.

So, what you want for your pool are mirror vdevs, and disks that are large enough to make sure your pool is only ~25-50% full when loaded up with the stuff you want to use it for. Keeping pool utilization low boosts write speeds by allowing contiguous space allocation. For example, if you want 8TB of datastore storage, expect to be buying 32-64TB of raw hard disk, depending on how zippy you want it to be.

If you had 8TB of datastore storage, you might figure that as much of 1/8th of that is your working set, so you definitely want to go with at least 500GB L2ARC, but you may even have enough RAM to go out to 1TB L2ARC, depending on how you configure your pool. It's very important to note that your options are contingent on your design choices. If you go putting 1TB of L2ARC on a 128GB RAM system with iSCSI and a 4K volblocksize, you will probably melt down, for example.

But, yes, unless you really want to be filling most reads from your pool storage, you do want L2ARC for VM storage. You want as much as you can safely support and reasonably afford, in most cases.

BTW · Apr 14, 2016

To answer the easy question it is actually 144GB (typo).

Thanks for the clarification examples. That helps out. I think the major part of my design will be limited to budget. I am trying to repurpose my existing memory and spinning disk (as much as possible).

For the working set, that is more difficult. My current VM environment storage is using approximately 12.6MB/s (according to FreeNAS reports). In saying that, this is only half the size I have intended for the growth once I get more space/performance. So based on what the environment is currently doing and the future growth, I would estimate 100-125% increase to these numbers (will have some VDI [not very taxed] and monitoring systems to add to the environment).

So I did read this article (https://clinta.github.io/FreeNAS-Multipurpose-SSD/) which seems like a good source and ZIL and L2ARC.

So considering what has been discussed so far and I am looking for a combination of IOPS and space (I know this is a bit of a loaded question but...):
(I read this to try to get a better understanding https://forums.freenas.org/index.ph...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/)

Use 2 bays for 2 x 250-500GB SSD (L2ARC) & 10 bays 10 x 3TB SATA (2 x 5 disk vdev with 2 zpool)
Use 2 bays for 2 x 250-500GB SSD (L2ARC) & 10 bays 10 x 3TB SATA (5 x 2 disk vdev with 5 zpool)
No L2ARC & use all 12 bays for 3TB SATA (even though you mentioned L2ARC is very helpful in VMFS stores)

Basically if this is what you had to work with, how would you set this up?

Any advice is appreciated.
Thanks
B

jgreco · Apr 14, 2016

BTW said:
So I did read this article (https://clinta.github.io/FreeNAS-Multipurpose-SSD/) which seems like a good source and ZIL and L2ARC.

Oh dear lord, no, all the bloggers who suggest this suffer from rectal-cranial inversion. Your SLOG device needs to be low latency, high performance SSD - or just don't bother with sync writes at all, if it is a lab box. Your L2ARC needs to be cheap high capacity SSD, which is constantly seeing a low volume of write traffic. These two things are opposites, and interfere with each other.

So considering what has been discussed so far and I am looking for a combination of IOPS and space (I know this is a bit of a loaded question but...):
(I read this to try to get a better understanding https://forums.freenas.org/index.ph...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/)

8 bays of 4TB HDD as mirror vdevs for your iSCSI/NFS VM pool, a 16TB pool of which you should only use 4...8TB of. 4 bays of 6TB HDD as RAIDZ2 for a 12TB datastore to hold backups. Take a 500GB SSD and use Velcro tape to stick it somewhere inside your chassis for L2ARC. Skip the SLOG, and remember that it's a lab grade setup. Be sure to shut down your VM's before rebooting the filer.

BTW · Jun 11, 2016

Ok. Been a few weeks to get a few projects off my plate to circle back around to this.
So I have changed some of the hardware to take into account of some of your suggestions (wifey not happy that the budget went out the window).

Motherboard: Tyan S7012
CPU: 2 x E5620
Mem: 144GB
Controller: 3 x IBM M1015 (16 x SATA III & 1 SSD)
Case: Norco RPC-2212
Drives: 16 x 7.2K SATA III (3TB)
1 x SSD (240GB L2ARC)
1 x Intel SSD 750 (400GB SLOG)

Before anything is said, I have 16 x 3TB SATA III drives cause that is what I had access to without going to spend another $2K. I know you recommended some different drives but, this is what I have to work with.

With this setup would you still recommend mirroring my vdevs?
1. If yes
  1. Would I create the volume via the "manual volume manager" (select the 10 drives and mirror, 1 drive spare, SSD for cache, Intel for log)
  2. Would I create the volume via the "manual volume manager" (select 2 drives at a time and create 5 volumes)
    1. If option 2 is selected, how do I present the log & cache to multiple volumes?
Would I setup 1 volume for ESXi pool (10 disks ZFS with 1 spare, cache, & log) & 1 volume for backup pool (5 disks ZFS)

Thanks
B

nightshade00013 · Jun 11, 2016

BTW said:
wifey not happy that the budget went out the window

ROFLMAO, I had wife agro as well with my build. It was a constant "How much more money is this going to cost."

1. For VM storage you will still want a pool with multiple vDev's that are mirror's. What you will start with is a single pair mirrored when you begin building the pool and then keep extending it with more mirrors. You will just select the drop down box at the top with the original volume you want to extend in the volume manager. You should end up with five vDev's in a single pool.

2. For the backup pool the consensus is that the mirrored vDev pool should never be over 50% use so 5 mirrored vDev's (10 drives) @ 3tb per mirror = ~15TB with 50% of that being 7.5TB not including overhead for ZFS. The backup pool of five 3TB drives in a raidZ2 would give approx 9TB of space not including ZFS overhead so your plan seems like it would work out fairly well.

jgreco · Jun 11, 2016

For VM storage, mirrors are really the only sensible way to go for most uses. There are some exceptions where low performance and high quantities of space are required.

I create my pools with the manual volume manager, using a procedure similar to what @nightshade00013 outlines. It's easier to force things like three way mirrors that way.

BTW · Jun 12, 2016

So I want to clarify this setup before I move on (since I have not worked with FreeNAS slog/cache before).

Does this look correct now? (attachment)

I also remember reading about configuring ESXi to perform sequential writes so it can take advantage of the SLOG. Can you point me back to that article, I cannot find it and I lost my URL list.

Thanks
B

BTW · Jun 12, 2016

I think I found the article about ESXi and sync writes.
https://forums.freenas.org/index.ph...xi-nfs-so-slow-and-why-is-iscsi-faster.12506/

Is this info still valid?
Any updated articles to this?
1. If this is still valid info
  1. how do I enable sync=always for my iSCSI volumes? (for anyone who did not read this string entirely, there is a SLOG on this system)

BTW · Jun 12, 2016

I usually find what I am looking for right after I post (like finding the mate to the single sock you had in your drawer for 3 months, after you throw it out).

I have only this test volume on my new NAS for now.

[root@cdanas002] ~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
CDANAS002_Backup01 2.38M 10.5T 153K /mnt/CDANAS002_Backup01
CDANAS002_Backup01/.system 1.32M 10.5T 166K legacy
CDANAS002_Backup01/.system/configs-2c252f0045f7400bba077e5a4017475a 153K 10.5T 153K legacy
CDANAS002_Backup01/.system/cores 153K 10.5T 153K legacy
CDANAS002_Backup01/.system/rrd-2c252f0045f7400bba077e5a4017475a 153K 10.5T 153K legacy
CDANAS002_Backup01/.system/samba4 243K 10.5T 243K legacy
CDANAS002_Backup01/.system/syslog-2c252f0045f7400bba077e5a4017475a 479K 10.5T 479K legacy
CDANAS002_VOL01 2.03T 11.1T 96K /mnt/CDANAS002_VOL01
CDANAS002_VOL01/PRODT2-CDANAS002-LUN-204 2.03T 13.2T 12.8G -
freenas-boot 870M 13.1G 31K none
freenas-boot/ROOT 857M 13.1G 25K none
freenas-boot/ROOT/9.10-STABLE-201606072003 857M 13.1G 597M /
freenas-boot/ROOT/Initial-Install 1K 13.1G 482M legacy
freenas-boot/ROOT/default 43K 13.1G 483M legacy
freenas-boot/grub 12.7M 13.1G 6.33M legacy

[root@cdanas002] ~# zfs set sync=always CDANAS002_VOL01
[root@cdanas002] ~# zfs list -o sync CDANAS002_VOL01
SYNC
always

jgreco · Jun 12, 2016

BTW said:
I think I found the article about ESXi and sync writes.
https://forums.freenas.org/index.ph...xi-nfs-so-slow-and-why-is-iscsi-faster.12506/

Is this info still valid?

Any updated articles to this?

If this is still valid info

Yes, generally speaking we don't keep things as stickies once they lose relevance. The sync write problem isn't likely to change anytime soon. It's kind of inherent in the whole virtualization thing.... a hypervisor doesn't have a reliable method to understand the importance of any given write made by a VM. So if they're important VM's doing important things, you are very likely to want sync writes, because it'd suck if you were a bank and your filer crashed half a second after someone did a wire transfer out of their account, and it never got recorded back to the balance in their account properly. And if they're unimportant VM's like home lab VM's doing useless learny things, then you really probably don't need sync writes. Most of the world falls somewhere in between.

diedrichg · Jun 13, 2016

BTW said:
Controller: 3 x IBM M1015 (16 x SATA III & 1 SSD)

IBM 1015 docs:
"The adapter has two internal mini-SAS connectors to drive up to 16 devices..."

Why do you need (3) 1015s?

danb35 · Jun 13, 2016

diedrichg said:
"The adapter has two internal mini-SAS connectors to drive up to 16 devices..."

The only way an M1015 is going to drive more than 8 devices is using a SAS expander, which can be expensive. It may be more economical to buy 3 M1015s than one plus an expander.

BTW · Jun 13, 2016

So I had a 3 factors why I have 3 RAID cards. This is just my case and not necessarily a recommendation. Each scenario is a little different.

Connectivity (danb35 hit the point)
1. My case has 16 drives laid out in 4 x 4 shelves (hot-swap). Each shelf has one mini-sas port to share between them. So the first 2 cards (4 ports) facilitate the main storage drives.
  1. The 3rd card only has the L2ARC drive connected to it via a breakout cable. Mainly cause my MB only has SATA II connectors. Was simpler (config and troubleshooting perspective) to keep all drives connected via the same source
PCIe slot
1. My MB has PCIe .2 x8 slots. This means theoretically they could push 4GB/s. (https://en.wikipedia.org/wiki/PCI_Express)
  1. If I connect 8 SATA III drives, in theory they could demand approximately 4.6GB/s. Now the PCIe slot becomes a bottle neck. This is more likely to be more of a concern if you are running SSD in these slots.
2. Intel 750 SLOG
  1. Someone might say "but your Intel 750 requires a PCIe .3 x16 slot". As that is the recommended slot for that card it will perform in a slower slot. You can potentially create a bottleneck between your card and the MB on heavily used systems (I will not be able to max out the SLOG performance with this setup but, will still be very good). I have taken that into account for my situation and if I upgrade my MB in the future, the Intel card will be good to go.
Upgrades
1. I see the M1015 a card that will out-perform my needs for some time. If I upgrade to a different MB or swap out some of the spinning disk for SSD, I am already good for throughput.
2. The FreeBSD driver support for the M1015 will more than likely outlast the support for my MB. Basically I wanted the MB to be, as much as possible, a FRU

jgreco · Jun 13, 2016

I like anyone who can provide a well-reasoned summary like this. I'll point out, however, that some of this fails into irrelevancy because we're talking about a NAS platform, which has inherently limited I/O due to the network.

Important Announcement for the TrueNAS Community.

BUILD L2ARC or Enough Memory

BTW

Dabbler

jgreco

Resident Grinch

BTW

Dabbler

jgreco

Resident Grinch

BTW

Dabbler

jgreco

Resident Grinch

BTW

Dabbler

nightshade00013

Wizard

jgreco

Resident Grinch

BTW

Dabbler

Attachments

BTW

Dabbler

BTW

Dabbler

jgreco

Resident Grinch

diedrichg

Wizard

danb35

Hall of Famer

BTW

Dabbler

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

BUILD L2ARC or Enough Memory

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Wizard

Resident Grinch

Dabbler

Attachments

Dabbler

Dabbler

Resident Grinch

Wizard

Hall of Famer

Dabbler

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "L2ARC or Enough Memory"

Similar threads