Trying to clarify best config for using ZFS box for VMs

jgreco · May 30, 2015

The functional solution is what I just said: large amounts of free space and lots of ARC.

rogerh · May 30, 2015

jgreco said:
The functional solution is what I just said: large amounts of free space and lots of ARC.

I think the question was: what do you do if you have exceeded the suggested capacity limit and everything has ground to a crawl? Will increasing capacity and memory resolve the situation or do you have to wipe the pool and start again from scratch?

jgreco · May 30, 2015

Increasing ARC will help immediately, and increasing free space will also help, though the stuff that is already fragmented will continue to read slowly.

The basic mechanism for addressing read performance issues in ZFS is to keep content in ARC, so you want to have the VM working set in ARC, and write performance is best when free space is plentiful. Note that when I say "ARC" that includes L2ARC.

Tancients · May 30, 2015

Does freeNAS follow the solaris default arc config of using 7/8ths of the total memory of the system? I also thought L2ARC was essentially stand-by data that was on the drive, ready for instant reading, but all writes would hit the drives and L2ARC cache at the same time. That's how the system can lose the L2ARC drive and only impact performance, not data integrity.

jgreco · May 31, 2015

You're a little confused, writes don't hit the L2ARC. The ARC is a read cache (which is actually what the latter two letters stand for), and recently written blocks can remain in the ARC as long as they seem possibly valuable. When the ARC is under modest pressure, rather than merely evicting blocks, those can get pushed out to L2ARC instead. Since ZFS goes to some trouble to identify what to hold on to in ARC, the idea is that the things in L2ARC are things that ZFS kinda wanted to keep in ARC but didn't have room for.

HoneyBadger · May 31, 2015

Tancients said:
As for the 1GBps, is that 200MB/s rw for a single port or for both using MPIO? I was seeing numbers citing the limitation of 200MB/s using dual gigabit multi path, but haven't had the hardware setup for a nice test run yet.

Each 1Gbps link is theoretically capable of the ~100MB/s value in both directions, since "full duplex" means it can send and receive that amount simultaneously.

Tancients said:
I meant less of it being a bug and more if there was a functional solution for it. I've got a few ZOL configs at work but they haven't been used for active storage, so I've never really been concerned about functional performance. Though if I'm going with raid 10 (instead of three-way mirror) I'll have enough space for a while so I shouldn't have to worry about it, and it'd be easy to expand the zpool up to 24 drives. Out of curiosity, is there a recommended way of counteracting or recovering it? Or is the only way to basically migrate the data off, rebuild the array, and then move it back?

There's no "zfs defrag" so you have to do as you said, migrate data off and then move it back. Rebuilding the array isn't needed since the old blocks will get freed up, but since you're totally emptying it out, you can destroy the zvol/pool/etc. FreeNAS 9.3 has support for SCSI UNMAP from ESXi as well so if you use zvols I believe you can just issue the command from a VMware host CLI to zero the blocks and come back without having to reconfigure your extents/targets/etc. Bolded part because I technically haven't done that myself yet.

jgreco · May 31, 2015

Increasing the free space helps both immediately and in the long run with the fragmentation performance issue. Immediately, the system stops having to work so durn hard to find a contiguous run of blocks to use for the operations that are being written. In the longer term, you may start to reclaim locality benefits especially on a highly fragmented pool. When doing things like OS updates, and you're writing out a new file on a VM, if you're writing 1MB and ZFS allocates contiguous blocks for that, that's very good, even though the blocks are obviously not contiguous within the context of the blocks making up the virtual machine disk. The reality is that a VM is not terribly likely to read the blocks immediately preceding that file and then continue on to reading those blocks (a la sequential traversal of all disk blocks) - this would probably incur a seek because the new updated blocks are elsewhere, not contiguous. But this doesn't really matter because the file itself is contiguous, so when someone runs that program or opens the file, minimal seek activity gets you that data quickly!

The real problem with VM service is that you can get a lot of shred-dy behaviours. Think of inodes. If you leave on the option to update file atime, every time your VM reads a file, the inode for that file gets updated, and that means (at the ZFS level) a small block of data needs to be allocated and another small block is freed. Especially for things like source code trees, this means that a real mess is made on disk, especially as the files are read many times and updated and all that. Sooner or later you end up with lots of little blocks allocated all throughout the available free space on the pool, and no large chunks of space to allocate. So then when you need to write a large file, it involves more than one region of space. And when that file is ultimately freed, you're not left with a single large chunk of space, but still two smaller ones. Meanwhile the file that replaced the large file struggled to find space, and got broken up into three chunks, and in the meantime the two smaller chunks of space that were freed got allocated to some other smallish chunks of data. This merry go round doesn't stop until the system finds an unhappy balance of some sort.

So to maintain write performance, we basically pull a nasty trick on it: we make sure it has PLENTY of space for contiguous writes. This doesn't actually eliminate fragmentation, but it does mean that you're less likely to go madly seeking all over to get a run of blocks that the VM is likely to request (such as a file).

Tancients · Jun 1, 2015

Ahh, that makes a bit more sense then. Thank you a bunch for the detailed explanation!

Important Announcement for the TrueNAS Community.

Trying to clarify best config for using ZFS box for VMs

jgreco

Resident Grinch

rogerh

Guru

jgreco

Resident Grinch

Tancients

Dabbler

jgreco

Resident Grinch

HoneyBadger

actually does care

jgreco

Resident Grinch

Tancients

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Trying to clarify best config for using ZFS box for VMs

Resident Grinch

Guru

Resident Grinch

Dabbler

Resident Grinch

actually does care

Resident Grinch

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Trying to clarify best config for using ZFS box for VMs"

Similar threads