Compression or Deduplication for storing VMs

Which is better for storing VMs?

  • Deduplication

    Votes: 0 0.0%
  • Both

    Votes: 0 0.0%

  • Total voters
    1
Status
Not open for further replies.

SLam

Cadet
Joined
Jul 3, 2013
Messages
1
Hi,

I'm setting up an NFS share in FreeNAS strictly for storing VMs (VMDKs), would it be better to use Compression, Deduplication or Both? I want to save space, but don't want to sacrifice too much performance either.

My thinking that Deduplication would be useful because I'm storing VMDKs, which has a lot of shared bits among multiple VMs.

Any advice or suggestions?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You can run the test to see what dedup would get you. In general, dedup doesn't save as much as you'd expect and eats awesome quantities of resources.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
The only thing I'd have voted for is "neither" as neither one is a very "good" decision. There's a time and place for both compression and/or dedup, and I wouldn't consider VMs to be in any category.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, that's maybe excessively negative. A VM environment offers interesting challenges, some of which are well-suited to ZFS.

When installing something like Windows, if you look at what ends up installed, a lot of it is indeed data that could theoretically be dedup'd if there are multiple instances, and it may take a significant amount of space with crap that'll never he accessed again in the lifetime of the VM. For example, Windows Server 2012 has a requirement for 32GB of disk. Having a dozen such images suggests almost half a terabyte of disk. Now in theory, most of the bits that get installed are the same from system to system, but in practice it often doesn't work out quite that way. ZFS dedup works on ZFS blocks, which can be up to 128KB in size, and so what you're needing is for things to be written to the same exact locations in a VMDK, otherwise the ordering of the disk blocks written by the VM will be "different" and therefore the contents will not dedup well.

The crap that will never be accessed is well-positioned to be compressed. The crap that will be frequently accessed, well, that's less desirable to compress, because you will end up with reduced performance every time you read that data. ZFS does allow choices such as the use of compression only of runs of zeroes, which may be a useful tradeoff in a vmdk scenario, but VMware thin provisioning already does something similar for you.

Dedup eats lots of resources, and as the FreeNAS team has noted, may provide unexpected challenges at the worst times if your system is not sufficiently resourced. I would say that the majority of deployments will tend to be under-resourced.

But there's another way to look at this problem. As an administrator, you *know* that the image being installed could be virtually identical from VM to VM. You can get dedup-like results by using ZFS clones. Make your VM install image, snapshot, clone, deploy. Instantly you have a second VM that shares all its disk blocks with the first VM, but thanks to the ZFS COW design, any writes result in just that data being updated and diverging. Better, it does not require the immense resources of dedup and DDT tables to accomplish this, and it guarantees 100% block sharing, even between hundreds of VM's. The problem with this strategy is that it does not address what happens when clients start writing a substantial fraction of their disks, such as when OS updates are downloaded and applied. Suddenly space requirements get out of control.

So. Compression, dedup, and cloning each have their pros and cons.

In a busy VM environment, the best answer is probably to use a lightweight compression like ZLE, and combine that with ZFS clone based VM installs, coupled with dedup to try to minimize growth in the future. If you can reasonably cram your VM environment into a handful of TB's, then a 32GB fileserver and the combination of these features will likely combine to make for good space recovery.

On the flip side, my cynical nature says that the majority of people who ask this question are looking to do it all "on the cheap" and won't provide a suitably-resourced server. Most of the places where this sort of thing is deployed to advantage already know all this... so what I'd say is, don't toy with the dedup unless you're prepared to do it right. The compression and the clones, have at it but take the caveats to heart.
 
Status
Not open for further replies.
Top