Deduplication and I/O

Status
Not open for further replies.

Jason Keller

Explorer
Joined
Apr 2, 2015
Messages
61
Just throwing this out there since I noticed it (and largely caught me off guard). I enabled dedup on a dataset (yes I have enough memory for this test, and DDT fits comfortably in the 25% metadata space in ARC)...and then I kicked off some clone operations in VMware on the ZVOL...

Code:
                                           capacity     operations    bandwidth
pool                                    alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
dpool                                   36.2G  1.59T      0   181K      0   724M
  mirror                                6.02G   272G      0  28.8K      0   115M
    gptid/8c26088f-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    919      0   115M
    gptid/8cd38744-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    957      0   120M
  mirror                                6.03G   272G      0  32.0K      0   128M
    gptid/8d642fac-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0   1023      0   128M
    gptid/8dce09c1-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0  1.25K      0   159M
  mirror                                6.03G   272G      0  30.1K      0   121M
    gptid/8e39b615-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0  1.18K      0   151M
    gptid/8eaba4d9-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    964      0   121M
  mirror                                6.04G   272G      0  27.6K      0   111M
    gptid/8f3efbad-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0  1.12K      0   144M
    gptid/8ff4afa1-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    884      0   111M
  mirror                                6.02G   272G      0  31.4K      0   126M
    gptid/907bf327-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0  1.21K      0   155M
    gptid/c02ca290-d8b8-11e4-96f0-5cf3fc4c16c0      -      -      0   1004      0   126M
  mirror                                6.02G   272G      0  31.0K      0   124M
    gptid/c47779b9-dc60-11e4-908f-5cf3fc4c16c0      -      -      0    992      0   124M
    gptid/b6e90ead-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0  1.18K      0   151M
cache                                       -      -      -      -      -      -
  gptid/351cd1d2-dcb0-11e4-908f-5cf3fc4c16c0  32.5G  27.1G      0      0      0      0
--------------------------------------  -----  -----  -----  -----  -----  -----


                                           capacity     operations    bandwidth
pool                                    alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
dpool                                   38.5G  1.59T      0    766      0  5.82M
  mirror                                6.41G   272G      0    130      0  1018K
    gptid/8c26088f-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    112      0  1018K
    gptid/8cd38744-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    105      0  1018K
  mirror                                6.42G   272G      0    137      0  1.00M
    gptid/8d642fac-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    137      0  1.00M
    gptid/8dce09c1-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    137      0  1.00M
  mirror                                6.42G   272G      0    179      0  1.35M
    gptid/8e39b615-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    179      0  1.35M
    gptid/8eaba4d9-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    177      0  1.35M
  mirror                                6.44G   272G      0    108      0   866K
    gptid/8f3efbad-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    108      0   866K
    gptid/8ff4afa1-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    108      0   866K
  mirror                                6.42G   272G      0     73      0   587K
    gptid/907bf327-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0     73      0   587K
    gptid/c02ca290-d8b8-11e4-96f0-5cf3fc4c16c0      -      -      0     71      0   571K
  mirror                                6.42G   272G      0    135      0  1.05M
    gptid/c47779b9-dc60-11e4-908f-5cf3fc4c16c0      -      -      0    135      0  1.05M
    gptid/b6e90ead-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0    134      0  1.04M
cache                                       -      -      -      -      -      -
  gptid/351cd1d2-dcb0-11e4-908f-5cf3fc4c16c0  32.5G  27.1G      0      0      0      0
--------------------------------------  -----  -----  -----  -----  -----  -----



                                           capacity     operations    bandwidth
pool                                    alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
dpool                                   35.1G  1.59T      0      0      0      0
  mirror                                5.84G   272G      0      0      0      0
    gptid/8c26088f-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/8cd38744-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
  mirror                                5.84G   272G      0      0      0      0
    gptid/8d642fac-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/8dce09c1-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
  mirror                                5.84G   272G      0      0      0      0
    gptid/8e39b615-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/8eaba4d9-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
  mirror                                5.85G   272G      0      0      0      0
    gptid/8f3efbad-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/8ff4afa1-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
  mirror                                5.85G   272G      0      0      0      0
    gptid/907bf327-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/c02ca290-d8b8-11e4-96f0-5cf3fc4c16c0      -      -      0      0      0      0
  mirror                                5.85G   272G      0      0      0      0
    gptid/c47779b9-dc60-11e4-908f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/b6e90ead-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
cache                                       -      -      -      -      -      -
  gptid/351cd1d2-dcb0-11e4-908f-5cf3fc4c16c0  32.5G  27.1G      0      0      0      0
--------------------------------------  -----  -----  -----  -----  -----  -----


                                           capacity     operations    bandwidth
pool                                    alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
dpool                                   32.6G  1.60T      0      0      0      0
  mirror                                5.44G   273G      0      0      0      0
    gptid/8c26088f-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/8cd38744-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
  mirror                                5.43G   273G      0      0      0      0
    gptid/8d642fac-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/8dce09c1-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
  mirror                                5.42G   273G      0      0      0      0
    gptid/8e39b615-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/8eaba4d9-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
  mirror                                5.45G   273G      0      0      0      0
    gptid/8f3efbad-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/8ff4afa1-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
  mirror                                5.44G   273G      0      0      0      0
    gptid/907bf327-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/c02ca290-d8b8-11e4-96f0-5cf3fc4c16c0      -      -      0      0      0      0
  mirror                                5.46G   273G      0      0      0      0
    gptid/c47779b9-dc60-11e4-908f-5cf3fc4c16c0      -      -      0      0      0      0
    gptid/b6e90ead-d8a9-11e4-8f6f-5cf3fc4c16c0      -      -      0      0      0      0
cache                                       -      -      -      -      -      -
  gptid/351cd1d2-dcb0-11e4-908f-5cf3fc4c16c0  32.5G  27.1G      0      0      0      0
--------------------------------------  -----  -----  -----  -----  -----  -----


Currently using SHA256 without verify. My intention was to see if it would significantly lessen the writes to the pool to increase performance and/or save writes to the pool by not writing duplicate data (very useful for an all SSD pool where write cycles are at a premium). As this will mainly be a swift recovery/test environment utilizing template-deployed clones in VMware, I am seeing very high (+3.25:1) dedup ratios, with dedup+compression ratios hitting +4.88:1. So dedup is potentially useful for me on both space-saving and write-lessening levels.

I am a bit puzzled as to where these pool writes are coming from, as this is entirely duplicate data. I'm not sure if this is by design or a bug. My understanding was that ZFS doesn't commit writes to the pool that are matched as duplicates in the DDT.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Three things:

1 Cloned environments give "okay" dedup (See #2 for why this will also bone you later). I don't consider 3:1 and 4:1 ratios to be particularly good for dedup as dedup quickly turns into a CPU bottleneck when you have lots of blocks.
2. Dedup works at the block level. The blocks for iSCSI are 512 bytes, while the zvol blocks are something else (8KB or 16KB I believe). Do NOT try to set them both to the same value, you'll be sorry. Anyway, so unless the cloned environment happens to align, then the zvol blocks won't be dedupable, so you'll have a massive DDT table that has no duplicate entries.
3. The more you use the environments, the more you'll have deviations from the cloned environment. Simple disk accesses will be logged to the VM's file system, and that will not only cause a deviation in the "new" data being written, but since the old data is now missing a small piece, that block will be a deviation. So you'll find that after you've used those VMs for a little while the deduping values suck.

Your writes are because of what I described in #3. Say a file is accessed and so the file system is updated inside the VM to handle that new date/time stamp. That block was probably an 8KB block, and if only 200 bytes changed, you now have to write either:

1. A whole new block that is 8KB and will certainly not match your other VMs (the date/time stamp won't match your other VMs).
2. More than one block that is multiple smaller blocks (those will almost certainly not match your other VMs because of the date/time stamps).

And we didn't actually do any writes to the file. This was just in the filesystem. Then you add in the fact that writes to files don't necessarily align to the zvol block size and you've got a 1 in 8+ chance it won't align, even if the file is the same on all of the VMs you hav!

See how fast dedup becomes a lost cause? There's a reason why I tell people not to use it. It doesn't work out well in more than the very short term. Usually it works enough that people wrongly think that their short-term test environment proves it works great, but use the VM in production for 2 weeks and you'll find your dedup ratio stinks pretty badly.

Add to this the fact that the more blocks you have the larger the ddt and the more CPU resources needed to traverse the ddt (nevermind the RAM required) and you have a recipe that kills VMs performance later. I've seen one or two people that had performance that was so bad they couldn't migrate VMs off the zpool because the zpool was so slow that ESXi couldn't get the VMs off so they could get rid of dedup (remember, you can't just "turn off" dedup if you later realize it was a terrible mistake).

So thanks for proving that we still know what we are talking about. Yes, the dedup sounds promising, but the actual payout is still not there, and probably will never be. Especially since the cost of disk space is decreasing much faster than the cost of RAM.
 

Jason Keller

Explorer
Joined
Apr 2, 2015
Messages
61
While you have some valid points about block modification (and the huge additional CPU usage due to SHA256), this didn't answer my question of why the writes are occurring to the zpool before it's deduped (and it does get deduped). The writes you see above are clone operations, so the data being written there is all exact duplicates (and it shows, because the consumed space falls right back down when all the TXGs clear). Or is zpool iostat showing writes to the pool that are never actually getting committed to disk?

Also, I would never use deduplication in production (because as noted in your #2, it will indeed bone you later). This environment is purely a "scratchpad" if you will. The only reason I was investigating dedupe is that since I'm going to be deploying many clones many times, I thought it might help lessen the writes to the SSDs that I am eventually looking to use as pool drives. However, from the behavior I have seen thus far that won't pan out, as the writes seem to be going to the disks before it's all deduped. So it looks like it saves me exactly zero write cycles to the drives, making it pointless for me. Compression, however, I always have on LZ4 and I'm still rocking 1.7:1 on average, at 500MB/s speeds (it could probably go faster with better disks).
 
Status
Not open for further replies.
Top