Academic Curiosity - ZFS Dedup Read Performance

jason.rohm · May 28, 2013

I'm still new to the FreeNAS community, but I had some non-practical questions about ZFS dedup in a VMWare environment.

Assuming a obscene amount of RAM, what would the READ performance impact be for a large dedup environment assuming there is a non-trivial amount of dedup'd data that is regularly accessed (such as in a virtual-desktop environment were dozens of copies of nearly identical virtual desktops are regularly cloned, used, and destroyed)?

Does FreeNAS/ZFS maintain a single read cache for all copies or does the cache happen higher up in the stack (resulting in multiple copies)?
How would a SSD L2ARC impact this performance? Same as above, would the L2ARC hold a single copy of the dedup'd data or multiple copies?

Thanks for your thoughts.

Jason Rohm

Current Environment:

FreeNAS 8.3.1
Dell 2950 III
16GB RAM
PERC5i 6x 2TB WD RED
iSCSI/CIFS

titan_rw · May 28, 2013

Everything is just blocks to zfs. So if you have one block that's referenced 10 times by 10 different vm's, there should only be one copy of the block in ARC / l2arc. Assuming you have sufficient ram for dedupe, then read performance should if anything, increase with dedupe. As once a block has made it into the arc / l2arc by a request from vm1, a request from vm2 for the same block should be served from the arc / l2arc. (zfs knows the two requests are for the same block by references the dedupe tables, or 'ddt's. Hence the need for the ddt's to fit in ram. If they don't, then not only will performance tank, but you will be unable to import the pool if it's ever marked dirty.)

Keep in mind the general rule of thumb for memory requirements with dedupe: 5GB ram for every TB of deduped data. Then add 6GB of ram for the system. Both dedupe and l2arc consume ARC to 'work', so you definitely want lots of ram.

jason.rohm · May 28, 2013

Thanks. That was my assumption based on my understanding of the architecture, but it wouldn't be the first time my assumptions were wrong.

Obviously the real-world applications for this are limited, but given my lab environment where it isn't unusual for me to have six copies of CallManager (Linux virtual appliance) running, another two or three dozen stored inactive, and all are 90-95% the same, I could see where using dedup might make my clone process less time consuming.

Important Announcement for the TrueNAS Community.

Academic Curiosity - ZFS Dedup Read Performance

jason.rohm

Dabbler

titan_rw

Guru

jason.rohm

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Academic Curiosity - ZFS Dedup Read Performance

jason.rohm

Dabbler

titan_rw

Guru

jason.rohm

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Academic Curiosity - ZFS Dedup Read Performance"

Similar threads