Academic Curiosity - ZFS Dedup Read Performance

Status
Not open for further replies.

jason.rohm

Dabbler
Joined
May 7, 2013
Messages
25
I'm still new to the FreeNAS community, but I had some non-practical questions about ZFS dedup in a VMWare environment.

Assuming a obscene amount of RAM, what would the READ performance impact be for a large dedup environment assuming there is a non-trivial amount of dedup'd data that is regularly accessed (such as in a virtual-desktop environment were dozens of copies of nearly identical virtual desktops are regularly cloned, used, and destroyed)?

Does FreeNAS/ZFS maintain a single read cache for all copies or does the cache happen higher up in the stack (resulting in multiple copies)?
How would a SSD L2ARC impact this performance? Same as above, would the L2ARC hold a single copy of the dedup'd data or multiple copies?

Thanks for your thoughts.

Jason Rohm

Current Environment:

FreeNAS 8.3.1
Dell 2950 III
16GB RAM
PERC5i 6x 2TB WD RED
iSCSI/CIFS
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Everything is just blocks to zfs. So if you have one block that's referenced 10 times by 10 different vm's, there should only be one copy of the block in ARC / l2arc. Assuming you have sufficient ram for dedupe, then read performance should if anything, increase with dedupe. As once a block has made it into the arc / l2arc by a request from vm1, a request from vm2 for the same block should be served from the arc / l2arc. (zfs knows the two requests are for the same block by references the dedupe tables, or 'ddt's. Hence the need for the ddt's to fit in ram. If they don't, then not only will performance tank, but you will be unable to import the pool if it's ever marked dirty.)

Keep in mind the general rule of thumb for memory requirements with dedupe: 5GB ram for every TB of deduped data. Then add 6GB of ram for the system. Both dedupe and l2arc consume ARC to 'work', so you definitely want lots of ram.
 

jason.rohm

Dabbler
Joined
May 7, 2013
Messages
25
Thanks. That was my assumption based on my understanding of the architecture, but it wouldn't be the first time my assumptions were wrong.

Obviously the real-world applications for this are limited, but given my lab environment where it isn't unusual for me to have six copies of CallManager (Linux virtual appliance) running, another two or three dozen stored inactive, and all are 90-95% the same, I could see where using dedup might make my clone process less time consuming.
 
Status
Not open for further replies.
Top