AgentZero
Dabbler
- Joined
- Jan 7, 2013
- Messages
- 24
Before saying that dedupe is a resource hog and should never be used in ZFS, let's just assume that the volume in question is no larger than 1TB and the server has ample RAM - 64GB. Let's also assume that that it is storing virtual machines that are for the most part very similar and the expected dedupe ratio based on testing is at least 1.9+
Further, let's assume the volume is comprised of mirrored pairs of 15K SAS with SSD for L2ARC and ZIL and is being presented to VMware via NFS.
Now the hypothetical question: Deduplication works by storing pointers when duplicate blocks are detected. If two blocks are identical in raw format, then they would also be identical when compressed. My question is really - which operation occurs first in the IO stream? Does the dedupe engine checksum the block first then compress it if found to be unique and written to disk? Or is the block compressed, then checksumed against the DDT?
Further, let's assume the volume is comprised of mirrored pairs of 15K SAS with SSD for L2ARC and ZIL and is being presented to VMware via NFS.
Now the hypothetical question: Deduplication works by storing pointers when duplicate blocks are detected. If two blocks are identical in raw format, then they would also be identical when compressed. My question is really - which operation occurs first in the IO stream? Does the dedupe engine checksum the block first then compress it if found to be unique and written to disk? Or is the block compressed, then checksumed against the DDT?