From what dscape mentions in Post #4, it seems like I'm doing something similar.
I had an esxi box with local storage. I run a backup script locally on that box that mounts an nfs share (freenas), snapshots each vm in turn, and copies the (now static) vmdk / vmx to the nfs share. It then deletes the vm snapshot returning things to normal.
These are by definition full backups everyday. But with dedupe, I save all the duplicated space, so each 'full' backup only really uses disk space from what's changed in the .vmdk image.
I tried lz4 vs gzip9, and settled on gzip9. lz4 was only slightly (15% or so) faster, and was network bound. Enabling gzip9 actually is cpu bound, but just barely, as it's only a bit slower than lz4.
As to whether the values are meaningful for not, I have no idea. That's the output from "zdb -D" though. Here's the raw output if it matters:
Code:
root@nas2 ~ # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
nas2pool 32.5T 9.36T 23.1T 28% 18.49x ONLINE /mnt
root@nas2 ~ # zdb -Dv nas2pool
DDT-sha256-zap-duplicate: 579601 entries, size 1309 on disk, 211 in core
DDT-sha256-zap-unique: 206626 entries, size 1658 on disk, 267 in core
DDT histogram (aggregated over all DDTs):
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 202K 25.2G 3.76G 4.60G 202K 25.2G 3.76G 4.60G
2 56.0K 7.00G 1.76G 1.99G 130K 16.2G 4.06G 4.59G
4 39.7K 4.97G 1.58G 1.70G 202K 25.2G 8.10G 8.75G
8 28.3K 3.54G 1017M 1.09G 312K 39.1G 10.5G 11.6G
16 438K 54.8G 34.9G 35.8G 9.60M 1.20T 781G 799G
32 3.65K 467M 198M 210M 167K 20.8G 8.82G 9.38G
64 256 31.9M 12.2M 13.1M 18.2K 2.27G 864M 930M
128 6 768K 24K 48.0K 908 114M 3.55M 7.09M
256 3 384K 12K 24.0K 1.13K 145M 4.52M 9.04M
512 1 128K 4K 7.99K 736 92M 2.88M 5.74M
4K 1 128K 4K 7.99K 7.32K 938M 29.3M 58.5M
16K 1 128K 4K 7.99K 16.9K 2.11G 67.4M 135M
Total 768K 96.0G 43.2G 45.4G 10.6M 1.33T 817G 839G
dedup = 18.50, compress = 1.67, copies = 1.03, dedup * compress / copies = 30.00
If I'm calculating things right, the DDT's should be using (768,000 * 320 bytes) of ARC. That only about 234 MB. I would have had the pool space to do this without dedupe, but I wanted to see how it worked. It's a backup nas, so I'm not terribly worried about having dedupe ram issues. I totally accept that I might lose the pool due to insufficient ram. This is kind of my 'experiment' box. It's only got 12 gigs of ram, which for the pool size it has, is already a bit low. Plus the little bit of dedupe I'm doing. Also note that dedupe is only enabled on the dataset I use for vm backup. The rest of the pool would dedupe extremely poorly.
Anyway, I'm not trying to derail the OP's thread, I just wanted to point out it's quite possible to have dedupe ratio's in excess of 2.0.