I need to create a pool which has dedup enabled. The manual states that there is aproximatly a need of 5gb ram per 1 tb of dedup space.
So if i have 50 gb that i need to save 20 times on the dedup pool, does that count as 50x20 = 1tb, or does that count and 50gb for these calculationsp urposes ?
Also, say the system needs 20 gb RAM extra for dedup purposes. Plus 16gb minimum ram, id need lets say 36gb of ram.
What happens with the DEDUP table, once i shut down free NAS ?
The dedup tables are 20gb big, but my Free NAS OS drives is only 16gb Big.
Do i also need a comparable big OS install drive where the dedup table is dumped before a shutdown ?
Becuase that would mean i need to create a 36gb install partition as well, not only give it more ram.
Please advise.
If you have a 1TB pool and enable dedup, expect to need at least 5GB extra RAM. If you have a 10TB pool and enable dedup, expect to need at least 50GB extra RAM. However, please note that the RAM to disk ratio is a function of the average block size in use. The number suggested by the manual is if you don't mess with stuff. If you set a smaller block size, the RAM requirements skyrocket quickly.
You should consider dedup memory requirements to be ON TOP OF any existing memory requirements.
The dedup table is stored in the pool. When accessed, it is loaded into ARC as metadata. It is eligible to be evicted to L2ARC. However, it works poorly if evicted to L2ARC. Therefore you should be generous with RAM so that it has a better chance of remaining in ARC.
When you turn off your NAS, the ARC vanishes because it is in RAM. The L2ARC is rendered useless because the L2ARC pointers are stored in the ARC.
When you turn on your NAS, the dedup table will be fetched from pool on demand and stored in ARC similarly to other metadata. This means post-reboot write performance is somewhat worse until the ARC warms up with the dedup data. This sucks. Try to avoid reboots when using dedup.
Your boot device size has nothing to do with dedup.
In general, the community feels that dedup is a poor strategy, and you are better off with compression, snapshots, higher level deduplication such as that provided by many data backup products, etc.
I disagree somewhat -- I think dedup has a specific valid role and is useful in some cases. If you have a datastore where you are storing backup images, for example, with relatively small deltas, you can store a HUGE number of full backups using dedup. But you need a lot of memory to do this successfully. But it's 2019 and now 256GB of used DDR3 is under $500.
The problem is that what people are THINKING is that "oh I have 500 Windows desktops (pc000-pc499) and they all have the common WIndows files". You would think that this would dedup great but it doesn't. Modern OS's tend to be nondeterministic when installing, and do not install the exact same blocks in the exact same places. If you could dedup 512-byte blocks this wouldn't be a problem, but ZFS simply can't do that and have it be practical and performant. So you need to look at much larger blocks. 1MB blocks dedup well, but the problem is that if you have 500 desktops, the contents of a given 1MB window on those disk images will generally result in 500 different layouts. So you still get 500 different blocks and 0% dedup.
What *does* dedup well is when you backup pc203 one day, then back pc203 up the next day and it writes 99% of the same blocks, with only a 1% delta. ZFS will do smashingly well on that and you will save all that duplicated space. But pc203 and pc167 are not likely to share much overlap, except for the all-zeroes block. So pc167's blocks will generally not overlap much with pc203, and just dedup against other images of pc167.
The big variables here are that you can play with blocksize (to increase the odds of overlap) and the amount of ARC reserved for metadata to optimize for your use case. The thing that really drives memory consumption is the number of unique disk blocks ZFS has to track. The more there are, the more memory it takes. The sad reality is that dedup doesn't work as well as most people would like.