Turning dedup off at a later time

Status
Not open for further replies.

nanda

Explorer
Joined
Jun 9, 2013
Messages
56
Hello, I have a 4 TB zvol and 16 GB RAM.

I'm currently using only a tiny fraction of the zvol, certainly < 1 TB.

Question: Am I right in thinking that dedup can be turned on at this stage, and then turned off when the zvol is more full? I.e., will I get all the RAM back?
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
No, once a block is is deduped it is always deduped. To "undedupe" data you can copy it off somewhere and then copy it back though (after disabling dedup on the dataset)
 

nanda

Explorer
Joined
Jun 9, 2013
Messages
56
No, once a block is is deduped it is always deduped. To "undedupe" data you can copy it off somewhere and then copy it back though (after disabling dedup on the dataset)


I understand hdd blocks that are deduped will remain deduped, but what is the point of keeping hashes in RAM once no new data is being written to dedup?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Because the data that is still on the disk is still deduped!

When you turn a feature like compression or dedup on/off it only applies to new data. Old data is left in it's current state.

Could you imagine if you had a 100TB pool and you disabled compression and the server threw a big "please wait" for a few weeks while it decompressed all the data on the pool? See how stupid it would be to go back and change the setting for all currently existing data? ZFS had to take into consideration pools of enormous size. While you and I don't think about the consequences of large systems, the ZFS engineers did(thank god).
 

nanda

Explorer
Joined
Jun 9, 2013
Messages
56
Because the data that is still on the disk is still deduped!

When you turn a feature like compression or dedup on/off it only applies to new data. Old data is left in it's current state.

Could you imagine if you had a 100TB pool and you disabled compression and the server threw a big "please wait" for a few weeks while it decompressed all the data on the pool? See how stupid it would be to go back and change the setting for all currently existing data? ZFS had to take into consideration pools of enormous size. While you and I don't think about the consequences of large systems, the ZFS engineers did(thank god).


Maybe I wasn't clear enough when asking.

I know deduped data will remain deduped when dedup is turned off, while new data won't be deduped.

My question was about RAM: will RAM still be used to store hdd block hashes when dedup is turned off, and in that case why?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Ok, so you and I both agree that if you turn off dedup on a pool that has dedup on then only new data will not be put through the dedup table. But, all the old data is still in that dedup table, hence that table must still exist and is still used until all of your data is no longer going through that table. So you can expect that over time the dedup table will slowly shrink as the old data is deleted from the server, but it's unlikely to ever be zero for that server unless you take actions to make the table be zero. Things such as destroying the pool and restoring from backup, or moving all of the data to a dataset that isn't deduped and then back to its original location, or something else that makes the old data disappear and new copies of the data to be written.

If that dedup table wasn't kept in RAM then all of the data that is deduped suddenly would be inaccessible because there's no link between your files and the deduped blocks.

Get it? ;)
 

nanda

Explorer
Joined
Jun 9, 2013
Messages
56
I can accept that it is as you suggest, however I find this implementation very strange.

To me the file system has three relevant components:
- Data blocks with addresses.
- Hashes of used data blocks.
- Something like FAT combining filename and data block address, essentially a DNS for the volume. I'm guessing it is implemented something like a tree of shared_ptr.

Files are deleted by deleting FAT records. When all pointers to a block disappears from FAT, the block is garbage collected.

When dedup is turned on, new files are hashed blockwise and compared to a hash table in RAM, and deduped if found.

When dedup is turned off, the hash table is taken out of RAM and maybe stored on the volume, as it is not needed.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
yeah, not the case. it doesn't work like that.

If 3 files use the same data block and are deduped, then the file entries point to the hashes in the DDT, and the DDT points to the blocks on the disk.

The dedup table is a construct that links hashes of data to the actual data blocks on the disk. Without that construct you have no way to make the link.

I don't have any good links handy, but you should google up on the topic. You seem interested in understanding this(I know I was) and its a bit much for me to try to explain right now. Got a computer to work on today or I'd try to find my bookmark of dedup for you.
 
Status
Not open for further replies.
Top