Dumping/Exporting ZFS Dedup Tables

Status
Not open for further replies.

zeamaize

Cadet
Joined
Feb 12, 2013
Messages
4
Hello all,

I'm a researcher at a non-profit institution, and we're studying the way our data gets deduplicated. I was wondering if anyone knows a way to dump/export the deduplication table in ZFS. What we'd really like to do is be able to look at which files reference which chunks, and the size of those chunks, etc, to fully profile our data for an NSF funded study we're conducting.

I've been over the ZFS documentation, but haven't been able to find any documentation that shows how to do this, but it seems like it must be possible. Does anyone have any pointers?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I'd doubt its possible. I've never heard of anyone trying to dump the table. I'm not sure why it would help you or be worth the trouble. You do need a system with ALOT of RAM.

I've said this several times regarding dedup but I'll say it again for your benefit. The cost of RAM almost outweights the cost of buying more hard drives and not using dedup. Consider all options and remember that if you run out of RAM you WILL be locked out of your zpool until you mount the zpool on a system with enough RAM. Basically if the system crashes because the dedup table is too big with 8GB of RAM you can expect the data to be inaccessible until you put the zpool in a system with more RAM. There is no hail mary or forgiveness if you don't have enough RAM for the dedup table. The zpool just won't mount. We've already had someone locked out of their zpool until they upgraded their system RAM.

Generally, I consider using dedup borderline irresponsible because there is no upper limit for how much RAM you'll need and disk space is so cheap these days.
 

zeamaize

Cadet
Joined
Feb 12, 2013
Messages
4
Well first off, we're not typical home users. As I said, I'm actually a scientist, and we're spending ~$20,000 on our setup. We're also directly researching deduplication.

What we want to do is actually study what chunks end up referenced in our data as it relates to some metrics of interest.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
You'd probably be better off visiting the Oracle site where ZFS was developed and posting your question in their forums.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Well first off, we're not typical home users. As I said, I'm actually a scientist, and we're spending ~$20,000 on our setup. We're also directly researching deduplication.

What we want to do is actually study what chunks end up referenced in our data as it relates to some metrics of interest.

That's fine. Let's say you build a system with 128GB of RAM. If you follow the thumbrule of 5GB of RAM per 1TB of disk space, you'd only be able to create a system with about 25TB. But that thumbrule isn't a hard fast rule and you may need only 2GB/TB of disk space, or you might need 20GB/TB of disk space.

I'm curious though, even if you could get a printout, how would that help you? The dedup is block level based, so if 2 blocks are identical but 1 is off by 1 bit then it won't save you any disk space. From a scientific perspective I don't know how that would help without you knowing for 100% certainty that your data is aligning to blocks.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Actually you can choose file, block, or bytes.

Can you provide any documents explaining how to change it? I just did some Googling and all I see is block level.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
To the OP -

If you really expect to spend $20k on your setup, you should hire someone that knows what they are doing to build it for you. The FreeNAS project really requires you to "know your stuff" or you can lose your data without realizing what you did wrong. Someone commented yesterday that if you don't know your stuff don't expect to keep your data for long. That's absolutely true and there are plenty of examples from prior posters that didn't know what was wrong until one of the senior posters told them what happened.

Quite bluntly, if you aren't willing to spend 50-100 hours of research and experiments figuring everything out and making sure you aren't making a data-losing mistake you should hire someone to build you one. I know ixSystems can build you one(they are the corporate entity of FreeNAS).
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Can you provide any documents explaining how to change it? I just did some Googling and all I see is block level.

Actually I thought the link below explained it, but after checking back it doesn't. If you look at the link you'll see why I thought it was selectable.

The second link I thought might be useful to the OP, but if he's serious about researching ZFS, there's a LOT of stuff to read and better places to post questions than here.

https://blogs.oracle.com/bonwick/entry/zfs_dedup

ASPECTS OF DEDUPLICATION
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I was a bit surprised because ZFS works with an I/O pipeline, so data blocks are pipelined in this order: compression -> checksum -> dedup -> RAID. At the point you are in the pipeline there is no longer a representation between files, only blocks. Trying to change this process would require almost a complete rewrite of ZFS.
 
Status
Not open for further replies.
Top