Dedup Ram reqs

Status
Not open for further replies.

justusiv

Dabbler
Joined
Feb 8, 2013
Messages
23
I have a question about the ram reqs for dedup.
Is the amount of ram required based on the size of the volume or the amount of data being placed on there.

For example lets says i have a 4TB volume. on that volume is 8TB of data
How about 100TB of data because of the dedup etc


Hope that makes sense.
 

justusiv

Dabbler
Joined
Feb 8, 2013
Messages
23
Unless you have a lot of RAM and a lot of duplicate data, do not change the default deduplication setting of "Off". The dedup tables used during deduplication need ~8 GB of RAM per 1TB of data to be deduplicated. For performance reasons, consider using compression rather than turning this option on.
After re-reading it appears per data although i still dont know if that is TB in deduped data or expanded data.
I have a server with 100TB, Windows can dedupe it down to 10 TB.
On freenas am i going to need 80GB of ram or 800GB. that is quite a difference. On top of the fact Windows can do it with 6GB(although not live)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
It's technically per unique data block, so in essence your quantity of data.

So 1TB of all unique data will take up more space than 1TB of data which is really only 200GB of unique data.

The size of the pool doesn't really matter except that it allows you to easily balloon your dedup tables out of control.
 

justusiv

Dabbler
Joined
Feb 8, 2013
Messages
23
Is there anyways to check on dedupe rates?

For example on my windows box i can type
PS C:\Windows\system32> Get-DedupStatus | fl


Volume : E:
VolumeId : \\?\Volume{62a68939-8f4a-43af-b66b-b1198a9bd0aa}\
Capacity : 40.87 TB
FreeSpace : 15.42 TB
UsedSpace : 25.45 TB
UnoptimizedSize : 122.46 TB
SavedSpace : 97.01 TB
SavingsRate : 79 %
OptimizedFilesCount : 15688058
OptimizedFilesSize : 109.88 TB
OptimizedFilesSavingsRate : 88 %
InPolicyFilesCount : 19262819
InPolicyFilesSize : 122.64 TB
LastOptimizationTime : 8/23/2013 12:40:27 PM
LastOptimizationResult : 0x8056533D
LastOptimizationResultMessage : The operation was cancelled.
LastGarbageCollectionTime : 8/19/2013 2:44:54 AM
LastGarbageCollectionResult : 0x8056533D
LastGarbageCollectionResultMessage : The operation was cancelled.
LastScrubbingTime : 8/10/2013 9:08:13 PM
LastScrubbingResult : 0x00000000
LastScrubbingResultMessage : The operation completed successfully.

And one more follow up question. Why cant an SSD step in for ram to help with these crazy high requirnments?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
SSDs can sort-of. But since all reads and writes have to be matched against the dedup table it would tank performance badly.

But you must have enough RAM for the dedup table because on a zpool import of an unclean table the entire dedup table MUST fit in RAM. So not enough ram and your data is locked away until you have enough.

If you have a pool that actually has dedup enabled you can check it from the command line, but there really is almost no reason for dedup unless your data happens to align to the block level, etc. Compression will generally give you the same or better savings without the potential to be locked out of your pool.
 

justusiv

Dabbler
Joined
Feb 8, 2013
Messages
23
Also if someone can explain why this is happening.
I have a dedup vol.
I placed a 23.4 GB file on it. Took a peak at used space 14.5 GiB(perfect dedup is working)
I copied the same 23.4 GB file to another location. Took a peak at used space 29.0 GiB.
I copied the file a third time. 43.6 GiB
Something isnt right with the dedup. Why is a duplicated file not being deduped out better then this. The file is 100% identical.
 

justusiv

Dabbler
Joined
Feb 8, 2013
Messages
23
My reasoning for these line of questions is that deduplication is working great for us on Windows. Just the thought of inline deduplication on FreeNAS is tempting.
I posted stats from my windows box above. Over 122TB of files deduped out down to 25TB. The ram reqs for this just put it out of the realm of realistic.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Remember how I said block level and not file level. Do not assume that if you have 10 copies of a file that they will dedup. They might not save anything. They'll dedup if they align to your blocks.

Windows doesn't do dedup on the block level from what I've read. And it doesn't do it on the fly as the file is being copied. It does it in the background long after the file has been written(usually you set it to some value like 20 days so that current data isn't deduped since its likely to change and only creates more overhead for the OS to deal with. Window's dedup doesn't work anything like ZFS' does. It's not inline. It looks that way to you the user, but it doesn't if you look at anything besides the "used space" and "free space" on the disk. If windows had wanted to do on-the-fly they'd have to maintain big dedup tables just like ZFS. That's why the didn't do on-the-fly. ;)

As for why it isn't deduping I don't know. I generally consider people to be crazy if they try to use it because the consequences are so abrupt, quite irreversible, and generally any money you think you'll save on storage space you WILL spend on RAM(sometimes an order of magnitude more). Once you need more RAM, you need more RAM period. No warning message. No "undo". You buy more RAM or you kiss the pool goodbye. And if you aren't using ECC RAM, well, just another vector to kiss your data goodbye. You have to enable it on the pool before you start copying files, and only new files(and new/changed blocks) will be included in the dedup table. Just like if you turn dedup off any data blocks that are deduped will still be deduped until all of the data blocks expire.

Dedup is one of those things that can be amazing with a very small pool(like 5TB of less) without an expensive investment in RAM. It also helps with zfs' prefetch caching performance too. But as soon as you start talking about large storage arrays you are definitely not using dedup without very expensive RAM purchases.

Your dedup ratio can be checked with "zpool list" and compression(if enabled) can be checked with "zfs get all | grep compress".
 

justusiv

Dabbler
Joined
Feb 8, 2013
Messages
23
I did check the "zpool list" as i added files and it appears to be working right. however the "used" space doesnt show the same number
here was the stats as a added the 23.5GB file each time.
[root@freenas ~]# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
ZFSvol 14.5T 47.9G 14.5T 0% 1.02x ONLINE /mnt
[root@freenas ~]# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
ZFSvol 14.5T 48.0G 14.5T 0% 2.04x ONLINE /mnt
[root@freenas ~]# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
ZFSvol 14.5T 48.2G 14.5T 0% 3.07x ONLINE /mnt
[root@freenas ~]#


The thing is i know my data is perfect for dedup. It just might not be right for ZFS and inline dedup.

BTW thank you for your time.

*EDIT*
I still don't get why ZFS cant do some magic to mount a dedup vol with an SSD acting as ram or something. Sure performance would tank but at least it would let you get your volume online.
I am doing a small run with 1TB of data i will post stats monday :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
One thing you will learn when you start doing things with dedup, compression and/or snapshots is that the used and free space, allocated, etc. will be different for different tools. Many will flat out report a different value than you are expecting.

Good luck!
 

justusiv

Dabbler
Joined
Feb 8, 2013
Messages
23
UGH i think the block alignment is killing me. I haven't got the stats from windows dedup yet, but my ZFS duplication is abysmal.
[root@freenas ~]# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
ZFSvol 14.5T 1.19T 13.3T 8% 1.08x ONLINE /mnt
I am going to run server 2012 dedup when i get a chance on the exact same set of data but i expect way way way better results.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
No surprise. I told you that dedup might now work well because its block level. The dev team was serious when they said to use compression instead in the release notes.
 

justusiv

Dabbler
Joined
Feb 8, 2013
Messages
23
No surprise. I told you that dedup might now work well because its block level. The dev team was serious when they said to use compression instead in the release notes.


I know i know.... sometimes you just have to see stuff with your own eyes though.
I am going to post my stats from two other methods of dedup on the exact data here in a few just for future reference.
 

justusiv

Dabbler
Joined
Feb 8, 2013
Messages
23
As promised here are the stats for the other forms of deduplication.
Server 2012
Data Deduplication Savings Evaluation Tool
Copyright (c) 2012 Microsoft Corporation. All Rights Reserved.

Evaluated folder: F:\ddptest
Evaluated folder size: 1.01 TB
Files in evaluated folder: 788

Processed files: 356
Processed files size: 1.01 TB
Optimized files size: 166.72 GB
Space savings: 866.99 GB
Space savings percent: 83

Optimized files size (no compression): 328.21 GB
Space savings (no compression): 705.51 GB
Space savings percent (no compression): 68

Files excluded by policy: 432
Small files (<32KB): 432
Files excluded by error: 0

In my searches i found a program called eXdupe and here are the stats for that.
ORINGINAL DATA: 1.00 TB (1,109,943,599,078 bytes)
eXdupe: 134 GB (143,925,492,869 bytes)
Dedupe rate: 7.711931895819617
 
Status
Not open for further replies.
Top