Should I dedupe on my setup?

Status
Not open for further replies.

SwisherSweet

Contributor
Joined
May 13, 2017
Messages
139
I have finally gotten all my parts to built a Mac Pro FreeNAS server. The basic hardware is:
  • Mac Pro with 2 x 6 Core Xeon "Westmere" 3.46GHz processors
  • 64GB ECC 1,333MHz RAM
  • 6 x 3 TB Toshiba P300 drives
  • 120GB SSD Boot Drive (can't boot FreeNAS from USB on Mac)
I have a wildcard component that I might use. It's a Kingston HyperX Predator 240GB PCIe Flash. It get's about 1,200 MB/s Read/Write speeds. I plan on using multiple iSCSI targets for Windows VMs, which I understand does a does a lot of synchronous writes. While this card doesn't have a battery backups, it's super fast and is a spare component.

But to the point of the post. I am a data pack-rat. I spend very little time organizing my data and it's very likely I have hundreds of gigs of duplicate data buried deep in folders. I read about FreeNAS deduplication and I would love to save some space if possible, but I don't want to make my NAS a dog. However, I think my hardware (perhaps with the PCIe flash card) would have no problem keeping up with demand of the 5 to 7 clients/workstations that will be hitting it.

Other things the server will be used for:
  • Plex media storage and transcoding
  • Windows VMs
So my questions are:
  1. Is my hardware good enough to use deduplication without a significant performance penalty?
  2. Would I benefit from the PCIe flash as an L2ARC?
  3. Can I turn deduplication on or off later if I change my mind (or I clean up my duplicate data)?
Thank you.
 
Last edited:

toadman

Guru
Joined
Jun 4, 2013
Messages
619
Some thoughts...

(1) While I can't answer the performance question directly as asked, an issue with dudup is always having the memory structure grow too large. This obviously hurts performance (it uses RAM and the arc shrinks) or causing the system to slow to a crawl (RAM exhausted and now swapping) or to panic (all memory exhausted).

If the rule of thumb is 5GB RAM per TB of storage for dedup, you'd need potentially quite a bit of RAM. Depending on the data, upwards of 45GB. So I would say it's not a good idea with a 64GB system.

Why don't you see what you can get using lz4 compression? Unless your files are already compressed (e.g. media) you'll probably get a good ratio. And with the 5-7 clients hitting the server I think you'd be better served with as much ARC as you can get. I guess it depends on how many VMs you are running? If you have 1s or 10s of them, I don't think the savings would be worth the risk of a performance hot or a panic. If you have 100s or 1000s of them, dedupe starts to get interesting, esp with window VMs. But then you probably need a larger system. :)

(2) Depends on the data pattern. I would run the system with no L2ARC to start and see what the ARC performance is, then make a call based on that data.

(3) No. You would need to turn it off, migrate the data to a different dataset, then delete the original dedup dataset. https://forums.freenas.org/index.php?threads/turning-dedup-off-at-a-later-time.19361/
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Is my hardware good enough to use deduplication without a significant performance penalty?
No. De-duplication is going to give you nothing. You're better off using compression over de-duplication. You simply do not have the memory requirements.
Would I benefit from the PCIe flash as an L2ARC?
Unlikely. Keep in mind an under-utilized L2ARC can cause performance issues. Also note that L2ARC pointers are stored in RAM, leaving less space for ARC.
Can I turn deduplication on or off later if I change my mind (or I clean up my duplicate data)?
The property can be turned on/off at will. If you turn it on on a already populated dataset, the existing data is not de-duplicated until it's written to. I will refer you to my first answer.
 
Status
Not open for further replies.
Top