Advice on a large FreeNAS setup (using dedup)

Status
Not open for further replies.

David Buchanan

Dabbler
Joined
Dec 19, 2014
Messages
10
Hi All,

Let me start off by saying that I've tried to do my research, so hopefully my post comes across as such and not as a new user asking stupid questions. :)

I'll give you a little back ground as to why I'm looking into FreeNAS and using de-dup.

We have a HP Blade Center backed by a Fibre Channel P2000 G3 with 140TB of storage. We have one bare-metal blade running Server 2012 R2 with 70TB of storage running Windows Data De-duplication (Post processing). This server stores around 130TB of Veeam backups on this 70TB volume.

We are at the stage where Veeam is producing a very large amount of I/O as well as the Windows dedup with it's post processing de-dup.

Our long term strategy is to move to something like a HP StoreOne. However this is 6-12 months off, plus I'm looking into other alternatives. This has lead me to FreeNAS (Which I currently use in a number of smaller sites).

So, here is my FreeNAS plan:

HP BL460 G7 Blade with 2 x Intel Xeon E5645 (6 core with HT, 24 threads total) and 196GB of Memory.
Currently the server has a 4GB SD card in it, however I believe it is now recommended to replace this with at least an 8GB one.

Because of the volume of data and the fact that such large amounts of it are similar we need to run de-dup (Although I'm open to suggestions on compression). Just to confirm, FreeNAS is in-line data de-duplication?

Based on my research it's recommended to have somewhere between 2GB-10GB of memory per 1TB of data. What I'm not sure on is this per TB of written data (eg: 70TB) or the amount of data before being de-dup (eg: 130TB)?

If it's 70TB then the 196GB would be enough if I use the 2GB per TB, it puts me somewhere around the 2.5GB per TB. (Leaving some memory for other system services). If it's 130TB then I have less than 1.5GB per TB.

My biggest concern here is if we run out of memory then the volume can't be mounted. However, based on this article (https://blogs.oracle.com/bonwick/entry/zfs_dedup) it appears that some of the de-dup table can be moved to L2ARC and then to disk at the cost of performance, is this correct for FreeNAS? I'd rather take a hit on performance than not have access to any data.

The next part which I'm a little more unsure on is the best way to set-up FreeNAS with the P2000. I know that FreeNAS likes direct access to the disks, however I'm not sure how that's possible with a SAN. I assume I'd just create a LUN and assign it to the server as I would any other set-up and FreeNAS would use it without direct access to the drives. Is this correct?

For a system with 100TB+ of data would it be recommended to use L2ARC?

I welcome any advice on the above design good or bad. Hopefully someone with more experience with FreeNAS in the enterprise space can provide me with some insight.

Thanks,
David.
 
Joined
Jul 3, 2015
Messages
926
I think most people on this forum would agree that turning on de-dupe is not something that should be taken lightly. I don't have any real experience of de-dupe on FreeNAS due to the fact that whenever I've done my research most people consider it a long term risk to the stability of the system. Personally I would explore compression first off and really do your home work and testing on de-dupe for a while before putting it into production. Even iX systems don't suggest it unless there is a real pressing need.

I can't help you with the second part of the question Im afraid.

Regarding L2ARC I think generally its suggested never to have an L2ARC than 5 x that of your RAM so for you about 1TB. You can always add an L2ARC later so you may want to see what the hit ratio of your ARC is before deciding if its a good idea or not.

PS: Regarding your boot media I would go for larger than 8GB due to OS snapshots, perhaps 32GB or 64GB and personally I'd mirror them.
 
Last edited:

toadman

Guru
Joined
Jun 4, 2013
Messages
619
I've never used a system that big. But one thing to consider with L2ARC is that it consumes part of the ARC to manage it, which of course then affects the dedupe available RAM. Can you get the server to 256GB?

I too would suggest some POC testing with dedupe before you put it in production.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Compression is useful to reduce the size of on-disk data; CPU is relatively cheap. Do this no matter what.

If the contents of your VM images aren't changing greatly on a daily basis, then dedup has the potential to be a significant win.

The rules for RAM aren't so much "rules" as they are general guidelines. Should you use 70TB? Yes. Should you use 130TB? Yes. Just like the normal fileserver RAM sizing guidelines don't really specify whether it is 1GB per TB of {usable, available, raw, etc} space, both answers could be fine. 5GB per TB of space being dedup'ed is a typical starting point, but your workload might be able to survive on less, or might require twice that.

ZFS dedup information is stored on disk. You don't want it to stay on disk, because that results in lower performance. You ideally want it in ARC. Having it in L2ARC is a second-best choice.

ZFS can run on top of a SAN. However, it will not be able to repair corruption, because the underlying hardware does not expose redundancy to FreeNAS. Also, FreeNAS very much prefers certain controllers that are well-known to work with FreeBSD; most of the RAID controllers out there are a little bit finicky. You may have some troubles.
 

David Buchanan

Dabbler
Joined
Dec 19, 2014
Messages
10
Thanks for the replies everyone.

We will be doing a POC before we move this into production. I just need to make sure I understand FreeNAS enough before putting any work into the build. Going to 256GB is doable if needed, just the server has 196GB as it stands now.

My thoughts on the L2ARC were that in the event the dedup table can't fit into memory then a SSD based L2ARC would be better than going to disk. May main concern here is when the table is to big for memory that the pool can't be used. Is this correct, or does FreeNAS now use L2ARC or Disk before being unable to use the pool? So I will always have access to our data, or am I viewing things wrong here? We can live with slower performance while we order/upgrade the memory, We can't live with no access to this data.

Seeing as Veeam already compresses images is it necessary/advisable to enable compression again on the pool? I've had cases where other products have had problems with this double compression set-up.

Part of the way we use Veeam is that it creates a full monthly image each month, we have this happening for a large number of clients. This is around 15-20TB of data. However 80-90% of this data is duplicate data. Hence the need for data de-dup.

I figured as much with FreeNAS accessing the P2000 drives, thanks for clarifying jgreco.

Regards,
David.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I've seen a few TrueNAS customers use dedup. I will tell you that as you have more and more data the Dedup Table will get so large that CPU usage will basically max out and performance will nosedive. I've seen people unable to do 30MB/sec because go through the dedup table was single-handledly maxing out all 16 cores in their system. These were systems with less than 20TB of data post-dedup.

IMO, attempting to dedup 70TB is outside the realm of possibility unless you're about to tell me you're buying a 4 CPU system and all of the CPUs are going to be 16core/32thread CPUs. Regardless of RAM usage you're going to find yourself CPU bound before you get 70TB of data on the server, and what will you do then? Once you've killed the zpool's performance, moving the data off will be just as painful. At that point you may be deciding if the disk space savings will actually be worth the additional cost of gobs of processing power and RAM. For most everyone that comes here looking to use dedup they find that they are better off not using dedup and simply buying the additional storage. Especially when considering the potential killer of not being able to mount their zpool if they don't have enough RAM.
 
Status
Not open for further replies.
Top