Ultra slow snapshot replication to a secondary FreeNAS

XStylus

Dabbler
Joined
Nov 22, 2017
Messages
20
Hi there,

I have two ridiculously beefy and identically configured FreeNAS 11.2-U3 boxes equipped with eight 12TB drives in a RAID1+0 config (or the ZFS equivalent of such) in both units. Both are connected via 10GbE with jumbo frames (MTU 9000) enabled. The primary FreeNAS unit has 30TB of data residing on it.

I have set a replication task to do a weekly snapshot replication from the primary FreeNAS unit to the secondary FreeNAS unit. The task takes several days to complete.

Is this normal??


---
Config (x2):
Xeon E5-1650v3
512GB memory
SuperMicro X10SRL-F Motherboard
Seagate Exos 12TB SAS HDD x8
Intel X520 SFP+ 10GbE Network Adapter
 
Last edited:

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
It depends.
If it is the first replication, it will take some time, but with 10Gbe it should be snappy, even though I have never had the chance of handling such hardware.
How do you run the replication?
What kind of HHD?
 

XStylus

Dabbler
Joined
Nov 22, 2017
Messages
20
It depends.
If it is the first replication, it will take some time, but with 10Gbe it should be snappy, even though I have never had the chance of handling such hardware.

First replication took three solid days. Subsequent replications have taken about as long.

How do you run the replication?

What in detail would you like to know?

I mainly followed the how-to instruction video by Lawrence Systems, and also consulted the FreeNAS manual. Started by creating a periodic snapshot task for the dataset, and configured Replication Tasks to replicate to the secondary FreeNAS on weekends. It seems pretty simple and hard to screw up.

What kind of HHD?

HHD? Do you mean HDD? As stated in my original post, I'm running enterprise-grade Seagate Exos 12TB SAS drives.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
What in detail would you like to know?

The configuration you're using for the replication job... all details. I guess the compression and encryption would be the most interesting points as this can be the bottleneck.

HHD? Do you mean HDD? As stated in my original post, I'm running enterprise-grade Seagate Exos 12TB SAS drives.

I suspect seeing the output of zpool list -v will help here, just to put everyone on the same page.

It seems pretty simple and hard to screw up.

But even harder to help if we don't have details in which to see where you may have gone off-track.

Based on your story so far, all I can confirm is:

You aren't crazy, 30TB should be able to copy over 10Gbit in fewer than 3 days.

The first things I would look at would be the config of the replication job and the network configuration... perhaps some component isn't handling the jumbo frames and you have re-tries on every frame.
 
Joined
Jul 3, 2015
Messages
926
How long are you telling the week snapshot to last for as it sounds bit like the snapshot is expiring and its having to start again from new instead of sending just the increments?
 
Last edited:

XStylus

Dabbler
Joined
Nov 22, 2017
Messages
20
The configuration you're using for the replication job... all details. I guess the compression and encryption would be the most interesting points as this can be the bottleneck.


Perhaps it'd be best if I provide a screenshot of the configuration, then.

Screen Shot 2019-05-01 at 2.59.30 AM.png




I suspect seeing the output of zpool list -v will help here, just to put everyone on the same page.


Can do.

Code:
NAME                                     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
Aincrad                                 43.5T  28.1T  15.4T        -         -     8%    64%  1.00x  ONLINE  /mnt
  mirror                                10.9T  7.07T  3.80T        -         -     8%    65%
    gptid/6c1afa54-cea6-11e8-af54-001b21c9af68      -      -      -        -         -      -      -
    gptid/6d0046e5-cea6-11e8-af54-001b21c9af68      -      -      -        -         -      -      -
  mirror                                10.9T  7.02T  3.85T        -         -     8%    64%
    gptid/6f219eb3-cea6-11e8-af54-001b21c9af68      -      -      -        -         -      -      -
    gptid/712e0568-cea6-11e8-af54-001b21c9af68      -      -      -        -         -      -      -
  mirror                                10.9T  7.01T  3.87T        -         -     8%    64%
    gptid/73440fe5-cea6-11e8-af54-001b21c9af68      -      -      -        -         -      -      -
    gptid/7434cba1-cea6-11e8-af54-001b21c9af68      -      -      -        -         -      -      -
  mirror                                10.9T  7.01T  3.86T        -         -     8%    64%
    gptid/7523feac-cea6-11e8-af54-001b21c9af68      -      -      -        -         -      -      -
    gptid/761f60bc-cea6-11e8-af54-001b21c9af68      -      -      -        -         -      -      -
freenas-boot                             232G  2.24G   230G        -         -      -     0%  1.00x  ONLINE  -
  mirror                                 232G  2.24G   230G        -         -      -     0%
    ada0p2                                  -      -      -        -         -      -      -
    ada1p2   




The first things I would look at would be the config of the replication job and the network configuration... perhaps some component isn't handling the jumbo frames and you have re-tries on every frame.


I've confirmed that the network switch is configured for jumbo frames, as are both FreeNAS units.

Screen Shot 2019-05-01 at 3.02.00 AM.png
Screen Shot 2019-05-01 at 3.02.42 AM.png



Let me know if there's any other information I can provide that would be of help. Thank you!
 
Joined
Jul 3, 2015
Messages
926
Ok, so you are snapshotting every day and keeping those snaps for 3 days. I'm still thinking that maybe your replication is chasing its tail and the snapshots are expiring before its up-to-date. I'd be tempted to keep your day snapshots for a week or better still two weeks and see if that sorts things out.
 
Joined
Jul 3, 2015
Messages
926
Can you actually see how fast replication is running in Mbps?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
What kind of data?
Is this for block storage?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I would also consider turning off the compression, since you're not trying to save bandwidth here.

You could think about how acceptable it would be for you to not use encryption (seems to be a LAN thing, so how "shared" is your LAN?). Set it to none if you can.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
First replication took three solid days. Subsequent replications have taken about as long.



What in detail would you like to know?

I mainly followed the how-to instruction video by Lawrence Systems, and also consulted the FreeNAS manual. Started by creating a periodic snapshot task for the dataset, and configured Replication Tasks to replicate to the secondary FreeNAS on weekends. It seems pretty simple and hard to screw up.



HHD? Do you mean HDD? As stated in my original post, I'm running enterprise-grade Seagate Exos 12TB SAS drives.
Hard to screw up is a matter of perspective.
With ZFS, a lot of simple things will hit the fan if you miss a step or make assumption without understanding the underlaying nature and behaviour of ZFS.
As per your info, 3 days lifespan for replication is your issue.
As pointed out by @Johnny Fartpants, the snapshots don't live long enough on the source server to be found and compared to on the remote server.

I can think of two scenarios, as I am not familiar with the GUI based replication, I can't really affirm it:

With a one week replication task, you will tell replication task to start every 7 days.
If your snapshots have a 3 day lifespan and as it is automated snapshot generation, I doubt a Hold on the snapshot is performed. It would be nice to have for such scenario, but it is not without issues..
So if you have a 5 minute interval for snapshot creation, then the way snapshot work is as follow:
You need to remember each dataset has its own snapshots.

From the very first daythe snapshot are taken, the number of snaphot per dataset will increase, 1 snaphot every 5 minutes. Over 1 hour you will have about 12 snaphots.
Over 3 hours, you will have 36 snapshots. If you were to specify a lifespan covering a year worth of snapshot, you will keep seeing the number of snapshot grow at a rate of 1 snapshot every 5 minutes and you could end up with thousands of them.
But because you have only a 3 hours lifespan, snapshots that are older than 3 hours old will be destroyed by ZFS on the source server.
So the maximum number of snapshot per dataset for your system shouldn't exceed 36 with an automatic snapshot.


On the first replication, the source only saw 36 snapshots and started sending those to the server. Lots of data at first but this is fine. As replication takes place, those snapshots are going to be replicated.
When replication takes place, I believe ZFS place a temporary hold on the snapshot being replicated. This mean none of the snapshot and the snapshot after it will be deleted even when they have lived longer than thir expected lifespan. This is a safety net from ZFS. But as soon as the replciation is complete, the temporary hold will be removed and old snapshots destroyed.




1) If you have "Delete Stale Snapshots....", the replication process will force snapshots with longer than expected lifespan to be deleted, but the replication should start properly as ZFS, when looking for incremental snapshot needs to find the youngest common snapshot present at both source and destination.
In you case, possibly all the snapshots that exist on the destinaion no longer exist on the source, and as a result, I suspect the replication task is forcing replication and destroying the content of the replicated data in order to start over.

2) Depending on the number of snapshots, it is possible the replication task is scanning the source and the destination server to look for common snapshots and maybe this task is a slow process as it can't reconcile the snapshots easily.

The reason I asked about the HDD (not HHD as you pointed out) was to look for SMR drives. Those can add significant delays on the replication when snapshots are being destroyed and this could have accounted for longer than normal operations. It seems I missed that part.

I suspect case 1) is the reason of you problem.
I think the replication is trying to find a snaphot it can't find and as a result wipe the data on the destination in order to start fresh.


The answer would be to increase the lifespan of the snapshots, but I would not make it 2 weeks either,I would amke it a few weeks or months to be safe.
Or if you don't then you can have another automatic snapshot with longer lifespan but longer intervals, such as every 12hrs and lifespan of 6 months.
 

XStylus

Dabbler
Joined
Nov 22, 2017
Messages
20
I would also consider turning off the compression, since you're not trying to save bandwidth here.

Not a bad idea. Done.


Can you actually see how fast replication is running in Mbps?

Here's a screenshot, after turning off encryption. Only traffic happening is between the two FreeNAS units.

Screen Shot 2019-05-01 at 12.42.13 PM.png
Screen Shot 2019-05-01 at 12.46.44 PM.png


That still seems slow to me. On an 8-drive RAID1+0 array (or whatever the ZFS name for that is), I'd expect the transfer speeds to be north of 500MBps at worst. Is there something I've overlooked?

What kind of data?
Is this for block storage?

It's an SMB share.


Hard to screw up is a matter of perspective.
The answer would be to increase the lifespan of the snapshots, but I would not make it 2 weeks either,I would amke it a few weeks or months to be safe.
Or if you don't then you can have another automatic snapshot with longer lifespan but longer intervals, such as every 12hrs and lifespan of 6 months.

This is a proof-of-concept system that will eventually grow to 300TB in its final form, and will have very large video files moving on and off the system regularly. Having snapshots with very long lifespans will tie up large amounts of data for long amounts of time.

I've increased the snapshot lifespan to one week, but that's as long as I'd dare go.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
This is a system that will eventually grow to 300TB, and will have very large video files moving on and off the system regularly. Having snapshots with very long lifespans will tie up large amounts of data for long amounts of time.

I've increased the snapshot lifespan to one week, but that's as long as I'd dare go.
If you don't want to retain the files for very long, then you want to replicate them as soon as possible, rather than waiting a week to go by.
I have the feeling you think a snapshot is containing the actual files. Is that the case?
If so you want to read on snapshots.
I think you are still better of extending the lifespan of the snapshots, especially if you plan on growing to 300TB.
At your current rate it would take around 12 days to replicate the entire content of the pool if it was full.

I think in you case, replication may not be the right thing to do.
Maybe rsync or the like could help in this area with a script deleting the file on the source as soon as it has been synched to the remote server.

Unless, you misundersood the concept of snapshots and you want to retain the files on the source as well.
 

XStylus

Dabbler
Joined
Nov 22, 2017
Messages
20
If you don't want to retain the files for very long, then you want to replicate them as soon as possible, rather than waiting a week to go by.

A week is way too long. I set it to replicate nightly. If I had said weekly, I misspoke. My apologies.


I have the feeling you think a snapshot is containing the actual files. Is that the case?

My understanding is that a snapshot (on a single machine) is a block-level snapshot of the contents of the system at a specific moment in time. Subsequent snapshots are just diffs of the blocks that have changed in relation to the previous snapshot.

When replicated to a secondary machine, my understanding is that it must transfer everything on the first replication, and subsequent replications are just copying over the diffs.

Is my understanding correct?


I think you are still better of extending the lifespan of the snapshots, especially if you plan on growing to 300TB.

We're a post-production house. We have projects that are often several terabytes in size, and we're juggling a hundred of them at any one time. When a project is done, we offload it to LTO so as to reclaim storage space.

If I extend the lifespan of the snapshots, a project that we remove from the FreeNAS won't actually be able to reclaim the space it occupied until the snapshots have expired.


At your current rate it would take around 12 days to replicate the entire content of the pool if it was full.

To which I ask, why is it replicating so slowly?
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
A week is way too long. I set it to replicate nightly. If I had said weekly, I misspoke. My apologies.
This is what you wrote:
I have set a replication task to do a weekly snapshot replication from the primary FreeNAS unit to the secondary FreeNAS unit. The task takes several days to complete.
With ZFS and Freenas in this case, replication should occur as soon as snapshots have been created, so no need to wait for a weekly replication. But you can specify times when you can limit replication task. This is useful if you delay the replication to late at night when users are not accessing the files.

My understanding is that a snapshot (on a single machine) is a block-level snapshot of the contents of the system at a specific moment in time. Subsequent snapshots are just diffs of the blocks that have changed in relation to the previous snapshot.
Somewhat correct. Snapshots are just pointers to the blocks. They just describe which block are used, so snapshots don't take much space. However, the block they point to contains the content of your files and whatsnot.
By themselves, having a large number of snapshot will have very little effect in consuming storage space. Longer lifespan of a snapshot also means longer preservation of a block, especially event f the files has been deleted long ago, the block will still be there until the block is no longer pointed to by any of the snapshots.

When replicated to a secondary machine, my understanding is that it must transfer everything on the first replication, and subsequent replications are just copying over the diffs.

Is my understanding correct?
Correct, but if the first replication takes more than 3 days to complete and the lifespan of the snapshots is less than 3 days then, when it is time to perform the incremental replication both both snapshots must exist for ZFS to be able to proceed. If it can't then it might destroy the remote snapshots and start from scratch.

To know if it is the case, you should monitor the pool capacity on the remote side to see if the dataset being replicated drops to 0.
If you can see the graph showing a sawtooth like shape, then it means replication starts from scratch, meaning that the data will be sent over again.

We're a post-production house. We have projects that are often several terabytes in size, and we're juggling a hundred of them at any one time. When a project is done, we offload it to LTO so as to reclaim storage space.

If I extend the lifespan of the snapshots, a project that we remove from the FreeNAS won't actually be able to reclaim the space it occupied until the snapshots have expired.
Then I would maybe consider assigning each project its own dataset.
This is more maintenance on your part, but it will allow you to increase snapshot lifespan and when the project is completed and replication has been finalized, you can then destroy the dataset on the source. ZFS will then free all the blocks locked by the snapshot for that dataset.
The benefit, from having a dataset per project, means that you can recover or replicate projects as needed as opposed to replicating an entire pool just to recover a few files.
This is still debatable in your case, but should open some options.


To which I ask, why is it replicating so slowly?
Well, it all depends what you define slow.
Is it the actual transfer of about 335MBs that is the issue or the fact incremental replication seem to take as long as the first replication?
The answer to your questions are be found within my answers.
In your case, if you are dealing with TBlike files over 100 projects, assuming those files are modified, then you might end up with huge block changes. But that I don't really believe to be the case as your pool is half full. But then again, just for the sake of understanding the issue, it is possible you are making continuous changes to the various project files and those change are being locks up to 3 days every 5 minutes, at the same time, because the lifespan of your snapshots are only 3 days, every minute past the deadline the corresponding snapshot will be destroyed and the blocks freed if they are not locked by another snapshot.
So it is conceivable that what couls have been a simple incremental replication which should be of a reduced size, it is possible the amount of data locked by the newer snapshots will still require a huge amount of data to be transmitted.

You can verify this by running an incremental replication with the -vvv options, but for that you need to run through CLI. I don't know if the replication done throught the GUI shows you the expected amount of data to be sent out.

To monitor traffic, your original approach is not relevant.
What you want is using Netdata (under services) to monitor metrics within the last 24 hours. You want to be able to monitor dataset capaciy usage as well as network activity over those periods. This will give you more useful information than the network bandwidth under the dashboard.
You can aslo look at the RRD from the old GUI, much better than the new Angular based GUI.
 
Joined
Jul 3, 2015
Messages
926
So to start with 335MB/s is not bad going even with 10Gb networking. I have several systems with dual 10Gb replicating daily and never see more than 500MB/s although anything between 250 - 500 is normal. You will find that your limiting factor is either sshd and/or compression. Next time replication starts take a look at your system and see what resources are being used and by what. If you have a search on the forums there is a guide of how to improve replication speed by using netcat and although interesting its still not something I do as the default works fine for me.

I think replication would work fine for what you are wanting to do. I have many systems up-to 500TB each with TBs added every day and use replication as my backup/second copy of the data and have done for years. The important part is getting your snapshot schedule right along with your replication times. Snapshots consume very little space unless you have users adding TBs of data and then within a day or two deleting it all. If keeping snaphots for a week or more is not acceptable to you then forget it as it's not going to work. I have four snapshot schedule's, hour for a day, day for two weeks, week for a month and month for a year. The only one that catches me out sometimes is month for a year and all I do is manually purge some of those as and when required but in your case you could just leave that schedule out.

It really does seem that your issue is simply snapshots expiring too soon and I think with a small tweak you will get exactly what you are after.

Best of luck
 
Last edited:
Top