Replication is sending 95GB of Data for a 33MB snapshot

GEOptic

Dabbler
Joined
Apr 18, 2018
Messages
42
Good Day Ladies, Gentlemen,

We had our replication server local for a long time (so that it uses GB network to replicate few TeraBytes).
Then, we were confident we could send it out of the Office for secure replication on a 50Mbps link.
Replication send only changes right? That should be quite small ... well 33MB is small enough.

Now what! How come the replication task is sending 95GB of data for a 33MB snapshot...
All replication tasks were created with 11.2 many months before moving the target server.

Here's the Snapshot:
GE1-Snap.JPG

Here's the Replication Task running :
GE1-Repl.JPG


The remote server was sync before we moved it...

I just don't get it ... Command line "zfs list -o space -r Vol1" list all snapshot and even if it should send all of them, the total is less the 1GB.

Vol1/ge1-home@auto-20201123.1200-1d - 33.0M - - - -
Vol1/ge1-home@auto-20201123.1300-1d - 15.3M - - - -
Vol1/ge1-home@auto-20201123.1400-1d - 42.7M - - - -
Vol1/ge1-home@auto-20201123.1500-1d - 44.1M - - - -
Vol1/ge1-home@auto-20201123.1600-1d - 32.8M - - - -
Vol1/ge1-home@auto-20201123.1700-1d - 21.5M - - - -
Vol1/ge1-home@auto-20201123.1800-1d - 61.6M - - - -
Vol1/ge1-home@auto-20201123.2000-1w - 2.51M - - - -
Vol1/ge1-home@auto-20201123.2031-1m - 2.51M - - - -
Vol1/ge1-home@auto-20201124.0800-1d - 41.3M - - - -
Vol1/ge1-home@auto-20201124.0900-1d - 39.2M - - - -
Vol1/ge1-home@auto-20201124.1000-1d - 31.0M - - - -
Vol1/ge1-home@auto-20201124.1100-1d - 58.1M - - - -
Vol1/ge1-home@auto-20201124.1200-1d - 24.0M - - - -
Vol1/ge1-home@auto-20201124.1300-1d - 24.9M - - - -
Vol1/ge1-home@auto-20201124.1400-1d - 25.2M - - - -
Vol1/ge1-home@auto-20201124.1500-1d - 29.2M - - - -
Vol1/ge1-home@auto-20201124.1600-1d - 18.6M - - - -

Anyone can light me up on this!??
 
Joined
Jan 7, 2015
Messages
1,155
It might be snapshots.
 

GEOptic

Dabbler
Joined
Apr 18, 2018
Messages
42
Good Day John ...

Even sending all the snapshots for the Dataset ... the total is way beyong 1GB ...
It's sending 95GB !!!

After transfert succeeds ... the snapshot on the remote server is indeed 33MB !

I must do something wrong ... Can't believe this functionnality is not working properly... :(
 
Joined
Jan 7, 2015
Messages
1,155
Hmm. Not sure then, someone else will chime in on this.
 

GEOptic

Dabbler
Joined
Apr 18, 2018
Messages
42
I'm still puzzled .. ;) Maybe it's caching up on old snapshot for some reason ... But the GUI and command line gave no clue... :(

Thanks John !! Be safe, keep being Wise! ;-)
 

onlineforums

Explorer
Joined
Oct 1, 2017
Messages
56
Running into same issue as the OP. Replication task off site. It is sending a snapshot showing "used" as 0 and "referenced" as 30 GB. It is dated a day after another "used 0" and "referenced 30 GB" but it is seemingly sending all 30 GB even though the original 30GB was previously sent and the snapshot is literally not taking any space yet replication is sending it all over again and then the destination is seemingly deleting the data.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
It's difficult to diagnose from here.... 33MB referencing 3.19TB is fairly extreme. Is this what you think is the situation with that dataset?
Is there much read or write activity?

I notice you are running FreeNAS 11.3-U3.1 and then replicating to TrueNAS 12. It would be better have them at the same level...most testing is done between systems of the same software version. If the next replication is also slow, you could report a bug.
 

GEOptic

Dabbler
Joined
Apr 18, 2018
Messages
42
Good Day Captain!! :)

We're actually sending snapshot every hour... So for this dataset, 33MB is ok ... for 1 hour.
And I noticed that MOST replication send a huge amount of data compared to the actual snapshot ...
As it's a builtin functionnality ... I first thought I was doing something wrong or just switching subnet could have caused this ...
I use rsync with other Linux box, and it's quite stable and fast ...
I guess I will report this as a bug.
I may end up try to use Syncthing ... We had our lot of troubles with Replication!
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Thanks... I'd recommend you update to 11.3-U5 first and see if the issue still exists. That version is the most mature and tested and should have few, if any major issues remaining.
 

GEOptic

Dabbler
Joined
Apr 18, 2018
Messages
42
Hello all!

I may have found something on this...

I suppose that there were some snapshots mismatch (no idea why) and the replication tried to resync with an old common snapshot.
And although the GUI and command line list a specific snapshot with a specific size, it was transfering data way back from that common snapshot... way back, I mean few months ...
How come? I don't know ... we just moved the server to remote location, changed IP/subnet and tried to replicate.

Tks for your help! It does replicate good now, although still day time ... :(
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Thanks for the update..... I can see why one update may require the old snapshot data as well. However, I'm surprised it happens every replication. If it does not self-correct, I would view that as a bug. However, it might be a result of the different versions of zfs between 11.3 and 12.0. We'd need to test after updating to 11.3-U5 and then 12.0-U1.
 

GEOptic

Dabbler
Joined
Apr 18, 2018
Messages
42
UPDATES :

Upgrade to TrueNas 12u1 on all systems clear replication issues. ;)
 
Top