Replication Enhancement - Progress reporting - for community comment

Status
Not open for further replies.

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
I have been working on numerous enhancement to replication (for 9.1 release code) with the aim of getting them incorporated upstream; however before I request a pull I would appreciate community code review/testing - especially on error situations

The following change has been tested for several weeks and appears to be working as intended:
- replication status included on storage/zfs snapshots tab. Distinguish between replica, latest, new and in progress transfers (part ticket #778)
- replication progress for in flight transfers shown as % of data to transfer
- If replication fails (for whatever reason) then replica snapshots are not ALL auto expired on replication server (if it also performing snapshots) which then requires a full resync #2115
- Removing expired replica snapshots on replica server (excluding the latest one) is possible via running autosnap.py in cron. This is an alternative to keeping primary and replica server in sync (#388 appears to be a legacy ticket which can be closed)
- Also in testing ability to have multiple snapshots scheduled on same dataset is working as expected (eg hourly retained for 12 hours, daily for 5 days etc) ticket #1646 - NB I think this was upstream changes though I also stopped referencing a DB field which may become invalid in this use case

Items outstanding:
- Amend function which deletes periodic zfs replication to also remove 'replica' status on replication server snapshots (essential)
- Update of ZFS snapshot screen every x secs - to show updating replication progress (desirable)


Code is on https://github.com/noprobs/freenas/tree/repl-progress branch the following files have been changed (and should be copied to a test server)
gui/common/__init__.py
gui/freeadmin/api/resources.py
gui/middleware/notifier.py
gui/middleware/zfs.py
gui/templates/storage/snapshots.html
gui/tools/autosnap.py
gui/tools/autorepl.py


Feedback appreciated!

FYI Other development in progress/consideration
1) Allow option to make a replication inactive (rather than deleting it)
2) Enable primary and replication server to keep snapshots for different time periods eg keep for 1 day on primary server and 5 day on replica server
3) Improve control over replication bandwidth to allow eg 1Mbps at any time and 10Mbps overnight
4) Remove snapshots which have zero size (see fracai script)


To achieve the above I will need to create new field in the SQLite DB. I have yet to investigate how I do this without messing up version upgrades - comments appreciated.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm not a developer, so don't take my comments as official input...

Why not put a ticket in at support.freenas.org and include the files? Let them get incorporated into the official alpha build and then it'll get tested as the beta and release candidates are tested.
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
OK hopefully finished my coding and I have been testing in my set-up for a few weeks

I am nearly ready to request a pull from github (hopefully for inclusion in release after 9.1) however given the extent of the changes (and therefore potential for error) I would welcome feedback from community/ix systems.

This is my branch https://github.com/noprobs/freenas/compare/repl-enhance

This is my summary of my changes and references to (partly0 related tickets

- freenas:state values extended to show 'Replicated', 'Replica', 'Queued', 'In_Progress' and 'Latest_Replica'. This is shown as replication status on main snapshots template (part ticket #778)
- Show progress of individual replication as % complete - again on status screen. NOTE: Have to refresh screen to watch increment
- Added option to not preserve filesystem path on replication (zfs receive -e) ie enable Local/Data/Subdata to replication to Remote/Repl/Subdata (#1851, #2062)
- Change in method of determining which snapshots need to be sent, now build list based on difference between local and remote file system. Numerous improvements:
- Replication not fail if some snaps removed from remote or LATEST flag removed/changed
- Can now replicate to multiple remote servers (#1455)
- Better recovery of remote and local differ without forcing total resync
- Support multiple different snapshot schedules on same filesystem eg every 5 mins for 1 hour, every 1 hour for 24 hours etc (NOTE this was mainly a result of sorting change in 9.1 rather then my code) (ticket #1646)
- Fix bug if replication fails for a period (eg network issue) then in some circumstance all replica snapshots can be expired causing replication fail/full re-send needed (ticket #2115)
- Added a new recursion option on replication which performs recursion via loop. This forces a check that all child replications worked correctly, sets freenas:state and also allows for additional non recursed child snaps to be replicated (#1999, #2360)
- Fix bug (#2359) by changing initialise once routine to destroy all snaps (vs delete remote dataset) unless there is a clone.

Thanks in advance
 

mstrent

Dabbler
Joined
Oct 11, 2012
Messages
21
I want to give you a huge THANKS! I have been testing FreeNAS 9; hoping to purchase a TrueNAS and use a FreeNAS as a backup. The replication issues you are addressing are at the top of my list of concerns before implementing FreeNAS/TrueNAS in my business production environment.

Here's hoping iXsystems and yourself can get this stuff merged ASAP. I'll be eagerly watching this. Thanks again!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
noprobs -

Excellent work. Did you put a ticket in for this at support.freenas.org? If you don't nobody will know you've done all this great work. :)
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
mstrent - yes this was my commit to github and have a pull request (actually 3) open to ix systems. Hopefully they will get a chance to review and provide feedback soon. It was quite a sizable change so I am guessing they will have quite a few suggestions for code improvements before it is ready for merging. One other niggle is that it is not a clean merge due to other changes since I checked out the code. I am hoping to get into master soon to give time for wider validation.

I also want to refresh my repository on 9.1 rel to start working on next enhancement - different retention periods on replicated and local snapshots.
 

Joe H

Cadet
Joined
Sep 9, 2013
Messages
1
Noprobs. I want to also add my thanks here. I've been thinking about how to refactor the replication and snapshot features. Looking forward to seeing these features merged.
 

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
Curious to know if this is moving along. A graph for replication status would be awesome.
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
I had a working version (in Dev) however after some feedback on the code (potential issues I had not considered) as well as planned progress reporting native in ZFS I backed away from the progress reporting, though other elements were 'near complete'. I was under impression that replicator code was mid rewrite by ixsystems so I stopped work. Via separate email to Dev list I will try to get a view from ixsystems whether I should dust down and make available.

The other feature I have coded to replication and have working in my own production is compression of replication data streams.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes, this was integrated in 9.2.1.5 I believe. However it is broken on all 9.3.1 and 9.10 releases as far as I know.

Someone really should put in a bug ticket for it, as it should be fixed.
 

dpearcefl

Contributor
Joined
Aug 4, 2015
Messages
145
Turns out you can see the progress in the shell:
Code:
% ps aux | grep "zfs:"
root          84090    7.4  0.0  48608   3724  -  S     8:05AM     5:03.85 zfs: sending tank1/Storage1/VMLinks@auto-20160615.1800-100y (84%: 371758245800/439048871216) (zfs)
 
Status
Not open for further replies.
Top