Site to site replication

RichR · Feb 2, 2019

Hi.

My goal is site to site replication. I'm replacing current storage with new FN-cert hardware. I'm also totally changing the pool layout.

Background - I have a bunch of FN servers, and still administer 8 of them daily, BUT, I've always run daily rsync scripts (not me, cron actually), which have never failed in 8 years. The great advantage of using rsync for me is I've always had remote duplicate filesystems that were ready to go if needed. The disadvantages are 1) rsync is not atomic and takes a long time to run. I have millions of files, and although I run numerous scripts simultaneously on different directories, it still takes a long time over a 10Mbit dedicated MAN connection. Therefore, I can really only realistically run it daily, which shoots the new goal of < 1 hour RPO and RTO out the window, and I can really lose 24 hours worth of data which is not acceptable. 2) Well, this is just an extension of #1, in that there is no possible way I could effectively run rsync every 5 minutes, which is what I want to do with snapshots.

****************************
For this discussion, assume I have 2x FN 11.2 servers in two different data centers. Also assume, and this is really the big one in my opinion, I do not need to go back in time to pull out old files^^^. I want a replica (or at least be able to get there somewhat easily) of what I have in data center A (DCA) in data center B (DCB).

I'm thinking of having snapshot/replication tasks being:

every 5 minutes 8am - 9pm M-F
hourly 8am - 9pm Sat, Sun
hourly 10pm-7am daily

which basically gives me a hourly snapshots except for M-F during busy hours when I have 5-minute snapshots 8am - 9pm.

I've tested snapshots and replication (but not with this exact schedule) and find that at any time, I can go to the latest snapshot @ DCB, clone the snapshot, then promote it, and I seem to have a replica working filesystem of what's at DCA (which is what I want).

Questions (round 1) (please keep ^^^ from above in mind)

1) Do snapshots refer to data diff in the previous snapshot solely based on the time it was taken, or based on the snapshot with the same increment. For example, does the "hourly" snapshot taken at 10pm Friday night reference the previous 5-minute snapshot from 9pm, or the previous hourly one that was done at 7am? I'm asking because I don't understand things like "keeping daily's for a week, weekly's for a month, monthly's for a year". IF for example you're taking daily snapshots, is there really such a thing as a monthly snapshot?

2) What happens when you manually delete a snapshot, or when it's removed automatically? Through some testing @ DCB, I had snapshots replicated automatically from DCA to DCB, with about 3GB in the first, then I didn't change any files @ DCA. I waited a while for a bunch of other snapshots to be replicated from DCA to DCB, pretty much all 0's, stopped the replication task @ DCA, then deleted them all except the last one @ DCB, so all I had left was the last one. I cloned it, then promoted it. I'm trying to figure out how it seemed I had 3GB of data in the dataset. I figured if I deleted the first snapshot, the data would be gone.

This leads to figuring out "Snapshot Lifetime" in the Periodic Snapshot Task, and "Deleting Stale Snapshots" in Replication Tasks.

3) Ok then, so how long should I keep them? I'll have roughly 179 snapshots per day during the weekdays. I'm trying to understand the "maintenance" that I need to do. What do I need to do to keep things manageable, streamlined, and easy to "convert" if needed?

4) Does cloning, then promoting a snapshot "add" it to an existing filesystem if I have one? Meaning... let's say things are rolling along, and for some reason I have to clone then promote a snapshot. fine. Then I get another snapshot. When I clone it then promote it, does it add to what in the current dataset?

5) Once a clone is promoted, is there a reason to keep any of the old snapshots around?

6) Are there any maintenance "gotchas" that I have not thought about?

My goal is to test all of this out for the next couple of weeks on the two new systems before I start copying real data.

Many, many thanks.

Chris Moore · Feb 2, 2019

Whether you do this by rsync or by snapshots is not really a difference. In either case, you are only moving the amount of data that has changed.
How much data is changing on a daily basis? If y

RichR said:
Do snapshots refer to data diff in the previous snapshot solely based on the time it was taken, or based on the snapshot with the same increment.

I will try to answer with an example. If you take a snapshot at 9PM and you take another snapshot at 10PM, and nothing has changed, the amount of data that will be 'referenced' by the snapshot will be zero. The snapshot is only the difference from the last snapshot.

Chris Moore · Feb 2, 2019

RichR said:
What happens when you manually delete a snapshot, or when it's removed automatically?

Regardless of how the snapshot is deleted, it frees that reference to the block on disk. If there are any other snapshots that also reference that block on disk, the disk space is not yet free. Once all snapshots that reference a disk block are deleted, the block returns to free space.

RichR said:
I'm trying to figure out how it seemed I had 3GB of data in the dataset. I figured if I deleted the first snapshot, the data would be gone.

As long as there is a snapshot that has a reference to a data block, the data block is still allocated.

RichR said:
Ok then, so how long should I keep them? I'll have roughly 179 snapshots per day during the weekdays. I'm trying to understand the "maintenance" that I need to do. What do I need to do to keep things manageable, streamlined, and easy to "convert" if needed?

I keep snapshots, but I use them for a different purpose. I use rsync, run on an hourly schedule, to keep systems updated, but I don't know the details of your environment. Personally, I would think five minutes is too often.
I have a setup like this:
task - make a snapshot every hour and keep 24.
task - make a snapshot every day and keep 30.
task - make a snapshot every month and keep 12.
I also make the occasional manual snapshot, but these give me the ability to go back and get a user file if someone makes an accidental deletion or change that they later realize was a mistake. It really saved my bacon a few times.

RichR said:
Does cloning, then promoting a snapshot "add" it to an existing filesystem if I have one? Meaning... let's say things are rolling along, and for some reason I have to clone then promote a snapshot. fine. Then I get another snapshot. When I clone it then promote it, does it add to what in the current dataset?

You can mount a snapshot as read-only without making a clone, so I wouldn't make a clone unless you really need to have a writable volume. I am not sure about your correct use of terminology, so please forgive me if I go into more detail than you need. If you make a clone, it can be mounted and is a writable volume with the same initial content as the snapshot. They are actually sharing the same data blocks on disk, so initially it consumes no additional space. This changes when data it edited or added because those blocks are different between the original and the clone and they take up space. You can make snapshots of the clone. It gets complicated. I don't do this because it is possible to have changes happen in both places at the same time and in my mind it is just asking for problems. If you promote the clone, it frees the snapshot that the clone was created from so it can be deleted and the content of the clone becomes the active version of the file system. Still, we are talking about links to blocks on disk, so the promotion should not cause a copy of information that is the same between the original and the clone and when the "original" is deleted, it deletes that link to the block on disk and if that is the last link to that block on disk, the block is marked as free space.

RichR said:
Once a clone is promoted, is there a reason to keep any of the old snapshots around?

No.

RichR said:
Are there any maintenance "gotchas" that I have not thought about?

It gets terribly complicated with clones. The reason for doing that is so you can have a live version of the data for production and another "live" version for testing and any changes made in the testing version are separate from the live data. Then if you decide to throw away the test version, you can just delete the clone. Being able to promote the clone is for the occasion when the test environment needs to become the primary environment and the 'old' environment can be deleted. I can see how it might be useful, but I don't see many uses for it.

I hope this was helpful.

RichR · Feb 3, 2019

Chris Moore said:
I keep snapshots, but I use them for a different purpose. I use rsync, run on an hourly schedule, to keep systems updated, but I don't know the details of your environment. Personally, I would think five minutes is too often.
I have a setup like this:
task - make a snapshot every hour and keep 24.
task - make a snapshot every day and keep 30.
task - make a snapshot every month and keep 12.

This schedule makes sense to me, but I want to dig a little further to clarify some points. To me, this seems like the minimum one would want in order to always have an hourly backup scenario with the minimal number of on-hand snapshots. This kind of has to do with my question "1) Do snapshots refer to data diff in the previous snapshot solely based on the time it was taken, or based on the snapshot with the same increment. ?"
"If" incremental snapshots of the same dataset are performed regardless of increment, what do they use as their previous reference? Let's put aside your monthly for a minute, and say all of this has to do with snapshots for tank/dataset1, and you take your daily at 2am. If using the FN GUI, you'd set up 2 tasks (not including the replication task), one for hourly snapshots keeping for 1 day, and one for daily snapshots, keeping for 30 days.

My main questions are, [1] does your daily @ 2am use the previous daily as its reference for diff or the hourly from 1am, and is this determined because of the the dataset that is being snapshotted (both being tank/dataset1)?

[2] Let's say we're starting from scratch, and for example purposes are only doing hourly backups with a life of 24 hours.... (sorry, I'm going to write this out visually), and we're sending all these to a remote system.

4 am - tank/dataset1@auto-20190202.0400 empty - no data added
5 am - tank/dataset1@auto-20190202.0500 empty - write 1GB
6 am - tank/dataset1@auto-20190202.0600 empty - no data added
7 am - tank/dataset1@auto-20190202.0700 empty - no data added
8 am - tank/dataset1@auto-20190202.0800 empty - no data added
9 am - tank/dataset1@auto-20190202.0900 empty - no data added
10 am - tank/dataset1@auto-20190202.1000 empty - no data added
11am - tank/dataset1@auto-20190202.1100 empty - no data added
12 pm - tank/dataset1@auto-20190202.1200 empty - no data added
1 pm - tank/dataset1@auto-20190202.1300 empty - no data added
2 pm - tank/dataset1@auto-20190202.1400 empty - no data added
3 pm - tank/dataset1@auto-20190202.1500 empty - no data added
4 pm - tank/dataset1@auto-20190202.1600 empty - no data added
5 pm - tank/dataset1@auto-20190202.1700 empty - no data added
6 pm - tank/dataset1@auto-20190202.1800 empty - no data added
7 pm - tank/dataset1@auto-20190202.1900 empty - no data added
8 pm - tank/dataset1@auto-20190202.2000 empty - no data added
9 pm - tank/dataset1@auto-20190202.2100 empty - no data added
10 pm - tank/dataset1@auto-20190202.2200 empty - no data added
11 pm - tank/dataset1@auto-20190202.2300 empty - no data added
12 am - tank/dataset1@auto-20190203.0000 empty - no data added
1 am - tank/dataset1@auto-20190203.0100 empty - no data added
2 am - tank/dataset1@auto-20190203.0200 empty - no data added
3 am - tank/dataset1@auto-20190203.0300 empty - no data added
4 am - tank/dataset1@auto-20190203.0400 empty - no data added
5 am - tank/dataset1@auto-20190203.0500 empty - no data added / tank/dataset1@auto-20190202.0400 is deleted
6 am - tank/dataset1@auto-20190203.0600 empty - no data added / tank/dataset1@auto-20190202.0500 is deleted

Ok so we had a dataset created with 1GB of data over 24hrs ago, now it's deleted. We added nothing since. What should the remote system have?

Thanks,

Rich

jgreco · Feb 3, 2019

RichR said:
My main questions are, [1] does your daily @ 2am use the previous daily as its reference for diff or the hourly from 1am, and is this determined because of the the dataset that is being snapshotted (both being tank/dataset1)?

It probably helps to understand that ZFS has no concept of "hourly" or "daily" snapshots. Those are things for those dumb humans who use it.

So snapshots are differentials between the current snapshot and the previous snapshot. What you happened to call it is irrelevant.

Chris Moore · Feb 3, 2019

This is the way I understand it, but my understanding may not be perfect even though it works for me. I have been reading about it and using it for years, but I know that it is possible to use something and not have a perfect understanding of how it works. For example, I mostly understand the principles of the operation of my car, but I am not 100% on every aspect. That said, I encourage you to read and study more about ZFS.

RichR said:
My main questions are, [1] does your daily @ 2am use the previous daily as its reference for diff or the hourly from 1am, and is this determined because of the the dataset that is being snapshotted (both being tank/dataset1)?

It is kind of funny how snapshots work because both snapsots are really just a collection of links to blocks on disk, references, and as long as those blocks on disk are referenced, they can't be marked as free. The 'size' of a snapshot is the amount of data referenced that is different from the previous snapshot, but the current snapshot also has links to all the other data that is the same.

RichR said:
"If" incremental snapshots of the same dataset are performed regardless of increment, what do they use as their previous reference?

I hope I can more clearly explain with an example.
I offset my schedules slightly so the hourly, daily and monthly are not all happening at exactly the same moment. The next snapshot is always based on the previous snapshot, but each snapshot can actually stand alone. That means, or example, if I have a snapshot from the Monthly schedule that is made 3 minutes after the snapshot from the Daily schedule, the Montly contains the difference, not from the last Monthly, but from the Daily. Yet, if that Daily gets deleted, there is still a link to all the data blocks that were referenced so the data blocks are still locked and not free. So, the amount of data referenced by the Monthly snapshot might change if the Daily is deleted because the Daily isn't locking the blocks any more. Blocks on disk are not free until there are no more references to them.

RichR said:
Ok so we had a dataset created with 1GB of data over 24hrs ago, now it's deleted. We added nothing since. What should the remote system have?

All the snapshots taken after the snapshot that had the 1GB of data will also contain a reference to that 1GB of data, so if you delete the snapshot that 'contains' the 1GB, that reference will move to another snapshot. Deleting the snapshot only deletes the link, not the data.

Chris Moore · Feb 3, 2019

Here is a list of some of the snapshots on my system:

Code:

root@Emily-NAS:~ # zfs list -t snapshot
NAME                                                                                   USED  AVAIL  REFER  MOUNTPOINT
-snip-
Emily/BigPond@auto-20180916.1543-7d                                                   14.5M      -  9.22T  -
Emily/BigPond@auto-20180917.1543-7d                                                   12.0G      -  9.23T  -
Emily/BigPond@auto-20180918.1543-7d                                                    512K      -  9.25T  -
Emily/BigPond@auto-20180919.1543-7d                                                       0      -  9.25T  -
Emily/BigPond@auto-20180920.1543-7d                                                       0      -  9.25T  -
Emily/BigPond@auto-20180921.1543-7d                                                       0      -  9.25T  -
Emily/BigPond@auto-20180922.1543-7d                                                       0      -  9.25T  -
Emily/BigPond@manual-23Sep2018                                                        4.83G      -  9.25T  -
Emily/BigPond@manual-19Aug2018exit                                                    8.89G      -  9.93T  -
-snip-
freenas-boot/ROOT/11.1-U7@2017-09-07-18:48:48                                          736M      -   737M  -
freenas-boot/ROOT/11.1-U7@2017-09-26-18:05:27                                          725M      -   726M  -
freenas-boot/ROOT/11.1-U7@2017-12-13-22:17:24                                          727M      -   728M  -
freenas-boot/ROOT/11.1-U7@2018-01-18-22:10:58                                          825M      -   826M  -
freenas-boot/ROOT/11.1-U7@2018-02-22-18:40:19                                          825M      -   826M  -
freenas-boot/ROOT/11.1-U7@2018-03-24-00:24:12                                          832M      -   833M  -
freenas-boot/ROOT/11.1-U7@2018-06-01-00:02:38                                          836M      -   837M  -
freenas-boot/ROOT/11.1-U7@2018-09-09-15:28:31                                          838M      -   849M  -
freenas-boot/ROOT/11.1-U7@2018-12-27-05:41:49                                          838M      -   849M  -
freenas-boot/ROOT/11.1-U7@2019-01-27-11:03:32                                          780M      -   792M  -
freenas-boot/ROOT/9.10.2-U5@2016-10-04-05:53:17                                        164M      -   638M  -
freenas-boot/ROOT/9.10.2-U5@2016-11-07-21:58:46                                        160M      -   636M  -
freenas-boot/ROOT/9.10.2-U5@2016-11-12-11:13:47                                        160M      -   637M  -
freenas-boot/ROOT/9.10.2-U5@2017-03-11-00:51:47                                        165M      -   636M  -
freenas-boot/ROOT/9.10.2-U5@2017-04-22-14:12:05                                        652M      -   653M  -
freenas-boot/ROOT/9.10.2-U5@2017-06-10-20:40:24                                        653M      -   653M  -
freenas-boot/grub@Pre-Upgrade-FreeNAS-8bc815b059fa92f1c8ba7c7685deacbb                6.77M      -  6.79M  -
freenas-boot/grub@Pre-Upgrade-Wizard-2016-07-16_00:56:31                                18K      -  6.33M  -
freenas-boot/grub@Pre-Upgrade-9.10.1                                                    18K      -  6.33M  -

The way I understand this, the USED column is the amount of data that the snapshot references that is different from the previous snapshot.
I hope this helps.

RichR · Feb 3, 2019

Chris Moore said:

Thanks - so a little after 11am, I added some data to the lone dataset... I figured "USED" would have "something" in it for the snapshot done at 12:00....
Why is "USED" @ 1200 "0" when clearly data was added?????????? I don't think I'm getting more confused, but it sure looks that way.

Code:

root@pod-12pri[~]# zfs list -t snapshot
NAME                                            USED  AVAIL  REFER  MOUNTPOINT
freenas-boot/ROOT/default@2018-12-18-18:01:08  1.82M      -   760M  -
pool1/backups/dataset2@auto-20190202.2209-1d       0      -  4.18G  -
pool1/backups/dataset2@auto-20190203.0300-1d       0      -  4.18G  -
pool1/backups/dataset2@auto-20190203.0400-1d       0      -  4.18G  -
pool1/backups/dataset2@auto-20190203.0500-1d       0      -  4.18G  -
pool1/backups/dataset2@auto-20190203.0600-1d       0      -  4.18G  -
pool1/backups/dataset2@auto-20190203.0700-1d       0      -  4.18G  -
pool1/backups/dataset2@auto-20190203.0800-1d       0      -  4.18G  -
pool1/backups/dataset2@auto-20190203.0900-1d       0      -  4.18G  -
pool1/backups/dataset2@auto-20190203.1000-1d       0      -  4.18G  -
pool1/backups/dataset2@auto-20190203.1100-1d       0      -  4.18G  -
pool1/backups/dataset2@auto-20190203.1200-1d       0      -  5.41G  -

Important Announcement for the TrueNAS Community.

Site to site replication

RichR

Explorer

Chris Moore

Hall of Famer

Chris Moore

Hall of Famer

RichR

Explorer

jgreco

Resident Grinch

Chris Moore

Hall of Famer

Chris Moore

Hall of Famer

RichR

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Site to site replication

Explorer

Hall of Famer

Hall of Famer

Explorer

Resident Grinch

Hall of Famer

Hall of Famer

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Site to site replication"

Similar threads