How can I verify that replication is working correctly?

Status
Not open for further replies.

indivision

Guru
Joined
Jan 4, 2013
Messages
806
I have set up snapshots on one FreeNAS server. Also, I have set up replication tasks, pointing to a second FreeNAS server.

I set up a dataset on the second server called "snapshots." However, it basically shows itself as empty. Is that normal? How do I know how much space the snapshot replications are taking up?

Also, under the "ZFS Replication" tab, I see my replication tasks listed. However, there is a "Last snapshot" column that stays blank for all replication tasks. Shouldn't that column list information about the last snapshots replicated?
 

scubaaadan

Cadet
Joined
Feb 14, 2013
Messages
2
My way of knowing if the replication is happening is looking at the processes to see if ssh is taking up more CPU than I think it typically should. For example I believe I'm replicating right now because I see the ssh process at 10% on the sender and 15% on the receive. (i.e. higher than I expect) and I see a process called throttle (I configured the rate limit on the WAN replication). Finally I can match traffic pattern on the graphing page... matching higher than expected send (TX) traffic on one side to receives (RX) on the other.

I wish there was better reporting for zfs replication. I'd love to know statistics and progress for the current replication task and a guess at how much time remains.

P.S. That column is blank for me too. (v8.3.0-RELEASE-p1)
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Thank you for your reply. Good to know that I'm not the only one with that column blank.

I have noticed that the traffic pattern matches in the graphs also. So, that seems like a good sign. I've just been a surprised that there doesn't seem to be an indication of how much space the replication takes up. So, I've taken that as a potential red flag. Also, I tried looking at the directories from the shell. I'm a little unsure of how permissions are set up and how to navigate around that. But, the directories look empty to me...
 

ben

FreeNAS GUI Developer
Joined
May 24, 2011
Messages
373
indivision, why not try cloning one of the snapshots on the replication target and seeing if you can actually recover your files from it. That's the real test of a backup, so that should be the one you do.
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Thank you Ben. Actually, for some reason, it hadn't occurred to me that replications would appear in the snapshots tab on that machine. So, I hadn't looked there. I had thought that it was more of a file back-up to transfer over and rebuild snapshots on the original machine.

I haven't tested by making a clone, etc. But, I can see all of the expected information there. So, it looks like it's working.
 

ben

FreeNAS GUI Developer
Joined
May 24, 2011
Messages
373
I recommend doing a test anyway. An untested backup is no backup at all.
 

cbarber

Dabbler
Joined
Sep 23, 2017
Messages
17
Can anyone answer the original question - how to verify replication is working - in the context of 11.0? I'm supposedly replicating a snapshot from a 11.0 box to a 9.3 box. I just set this up and there is no positive indication that anything is happening and yet there are no error messages either. The snapshots are definitely happening, but the rep has indicated a status of "sending" since the rep was set up nearly an hour ago. There is at most a 2gb footprint to be replication across a gigabit Ethernet link through a single switch. I would think this would replicate in a matter of seconds.

I see no network activity. I see no snapshots in existing on the 9.3 receiver box.

What can I look at on the 11.0 box to understand what its replication is doing or attempting to do?

What can I look at on the 9.3 box (the receiver of the replica) to understand what, if anything, it has received?

Thanks
Chris
 
Last edited:

indivision

Guru
Joined
Jan 4, 2013
Messages
806
It sounds like something is broken at a high level. The status should change to read "Up to date" along with an indication of what the last snapshot sent was.

I would double-check your configuration of the replication, keys, etc.

Once you get that working correctly, you could try what ben suggests above and make a clone to see if the data is there. That doesn't work too easily for me because the snapshots are of large amounts of data. In that case, there is a way to make the replicated snapshot visible by changing a flag. I don't know the commands from memory. But, you should be able to find this detailed by searching the [forum] if you need to go that route.
 

cbarber

Dabbler
Joined
Sep 23, 2017
Messages
17
Indivision, could you be more specific as to "etc"? How much has to match between source and dest in terms of user accounts (I'm using root), permissions, etc. It's hard to believe there's a problem with the keys or ssh access because when I created the rep task on the source it automatically created the appropriate matching dataset or folder or whatever it is on the destination. That's all done through SSH, depends on the keys being set properly, right? What about permissions applied to the volume and dataset on the destination?

I read in another post or somewhere to verify that ssh is working by connecting from source to dest using shell on the source. that works, does not ask for a password, but it vomits out a massive amount of rubbish about certs or keys whose identity can't be verified, can't find the key in a dns server, etc., and finally asks if I really want to connect (yes/no). I have to type in the answer to get past this. wtf is that all about? why is it looking in dns for a key identity?

I stared at this thing for two hours while it said "sending" and obviously not doing anything. I even went out and back in to the gui, no change. Then I rebooted the source freenas server and when it came back it said it "failed" and started sending failure notification emails to me. Of course, there is no hint of a useful error message in any of this. Basically just says "it failed":

"The replication failed for the local ZFS Volume1-internal-sas/Exactimate while attempting to send snapshot auto-20170923.1010-1w to 192.168.1.19 "

Is there a log file or something I can look at that would have something useful in it?

Thanks
Chris
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Did you follow the steps in the documentation exactly? There's a part where you need to copy a key on one server and then paste it into the settings of the root account on the other. Did you do that?
 

cbarber

Dabbler
Joined
Sep 23, 2017
Messages
17
I did, manually copied (copy/paste, not typing) the key from the source into the root account settings of the destination. But I'm always suspicious of steps like this because the text of the key that appears in the box contains more stuff than is obviously part of the encoded key, like "this is the key for..." or whatever. So I copied the whole thing. Seems pretty sketchy to me. What exactly needs to be copied? I will certainly try copying the key over again.

And then on the source you do something like "scan for keys" which picks up the key from the destination and remembers it on the source. I'll recheck both of those and repeat the steps.

-cb
 

indivision

Guru
Joined
Jan 4, 2013
Messages
806
Copying the whole thing should work. That's what I did.

Let me know if running through the set up fixes it or not.

Did you turn on the SSH service?

I'm just trying to think of things that I've forgotten to do in the past... Permissions could come into play. I don't know what you might have done there. But, one strategy is to get it working with very permissive permissions. Then make the permissions more restricted after you see it all working.
 

Evertb1

Guru
Joined
May 31, 2016
Messages
700
Can anyone answer the original question - how to verify replication is working - in the context of 11.0? I'm supposedly replicating a snapshot from a 11.0 box to a 9.3 box. I just set this up and there is no positive indication that anything is happening and yet there are no error messages either. The snapshots are definitely happening, but the rep has indicated a status of "sending" since the rep was set up nearly an hour ago. There is at most a 2gb footprint to be replication across a gigabit Ethernet link through a single switch. I would think this would replicate in a matter of seconds.

I see no network activity. I see no snapshots in existing on the 9.3 receiver box.

What can I look at on the 11.0 box to understand what its replication is doing or attempting to do?

What can I look at on the 9.3 box (the receiver of the replica) to understand what, if anything, it has received?

Thanks
Chris
Do you see data on your targetted dataset? And what do you see in the status column of your replication task?
 

Evertb1

Guru
Joined
May 31, 2016
Messages
700
I have set up snapshots on one FreeNAS server. Also, I have set up replication tasks, pointing to a second FreeNAS server.

I set up a dataset on the second server called "snapshots." However, it basically shows itself as empty. Is that normal? How do I know how much space the snapshot replications are taking up?

Also, under the "ZFS Replication" tab, I see my replication tasks listed. However, there is a "Last snapshot" column that stays blank for all replication tasks. Shouldn't that column list information about the last snapshots replicated?
The targetted dataset should not be empty but contain just as much data as the replicated dataset (if the replication task worked).

I think it would be of benefit for you to read up a bit about replicationtasks and snapshots. You called your replication dataset "snapshots" and that is for you to deside, but if the replication task works, the dataset will contain the same data as the replicated dataset, based on a snapshot of that dataset.
 

cbarber

Dabbler
Joined
Sep 23, 2017
Messages
17
redid the whole thing, same result. Also discovered that even after deleting the not-working rep job, it is impossible to delete the datasets on the remote end that were created for it, without apparently rebooting at least the FreeNAS to which those datasets belongs, possibly both boxes. I don't know, that was another two hours of my life I'll never get back.

So I thought I'd try doing a rep from the 9.3 to 11.0 (reverse of what I was originally trying to do). Not happening, but this time it's because the 9.3 box won't run a periodic snapshot job. no matter what. Yes, first time any of these boxes have ever had snapshots set up on them. Never really had the need since the 9.3 is just a backup of a backup of a backup of some files and some virtual machines.

It does create a manual snapshot ok but that's no use for replication. So at this point I'm thinking that the 9.3 box is pretty much fubar. nothing seems to work other than basically reading and writing its disks. I'm going to step it up to 9.10 and see what happens, then maybe up to 11.

What's the deal with upping 9.x to 11? People seem to be all wound up about this. Does it work or what? And is there a separate optional update that alters the zvols? I really don't care about that, whatever it is.
 
Last edited by a moderator:

cbarber

Dabbler
Joined
Sep 23, 2017
Messages
17
Do you see data on your targetted dataset? And what do you see in the status column of your replication task?
Evertb1, thanks for chiming in. The status of the rep task said "sending" for two hours straight with obviously no network activity happening, until my patience ran out and I rebooted the FreeNAS 11 that was running the rep task. when it came back from the reboot it said the task was failing and finally starting sending failure notification emails. Nice of it to finally notice. But how would I see data on the target dataset (I assume you mean the recipient of the replicated snapshot)? This dataset is not shared. how would I look at the contents of it? Do you mean what appears in the target machine's snapshots list view? In that case no, nothing ever appeared there. But for sure there are snapshots on the 11 box that is running the rep task. those are definitely there.

Chris
 
Last edited by a moderator:

cbarber

Dabbler
Joined
Sep 23, 2017
Messages
17
Thanks for clarifying the replicated snapshot thing. You're right, there's a lot of detail I don't know... and a lot of it I shouldn't have to know, from my perspective, but as always one has to dig in to make these things work. So the replica is not a replica of the stack of snapshots, but a replica of the thing itself. Good to know. Now I understand what you meant by seeing the data on the target. And no, definitely nothing there. A few hundred k bytes of whatever overhead it takes to make an empty dataset. Definitely no 2 or 3 GB copy of my original test data on the other box.
-cb
 

cbarber

Dabbler
Joined
Sep 23, 2017
Messages
17
So I upgraded the 9.3 box to 9.10.2, which allows me to use the semi-automatic rep setup mode. Nice feature, saves a bunch of clicks. And now it's all working. Periodic snaps are working on the upgraded 9.10.2 now and I suppose the likely reason they seemed to not work on this box when it was 9.3 was that I failed to open up the time window for the replication. Really bad choice of default for that. But I think it was the 9.10.2 upgrade that sorted out whatever problem the 11 box was having trying to rep to the 9.3.

Everything is replicating now, both directions. Thank you everyone for your input.
Chris
 

Evertb1

Guru
Joined
May 31, 2016
Messages
700
Thanks for clarifying the replicated snapshot thing. You're right, there's a lot of detail I don't know... and a lot of it I shouldn't have to know, from my perspective, but as always one has to dig in to make these things work. So the replica is not a replica of the stack of snapshots, but a replica of the thing itself. Good to know. Now I understand what you meant by seeing the data on the target. And no, definitely nothing there. A few hundred k bytes of whatever overhead it takes to make an empty dataset. Definitely no 2 or 3 GB copy of my original test data on the other box.
-cb
While I am happy that you are on your way with your replication I feel that I need to explains some things.

There are some rules for the Forum. They are not there to pester you with it but they are there to let others help you with helping yourself. There ar a lot of (very) heplfull people on the forum but your best bet on getting good help is providing lots of infomation about your system and your problem.

I don't agree with your statement that you should not know a lot about the details. FreeNAS is free in the sence that you don't have to pay for the OS. But you pay for it with some efforts to learn about it. You are your own helpdesk. The forum is no helpdesk. If people notice that you invest time in learning, analyzing etc. you will be amazed about the efforts they put into helping you when you need it.

By the way: If you want to have some direct feedback of FreeNAS on your GUI it is possible to display the console output in the GUI (bottom of the screen). Go to "System" -->> "Advanced" put a checkmark behind "Show console messages in the footer:" and you will have direct feedback.
 
Last edited:
Status
Not open for further replies.
Top