SOLVED 11.2 to 11.3 : Problem with replication

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hi,

Here is my own experience about upgrading my FreeNAS servers in the last days.

So with some time in my hands, I chose to upgrade my FreeNAS servers (each one is in my signature). Know that here, Thanatos can be sacrificed any time. I will re-sync it over LAN from Atlas without any impact. Actually, at the end of the process, I intend to destroy and re-create its pool to adjust the compression level.

Once Thanatos is up-to-date and is stable, Atlas is the second to be touched. Should I end up with problems, I can re-sync it over LAN from Thanatos without too much impact. The one that must go through the process as smoothly as possible is Hades. Being hundreds of Km away, I can not re-sync it easily.

For doing my upgrades, I followed @Chris Moore advice in this post.

At first, each server was on a different version of FreeNAS 11.2. I pushed each of them to 11.2 U8. Nothing special here and all 3 servers did it well.

Thanatos was the first to be upgraded to 11.3 U5. It went well and I chose not to upgrade the pool's feature flag yet, just in case...
I then upgraded Atlas. The upgrade itself was good but when surfing the interface, I noticed that many things were marked as "Legacy". I also noted the new SSH Key and SSH Connection in the System menu.

First thing was to configure that SSH between the two servers. Easy and quick.

Next thing I tried to update was the replication task. I was hit by bug NAS-10816. Ok... I will destroy and re-create my replication tasks from scratch every time I need to change a single bit in them... Painful but should not be a show stopper. I did it and it seemed to work. Because it was my slow changing dataset, no new data was to be sync, so the problem did not show. It was when I did the replication task for my main dataset that I faced the problem : the replication task failed, saying that it could not find a previous snapshot to start from. Because I ensure that to re-start from scratch was unchecked, the task failed.

I cancelled the new replication task from Atlas to Thanatos and re-created it as Legacy. It worked and only the few new snapshots were sent. I then try to modify that replication task to SSH but the bug prevented me from doing it. I deleted and re-created the replication task as SSH. Only then did the replication got it right.

So the process ended up being :
--Update each server to 11.3 U5
--Create a new SSH connection between the two servers
--Re-create the snapshots tasks so they turn under the new mode
--Be sure that not 2 snapshots are time marked at the same moment. I had some tasks that created 2 snapshots at the same moment with different retention period. It was fine so far but it is not anymore. I fixed that when re-scheduling my snapshots here.
--Let the old Legacy replication task propagate at least one of the new snapshots
--Once at least one non-Legacy snapshot has been replicated, delete the Legacy replication task
--Re-create a new replication task using the new SSH Connection profile and the new snapshot schedules

Then the setup will be running using the new, updated way : no more Legacy in the config, at least for these parts (SSH Connection ; Snapshot task ; Replication task).

Because that ended up being a little more changes than expected, I am not sure if I will go with TrueNAS next or if I keep waiting... In all cases, to have remove these Legacy things should help for the next migration to TrueNAS. To know that bug NAS-10816 is fixed there is another plus... Will see how confident I will be after a week...

Hope this can be of any help to someone else,
 
Last edited:

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Oups... Direct link to the bug does not work.... Just search for the bug ID 108160 in IX' Jira and you will find it...

@JoshDW19, any reason why a direct link to IX's Jira does not work ??
 
Top