Patrick_3000
Contributor
- Joined
- Apr 28, 2021
- Messages
- 167
I have a primary server and a backup server, both with SCALE installed. A few months ago, I switched from rsync to replication to back up the primary server to the backup server every hour. I'd prefer to use replication because there is a top-level dataset with 8 child datasets, and replication can handle it in one task whereas rsync requires 8 tasks, and configuring and managing 8 tasks in the SCALE UI is doable but cumbersome.
The problem is that recently I had to shut down the primary server for two days because the CPU fan failed and I had to order and install a replacement. During that time, all the snapshots used for replication expired, and when I got the primary server going again, the replication task failed since there were no snapshots on the backup server. No matter what I tried, including allowing it to replicate from scratch, there was no way to run the replication task. The only thing I could have done was destroy the top-level dataset on the backup server and recreate it, and then I presumably could have run the replication task like it was happening for the first time, with a full data transfer of several Terabytes.
But this points to a major flaw with replication. It appears that if the source server for the replication is down for a while and all the snapshots used for replication expire, there is no way to run the replication task once the source server is back up.
Consequently, I have stopped using replicate to back up the primary server to the backup server and have returned to using 8 rsync tasks, one for each child dataset, as cumbersome as that is. At least this way I won't run into any problems if I have more downtime in the future. I use rsync over ssh, not with modules, so it's not going away in Cobia as I understand it.
The bottom line, however, is this: does anyone know how to set up a replication task that's robust enough to be usable if the source server is shut down temporarily and the snapshots expire?
The problem is that recently I had to shut down the primary server for two days because the CPU fan failed and I had to order and install a replacement. During that time, all the snapshots used for replication expired, and when I got the primary server going again, the replication task failed since there were no snapshots on the backup server. No matter what I tried, including allowing it to replicate from scratch, there was no way to run the replication task. The only thing I could have done was destroy the top-level dataset on the backup server and recreate it, and then I presumably could have run the replication task like it was happening for the first time, with a full data transfer of several Terabytes.
But this points to a major flaw with replication. It appears that if the source server for the replication is down for a while and all the snapshots used for replication expire, there is no way to run the replication task once the source server is back up.
Consequently, I have stopped using replicate to back up the primary server to the backup server and have returned to using 8 rsync tasks, one for each child dataset, as cumbersome as that is. At least this way I won't run into any problems if I have more downtime in the future. I use rsync over ssh, not with modules, so it's not going away in Cobia as I understand it.
The bottom line, however, is this: does anyone know how to set up a replication task that's robust enough to be usable if the source server is shut down temporarily and the snapshots expire?
Last edited: