Patrick_3000
Contributor
- Joined
- Apr 28, 2021
- Messages
- 167
I have a dataset on a SCALE server with eight child datasets. Let's call it Dataset1. It's approximately 5 TB. I have another dataset which is identical, including the child datasets, on another SCALE server. Let's call it Dataset2.
I use rsync over ssh to keep Dataset2 identical to Dataset1. Unfortunately, it requires eight rsync tasks to sync all the child datasets. I'd prefer to use replication, but I've had some problems with it in the past, and I'm wondering if there is a way to resolve those problems and start using replication again.
The problem I had with replication was that the first SCALE server (the source) had to be taken offline for a week, and during that time, all the snapshots expired on the second scale server (the destination) and were automatically destroyed as they were past their lifetime. In retrospect, I should have established longer snapshot lifetimes. But in any case, after the first SCALE server was back online, there were no more snapshots on the second SCALE server, so, at least as I understood it, I would have had to run the replication task from scratch and transfer the entire 5 TB dataset, which would have been a lot of I/O on the hard drives. So, I abandoned replication and switched to rsync.
So here is the main thing I'm wondering: does anyone know if there is a way to resume replication without transferring the entire 5 TBs of data? The datasets are identical anyway. Maybe there is a way through the command line to just transfer snapshots so there is a common snapshot between the datasets?
I also realize that I could just replicate from scratch and transfer the entire 5 TB, but the problem is that if I have to do that this time, it tells me that I might have to do it in the future if something else goes wrong, in which case I'm not sure that replication is the best way to sync the datasets, and maybe I'm better off staying with rsync.
I use rsync over ssh to keep Dataset2 identical to Dataset1. Unfortunately, it requires eight rsync tasks to sync all the child datasets. I'd prefer to use replication, but I've had some problems with it in the past, and I'm wondering if there is a way to resolve those problems and start using replication again.
The problem I had with replication was that the first SCALE server (the source) had to be taken offline for a week, and during that time, all the snapshots expired on the second scale server (the destination) and were automatically destroyed as they were past their lifetime. In retrospect, I should have established longer snapshot lifetimes. But in any case, after the first SCALE server was back online, there were no more snapshots on the second SCALE server, so, at least as I understood it, I would have had to run the replication task from scratch and transfer the entire 5 TB dataset, which would have been a lot of I/O on the hard drives. So, I abandoned replication and switched to rsync.
So here is the main thing I'm wondering: does anyone know if there is a way to resume replication without transferring the entire 5 TBs of data? The datasets are identical anyway. Maybe there is a way through the command line to just transfer snapshots so there is a common snapshot between the datasets?
I also realize that I could just replicate from scratch and transfer the entire 5 TB, but the problem is that if I have to do that this time, it tells me that I might have to do it in the future if something else goes wrong, in which case I'm not sure that replication is the best way to sync the datasets, and maybe I'm better off staying with rsync.
Last edited: