Replication Task Skipping all Snapshots

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
@Apollo and @winnielinnie you both have guessed it correct.

I deleted the snaphsot 2024-03-01_00-00 ran scrub now its showing error on 2024-03-02_00-00. The problem is the data is of multiple files and I will be losing all the snapshots. Is there a way to delete specific items in the snapshot?

I have fixed the main file DOMINION_SQUARE1.db1 using the native application, but I need to figure out how to use grep on snapshots.
grep is just a program that run on top of another text generating executable. It performs a search on strings/characters.

what you can do is something like that:

zfs diff ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00 ServerPool/MasterDataset/Projects@auto-2024-03-02_00-00 | grep -i "DOMINION_SQUARE1.db1"

Can you provide more details on how you are using the data on this particular dataset? Are you copying file from PC to the dataset, or the data is always sourced from the dataset.
If the files are stored and modified on the dataset such as being accessed by a jail, then it may be impossible to recover the data after the corruption, because the problem could be compounded, unless the file is located in RAM and the corruption only occurred on the pool, but that is maybe wishthinking.

If we know your workflow, then we might come up with a better alternative which isn't going to rely on destroying your snapshots and the files it contains.

Some of the questions I have is about the current state of the files? At this point, we know there is corruption of some of the blocks. What is the state of the files located on the dataset?
If you know which the files on the dataset are corrupted and the one that are not, then you will be able to act accordingly. If the data is available elsewhere and you can replaced the corrupted ones, then deleting the snapshots after
ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00
shouldn't be too much of a problem.
However, if your files cannot be recovered and you want to mitigate disaster, then you can still delete the snapshot from
ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00
However, the files on the dataset would be most likely corrupted, unless if by magic the corruption occurred in a snapshot that reference part of a file which was overwritten by your application.

I see 2 approach to handling the replication issue and resolving the errors seen at scrub:
- Rollback to
ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00
and loose any changes since.

- Delete all failing snapshots after
ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00
which are pointing to corrupted blocks and then trying to manage restoring some of the files from you PC, but this could also leave some inconsistent states (ie database pointing at files which are no longer there...)

As I have stated, we need to know more about how those files are being used.

There is an alternative approach which you might consider which wouldn't involve destroying the snapshots and loosing the files it contains.
That would be to create "clones" of the snapshots and extract the files but that may not be too helpfool, unless you want to recover files before the corruption occurred, but the state of the files would be from the date when the snapshot was taken (if modification of the file did occur then).

I have this question: What do you know of the state of the files held in the dataset? The files in the dataset as seen by the jail, SMB... are live. If you can tell if corruption has occurred or not, it could prove useful. Yet again, it is specific to your use case.

If you were to take a snapshot now and delete all the snapshots in between taken after
ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00
Assuming the next one could be
ServerPool/MasterDataset/Projects@auto-2024-03-03_00-00
with a command similar to this one with the last snapshot taken after the one creted "now" above:
zfs destroy ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00%auto-2024-03-015_00-00
you could check if the error is still showing up. If no errors exist, then you would be out of the wood. However if error is still there, it would imply the error would still be pointing by the latest snapshot, but then again, zfs would be screaming with the live data, so I am not entirely sure.


What we really need to do here is to perform the
zfs diff ... snap1 snap2 > diff_file.txt
With snap1 and snap2 corresponding to the successive snapshots.
such as:

zfs diff ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00 ServerPool/MasterDataset/Projects@auto-2024-03-03_00-00 > diff_file.txt

zfs diff ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00 ServerPool/MasterDataset/Projects@auto-2024-03-03_00-00 > diff_file.txt

zfs diff ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00 ServerPool/MasterDataset/Projects@auto-2024-03-03_00-00 > diff_file.txt

And going through he file to see which files were modified.

You can also do the same with:
zfs diff ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00 > diff_file_from_snap.txt
which should return any differences of the live data on the dataset since the time of the snapshot was taken ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00.
Files added and destroyed with this time frame will not show up.

I would say this could be enough to come up with our next strategy.
 
Last edited:

urfrndsandy

Dabbler
Joined
May 30, 2023
Messages
32
grep is just a program that run on top of another text generating executable. It performs a search on strings/characters.

what you can do is something like that:



Can you provide more details on how you are using the data on this particular dataset? Are you copying file from PC to the dataset, or the data is always sourced from the dataset.
If the files are stored and modified on the dataset such as being accessed by a jail, then it may be impossible to recover the data after the corruption, because the problem could be compounded, unless the file is located in RAM and the corruption only occurred on the pool, but that is maybe wishthinking.

If we know your workflow, then we might come up with a better alternative which isn't going to rely on destroying your snapshots and the files it contains.

Some of the questions I have is about the current state of the files? At this point, we know there is corruption of some of the blocks. What is the state of the files located on the dataset?
If you know which the files on the dataset are corrupted and the one that are not, then you will be able to act accordingly. If the data is available elsewhere and you can replaced the corrupted ones, then deleting the snapshots after

shouldn't be too much of a problem.
However, if your files cannot be recovered and you want to mitigate disaster, then you can still delete the snapshot from

However, the files on the dataset would be most likely corrupted, unless if by magic the corruption occurred in a snapshot that reference part of a file which was overwritten by your application.

I see 2 approach to handling the replication issue and resolving the errors seen at scrub:
- Rollback to

and loose any changes since.

- Delete all failing snapshots after

which are pointing to corrupted blocks and then trying to manage restoring some of the files from you PC, but this could also leave some inconsistent states (ie database pointing at files which are no longer there...)

As I have stated, we need to know more about how those files are being used.

There is an alternative approach which you might consider which wouldn't involve destroying the snapshots and loosing the files it contains.
That would be to create "clones" of the snapshots and extract the files but that may not be too helpfool, unless you want to recover files before the corruption occurred, but the state of the files would be from the date when the snapshot was taken (if modification of the file did occur then).

I have this question: What do you know of the state of the files held in the dataset? The files in the dataset as seen by the jail, SMB... are live. If you can tell if corruption has occurred or not, it could prove useful. Yet again, it is specific to your use case.

If you were to take a snapshot now and delete all the snapshots in between taken after

Assuming the next one could be

with a command similar to this one with the last snapshot taken after the one creted "now" above:

you could check if the error is still showing up. If no errors exist, then you would be out of the wood. However if error is still there, it would imply the error would still be pointing by the latest snapshot, but then again, zfs would be screaming with the live data, so I am not entirely sure.


What we really need to do here is to perform the

With snap1 and snap2 corresponding to the successive snapshots.
such as:







And going through he file to see which files were modified.

You can also do the same with:

which should return any differences of the live data on the dataset since the time of the snapshot was taken ServerPool/MasterDataset/Projects@auto-2024-03-01_00-00.
Files added and destroyed with this time frame will not show up.

I would say this could be enough to come up with our next strategy.
@Apollo I guess what has happened is there was a file corruption in the live file sytem but it was still working in the native application. These files are accessed through pc and saved directly to dataset. I had set up failure notice if the replication to backup system fails. When i got the error it took few days until we found the error on the particular db1 file, this file is working correctly outside the truenas environment.

There is a difference of 17 days from date of file corruption to recovery so I guess I may have to delete all these snapshots, its difficult but I may have to live with it.

Can you help me with a syntax to zfs send specific snapshot to backup truenas machine using ssh or other methods so that I can identify the corrupt snapshots which fail and delete only those. Actually these snapshots are our life, we dont roll back but we use windows previous version option, copy these to local and use these to compare our data.
 
Joined
Oct 22, 2019
Messages
3,641
there was a file corruption in the live file sytem but it was still working in the native application.
Does your TrueNAS server use ECC RAM? Are you using an HBA or are your drives independently connected?


Can you help me with a syntax to zfs send specific snapshot to backup truenas machine using ssh or other methods so that I can identify the corrupt snapshots which fail and delete only those.
Such an attempt will fail if there exists corruption of data or metadata.

If you still want to try, you can try something like this:
Code:
zfs send -v -R mypool/dataset@latest-good-snap | zfs recv -v -s backup/dataset
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
@Apollo I guess what has happened is there was a file corruption in the live file sytem but it was still working in the native application. These files are accessed through pc and saved directly to dataset. I had set up failure notice if the replication to backup system fails. When i got the error it took few days until we found the error on the particular db1 file, this file is working correctly outside the truenas environment.

There is a difference of 17 days from date of file corruption to recovery so I guess I may have to delete all these snapshots, its difficult but I may have to live with it.

Can you help me with a syntax to zfs send specific snapshot to backup truenas machine using ssh or other methods so that I can identify the corrupt snapshots which fail and delete only those. Actually these snapshots are our life, we dont roll back but we use windows previous version option, copy these to local and use these to compare our data.

@Apollo I guess what has happened is there was a file corruption in the live file sytem but it was still working in the native application. These files are accessed through pc and saved directly to dataset. I had set up failure notice if the replication to backup system fails. When i got the error it took few days until we found the error on the particular db1 file, this file is working correctly outside the truenas environment.

There is a difference of 17 days from date of file corruption to recovery so I guess I may have to delete all these snapshots, its difficult but I may have to live with it.

Can you help me with a syntax to zfs send specific snapshot to backup truenas machine using ssh or other methods so that I can identify the corrupt snapshots which fail and delete only those. Actually these snapshots are our life, we dont roll back but we use windows previous version option, copy these to local and use these to compare our data.

I have created this series of tables with different scenario in order to understand what is the role and capability of snapshots.

I cannot render the tables with proper spacing, so I am including a .odt version using Consolas font.

| snapshot 1 | snapshot 2 | snapshot 3 | snapshot 4 | snapshot 5 | snapshot 6 | snapshot 7 |

---|------------|---------------------|--------------------|------------------------|---------------------------------|--------------------|------------------|

A | file A | file A (modified) | file A (modified) | file A (corrupted) | file A (overwriten from backup) | file A (modified) | file A (current) |

B | file B | file B (untouched) | file B (untouched) | file B (untouched) | file B (untouched) | file B (untouched) | file B (current) |

C | file C | file C (deleted) | | | | | |

D | | file D (added) | file D (untouched) | file D (modified) | file D (deleted) | | |

E | | | | file E (added) | file E (untouched) | file E (untouched) | file E (current) |

---|------------|---------------------|--------------------|------------------------|---------------------------------|--------------------|------------------|




Scenario 1) - Rollback of "Snapshot 1":

| snapshot 1 |

---|------------|

A | file A |

B | file B |

C | file C |

D | |

E | |

---|------------|



- Everything that exist after snapshot 1 is destroyed. "snapshot 2" through "snapshot 7" are destroyed. Every files, directory held bt any of the snapshot that have been destroyed is lost.

Only content held by the snapshot 1 is restored.

What we have is "file A", "file B" and "file C" with the same contnent and state as at the time "snapshot 1" was taken.



Scenario 2) - Destroy "snapshot 1" through "snapshot 6" (only "snapshot 7" remains):



| snapshot 7 |

---|-------------------|

A | file A (current) |

B | file B (current) |

C | |

D | |

E | file E (current) |

---|-------------------|



- "file A (current)", "file B (current)", "file E (current)" remain untouched in the dataset and are as the state indicate "current".



"file A (current)": Mofified version of the file prior to "snapshot 7" is lost and current file A remain.



"file B (current)": Because "file B" hasn't been touched (ie neither mofified nor deleted). the content and state of file B (current ) is the same as it was at the time "snapshot 1" was taken. So no loss of information by not having intermediary snapshots.



"file E (current)" is preserved by "snapshot 7".



Scenario 3) - Destroy "snapshot 2" through "snapshot 5":



| snapshot 1 | snapshot 6 | snapshot 7 |

---|------------|--------------------|------------------|

A | file A | file A (modified) | file A (current) |

B | file B | file B (untouched) | file B (current) |

C | file C | | |

D | | | |

E | | file E (untouched) | file E (current) |

---|------------|--------------------|------------------|



- "file A (current)", "file B (current)", "file E (current)" remain untouched in the dataset and are as the state indicate "current".

"file C" was deleted in "snapshot 2" so it won't show up in the dataset under "snapshot 7". However, it is still possible to recover "file C" using clone of "snapshot 1" which allow access to the files. "file C" can be copyied back to the the dataset in current tiem. This is what is called restoring by cloning of snapshot.





scenario 4): deleting corrupted "snapshot 4" (assuming snapshot 4 has corrupted blocks affecting added or modified files referenced by the snapshot):

| snapshot 1 | snapshot 2 | snapshot 3 | snapshot 5 | snapshot 6 | snapshot 7 |

---|------------|---------------------|--------------------|---------------------------------|--------------------|------------------|

A | file A | file A (modified) | file A (modified) | file A (overwriten from backup) | file A (modified) | file A (current) |

B | file B | file B (untouched) | file B (untouched) | file B (untouched) | file B (untouched) | file B (current) |

C | file C | file C (deleted) | | | | |

D | | file D (added) | file D (untouched) | file D (deleted) | | |

E | | | | file E (untouched) | file E (untouched) | file E (current) |

---|------------|---------------------|--------------------|---------------------------------|--------------------|------------------|

- Hopefuly, by only deleting the snapshot that holds corrupted data, we would resolve the replication related issue.

"file A (current)", "file B (current)", "file E (current)" remain untouched and are current with "snapshot 7".

"file A (current)" is no longer the same as "file A" from earlier snapshots (which is expected) but we still retain some level of history and "file A" can be recovered by cloning the relevant snapshot.


scenario 5): destroying "snapshot 1" through "snapshot 7":

| |

---|--------|

A | ????? |

B | ????? |

C | |

D | |

E | ????? |

---|--------|


I am not entirely sure what would be the state of the files. I have had experience when all the files would be deleted if snapshots no longer existed (at least on datasets that have been replicated).

It is possible the files would otherwise remain at the current state and still be present as would have been the case when "snapshot 7" was taken.

Because of it, it is best to take a new snapshot that would be most recent to "snapshot 7".

In this case, we have a safety net and garanty "file A (current)", "file B (current)" and "file E (current)" will remain unaffected.



scenarion 6): Destroying snapshots containing corrupted block and mitigate risk.



Before:

Create necessary snapshots to help preserve as much history as possible and preserve valid files with curent state while deleting corrupted files and replcaing them with a suitable backup if possible.

If snapshot with corrupted block originate from "snapshot 5" and the current file is still using block that are corrupted, the corrupted file must be deleted and a new snapshot created after deletion will allow zfs to free the blocks when snapshots are destroyed.

| snapshot 1 | snapshot 2 | snapshot 3 | snapshot 4 | snapshot 5 | snapshot 6 | snapshot 7 | snapshot 8

---|------------|---------------------|--------------------|------------------------|---------------------------------|-----------------------------|--------------------------------|------------------------------|

A | file A | file A (modified) | file A (modified) | file A (corrupted) | file A (corrupted earlier) | file A (corrupted earlier) | file A (deleted) | file A (copyied from backup) |

B | file B | file B (untouched) | file B (untouched) | file B (untouched) | file B (untouched) | file B (untouched) | file B (current) | file B (untouched) |

C | file C | file C (deleted) | | | | | | |

D | | file D (added) | file D (untouched) | file D (modified) | file D (deleted) | | | |

E | | | | file E (added) | file E (untouched) | file E (untouched) | file E (current) | file E (untouched) |

---|------------|---------------------|--------------------|------------------------|---------------------------------|-----------------------------|--------------------------------|------------------------------|



After:



| snapshot 1 | snapshot 2 | snapshot 3 | snapshot 8

---|------------|---------------------|--------------------|------------------------------|

A | file A | file A (modified) | file A (modified) | file A (copyied from backup) |

B | file B | file B (untouched) | file B (untouched) | file B (untouched) |

C | file C | file C (deleted) | | |

D | | file D (added) | file D (untouched) | |

E | | | | file E (untouched) |

---|------------|---------------------|--------------------|------------------------------|

Being able to run zfs diff as I have mentioned earlier should help give you clues about which on the scenario best fit your use case.


I came across this option as a possible candidate, but I have absolutely no experience about it:
Link to reddit post:
https://www.reddit.com/r/zfs/comments/ud1u5q/force_send_a_corrupt_snapshot/
On FreeBSD: sysctl vfs.zfs.send.corrupt_data=1
 

Attachments

  • Snapshots.odt
    41.3 KB · Views: 23

urfrndsandy

Dabbler
Joined
May 30, 2023
Messages
32
Does your TrueNAS server use ECC RAM? Are you using an HBA or are your drives independently connected?



Such an attempt will fail if there exists corruption of data or metadata.

If you still want to try, you can try something like this:
Code:
zfs send -v -R mypool/dataset@latest-good-snap | zfs recv -v -s backup/dataset
@winnielinnie I have HBA and ECC ram on "hp dl380" machine that is unfortunately the backup machine now. The live machine is running on a VM as this gives me the transfer performance, I know I shouldnt have trusted VM but never thought would end up with this problem.

Now, below is my syntax, but I need to run it from the backup system and use PULL and and also use SSH, can you please help me with this.

zfs send -v -R ServerPool/MasterDataset/Projects@auto-2024-03-18_23-00 | zfs recv -v -s ServerPool/MasterDataset/Projects
 

urfrndsandy

Dabbler
Joined
May 30, 2023
Messages
32
I have created this series of tables with different scenario in order to understand what is the role and capability of snapshots.

I cannot render the tables with proper spacing, so I am including a .odt version using Consolas font.






Scenario 1) - Rollback of "Snapshot 1":





- Everything that exist after snapshot 1 is destroyed. "snapshot 2" through "snapshot 7" are destroyed. Every files, directory held bt any of the snapshot that have been destroyed is lost.

Only content held by the snapshot 1 is restored.

What we have is "file A", "file B" and "file C" with the same contnent and state as at the time "snapshot 1" was taken.



Scenario 2) - Destroy "snapshot 1" through "snapshot 6" (only "snapshot 7" remains):







- "file A (current)", "file B (current)", "file E (current)" remain untouched in the dataset and are as the state indicate "current".



"file A (current)": Mofified version of the file prior to "snapshot 7" is lost and current file A remain.



"file B (current)": Because "file B" hasn't been touched (ie neither mofified nor deleted). the content and state of file B (current ) is the same as it was at the time "snapshot 1" was taken. So no loss of information by not having intermediary snapshots.



"file E (current)" is preserved by "snapshot 7".



Scenario 3) - Destroy "snapshot 2" through "snapshot 5":







- "file A (current)", "file B (current)", "file E (current)" remain untouched in the dataset and are as the state indicate "current".

"file C" was deleted in "snapshot 2" so it won't show up in the dataset under "snapshot 7". However, it is still possible to recover "file C" using clone of "snapshot 1" which allow access to the files. "file C" can be copyied back to the the dataset in current tiem. This is what is called restoring by cloning of snapshot.





scenario 4): deleting corrupted "snapshot 4" (assuming snapshot 4 has corrupted blocks affecting added or modified files referenced by the snapshot):



- Hopefuly, by only deleting the snapshot that holds corrupted data, we would resolve the replication related issue.

"file A (current)", "file B (current)", "file E (current)" remain untouched and are current with "snapshot 7".

"file A (current)" is no longer the same as "file A" from earlier snapshots (which is expected) but we still retain some level of history and "file A" can be recovered by cloning the relevant snapshot.


scenario 5): destroying "snapshot 1" through "snapshot 7":




I am not entirely sure what would be the state of the files. I have had experience when all the files would be deleted if snapshots no longer existed (at least on datasets that have been replicated).

It is possible the files would otherwise remain at the current state and still be present as would have been the case when "snapshot 7" was taken.

Because of it, it is best to take a new snapshot that would be most recent to "snapshot 7".

In this case, we have a safety net and garanty "file A (current)", "file B (current)" and "file E (current)" will remain unaffected.



scenarion 6): Destroying snapshots containing corrupted block and mitigate risk.



Before:

Create necessary snapshots to help preserve as much history as possible and preserve valid files with curent state while deleting corrupted files and replcaing them with a suitable backup if possible.

If snapshot with corrupted block originate from "snapshot 5" and the current file is still using block that are corrupted, the corrupted file must be deleted and a new snapshot created after deletion will allow zfs to free the blocks when snapshots are destroyed.





After:





Being able to run zfs diff as I have mentioned earlier should help give you clues about which on the scenario best fit your use case.


I came across this option as a possible candidate, but I have absolutely no experience about it:
Link to reddit post:
https://www.reddit.com/r/zfs/comments/ud1u5q/force_send_a_corrupt_snapshot/
@Apollo I will try to delete all the snapshots that have errors. But I want to backup as many snapshots as possible, but unable to figure out zfs send.

I am trying this
zfs send -v -R ServerPool/MasterDataset/Projects@auto-2024-03-18_23-00 | NewSSH1 zfs recv -v -s ServerPool/MasterDataset/Projects

But it doesnt seem to work
 
Joined
Oct 22, 2019
Messages
3,641
The live machine is running on a VM
Oh my.


and also use SSH
You'd just pipe the send through SSH for the recv. If you're logged into the "source" server, you send from source to backup over SSH.

Code:
zfs send | ssh root@ip.add.re.ss zfs recv


You need a working keypair though, and root's ability to login via SSH. (The GUI uses its own generated SSH key, from what I understand. It's one of many things that are streamlined.)
 

urfrndsandy

Dabbler
Joined
May 30, 2023
Messages
32
Oh my.



You'd just pipe the send through SSH for the recv. If you're logged into the "source" server, you send from source to backup over SSH.

Code:
zfs send | ssh root@ip.add.re.ss zfs recv


You need a working keypair though, and root's ability to login via SSH. (The GUI uses its own generated SSH key, from what I understand. It's one of many things that are streamlined.)
I was able to ssh , but finally got the below error saying Broken pipe.

warning: cannot send 'ServerPool/MasterDataset/Projects@auto-2024-03-18_17-00':Broken pipe
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Did you use ssh with password authentication? That does not work for long running tasks, because ssh will force a reauthentication after some (don't remember atm, sorry) timeout, and password authentication is not available while your replication stream is running. You must use public key authentication.
 

urfrndsandy

Dabbler
Joined
May 30, 2023
Messages
32
Did you use ssh with password authentication? That does not work for long running tasks, because ssh will force a reauthentication after some (don't remember atm, sorry) timeout, and password authentication is not available while your replication stream is running. You must use public key authentication.
@Patrick M. Hausen

Is this the syntax for ssh connection?

zfs send -v -R ServerPool/MasterDataset/Projects@auto-2024-03-18_23-00 | NewSSH1 zfs recv -v -s ServerPool/MasterDataset/Projects
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
@Patrick M. Hausen

Is this the syntax for ssh connection?

zfs send -v -R ServerPool/MasterDataset/Projects@auto-2024-03-18_23-00 | NewSSH1 zfs recv -v -s ServerPool/MasterDataset/Projects
What is NewSSH1?

Code:
zfs send -v -R ServerPool/MasterDataset/Projects@auto-2024-03-18_23-00 | ssh ip-of-nas zfs recv -v -s ServerPool/MasterDataset/Projects

should work
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
I was able to ssh , but finally got the below error saying Broken pipe.
"Broken pipe" can be caused by many error conditions (see my notes earlier on this post.)

you can add extra "v" in the command to get more verbose information, something like:
I am actually trying to replicate to a different dataset location, this way, we don't have to change what you already have. You can modify this at a later time.
zfs send -vv -R ServerPool/MasterDataset/Projects@auto-2024-03-18_23-00 | ssh -i /data/ssh/replication root@remote_IP zfs receive -vv -F ServerPool/MasterDataset/Projects_backup;
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
"Broken pipe" can be caused by many error conditions (see my notes earlier on this post.)
Exactly. Possibly you cannot even establish an SSH connection. Try the SSH command without any ZFS specifics and check if you can login at all. Also as I wrote you must use private/public key authentication.
 

urfrndsandy

Dabbler
Joined
May 30, 2023
Messages
32
"Broken pipe" can be caused by many error conditions (see my notes earlier on this post.)

you can add extra "v" in the command to get more verbose information, something like:
I am actually trying to replicate to a different dataset location, this way, we don't have to change what you already have. You can modify this at a later time.
@Apollo this syntax is perfect, but when I run this the replication starts from the begin of snapshots. I mean it start replication of all snapshots not the only one specified. what could I be missing.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
@Apollo this syntax is perfect, but when I run this the replication starts from the begin of snapshots. I mean it start replication of all snapshots not the only one specified. what could I be missing.
Nothing wrong with the behavior.
As it is, the replication is recursive, not incremental.
If this is a problem, why not using the replication task instead?
You will most likely end up with the failed replication anyway, at least until the corrupted data is resolved.

For now, I would say let it run its course, unless you think this is a waste.

If you want incremental replication and not use the replication task, you will most likely have incomplete replication.
Otherwise, if you want to take where the replication task stopped, you need the list of the snapshots at the source and ast the destination and create the relevant command using those snapshots as reference, such as:

zfs send -vv -R -I ServerPool/MasterDataset/Projects@last_snapshot_on_remote_end ServerPool/MasterDataset/Projects@latest_snapshot_on_source | ssh -i /data/ssh/replication root@remote_IP zfs receive -vv -F ServerPool/MasterDataset/Projects
 
Top