Replication failing and requiring manual intervention

izomiac · Dec 18, 2021

I'm currently using TrueNAS-22.02-RC.1-1 (SCALE) on two machines, with the hardware details in my signature. Lately, I've had a couple snapshots that TrueNAS is struggling to replicate. I've been getting the following error:

[2021/12/18 12:45:04] ERROR [replication_task__task_3] [zettarepl.replication.run] For task 'task_3' non-recoverable replication error ReplicationError('Last full ZFS replication failed to transfer all the children of the snapshot Pool/Iona@auto-2021-12-12_00-15. The snapshot Pool/Iona/Backup@auto-2021-12-12_00-15 was not transferred. Please run `zfs destroy Pool/Iona@auto-2021-12-12_00-15` on the target system and run replication again.')

Snapshot details:

Code:

root@iona:/mnt/Pool/Iona/Home/izomiac# zfs list -t snapshot | grep Iona | grep auto-2021-12-12_00-15
Pool/Iona@auto-2021-12-12_00-15                                                                                                                              136K      -      461K  -
Pool/Iona/Backup@auto-2021-12-12_00-15                                                                                                                       300M      -     2.05T  -
Pool/Iona/Home@auto-2021-12-12_00-15                                                                                                                           0B      -      341K  -
Pool/Iona/Home/izomiac@auto-2021-12-12_00-15                                                                                                                 188K      -      149G  -
Pool/Iona/Media@auto-2021-12-12_00-15                                                                                                                       55.6M      -     9.57T  -
Pool/Iona/Media-Private@auto-2021-12-12_00-15                                                                                                               67.2M      -     1.04T  -
Pool/Iona/Working@auto-2021-12-12_00-15                                                                                                                      256K      -     66.2G  -
root@iona:/mnt/Pool/Iona/Home/izomiac# zfs send -nvRw -I Pool/Iona@auto-2021-12-05_00-15 Pool/Iona@auto-2021-12-12_00-15
[116 lines clipped referring to snapshots taken after 2021-12-12]
total estimated size is 54.4G

Replication Task:

This is the fifth time I've had to manually destroy that incomplete snapshot, and I had to destroy the monthly 2021-12-01 snapshot twice as well. The weekly 2021-12-05 snapshot succeeded without any intervention. I'm not sure how recently this became an issue, since I had to do some major hardware/software work on Takao and replication was limited to my most essential files until I had a chance to physically go there and fix/upgrade it in November. I get a similar error on the replication task going the other way, but less frequently since those snapshots tend to be smaller. My internet connection isn't great (LTE), so interruptions are inevitable and I can't spare the bandwidth to transfer every weekly snapshot multiple times (50 GB x 5 failures per week = up to 1 TB extra). I'd also love for my replications to be automatic rather than requiring daily intervention. Any suggestions?

NugentS · Dec 18, 2021

There is a JIRA ticket that I opened on a similar issue.

https://jira.ixsystems.com/browse/NAS-113258

There is a fix due in RC.2 if I read Jira correctly

izomiac · Dec 18, 2021

Excellent, I'll eagerly await the next release then. Too bad I missed your ticket, the only hit on Google for the error message is this unanswered post, which seems like it's probably a different issue. OTOH, I can't exactly complain about not writing down the error message for Jira before fixing the issue, I've certainly done that a time or two, lol.

TheNiTz · Apr 7, 2022

Same issue, running the latest version as of 4/7/2022 TrueNAS-SCALE-22.02.0.1. If I run it manually it Succeeds if Automatic it fails after the 1st time

Log Path

/var/log/jobs/3467.log

Log Excerpt

[2022/04/07 13:00:08] INFO [replication_task__task_3] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: [] [2022/04/07 13:00:08] ERROR [replication_task__task_3] [zettarepl.replication.run] For task 'task_3' non-recoverable replication error ReplicationError('Last full ZFS replication failed to transfer all the children of the snapshot SDDs@auto-2022-04-07_13-00. The snapshot HDDs/SSDreplica/ix-applications/docker/6c8ca07846c178d3f005420ec2d66f490a525702debf2ef37eae978f53e4bcf1@auto-2022-04-07_13-00 was not transferred. Please run `zfs destroy -r HDDs/SSDreplica@auto-2022-04-07_13-00` on the target system and run replication again.')

Error

[EFAULT] Last full ZFS replication failed to transfer all the children of the snapshot SDDs@auto-2022-04-07_13-00. The snapshot HDDs/SSDreplica/ix-applications/docker/6c8ca07846c178d3f005420ec2d66f490a525702debf2ef37eae978f53e4bcf1@auto-2022-04-07_13-00 was not transferred. Please run `zfs destroy -r HDDs/SSDreplica@auto-2022-04-07_13-00` on the target system and run replication again.

TheNiTz · Apr 18, 2022

Solved - removed ix-application folder from the source. was able to complete by itself every time

Important Announcement for the TrueNAS Community.

Replication failing and requiring manual intervention

izomiac

Dabbler

NugentS

MVP

izomiac

Dabbler

TheNiTz

Cadet

Log Path

Log Excerpt

Error

TheNiTz

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Replication failing and requiring manual intervention

izomiac

Dabbler

NugentS

MVP

izomiac

Dabbler

TheNiTz

Cadet

Log Path​

Log Excerpt​

Error​

TheNiTz

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replication failing and requiring manual intervention"

Similar threads

Log Path

Log Excerpt

Error