Replicated snapshopts will not be destroyed automatically

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Hello all,

I setup a second server to act as a target for replication tasks and I just discovered that the replicated snapshots will not expire.

Hardware Info
Source
TrueNAS-SCALE-23.10.1
Supermicro X10SRi-F, Xeon 2640v4, 128 Gb ECC RAM, Seasonic Focus PX-750
Target
TrueNAS-SCALE-23.10.1
Supermicro X10SLL-F, i3 4130, 16 Gb ECC RAM, Seasonic Prime PX-750

I haven't upgraded the pools to the new zfs features.

Source:
1705612461654.png

Destination:
1705612476770.png


Snapshot task on source:
1705612608644.png


Replication task:
1705612706873.png

* I have to note that at first the replication tasks failed, I needed to manually appen the dataset name /dataset for the destination upon creation. I doubt this is the culprit, but I wanted to share that anyway.

Any ideas why the replicated snapshots do not deleted / are marked for deletion?

Thanks in advance!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
That's definitely not intended, and you appear to have the replication job correctly set to have the snapshots inherit their expiry times from the source. Can you file a bug report from the Report a Bug link and include a debug file from both source and destination systems (System -> Advanced -> Save Debug)?
 
Joined
Oct 22, 2019
Messages
3,641
That's definitely not intended
Are you sure?

From what I understand, it's the Source server that dictates pruning, both on itself and over-the-wire on the Destination server.

The screenshots seem to depict @chuck32 is logged into two different servers (Source and Destination). The Destination server wouldn't know about the expiration / pruning policy configured on the Source server. In fact, the Destination could even be a non-TrueNAS ZFS server.

So when the time comes, zettarepl (initiated on the Source side) will prune the expired snapshots over-the-wire on the Destination side.

Isn't that how iXsystems' zettarepl works?


EDIT: I use bold to draw the eyes to the important parts of my post. I swear I'm not "yelling" at anyone. :tongue:
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
That's a good point. Let's see if it actually prunes after the timer expires.

Edit: And it does - the snapshot pruning is only tracked on the machine where the job actually runs. Points to the minty user in the front row.
 
Last edited:

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Thank you both for your replies!

I probably have a lot to learn about snapshopts and will follow up with some more questions, which unraveled here. But first let's answer questions and do it chronologically.

The screenshots seem to depict @chuck32 is logged into two different servers (Source and Destination). The Destination server wouldn't know about the expiration / pruning policy configured on the Source server. In fact, the Destination could even be a non-TrueNAS ZFS server.
Yes, as I laid out in my OP the destination is a separate machine. That's why the screenshots have two color themes (I do not want to mix up which machine I'm working on by accident). But both run the same version of truenas.

So when the time comes, zettarepl (initiated on the Source side) will prune the expired snapshots over-the-wire on the Destination side.
Let's see if it actually prunes after the timer expires.
Edit: And it does - the snapshot pruning is only tracked on the machine where the job actually runs. Points to the minty user in the front row.
How did you know that before me ;)

I setup a test dataset with hourly snapshots and a two hour retention time. I can confirm the snapshots got deleted on the remote machine also. Additionally the snapshots I manually deleted on the source got also destroyed, which is nice.

Code:
[2024/01/20 08:00:00] INFO     [MainThread] [zettarepl.zettarepl] Scheduled tasks: [<Periodic Snapshot Task 'task_17'>]
[2024/01/20 08:00:00] INFO     [MainThread] [zettarepl.snapshot.create] On <Shell(<LocalTransport()>)> creating recursive snapshot ('neptune/test-dataset', 'auto-2024-01-20_08-00')
[2024/01/20 08:00:00] INFO     [MainThread] [zettarepl.zettarepl] Created ('neptune/test-dataset', 'auto-2024-01-20_08-00')
[2024/01/20 08:00:00] INFO     [Thread-1984] [zettarepl.paramiko.replication_task__task_17] Connected (version 2.0, client OpenSSH_9.2p1)
[2024/01/20 08:00:00] INFO     [Thread-1984] [zettarepl.paramiko.replication_task__task_17] Authentication (publickey) successful!
[2024/01/20 08:00:01] INFO     [replication_task__task_17] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: [('jupiter/test-dataset', 'auto-2023-12-22_22-44'), ('jupiter/test-dataset', 'auto-2023-12-23_22-44'), ('jupiter/test-dataset', 'auto-2023-12-24_22-44')]
[2024/01/20 08:00:01] INFO     [replication_task__task_17] [zettarepl.snapshot.destroy] On <Shell(<SSH Transport(admin@192.168.178.143)>)> for dataset 'jupiter/test-dataset' destroying snapshots {'auto-2023-12-23_22-44', 'auto-2023-12-22_22-44', 'auto-2023-12-24_22-44'}
[2024/01/20 08:00:01] INFO     [Thread-1986] [zettarepl.paramiko.retention] Connected (version 2.0, client OpenSSH_9.2p1)
[2024/01/20 08:00:01] INFO     [Thread-1986] [zettarepl.paramiko.retention] Authentication (publickey) successful!
[2024/01/20 08:00:01] INFO     [replication_task__task_17] [zettarepl.replication.run] For replication task 'task_17': doing push from 'neptune/test-dataset' to 'jupiter/test-dataset' of snapshot='auto-2024-01-20_08-00' incremental_base='auto-2024-01-20_04-00' include_intermediate=False receive_resume_token=None encryption=False
[2024/01/20 08:00:01] INFO     [retention] [zettarepl.zettarepl] Retention destroying local snapshots: []
[2024/01/20 08:00:03] INFO     [retention] [zettarepl.zettarepl] Retention on <SSH Transport(admin@192.168.178.143)> destroying snapshots: []
[2024/01/20 08:00:03] INFO     [retention] [zettarepl.zettarepl] Retention on <LocalTransport()> destroying snapshots: []
[2024/01/20 08:00:04] INFO     [retention] [zettarepl.zettarepl] Retention destroying local snapshots: [('neptune/test-dataset', 'auto-2024-01-20_04-00')]
[2024/01/20 08:00:04] INFO     [retention] [zettarepl.snapshot.destroy] On <Shell(<LocalTransport()>)> for dataset 'neptune/test-dataset' destroying snapshots {'auto-2024-01-20_04-00'}
[2024/01/20 08:00:05] INFO     [Thread-1989] [zettarepl.paramiko.retention] Connected (version 2.0, client OpenSSH_9.2p1)
[2024/01/20 08:00:05] INFO     [Thread-1989] [zettarepl.paramiko.retention] Authentication (publickey) successful!
[2024/01/20 08:00:06] INFO     [retention] [zettarepl.zettarepl] Retention on <SSH Transport(admin@192.168.178.143)> destroying snapshots: [('jupiter/test-dataset', 'auto-2024-01-20_04-00')]
[2024/01/20 08:00:06] INFO     [retention] [zettarepl.snapshot.destroy] On <Shell(<SSH Transport(admin@192.168.178.143)>)> for dataset 'jupiter/test-dataset' destroying snapshots {'auto-2024-01-20_04-00'}
[2024/01/20 08:00:06] INFO     [retention] [zettarepl.zettarepl] Retention on <LocalTransport()> destroying snapshots: []


Now onto the questions that arose during my investigation of the log files. Some explanation upfront:

syncthing-data
scanned-documents
are both under /mnt/neptune, they are not childs of each other.

datasetsnapshot schedule replication schedule
syncthing-data00:03 daily12:00 daily
scanned-documents22:47 dailylinked to snapshot

1) For some reason, the snapshot auto-2024-01-20_04-00 got marked for deletion correctly, it even displayed in the GUI that it will be destroyed at 06:00. When I checked after 07:00 it was still there. On the run at 08:00 it finally got destroyed. Even if there was some local / UTC time mix up (I'm ahead one hour) it should have been destroyed at 07:00. There's no mention in the logs about the snapshot at 5, 6 or 7 am.
Curious at least. Any idea for the delay?

2) Then I realized some of the tasks seemed to run at weird times.
Code:
[2024/01/18 22:47:00] INFO     [MainThread] [zettarepl.zettarepl] Scheduled tasks: [<Periodic Snapshot Task 'task_16'>]
[2024/01/18 22:47:00] INFO     [MainThread] [zettarepl.snapshot.create] On <Shell(<LocalTransport()>)> creating recursive snapshot ('neptune/scanned-documents', 'auto-2024-01-18_22-47')
[2024/01/18 22:47:00] INFO     [MainThread] [zettarepl.zettarepl] Created ('neptune/scanned-documents', 'auto-2024-01-18_22-47')
[2024/01/18 22:47:00] INFO     [Thread-1784] [zettarepl.paramiko.replication_task__task_16] Connected (version 2.0, client OpenSSH_9.2p1)
[2024/01/18 22:47:00] INFO     [Thread-1784] [zettarepl.paramiko.replication_task__task_16] Authentication (publickey) successful!
[2024/01/18 22:47:01] INFO     [replication_task__task_16] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: []
[2024/01/18 22:47:01] INFO     [Thread-1786] [zettarepl.paramiko.retention] Connected (version 2.0, client OpenSSH_9.2p1)
[2024/01/18 22:47:01] INFO     [replication_task__task_16] [zettarepl.replication.run] For replication task 'task_16': doing push from 'neptune/scanned-documents' to 'jupiter/scanned-documents' of snapshot='auto-2024-01-18_22-43' incremental_base=None include_intermediate=True receive_resume_token=None encryption=False
[2024/01/18 22:47:01] INFO     [Thread-1786] [zettarepl.paramiko.retention] Authentication (publickey) successful!
[2024/01/18 22:47:01] ERROR    [retention] [zettarepl.replication.task.snapshot_owner] Failed to list snapshots with <Shell(<SSH Transport(admin@192.168.178.143)>)>: ExecException(1, "cannot open 'jupiter/scanned-documents': dataset does not exist\n"). Assuming remote has no snapshots
[2024/01/18 22:47:01] INFO     [retention] [zettarepl.zettarepl] Retention destroying local snapshots: []
[2024/01/18 22:47:03] INFO     [retention] [zettarepl.zettarepl] Retention on <SSH Transport(admin@192.168.178.143)> destroying snapshots: []
[2024/01/18 22:47:03] INFO     [retention] [zettarepl.zettarepl] Retention on <LocalTransport()> destroying snapshots: []
[2024/01/18 22:47:07] INFO     [replication_task__task_16] [zettarepl.replication.run] For replication task 'task_16': doing push from 'neptune/scanned-documents' to 'jupiter/scanned-documents' of snapshot='auto-2024-01-18_22-47' incremental_base='auto-2024-01-18_22-43' include_intermediate=True receive_resume_token=None encryption=False
[2024/01/18 22:47:09] INFO     [retention] [zettarepl.zettarepl] Retention destroying local snapshots: []
[2024/01/18 22:47:10] INFO     [Thread-1789] [zettarepl.paramiko.retention] Connected (version 2.0, client OpenSSH_9.2p1)
[2024/01/18 22:47:10] INFO     [Thread-1789] [zettarepl.paramiko.retention] Authentication (publickey) successful!
[2024/01/18 22:47:11] INFO     [retention] [zettarepl.zettarepl] Retention on <SSH Transport(admin@192.168.178.143)> destroying snapshots: []
[2024/01/18 22:47:11] INFO     [retention] [zettarepl.zettarepl] Retention on <LocalTransport()> destroying snapshots: []
[2024/01/18 22:47:45] DEBUG    [IoThread_20] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:237] Connecting...
[2024/01/18 22:47:45] DEBUG    [IoThread_20] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:237] [async_exec:239] Running ['zfs', 'list', '-t', 'snapshot', '-H', '-o', 'name', '-s', 'name', '-d', '1', 'jupiter/scanned-documents'] with sudo=False
[2024/01/18 22:47:45] DEBUG    [IoThread_20] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:237] [async_exec:239] Reading stdout
[2024/01/18 22:47:45] DEBUG    [IoThread_20] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:237] [async_exec:239] Waiting for exit status
[2024/01/18 22:47:45] DEBUG    [IoThread_20] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:237] [async_exec:239] Success: 'jupiter/scanned-documents@auto-2024-01-18_22-43\njupiter/scanned-documents@auto-2024-01-18_22-47\n'
[2024/01/18 22:48:18] DEBUG    [IoThread_1] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:238] Connecting...
[2024/01/18 22:48:18] DEBUG    [IoThread_1] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:238] [async_exec:240] Running ['zfs', 'list', '-t', 'snapshot', '-H', '-o', 'name', '-s', 'name', '-d', '1', 'jupiter/syncthing-data'] with sudo=False
[2024/01/18 22:48:18] DEBUG    [IoThread_1] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:238] [async_exec:240] Reading stdout
[2024/01/18 22:48:18] DEBUG    [IoThread_1] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:238] [async_exec:240] Waiting for exit status
[2024/01/18 22:48:18] DEBUG    [IoThread_1] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:238] [async_exec:240] Success: 'jupiter/syncthing-data@auto-2023-10-21_21-00\njupiter/syncthing-data@auto-2023-10-22_21-00\njupiter/syncthing-data@auto-2023-10-23_21-00\njupiter/syncthing-data@auto-2023-10-24_21-00\njupiter/syncthing-data@auto-2023-10-25_21-00\njupiter/syncthing-data@auto-2023-10-27_21-00\njupiter/syncthing-data@auto-2023-10-28_21-00\njupiter/syncthing-data@auto-2023-10-29_21-00\njupiter/syncthing-data@auto-2023-10-30_21-00\njupiter/syncthing-data@auto-2023-10-31_21-00\njupiter/syncthing-data@auto-2023-11-01_21-00\njupiter/syncthing-data@auto-2023-11-02_21-00\njupiter/syncthing-data@auto-2023-11-03_21-00\njupiter/syncthing-data@auto-2023-11-04_21-00\njupiter/syncthing-data@auto-2023-11-05_21-00\njupiter/syncthing-data@auto-2023-11-06_21-00\njupiter/syncthing-data@auto-2023-11-07_21-00\njupiter/syncthing-data@auto-2023-11-08_21-00\njupiter/syncthing-data@auto-2023-11-11_21-00\njupiter/syncthing-data@auto-2023-11-13_21-00\njupiter/syncthing-data@auto-2023-11-14_21-00\njupiter/syncthing-data@auto-2023-11-15_21-00\njupiter/syncthing-data@auto-2023-11-16_21-00\njupiter/syncthing-data@auto-2023-11-17_21-00\njupiter/syncthing-data@auto-2023-11-18_21-00\njupiter/syncthing-data@auto-2023-11-19_21-00\njupiter/syncthing-data@auto-2023-11-20_21-00\njupiter/syncthing-data@auto-2023-11-21_21-00\njupiter/syncthing-data@auto-2023-11-22_21-00\njupiter/syncthing-data@auto-2023-11-23_21-00\njupiter/syncthing-data@auto-2023-11-24_21-00\njupiter/syncthing-data@auto-2023-11-25_21-00\njupiter/syncthing-data@auto-2023-11-26_21-00\njupiter/syncthing-data@auto-2023-11-27_00-00\njupiter/syncthing-data@auto-2023-11-28_00-00\njupiter/syncthing-data@auto-2023-11-29_00-00\njupiter/syncthing-data@auto-2023-11-30_00-00\njupiter/syncthing-data@auto-2023-12-01_00-00\njupiter/syncthing-data@auto-2023-12-02_00-00\njupiter/syncthing-data@auto-2023-12-03_00-00\njupiter/syncthing-data@auto-2023-12-04_00-00\njupiter/syncthing-data@auto-2023-12-05_00-00\njupiter/syncthing-data@auto-2023-12-06_00-00\njupiter/syncthing-data@auto-2023-12-07_00-00\njupiter/syncthing-data@auto-2023-12-08_00-00\njupiter/syncthing-data@auto-2023-12-09_00-00\njupiter/syncthing-data@auto-2023-12-10_00-00\njupiter/syncthing-data@auto-2023-12-11_00-00\njupiter/syncthing-data@auto-2023-12-12_00-00\njupiter/syncthing-data@auto-2023-12-13_00-00\njupiter/syncthing-data@auto-2023-12-14_00-00\njupiter/syncthing-data@auto-2023-12-15_00-00\njupiter/syncthing-data@auto-2023-12-16_00-00\njupiter/syncthing-data@auto-2023-12-17_00-00\njupiter/syncthing-data@auto-2023-12-18_00-00\njupiter/syncthing-data@auto-2023-12-19_00-00\njupiter/syncthing-data@auto-2023-12-20_00-00\njupiter/syncthing-data@auto-2023-12-21_00-00\njupiter/syncthing-data@auto-2023-12-22_00-00\njupiter/syncthing-data@auto-2023-12-23_00-00\njupiter/syncthing-data@auto-2023-12-24_00-00\njupiter/syncthing-data@auto-2023-12-25_00-00\njupiter/syncthing-data@auto-2023-12-26_00-00\njupiter/syncthing-data@auto-2023-12-27_00-03\njupiter/syncthing-data@auto-2023-12-28_00-03\njupiter/syncthing-data@auto-2023-12-29_00-03\njupiter/syncthing-data@auto-2023-12-30_00-03\njupiter/syncthing-data@auto-2023-12-31_00-03\njupiter/syncthing-data@auto-2024-01-01_00-03\njupiter/syncthing-data@auto-2024-01-02_00-03\njupiter/syncthing-data@auto-2024-01-03_00-03\njupiter/syncthing-data@auto-2024-01-04_00-03\njupiter/syncthing-data@auto-2024-01-05_00-03\njupiter/syncthing-data@auto-2024-01-06_00-03\njupiter/syncthing-data@auto-2024-01-07_00-03\njupiter/syncthing-data@auto-2024-01-08_00-03\njupiter/syncthing-data@auto-2024-01-09_00-03\njupiter/syncthing-data@auto-2024-01-10_00-03\njupiter/syncthing-data@auto-2024-01-11_00-03\njupiter/syncthing-data@auto-2024-01-12_00-03\njupiter/syncthing-data@auto-2024-01-13_00-03\njupiter/syncthing-data@auto-2024-01-14_00-03\njupiter/syncthing-data@auto-2024-01-15_00-03\njupiter/syncthing-data@auto-2024-01-16_00-03\njupiter/syncthing-data@auto-2024-01-17_00-03\njupiter/syncthing-data@auto-2024-01-18_00-03\njupiter/syncthing-data@manual-2023-11-20_17-25_beforededup\n'

For some reason, running the replication task for scanned-documents triggered zfs list for the unrelated dataset /syncthing-data. I verified that during two other manual runs of the replication task for scanned-documents. After I opened the snapshop task for scanned-documents and unset Recursive and set it again, that "connection" seemed to have disappeared. I read somewhere that you need to the snapshot tasks to recursive for the remote retention policy to work.

Code:
[2024/01/19 10:29:31] DEBUG    [IoThread_15] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:239] Connecting...
[2024/01/19 10:29:31] DEBUG    [IoThread_15] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:239] [async_exec:241] Running ['zfs', 'list', '-t', 'snapshot', '-H', '-o', 'name', '-s', 'name', '-d', '1', 'jupiter/syncthing-data'] with sudo=False
[2024/01/19 10:29:31] DEBUG    [IoThread_15] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:239] [async_exec:241] Reading stdout
[2024/01/19 10:29:31] DEBUG    [IoThread_15] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:239] [async_exec:241] Waiting for exit status
[2024/01/19 10:29:31] DEBUG    [IoThread_15] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:239] [async_exec:241] Success: 'jupiter/syncthing-data@auto-2023-10-21_21-00\njupiter/syncthing-data@auto-2023-10-22_21-00\njupiter/syncthing-data@auto-2023-10-23_21-00\njupiter/syncthing-data@auto-2023-10-24_21-00\njupiter/syncthing-data@auto-2023-10-25_21-00\njupiter/syncthing-data@auto-2023-10-27_21-00\njupiter/syncthing-data@auto-2023-10-28_21-00\njupiter/syncthing-data@auto-2023-10-29_21-00\njupiter/syncthing-data@auto-2023-10-30_21-00\njupiter/syncthing-data@auto-2023-10-31_21-00\njupiter/syncthing-data@auto-2023-11-01_21-00\njupiter/syncthing-data@auto-2023-11-02_21-00\njupiter/syncthing-data@auto-2023-11-03_21-00\njupiter/syncthing-data@auto-2023-11-04_21-00\njupiter/syncthing-data@auto-2023-11-05_21-00\njupiter/syncthing-data@auto-2023-11-06_21-00\njupiter/syncthing-data@auto-2023-11-07_21-00\njupiter/syncthing-data@auto-2023-11-08_21-00\njupiter/syncthing-data@auto-2023-11-11_21-00\njupiter/syncthing-data@auto-2023-11-13_21-00\njupiter/syncthing-data@auto-2023-11-14_21-00\njupiter/syncthing-data@auto-2023-11-15_21-00\njupiter/syncthing-data@auto-2023-11-16_21-00\njupiter/syncthing-data@auto-2023-11-17_21-00\njupiter/syncthing-data@auto-2023-11-18_21-00\njupiter/syncthing-data@auto-2023-11-19_21-00\njupiter/syncthing-data@auto-2023-11-20_21-00\njupiter/syncthing-data@auto-2023-11-21_21-00\njupiter/syncthing-data@auto-2023-11-22_21-00\njupiter/syncthing-data@auto-2023-11-23_21-00\njupiter/syncthing-data@auto-2023-11-24_21-00\njupiter/syncthing-data@auto-2023-11-25_21-00\njupiter/syncthing-data@auto-2023-11-26_21-00\njupiter/syncthing-data@auto-2023-11-27_00-00\njupiter/syncthing-data@auto-2023-11-28_00-00\njupiter/syncthing-data@auto-2023-11-29_00-00\njupiter/syncthing-data@auto-2023-11-30_00-00\njupiter/syncthing-data@auto-2023-12-01_00-00\njupiter/syncthing-data@auto-2023-12-02_00-00\njupiter/syncthing-data@auto-2023-12-03_00-00\njupiter/syncthing-data@auto-2023-12-04_00-00\njupiter/syncthing-data@auto-2023-12-05_00-00\njupiter/syncthing-data@auto-2023-12-06_00-00\njupiter/syncthing-data@auto-2023-12-07_00-00\njupiter/syncthing-data@auto-2023-12-08_00-00\njupiter/syncthing-data@auto-2023-12-09_00-00\njupiter/syncthing-data@auto-2023-12-10_00-00\njupiter/syncthing-data@auto-2023-12-11_00-00\njupiter/syncthing-data@auto-2023-12-12_00-00\njupiter/syncthing-data@auto-2023-12-13_00-00\njupiter/syncthing-data@auto-2023-12-14_00-00\njupiter/syncthing-data@auto-2023-12-15_00-00\njupiter/syncthing-data@auto-2023-12-16_00-00\njupiter/syncthing-data@auto-2023-12-17_00-00\njupiter/syncthing-data@auto-2023-12-18_00-00\njupiter/syncthing-data@auto-2023-12-19_00-00\njupiter/syncthing-data@auto-2023-12-20_00-00\njupiter/syncthing-data@auto-2023-12-21_00-00\njupiter/syncthing-data@auto-2023-12-22_00-00\njupiter/syncthing-data@auto-2023-12-23_00-00\njupiter/syncthing-data@auto-2023-12-24_00-00\njupiter/syncthing-data@auto-2023-12-25_00-00\njupiter/syncthing-data@auto-2023-12-26_00-00\njupiter/syncthing-data@auto-2023-12-27_00-03\njupiter/syncthing-data@auto-2023-12-28_00-03\njupiter/syncthing-data@auto-2023-12-29_00-03\njupiter/syncthing-data@auto-2023-12-30_00-03\njupiter/syncthing-data@auto-2023-12-31_00-03\njupiter/syncthing-data@auto-2024-01-01_00-03\njupiter/syncthing-data@auto-2024-01-02_00-03\njupiter/syncthing-data@auto-2024-01-03_00-03\njupiter/syncthing-data@auto-2024-01-04_00-03\njupiter/syncthing-data@auto-2024-01-05_00-03\njupiter/syncthing-data@auto-2024-01-06_00-03\njupiter/syncthing-data@auto-2024-01-07_00-03\njupiter/syncthing-data@auto-2024-01-08_00-03\njupiter/syncthing-data@auto-2024-01-09_00-03\njupiter/syncthing-data@auto-2024-01-10_00-03\njupiter/syncthing-data@auto-2024-01-11_00-03\njupiter/syncthing-data@auto-2024-01-12_00-03\njupiter/syncthing-data@auto-2024-01-13_00-03\njupiter/syncthing-data@auto-2024-01-14_00-03\njupiter/syncthing-data@auto-2024-01-15_00-03\njupiter/syncthing-data@auto-2024-01-16_00-03\njupiter/syncthing-data@auto-2024-01-17_00-03\njupiter/syncthing-data@auto-2024-01-18_00-03\njupiter/syncthing-data@manual-2023-11-20_17-25_beforededup\n'
[2024/01/19 10:33:56] DEBUG    [IoThread_12] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:240] Connecting...
[2024/01/19 10:33:56] DEBUG    [IoThread_12] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:240] [async_exec:242] Running ['zfs', 'list', '-t', 'snapshot', '-H', '-o', 'name', '-s', 'name', '-d', '1', 'jupiter/scanned-documents'] with sudo=False
[2024/01/19 10:33:56] DEBUG    [IoThread_12] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:240] [async_exec:242] Reading stdout
[2024/01/19 10:33:56] DEBUG    [IoThread_12] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:240] [async_exec:242] Waiting for exit status
[2024/01/19 10:33:56] DEBUG    [IoThread_12] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:240] [async_exec:242] Success: 'jupiter/scanned-documents@auto-2024-01-18_22-43\njupiter/scanned-documents@auto-2024-01-18_22-47\n'
[2024/01/19 10:34:05] DEBUG    [IoThread_20] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:241] Connecting...
[2024/01/19 10:34:06] DEBUG    [IoThread_20] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:241] [async_exec:243] Running ['zfs', 'list', '-t', 'snapshot', '-H', '-o', 'name', '-s', 'name', '-d', '1', 'jupiter/syncthing-data'] with sudo=False
[2024/01/19 10:34:06] DEBUG    [IoThread_20] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:241] [async_exec:243] Reading stdout
[2024/01/19 10:34:06] DEBUG    [IoThread_20] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:241] [async_exec:243] Waiting for exit status
[2024/01/19 10:34:06] DEBUG    [IoThread_20] [zettarepl.transport.base_ssh] [ssh:admin@192.168.178.143] [shell:241] [async_exec:243] Success: 'jupiter/syncthing-data@auto-2023-10-21_21-00\njupiter/syncthing-data@auto-2023-10-22_21-00\njupiter/syncthing-data@auto-2023-10-23_21-00\njupiter/syncthing-data@auto-2023-10-24_21-00\njupiter/syncthing-data@auto-2023-10-25_21-00\njupiter/syncthing-data@auto-2023-10-27_21-00\njupiter/syncthing-data@auto-2023-10-28_21-00\njupiter/syncthing-data@auto-2023-10-29_21-00\njupiter/syncthing-data@auto-2023-10-30_21-00\njupiter/syncthing-data@auto-2023-10-31_21-00\njupiter/syncthing-data@auto-2023-11-01_21-00\njupiter/syncthing-data@auto-2023-11-02_21-00\njupiter/syncthing-data@auto-2023-11-03_21-00\njupiter/syncthing-data@auto-2023-11-04_21-00\njupiter/syncthing-data@auto-2023-11-05_21-00\njupiter/syncthing-data@auto-2023-11-06_21-00\njupiter/syncthing-data@auto-2023-11-07_21-00\njupiter/syncthing-data@auto-2023-11-08_21-00\njupiter/syncthing-data@auto-2023-11-11_21-00\njupiter/syncthing-data@auto-2023-11-13_21-00\njupiter/syncthing-data@auto-2023-11-14_21-00\njupiter/syncthing-data@auto-2023-11-15_21-00\njupiter/syncthing-data@auto-2023-11-16_21-00\njupiter/syncthing-data@auto-2023-11-17_21-00\njupiter/syncthing-data@auto-2023-11-18_21-00\njupiter/syncthing-data@auto-2023-11-19_21-00\njupiter/syncthing-data@auto-2023-11-20_21-00\njupiter/syncthing-data@auto-2023-11-21_21-00\njupiter/syncthing-data@auto-2023-11-22_21-00\njupiter/syncthing-data@auto-2023-11-23_21-00\njupiter/syncthing-data@auto-2023-11-24_21-00\njupiter/syncthing-data@auto-2023-11-25_21-00\njupiter/syncthing-data@auto-2023-11-26_21-00\njupiter/syncthing-data@auto-2023-11-27_00-00\njupiter/syncthing-data@auto-2023-11-28_00-00\njupiter/syncthing-data@auto-2023-11-29_00-00\njupiter/syncthing-data@auto-2023-11-30_00-00\njupiter/syncthing-data@auto-2023-12-01_00-00\njupiter/syncthing-data@auto-2023-12-02_00-00\njupiter/syncthing-data@auto-2023-12-03_00-00\njupiter/syncthing-data@auto-2023-12-04_00-00\njupiter/syncthing-data@auto-2023-12-05_00-00\njupiter/syncthing-data@auto-2023-12-06_00-00\njupiter/syncthing-data@auto-2023-12-07_00-00\njupiter/syncthing-data@auto-2023-12-08_00-00\njupiter/syncthing-data@auto-2023-12-09_00-00\njupiter/syncthing-data@auto-2023-12-10_00-00\njupiter/syncthing-data@auto-2023-12-11_00-00\njupiter/syncthing-data@auto-2023-12-12_00-00\njupiter/syncthing-data@auto-2023-12-13_00-00\njupiter/syncthing-data@auto-2023-12-14_00-00\njupiter/syncthing-data@auto-2023-12-15_00-00\njupiter/syncthing-data@auto-2023-12-16_00-00\njupiter/syncthing-data@auto-2023-12-17_00-00\njupiter/syncthing-data@auto-2023-12-18_00-00\njupiter/syncthing-data@auto-2023-12-19_00-00\njupiter/syncthing-data@auto-2023-12-20_00-00\njupiter/syncthing-data@auto-2023-12-21_00-00\njupiter/syncthing-data@auto-2023-12-22_00-00\njupiter/syncthing-data@auto-2023-12-23_00-00\njupiter/syncthing-data@auto-2023-12-24_00-00\njupiter/syncthing-data@auto-2023-12-25_00-00\njupiter/syncthing-data@auto-2023-12-26_00-00\njupiter/syncthing-data@auto-2023-12-27_00-03\njupiter/syncthing-data@auto-2023-12-28_00-03\njupiter/syncthing-data@auto-2023-12-29_00-03\njupiter/syncthing-data@auto-2023-12-30_00-03\njupiter/syncthing-data@auto-2023-12-31_00-03\njupiter/syncthing-data@auto-2024-01-01_00-03\njupiter/syncthing-data@auto-2024-01-02_00-03\njupiter/syncthing-data@auto-2024-01-03_00-03\njupiter/syncthing-data@auto-2024-01-04_00-03\njupiter/syncthing-data@auto-2024-01-05_00-03\njupiter/syncthing-data@auto-2024-01-06_00-03\njupiter/syncthing-data@auto-2024-01-07_00-03\njupiter/syncthing-data@auto-2024-01-08_00-03\njupiter/syncthing-data@auto-2024-01-09_00-03\njupiter/syncthing-data@auto-2024-01-10_00-03\njupiter/syncthing-data@auto-2024-01-11_00-03\njupiter/syncthing-data@auto-2024-01-12_00-03\njupiter/syncthing-data@auto-2024-01-13_00-03\njupiter/syncthing-data@auto-2024-01-14_00-03\njupiter/syncthing-data@auto-2024-01-15_00-03\njupiter/syncthing-data@auto-2024-01-16_00-03\njupiter/syncthing-data@auto-2024-01-17_00-03\njupiter/syncthing-data@auto-2024-01-18_00-03\njupiter/syncthing-data@manual-2023-11-20_17-25_beforededup\n'

Also I'm wondering if the DEBUG is any reason to be concerned, but I doubt it.

This was triggered automatically, I was asleep then ;)
Code:
[2024/01/20 03:27:10] INFO     [retention] [zettarepl.zettarepl] Retention destroying local snapshots: [('neptune/test-dataset', 'auto-2024-01-20_01-00')]
[2024/01/20 03:27:10] INFO     [retention] [zettarepl.snapshot.destroy] On <Shell(<LocalTransport()>)> for dataset 'neptune/test-dataset' destroying snapshots {'auto-2024-01-20_01-00'}
[2024/01/20 03:27:11] INFO     [Thread-1945] [zettarepl.paramiko.retention] Connected (version 2.0, client OpenSSH_9.2p1)
[2024/01/20 03:27:11] INFO     [Thread-1945] [zettarepl.paramiko.retention] Authentication (publickey) successful!
[2024/01/20 03:27:12] INFO     [retention] [zettarepl.zettarepl] Retention on <SSH Transport(admin@192.168.178.143)> destroying snapshots: [('jupiter/test-dataset', 'auto-2024-01-20_01-00')]
[2024/01/20 03:27:12] INFO     [retention] [zettarepl.snapshot.destroy] On <Shell(<SSH Transport(admin@192.168.178.143)>)> for dataset 'jupiter/test-dataset' destroying snapshots {'auto-2024-01-20_01-00'}
[2024/01/20 03:27:13] INFO     [retention] [zettarepl.zettarepl] Retention on <LocalTransport()> destroying snapshots: []


I would assume nothing of that is reason for concern and the zettarepl just occasionally (not necessarly linked to snapshot / replication schedules) performs some tasks.

3) Because I also discovered that there are many snapshots on source that will not get deleted I got curious, because I was certain to have set a policy before. Turns out: When you delete the snapshot task, the retention policy will automatically change to Will not be destroyed automatically.
Before it says Will be automatically destroyed at XXX by periodic snapshot task. Maybe I'm a stickler for wording, but by periodic snapshot task implied to me: there is a period snapshot task and that will destroy the snapshot. Apparantly you need to read it as: Will be automatically destroyed at XXX by the specific snapshot task that created this snapshot.

I checked the documentation to see if I missed that, but I couldn't find any mention of that.
documentation. I discovered an interesting bit though:

Snapshot LifetimeEnter the length of time to retain the snapshot on this system using a numeric value and a single lowercase letter for units. Examples: 3h is three hours, 1m is one month, and 1y is one year. Does not accept minute values. After the time expires, the snapshot is removed. Snapshots replicated to other systems are not affected.
I would like to see the clarification here that this is not universally true. They will be affected when you set the retention policy same as source. But one could argue, whoever sets this knows what they are doing anyway. The fact that even manually deleted snapshots, which did not inherit a retention time will get synchronized, i.e. deleted on the replication target.

Thank you for your time and hopefully you can enlighten me on some of the points I stumbled upon.

Have a nice weekend!
 
Joined
Oct 22, 2019
Messages
3,641
Any idea for the delay?
It's another "quirk" about zettarepl.

Deletions are tethered to the snapshot task's schedule itself. There is no independent "deletion process" that runs regularly to check for expired snapshots.


Zettarepl (iXsystem's software that runs under-the-hood of snapshot and replication tasks) is "name-based", and uses the names of the snapshots themselves to do its magic. (Whether creating, deleting, or replicating.) In fact, zettarepl cannot operate on non-parseable names, which is why every snapshot task must have a parseable "date" within its name.¹

If you suspend or delete a Snapshot Task, you effectively "decouple" it from the existing snapshots, which has implications not only for pruning, but also for "protection" (if you have a similar-named Snapshot Task with a different expiration policy.)


* [1] Zettarepl parses the date/time format within a snapshot's name, i.e, XXXX-YY-ZZ, to determine it's creation time, and then compares it against the Snapshot Task's retention policy, and asks "Is this beyond the expiration?" Zettarepl does not inspect the snapshot's metadata. ZFS snapshots do, indeed, contain metadata, such as "creation time", which is more accurate than the "name" of the snapshot itself; however, iXsystems decided to ignore this metadata, since it's easier to script against snapshot names, and it's much faster to parse names, rather than read metadata for hundreds, if not thousands, of snapshots.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Maybe I'm a stickler for wording, but by periodic snapshot task implied to me: there is a period snapshot task and that will destroy the snapshot. Apparantly you need to read it as: Will be automatically destroyed at XXX by the specific snapshot task that created this snapshot.

Unless something has changed in SCALE, no snapshot is tied to its "creation task". Any new Periodic Snapshot Task can usurp into its fold any existing snapshot; even ones that were created long ago. "It's all about the name!"


Let's say a long time ago you created a Periodic Snapshot Task, and used the naming schema:
  • auto-YYYY-mm-dd_HH-MM

It only takes snapshots once per week, but has an expiration of 5 years.

Here are examples of some snapshots created by this task:
  • auto-2024-01-01_00-00
  • auto-2024-01-08_00-00
  • auto-2024-01-15_00-00


You expect them to survive for 5 years. After all, this is the retention policy you specified in the Periodic Snapshot Task.


Now let's say a couple years later you decide: "I don't want to have weekly snapshots anymore. I prefer daily. But I don't want these daily snapshots to pile up. I'd rather have them expire after 1 month. But I do like that I have some old, archived snapshots that were already created years back, which still have some life remaining in their expiration. They have maybe a couple more years. Maybe I'll inspect them later on to see if I want to preserve any indefinitely."


So you disable/remove the weekly Periodic Snapshot Task, and create a new daily Periodic Snapshot Task, with this naming schema:
  • auto-YYYY-mm-dd_HH-MM

It takes snapshots once per day, and has an expiration of 1 month.

Here are examples of some snapshots created by this new task:
  • auto-2026-03-04_00-00
  • auto-2026-03-11_00-00
  • auto-2026-03-18_00-00

To your surprise, you find out that all your old, archival snapshots (which you assumed were safe for 5 years) have all been destroyed! :eek:

You know why? Because when zettarepl runs, on behalf of your new "daily" Periodic Snapshot Task, it parses all matching snapshot names and determines if they have lived beyond your 1-month expiration policy.


I highlight in red what zettarepl looks for and considers a "match" for this daily snapshot task:
  • auto-2024-01-01_00-00
  • auto-2024-01-08_00-00
  • auto-2024-01-15_00-00

Notice how those old snapshot names, which were created by your pervious "weekly" task, contain a matching pattern for your new "daily" task? Zettarepl doesn't know or care that you want to keep them. All it knows is "These match the 'daily' task's date-stamp pattern. Are they older than 1 month? Yes. DESTROY!"
 
Last edited:

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
@winnielinnie thanks for your input!

I will setup a test snapshot task tonight and test your info.

From what I gathered it seems to checkout, my older snapshots did not get removed because the time part of the name did not match. So if I setup a new task, with a different retention policy it should pick up the old snapshots when the timecode matches and auto-remove them.
 
Joined
Oct 22, 2019
Messages
3,641
I will setup a test snapshot task tonight and test your info.
Keep in mind that a "schedule" or "rule" might not necessarily overlap.

For example, "daily" snapshots taken at 6pm every day will not overlap with "daily" snapshots taken at midnight. Zettarepl will only parse and prune snapshots that end with "18-00" for the former task, while skipping snapshots that end with "00-00".
 
Last edited:

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Keep in mind that a "schedule" or "rule" might not necessarily overlap.
That's what I wanted to test. I still think it's not intuitive how it works and how it's worded.

It worked though, I setup a new snapshot task matching the time pattern and it even updated the GUI from will not be destroyed automatically to will be destroyed at xxx by periodid snapshot task.

Thanks again, this will save me a lot of headache in the future when the first long term snapshots are due for deletion and they will not be deleted automatically ;)
 

CJRoss

Contributor
Joined
Aug 7, 2017
Messages
139
Another thing to be aware of is that if you have any Pull replication tasks and that host isn't available, no snapshots will be deleted, regardless of whether they were associated with a replication task or not. zettarepl seems to be all or none in regards to deletion and therefore leaves everything if it can't determine if a pulled snapshot should be deleted.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Hello all,

I need to revive that thread as there are still some issues not entirely clear to me:

1) Snapshots not following the automatic naming are not destroyed on the destination system
2) Some snapshots still get not deleted, even when they are following the naming scheme auto-YYYY-mm-dd_HH-MM
3) I do not find the Synchronizing Destination and Source Snapshots section in my GUI

Potentially important info beforehand: I messed up which system was which and started deleting snapshots on the destination system (iirc only manual snapshots) although I set different UI color themes for the systems to avoid exactly that. This didn't break anything, they still have a common snapshot.

Regarding 1) I setup a test dataset and played around with it. Snapshots removed on the source system get deleted on the destination system eventually even if the HH-MM does not match the current snapshot schedule on the source. This I confirmed earlier. Snapshots I manually created manual1 and manual2 are not marked for deletion on the destination system and thus stay there forever. I assume this has something to do with the way zettarepl works as mentioned by @winnielinnie here.

Regarding 2) this is actually a headscratcher for me. From the source system:

Code:
neptune/halimede@auto-2023-09-21_21-00   408K      -   492G  -
neptune/halimede@auto-2023-09-29_21-00   296K      -   492G  -
neptune/halimede@auto-2023-09-30_21-00   320K      -   492G  -
neptune/halimede@auto-2023-10-01_21-00  41.4M      -   492G  -


on the destination

Code:
jupiter/halimede@auto-2023-05-14_21-00   302G      -   730G  -
jupiter/halimede@auto-2023-09-21_21-00   408K      -   492G  -
jupiter/halimede@auto-2023-09-29_21-00   296K      -   492G  -
jupiter/halimede@auto-2023-09-30_21-00   320K      -   492G  -
jupiter/halimede@auto-2023-10-01_21-00  41.4M      -   492G  -


auto-2023-05-14_21-00 does not get deleted on the destination. The current snapshot schedule is timed for 23:59, this will mean that I have to manually delete the older snapshots on the source at some point. Referring to 1) this should also delete the snapshots on the destination.
In this case I manually deleted auto-2023-05-14_21-00 on the source weeks ago.

I know I can manually remove the snapshot on the destination. Are the snapshots on the destination somehow dependent on that snapshot? Can I check somehow?

Regarding 3) The documentation mentions

Synchronizing Destination and Source Snapshots​

Synchronizing Destination Snapshots With Source destroys any snapshots in the destination that do not match the source snapshots. TrueNAS also does a full replication of the source snapshots as if the replication task never run, which can lead to excessive bandwidth consumption.

This can be a very destructive option. Make sure that any snapshots deleted from the destination are obsolete or otherwise backed up in a different location.
The documentation was last updated "Last Modified 2023-08-24 14:58 EDT" so it should be implemted in 23.10.1 (source system) and 23.10.2 (recently upgraded the destination system). Can anyone see this option in the advanced replication UI? Or is this an obsolete / redundand explanation of the replication from scratch option? The "TrueNAS also does a full replication of the source snapshots as if the replication task never run, which can lead to excessive bandwidth consumption." part is leading me to believe the latter.

Thank you and have a great sunday!
 
Joined
Oct 22, 2019
Messages
3,641
auto-2023-05-14_21-00 does not get deleted on the destination. The current snapshot schedule is timed for 23:59, this will mean that I have to manually delete the older snapshots on the source at some point. Referring to 1) this should also delete the snapshots on the destination.
In this case I manually deleted auto-2023-05-14_21-00 on the source weeks ago.

I know I can manually remove the snapshot on the destination. Are the snapshots on the destination somehow dependent on that snapshot? Can I check somehow?
If your Periodic Snapshot Task's naming schema ends with "23-59", then any existing snapshots that end with anything other than "23-59" will be skipped when zettarepl decides what to prune. (Doesn't matter if it's on the source or destination of a Replication Task.)

Remember: The "Destination" can be any ZFS server. (Even non-TrueNAS servers.) All the logic and decisions are done by zettarepl that runs on the source. (Issued and executed by your authenticated SSH connection, which again, works even with a non-TrueNAS server on the other end.) There is no logic on the other side that dictates pruning and preservation of snapshots.

* It helps to imagine zettarepl as an actual human with admin access (via SSH) to a remote server that happens to host a ZFS pool. Pretend that you're telling Mr. Zettarepl something to the effect of: "Here's SSH access to a remote ZFS server. TrueNAS or not, it hosts a ZFS pool for my backups. Here are the rules I want you to follow when you decide what to send, prune, and preserve. Use your SSH connection + zfs commands to execute these actions remotely. Thank you."
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
If your Periodic Snapshot Task's naming schema ends with "23-59", then any existing snapshots that end with anything other than "23-59" will be skipped when zettarepl decides what to prune. (Doesn't matter if it's on the source or destination of a Replication Task.)
Thanks. Yes, this I understand and tested. I'm talking specifically about manually deleting snapshots on the source.

Here are the rules I want you to follow when you decide what to send, prune, and preserve. Use your SSH connection + zfs commands to execute these actions remotely. Thank you."
Pruning the source should be picked up then I think.

I tested it when we first started this topic:
I setup a test dataset with hourly snapshots and a two hour retention time. I can confirm the snapshots got deleted on the remote machine also. Additionally the snapshots I manually deleted on the source got also destroyed, which is nice.
and I tested it again yesterday:

I setup snapshots every 5 minutes with1 week retention time:

Code:
NAME                                         USED  AVAIL  REFER  MOUNTPOINT
source/test-dataset@auto-2024-03-03_19-20  54.3M      -  67.0M  -
source/test-dataset@auto-2024-03-03_19-25   112K      -  12.9M  -
source/test-dataset@auto-2024-03-03_19-30   144K      -  13.1M  -
source/test-dataset@auto-2024-03-03_19-35     0B      -  33.5M  -

NAME                                         USED  AVAIL  REFER  MOUNTPOINT
destination/test-dataset@auto-2024-03-03_19-20  54.2M      -  67.0M  -
destination/test-dataset@auto-2024-03-03_19-25    88K      -  12.9M  -
destination/test-dataset@auto-2024-03-03_19-30   120K      -  13.0M  -
destination/test-dataset@auto-2024-03-03_19-35     0B      -  33.5M  -


I changed the snapshot task from every 5 min to 43 min past the hour and manually deleted a snapshot on the source:

Code:
[2024/03/03 19:43:01] INFO     [replication_task__task_20] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: [('destination/test-dataset', 'auto-2024-03-03_19-20')]
[2024/03/03 19:43:01] INFO     [replication_task__task_20] [zettarepl.snapshot.destroy] On <Shell(<SSH Transport(admin@192.168.178.143)>)> for dataset 'destination/test-dataset' destroying snapshots {'auto-2024-03-03_19-20'}

Code:
NAME                                         USED  AVAIL  REFER  MOUNTPOINT
source/test-dataset@auto-2024-03-03_19-25   144K      -  12.9M  -
source/test-dataset@auto-2024-03-03_19-30   144K      -  13.1M  -
source/test-dataset@auto-2024-03-03_19-35     0B      -  33.5M  -

NAME                                         USED  AVAIL  REFER  MOUNTPOINT
destination/test-dataset@auto-2024-03-03_19-25   120K      -  12.9M  -
destination/test-dataset@auto-2024-03-03_19-30   120K      -  13.0M  -
destination/test-dataset@auto-2024-03-03_19-35     0B      -  33.5M  -

which also propagated to the destination. I created manual snapshots with a different naming convention:
Code:
NAME                                         USED  AVAIL  REFER  MOUNTPOINT
source/test-dataset@auto-2024-03-03_19-25   144K      -  12.9M  -
source/test-dataset@auto-2024-03-03_19-30   144K      -  13.1M  -
source/test-dataset@auto-2024-03-03_19-35   112K      -  33.5M  -
source/test-dataset@manual1                 112K      -   954M  -
source/test-dataset@manual2                   0B      -   954M  -

NAME                                         USED  AVAIL  REFER  MOUNTPOINT
destination/test-dataset@auto-2024-03-03_19-25   120K      -  12.9M  -
destination/test-dataset@auto-2024-03-03_19-30   120K      -  13.0M  -
destination/test-dataset@auto-2024-03-03_19-35    88K      -  33.5M  -
destination/test-dataset@manual1                  88K      -   954M  -
destination/test-dataset@manual2                   0B      -   954M  -

Code:
NAME                                         USED  AVAIL  REFER  MOUNTPOINT
source/test-dataset@auto-2024-03-03_19-25   144K      -  12.9M  -
source/test-dataset@auto-2024-03-03_19-30   144K      -  13.1M  -
source/test-dataset@auto-2024-03-03_19-35   112K      -  33.5M  -
source/test-dataset@manual2                 112K      -   954M  -
source/test-dataset@auto-2024-03-03_21-22     0B      -  1.02G  -

NAME                                         USED  AVAIL  REFER  MOUNTPOINT
destination/test-dataset@auto-2024-03-03_19-25   120K      -  12.9M  -
destination/test-dataset@auto-2024-03-03_19-30   120K      -  13.0M  -
destination/test-dataset@auto-2024-03-03_19-35    88K      -  33.5M  -
destination/test-dataset@manual1                  88K      -   954M  -
destination/test-dataset@manual2                  88K      -   954M  -
destination/test-dataset@auto-2024-03-03_21-22     0B      -  1.02G  -

and they did not get deleted.

It worked multiple times the way I thought it would, hence my confusion.
 
Joined
Oct 22, 2019
Messages
3,641
There's a lot going on under-the-hood when it comes to TrueNAS, its middleware, and zettarepl specifically. Unlike basic ZFS send/recv commands (which is what I do, since I'm not a fan of how the GUI operates for simple "mirrored" backups), you should assume some unexpected behavior.

So with the "recv" side, there's the -F flag, which technically will do what you desire. It will destroy and prune on the destination based on what exists on the source, when combined with the -R flag on the "send" side. (In a sense, akin to what rsync does with the --delete flag: a mirrored backup that makes the destination match the source.)

Zettarepl doesn't operate like that. In fact, I believe it issues multiple send/recv operations based on the rules of its task in the GUI. (Think of a baton being handed off in a track field. Each "runner" is the next chronological snapshot in the accumulated list.)

Strictly using ZFS commands (bypassing the GUI/zettarepl), you can use a more "all at once" method to make a mirrored backup on the destination. Something like this:
Code:
zfs send -v -R -I -w source/mydata@backup-2024-01-01 @backup2024-02-01 | zfs recv -v -F destination/mydata


This will destroy (on the destination) anything that doesn't exist on the source side. The dataset "mydata" on the destination will essentially by a pure replica of "mydata" on the source: properties, snapshots, clones, and children. However, it's hard to automate, and you lose out on all the bells and whistles that are afforded by the GUI.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
I changed the snapshot task from every 5 min to 43 min past the hour and manually deleted a snapshot on the source:
This could be a "fix" on SCALE and/or zettarepl in general. Perhaps the original Snapshot Task now retains information about previously created snapshots under its former umbrella? I'm not sure. You'd have to ask an iX developer or check out the source code.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
This could be a "fix" on SCALE and/or zettarepl in general.
I deliberately changed the schedule to separate all existing snapshots from the prune schedule of the task (although the retention time was 1 week). As per your explanation snapshots will be recognized by their time format and I wanted to make sure the snapshots are "orphaned" from the task.

Perhaps the original Snapshot Task now retains information about previously created snapshots under its former umbrella? I'm not sure.
As far as I checked it doesn't, the GUI however will pickup snapshots that match the time part of the naming.

There's a lot going on under-the-hood when it comes to TrueNAS, its middleware, and zettarepl specifically. Unlike basic ZFS send/recv commands (which is what I do, since I'm not a fan of how the GUI operates for simple "mirrored" backups), you should assume some unexpected behavior.
I'll probably leave it at that explanation :) Thanks for taking the time, I learned a lot!

However, it's hard to automate, and you lose out on all the bells and whistles that are afforded by the GUI.
That's really interesting. If I understand correctly I need to name all snapshots specifically source/mydata@backup-2024-01-01 @backup2024-02-01 . I may start to experiment with that information a little. A list of suitable snapshots could be generated with zfs list I guess.
 
Joined
Oct 22, 2019
Messages
3,641
A list of suitable snapshots could be generated with zfs list I guess.
The problem is you'll be "reinventing the wheel", which the GUI tries to offer you in its toolset.

The reason I backup "my way" is because I'm not a fan of the inflexible manner in which the GUI operates for ZFS replications. It's highly tied into your snapshot tasks and snapshot naming schema, and it operates with a "passing the baton" approach.

I like the simpler "just make the destination the same as the source" approach, which is accomplished by using a combination of -R -I (sender) and -F (receiver). This behaves analogous to using rsync to create one-way mirrored backups. But as I alluded to earlier, you have to manually manage and/or automate this yourself outside of the GUI. (Not recommended.)

Would be nice if they had an additional "tool" in the GUI called "Manual Simple Backup", which would be untethered from everything else and operate in the same vein as Syncoid. (AKA: a GUI wrapper for Syncoid.) Perhaps with a disclaimer that any dataset using the "Manual Simple Backup" tool cannot have a Replication Task assigned to it.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
The problem is you'll be "reinventing the wheel", which the GUI tries to offer you in its toolset.
Either I'm wildly off track here or the GUI does not offer what I'm after. Apart from replication from scratch there's no option to keep the source in sync when you throw in manual snapshots and the other manually deleted snapshots seem to be hit and miss.

Once you have everything setup and don't change a thing it's nice though.
. But as I alluded to earlier, you have to manually manage and/or automate this yourself outside of the GUI. (Not recommended.)
Agreed, in that case I'd rather prune manually from time to timem. I'd rather have everything automated.
 
Top