How are Periodic Snapshots marked for deletion?

winnielinnie · Jan 13, 2021

I could never find a definitive answer to this particular question. What exactly runs behind the scenes to mark and delete old snapshots?

Is there a script that searches for old snapshots based on their name? Or is it based on their creation timestamp?

Is it more sophisticated where it stores a table of snapshots and their expiration date, regardless of the snapshot's name and regardless of the Periodic Snapshot Task name? (In other words, renaming a snapshot will not save it from certain doom once its expiration arrives, since it is "tagged" regardless of its name?)

What happens if you change the "Naming Schema" for a Periodic Snapshot Task that already exists? Will expired snapshots eventually "slip through the cracks" and live for an eternity?

Attached is a screenshot to illustrate my confusion.

MikeyG · Jan 13, 2021

In my experience, yes, they will live on forever. I think the retention data is in the task, not the snapshots themselves. Not ideal, and I don't think it worked that way in 11.2 and prior. I hope I'm wrong though and have just been missing something.

winnielinnie · Jan 14, 2021

You mean if you rename a snapshot, the task that is supposed to remove expired snapshots will skip it (since it looks for a matching name)?

MikeyG · Jan 14, 2021

winnielinnie said:
You mean if you rename a snapshot, the task that is supposed to remove expired snapshots will skip it (since it looks for a matching name)?

Yes, I believe so. I think there are some things you can change on the task where it will continue to remove expired snapshots, and some things that break it. I'm not sure exactly what though. I just know that I've had orphaned snapshots hang around after modifying a task.

indy · Jan 14, 2021

I did exactly that (changed the naming scheme) and FreeNas stopped deleting out-of-date snapshots with the old name.

winnielinnie · Jan 15, 2021

mgittelman said:
I just know that I've had orphaned snapshots hang around after modifying a task.

indy said:
I did exactly that (changed the naming scheme) and FreeNas stopped deleting out-of-date snapshots with the old name.

I wonder if this is by design, or just a "handy accident" that works to protect snapshots?

Makes me curious, then, if there is a risk that a similarly-named snapshot task might inadvertently have its snapshots deleted, even though they have a longer expiration date? After all, the default naming schema is the same for newly created snapshot tasks.

rudds · Mar 21, 2021

Chiming in here to say the logic for this stuff is confusing, or buggy, or both. I'm having a similar problem, in that my pool snapshots are set to a 3-day retention in this "periodic snapshot task" like this:

Those snapshots are synced daily to an external pool overnight with a "replication task", and I set a 6-week retention on the replicated snapshots:

However I just noticed this morning that the snapshots on my external pool are also getting purged after three days, which was a bit of a rude surprise. (Thankfully I haven't needed to retrieve anything from older backups recently.)

I had this setup working fine under 11.3, so I'm not sure at what point the logic changed or a bug was introduced, but something does seem wrong here.

winnielinnie · Mar 21, 2021

rudds said:
I had this setup working fine under 11.3, so I'm not sure at what point the logic changed or a bug was introduced, but something does seem wrong here.

The mystery continues. Do these replicated snapshots (on the external pool) happen to have "-3d" appended to their names? Maybe it's the "-3d" which signals their destruction, even though you set an override lifetime of 6 weeks?

I'm still trying to figure this one out, and I'm not savvy enough to comb through source code to answer my own question in this matter. I don't even know if this is even a "bug" or if we're using snapshots incorrectly, or maybe we are missing a key detail not covered in the documentation?

I guess it's safe to say there's nothing in the snapshot's metadata itself that signals to the system "it's safe to delete me now!"

So that leaves an under-the-hood task (maybe run on a schedule, like a built-in cron job that cannot be reviewed or modified). Maybe it does in fact look at the name of the snapshot (or creation date?) and compare it to the appended portion, such as "-6m" or "-3d". This is noted above when someone renames a Snapshop Task.

Maybe it has nothing to do with any part of the snapshot's name, and there's a separate database stored somewhere that links snapshots to their expiration dates, and the cron job will destroy any existing snapshots that are in this "list"? Perhaps renaming a snapshot inadvertantly "saves it from certain destruction" because the task/database/list is referencing snapshot names, rather than some sort of unique index ID?

At this point I'm just shooting in the wind. It's not something I can feasibly test out to know for certain. Unless someone else has any theories that can answer this mystery?

Once you create a Snapshot Task, you sort of just "set it and forget it". It's not really clear what takes place behind-the-scenes when it comes to retention and expiration, as seen above with @rudds issue of overriding the lifetime to 6 weeks, yet the snapshots are destroyed after 3 days.

sretalla · Mar 22, 2021

I'm not 100% sure which method TrueNAS is employing, but I have seen reference to autosnap before (https://github.com/ansemjo/autosnap), so maybe there's some code based on that involved..

If that is the case, names are most certainly important in designating the retention period. (although there's also potentially a limit in terms of number of snapshots which might also be defined as a property of the dataset (snapshot_limit, related to snapshot_count when checking, I guess... I notice that the original script uses some named properties that I don't see on my replicated datasets, so there's clearly some variation)

You could probably consider a complementary script that uses the creation time of the snapshot to hold and un-hold the snapshots on the backup system depending on when you would want them to be allowed to be removed or not.

I also note that the Advanced Replication creation settings allow for the retention of the target to be set differently than the source, so I guess that's something to look at and confirm if it works or not (then raise a bug report if not).

Patrick M. Hausen · Mar 22, 2021

I don't have an explanation of the fundamentals but probably can add to the confusion

i have a snapshot task for my VMs that does hourly snapshots with a retention period of two weeks:

Bildschirmfoto 2021-03-22 um 13.44.19.png

The resulting snapshots look like this:

Code:

root@freenas[~]# zfs list -t snap -r ssd/vms/windows-pmh-disk0
NAME                                              USED  AVAIL     REFER  MOUNTPOINT
ssd/vms/windows-pmh-disk0@auto-2021-03-08_13-00  9.07M      -     30.4G  -
ssd/vms/windows-pmh-disk0@auto-2021-03-08_14-00  6.01M      -     30.4G  -
ssd/vms/windows-pmh-disk0@auto-2021-03-08_15-00  6.66M      -     30.4G  -
[...]
ssd/vms/windows-pmh-disk0@auto-2021-03-22_11-00  15.6M      -     30.8G  -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_12-00  9.57M      -     30.8G  -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  9.81M      -     30.8G  -

Then I have a replication task that replicates one snapshot per day and keeps those for 4 weeks on another system:

Bildschirmfoto 2021-03-22 um 13.46.22.png

The snapshots on the target system look like this:

Code:

root@freenas2[~]# zfs list -t snap -r fusion/backup/vms/windows-pmh-disk0
NAME                                                        USED  AVAIL     REFER  MOUNTPOINT
fusion/backup/vms/windows-pmh-disk0@auto-2021-02-23_00-00   644M      -     44.1G  -
fusion/backup/vms/windows-pmh-disk0@auto-2021-02-24_00-00   549M      -     44.1G  -
fusion/backup/vms/windows-pmh-disk0@auto-2021-02-25_00-00   572M      -     44.1G  -
[...]
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-20_00-00   572M      -     44.6G  -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-21_00-00   515M      -     44.6G  -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00     0B      -     44.6G  -

So - don't ask me why exactly but at least for me the system is working perfectly as I intend it to do. Perhaps my config screenshots help coming to a conclusion.

Kind regards,
Patrick

sretalla · Mar 22, 2021

Patrick M. Hausen said:
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00

Patrick M. Hausen said:
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00

It would be great to see a zfs get all for both of those and see if there's anything actually stored in the zfs properties to indicate how long to keep them given there's nothing in the name to help.

Patrick M. Hausen · Mar 22, 2021

Source system:

Code:

root@freenas[~]# zfs get all ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00
NAME                                             PROPERTY                VALUE                   SOURCE
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  type                    snapshot                -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  creation                Mon Mar 22 13:00 2021   -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  used                    9.85M                   -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  referenced              30.8G                   -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  compressratio           1.00x                   -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  volsize                 80G                     local
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  createtxg               10889070                -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  guid                    3500501945768056423     -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  primarycache            all                     default
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  secondarycache          all                     default
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  defer_destroy           off                     -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  userrefs                0                       -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  objsetid                86133                   -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  mlslabel                none                    default
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  refcompressratio        1.00x                   -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  written                 19.5M                   -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  clones                                          -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  logicalreferenced       30.6G                   -
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  context                 none                    default
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  fscontext               none                    default
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  defcontext              none                    default
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  rootcontext             none                    default
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  encryption              off                     default
ssd/vms/windows-pmh-disk0@auto-2021-03-22_13-00  org.freebsd.ioc:active  yes                     inherited from ssd

Destination system:

Code:

root@freenas2[~]# zfs get all fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00
NAME                                                       PROPERTY               VALUE                  SOURCE
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  type                   snapshot               -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  creation               Mon Mar 22  0:00 2021  -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  used                   0B                     -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  referenced             44.6G                  -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  compressratio          1.00x                  -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  volsize                80G                    local
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  createtxg              1179316                -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  guid                   4868441139690642052    -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  primarycache           all                    default
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  secondarycache         all                    default
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  defer_destroy          off                    -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  userrefs               0                      -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  objsetid               53833                  -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  mlslabel               none                   default
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  refcompressratio       1.00x                  -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  written                994M                   -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  clones                                        -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  logicalreferenced      30.6G                  -
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  context                none                   default
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  fscontext              none                   default
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  defcontext             none                   default
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  rootcontext            none                   default
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  encryption             off                    default
fusion/backup/vms/windows-pmh-disk0@auto-2021-03-22_00-00  org.truenas:managedby  217.29.46.82           inherited from fusion/backu

sretalla · Mar 22, 2021

So that's a clear "NO" to any information about snapshot retention being stored in the snapshot on either end.

I note from the API reference that there's nothing other than the creation and manipulation of replication jobs available, so no option to run cleanup separately.

It seems to me that the replication task itself is doing the cleanup based on the rules defined in the task itself and identifies jobs to work on based on creation time of the snapshots and the naming standard defined in the task properties in addition to the settings of the task controlling how to treat both sides.

Patrick M. Hausen · Mar 22, 2021

As far as I can deduct from some open issues in iXsystems' JIRA, they use zettarepl.

To quote:

zettarepl is a cross-platform ZFS replication solution. It provides:

Snapshot-based PUSH and PULL replication over SSH or high-speed unencrypted connection

Extensible snapshot creation and replication schedule, replication of manually created snapshots

Consistent recursive snapshots with possibility to exclude certain datasets

All modern ZFS features support including resumable replication

Flexible snapshot retention on both local and remote sides

Comprehensive logging that helps you to understand what is going on and why

Configuration via simple and clear YAML file

Full integration with FreeNAS, the World’s #1 data storage solution

sretalla · Mar 22, 2021

Patrick M. Hausen said:
As far as I can deduct from some open issues in iXsystems' JIRA, they use zettarepl.

I agree, it seems identical to the options available in the GUI.

And indeed it has a built-in set of routines for managing the retention/cleaning of snapshots on both sides as part of the replication system.

Patrick M. Hausen · Mar 22, 2021

sretalla said:
And indeed it has a built-in set of routines for managing the retention/cleaning of snapshots on both sides as part of the replication system.

Which is the only way that makes sense, don't you think? Picture a separate purging task based on the snapshot creation time and some retention configuration. When the replication stops for some reason and that goes undetected, the task will happily expire old snapshots until all of them are gone or the problem is noticed and fixed.

Only delete old data when the new data has been committed is a sane conservative approach, IMHO.

sretalla · Mar 22, 2021

Patrick M. Hausen said:
Which is the only way that makes sense, don't you think? Picture a separate purging task based on the snapshot creation time and some retention configuration. When the replication stops for some reason and that goes undetected, the task will happily expire old snapshots until all of them are gone or the problem is noticed and fixed.

Only delete old data when the new data has been committed is a sane conservative approach, IMHO.

I totally agree, but there is this:

hold_pending_snapshots will prevent source snapshots from being deleted by retention of replication fails for some reason

Or maybe more correct to say, "there is also this"

Patrick M. Hausen · Mar 22, 2021

That's the other side, yes. I was primarily thinking of some purging task on the destination.

sretalla · Mar 22, 2021

Patrick M. Hausen said:
I was primarily thinking of some purging task on the destination.

Which would be stopped if the snapshots still pending are marked as held by the replication task.

Maybe there's some combination of logic with all of that which would be helpful, but there are so many moving parts.

I see you've confirmed that it behaves as expected if set correctly according to your wishes, so maybe we just leave it at that.

Patrick M. Hausen · Mar 22, 2021

sretalla said:
Which would be stopped if the snapshots still pending are marked as held by the replication task.

The point is: there are no separate tasks. The purging is done once the replication completed. Which is good.

Important Announcement for the TrueNAS Community.

How are Periodic Snapshots marked for deletion?

MVP

Patron

MVP

Patron

Patron

MVP

Dabbler

MVP

Powered by Neutrality

Hall of Famer

Powered by Neutrality

Hall of Famer

Powered by Neutrality

Hall of Famer

Powered by Neutrality

Hall of Famer

Powered by Neutrality

Hall of Famer

Powered by Neutrality

Hall of Famer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How are Periodic Snapshots marked for deletion?"

Similar threads