Staged snapshot schedule

Forza

Explorer
Joined
Apr 28, 2021
Messages
81
Hi!

How would I best go about creating a schedule that creates snapshots of a dataset like this:

48 hourly
31 daily
48 monthly
12 yearly

I think that I should create a separate schedule for each "frequency", but is there a better way?
 
Joined
Oct 22, 2019
Messages
3,641
I think that I should create a separate schedule for each "frequency", but is there a better way?
Same boat as you. Unfortunately, there is no such built-in feature, let alone a means to schedule regular "pruning" of snapshots based on a "smart schedule".


You have to create separate snapshot tasks, with different expiration dates, using slightly different naming schemas, all for a single dataset. :confused:

Make sure to prepend the name for each snapshot task so that zettarepl doesn't get confused as to which ones to mark for deletion. Such names can look like so:
  • auto-hourly-%Y-%m-%d_%H-%M <--- set to expire in 2 DAYS
  • auto-daily-%Y-%m-%d_%H-%M <--- set to expire in 1 MONTH
  • auto-monthly-%Y-%m-%d_%H-%M <--- set to expire in 4 YEARS
  • auto-yearly-%Y-%m-%d_%H-%M <--- set to expire in 12 YEARS

The problem with TrueNAS forcing the users to set it like this is that since each task is separate from the others, you have to "time it" to start creating snapshots all around the same time. Otherwise, your less frequent snapshots (e.g, monthly and yearly) might not start until much later: until your more frequent snapshots have been created and already started expiring. :oops:

For me, this still is not preferable. I think we should be able to create a single task that will take regular snapshots, and prune them based on a "smart schedule". In other words, it takes snapshots hourly, and simply allows fewer and fewer of them to "survive" as time goes on, such that you end up with only a handful that survive each calendar year; yet your more frequent ones still exist for the last couple days or so, and a good amount for the last few weeks or so, etc...
 

Forza

Explorer
Joined
Apr 28, 2021
Messages
81
Thanks for your suggestions.

I did something like this on one dataset:
  • Hourly for 8 days
  • Every 4th hour for 32 days
  • Daily for 370 days
  • Monthly for 16 years

I'm thinking each snapshot schedule overlaps, which should provide a consistent state like you describe below:

I think we should be able to create a single task that will take regular snapshots, and prune them based on a "smart schedule". In other words, it takes snapshots hourly, and simply allows fewer and fewer of them to "survive" as time goes on, such that you end up with only a handful that survive each calendar year

I also really prefer this kind of automation. I have so on my other backup servers where I am not running TrueNAS it is really very useful.

Another question. Is it TrueNAS itself that clears expired snapshots, or is it ZFS itself that does this? For example, if I next year decide to keep my monthly snapshots for 20 years, can I simply change the schedule to reflect this?
 
Joined
Oct 22, 2019
Messages
3,641
Is it TrueNAS itself that clears expired snapshots, or is it ZFS itself that does this? For example, if I next year decide to keep my monthly snapshots for 20 years, can I simply change the schedule to reflect this?
From what I understand, it's done through zettarepl (which is software programmed in Python by the TrueNAS developers). It parses the names of a snapshot within a dataset(s), and uses the timestamp string (e.g, %Y-%m-%d_%H-%M) compared to the expiration lifespan (set in the Periodic Snapshot Task) to determine if it should be deleted.

Based on the above, there are a few ways to outright protect or extended the lifespan of snapshots.
  • Rename the snapshot(s) so that it does not match the pattern specified in the task (e.g, rename auto-monthly-2021-07-01_00-00 to saved-monthly-2021-07-01_00-00)
  • Use the zfs hold feature to protect specific snapshots (see here for a feature request to include it in the GUI)
  • Change the expiration life in the Periodic Snapshot Task for the dataset(s) <--- haven't tested this myself recently
  • Use a combination of the above
 

Forza

Explorer
Joined
Apr 28, 2021
Messages
81
An interesting twist is when you rename the scheduled task? For example I changed from auto-%Y-%m-%d_%H-%M to Hourly for 8 days-%Y%m%d_%H%M.
 
Joined
Oct 22, 2019
Messages
3,641
An interesting twist is when you rename the scheduled task? For example I changed from auto-%Y-%m-%d_%H-%M to Hourly for 8 days-%Y%m%d_%H%M.
How are you putting spaces in the snapshot name? I thought that can risk breaking things?

EDIT: Outside of the context of spaces used in the name, then theoretically if you rename the naming schema of the task, the previously created snapshots should be spared and never expire (as zettarepl is looking for a different pattern when determining what to destroy.) This was cofirmed by other users in the past.
 
Last edited:

Forza

Explorer
Joined
Apr 28, 2021
Messages
81
How are you putting spaces in the snapshot name? I thought that can risk breaking things?
I typed them into the GUI? It did not warn about spaces, though it warned about using special characters like slashes etc..
 
Joined
Oct 22, 2019
Messages
3,641
I would avoid spaces, and use underscores in their place instead. While technically there is no hard restriction on their use in ZFS object names, I can't help but think it will break something or exploit an underlying bug, especially in regards to scripts and automations.
 

gary_1

Explorer
Joined
Sep 26, 2017
Messages
78
For me, this still is not preferable. I think we should be able to create a single task that will take regular snapshots, and prune them based on a "smart schedule". In other words, it takes snapshots hourly, and simply allows fewer and fewer of them to "survive" as time goes on, such that you end up with only a handful that survive each calendar year; yet your more frequent ones still exist for the last couple days or so, and a good amount for the last few weeks or so, etc...

Borgbackup has a setup along those lines that's pretty flexible. https://borgbackup.readthedocs.io/en/stable/usage/prune.html

It'd be nice if there was an way to just have one snapshot task (recursive or not), set it to happen at your minimum desired interval, for example hourly. Then use a set of keep-daily, keep-weekly, keep-monthly, keep-yearly values to determine how many are kept overall.

The logic to handle that has some tricky edge cases, but it's handled well in other apps like BorgBackup.

As an aside, what issues did you encounter with using the same name for snapshots with different frequencies? I hadn't considered that when I created my two snapshot tasks. One runs daily and keeps for 2 weeks and one runs every sunday and keeps for 1 year. Both run at 4am. So far it seems to work, unless it's just a fluke (creation order?) that has the keep for 1 year run after the keep for 2 weeks so the lifetime of the sunday snapshot always ends up 1year?
 
Last edited:

gary_1

Explorer
Joined
Sep 26, 2017
Messages
78
Quick followup, from another of your threads that discussed snapshot naming there was a link to the zettarepl repo and within that there's the following info:

zettarepl periodic snapshot tasks retention is smart enough to behave correctly in variety of situations. E.g., if task a creates snapshots every hour and stores them for one day and task b creates snapshots every two hours and stores them for two days and they share naming schema, you'll get correct behavior: at the beginning of the new day, you'll have 24 snapshots for previous day and 12 snapshots for the day before it; retention for task a won't delete snapshots that are still retained for task b.

So it sounds like multiple snapshot tasks with the same name but different frequencies should work fine. I didn't look much further to figure out the details of how/why though.
 
Joined
Oct 22, 2019
Messages
3,641
So it sounds like multiple snapshot tasks with the same name but different frequencies should work fine.
In my opinion they do not work fine. In fact, if you start from the principle of "snapshots are sacred and should be preserved with care", then using the same naming schema for different snapshot tasks with different expirations (on the same dataset) can risk destroying your long-term snapshots, which you might assume will be "safe" (unless you intentionally delete all of them.) I supposed it's the nature of zettarepl and how it is based on parsing through names (rather than using a companion database or catalog with a separate index.)

overlapping-names.png


You see the above screenshot? Notice both tasks use the same naming schema? They are both auto- (not differentiated as auto-hourly-, auto-daily-, auto-weekly-, auto-monthly-, auto-yearly-, etc.)

If after much time has passed, and they populate with many snapshots, the user's assumption is that the snapshots created by each separate task will be preserved based on that task's expiration cycle. Yet simply pausing the long-term task (as seen above by unchecking "Enabled") will doom those snapshots to be destroyed, the only ones surviving being those that are lucky enough not to overlap in timestamp name with the more frequent snapshots in the other task(s).

The same risk applies to renaming the task (as the previously created snapshots using the older naming schema are no longer under the newly renamed task's "protection".) And of course the same risk applies to removing the task outright, as the previously created snapshots that fit a certain pattern in their name will be destroyed by the still-existing and more frequent task.

So for example, let's say you only have two tasks for the same dataset, and they use the same naming schema, but differ in frequency and expiration.
  • tank/playground, auto-%Y-%m-%d_%H-%M | frequency: hourly | expiration: 1 week
  • tank/playground, auto-%Y-%m-%d_%H-%M | frequency: weekly | expiration: 1 year

While they are both "Enabled", there's going to be overlap with certain snapshots that the user assumes will be "protected" for 1 year. For instance, after a year passes, you will have a bunch of snapshots from a week ago, yet only about 52 of them beyond that point, all the way to a year in the past.

Yet, here comes a "whoopsie!"

Your weekly snapshots are taken on Sundays at midnight. They might look like this:
  • auto-2021-07-11_00-00
  • auto-2021-07-18_00-00
  • auto-2021-07-25_00-00
Those names also collide with your hourly snapshots. After all, if you didn't even have a weekly task, those names would have existed at some point since "00-00" is part of the hourly pattern.

Guess what happens if the user decides to "pause" his weekly snapshot task by unchecking "Enabled"? (Or rename it, or even just delete it?)

"The snapshots that my weekly task already created, of which I have a year's worth, should be safe, right?"

zettarepl still runs the hourly task (that uses the same naming schema of "auto-") and will destroy all those snapshots from a year back because they happen to have "00-00" in their name, which fits the "hourly" pattern. These unfortunate snapshots no longer have the protection of the "unchecked" or "renamed" weekly task. :frown: (See below how to safeguard these long-term snapshots.)


Now re-visit the scenario above, except with different naming schemas:
  • tank/playground, auto-hourly-%Y-%m-%d_%H-%M | frequency: hourly | expiration: 1 week
  • tank/playground, auto-weekly-%Y-%m-%d_%H-%M | frequency: weekly | expiration: 1 year

Can you see how regardless of whether or not the weekly task is temporarily disabled, the snapshots it already created will remain safe since there is no possible collision of names?


EDIT: I want to reiterate that I agree with @gary_1 and @Forza that it would be better if we could specifiy a single task, which takes frequent snapshots, that are "dynamically pruned" as time goes on, which results in what some call "staged" or "smart" expiration schedules. The example you posted from Borgbackup illustrates this feature, as does other backup software.

Perhaps this is not possible with zettarepl / parsing names, as it might require a database with its own index to keep track of everything?
 
Last edited:

Forza

Explorer
Joined
Apr 28, 2021
Messages
81
it might require a database with its own index to keep track of everything?
On Linux I use Btrbk a lot. It is a python script that implements this "Dynamic" type schedule. It doesn't use a database, but it requires all snapshots have the same datetime-format. Perhaps it is possible for IX Systems to look at the code and include ideas from it in TrueNAS?

Example on configuation:
Code:
snapshot_preserve_min  2d                # This preserves all snapshots, including manual ones for 2 days, no matter how many.
snapshot_preserve      48h 14d 12m *y    # Keeps 48 hourly, 14 daily, 12 monthly, and a yearly backup with no limit.



 
Joined
Jan 4, 2014
Messages
1,644

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Maybe using this script would be a good solution...
 

gary_1

Explorer
Joined
Sep 26, 2017
Messages
78
@winnielinnie If disabling a snapshot task causes zettarepl to not consider it for keeping snapshots, that does sound bad. I could almost understand if you deleted the snapshot task itself. These are the kind of confusing quirks that documentation really needs to cover and cover well.

In my case I guess I've lucked out in that I setup the two tasks and have no plans to ever disable/delete them. However, I would have (incorrectly it seems) assumed any existing snapshots take would persist until their expiry date regardless.

Maybe zettarepl needs to base the "keep for" expiry off the file name too, so in addition to %Y %m etc it expects to end with -3w or 1y etc Then the removal is more decoupled from the generation of snapshots. This would also enable people to make manual snapshots that will eventually be removed if they use the naming convention.

That or if changes to zettarepl are not likely, then perhaps TrueNAS should ensure all snapshot tasks have unique names and warn the user if they do not as the "enable" issue you point out is certainly not obvious and could be a critical data loss issue for some users.

I have to say this is one area of TrueNAS that feels a little rough around the edges :(

Next time I prune my snapshots due to rearranging datasets (which is on the cards before too long) I think I'll take the opportunity to prefix the snapshot names with daily/weekly etc
 
Top