Snapshots not expiring

Status
Not open for further replies.

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
On January 27th I created a Periodic Snapshot Task to take a snap every hour and hold it for two months. The first snapshot occurred at 11:16AM.

However, today March 27 at 11:16AM I noticed that that snapshot was not expired and removed as I was expecting.

Is there any reason this wouln't have occurred as expected? Are perhaps months calculated by FreeNAS as "31 days" instead of 27th->27th, in which case I should expect to see these actually disappear on the 31st of the month due to the 4 day loss from February?

Attached is a screenshot of my periodic snapshot tasks and a list of the snapshots from that day and a command showing that there aren't holds on that particular snapshot

Tasks.png

Shots.png


Specs are in my signature.
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
Okay, so did a little experimentation on this.

I set up a snapshot task on an unused dataset that would do a snapshot every 5 minutes and retain for an hour. After the hour it began removing snapshots correctly, however I disabled the task and the expired snapshots were no longer deleted at expiration. Is this intended behaviour, or is this a bug that should be reported?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
All things considered, it sounds like a bug to me. Would be a relatively big one to fix, I guess, but it should definitely be documented at least.

Since there's a "Create Snapshot" button, I can't see much of a reason to keep this quirk around besides xkcd 1172 (Workflow).
 

Tywin

Contributor
Joined
Sep 19, 2014
Messages
163
All things considered, it sounds like a bug to me. Would be a relatively big one to fix, I guess, but it should definitely be documented at least.

Since there's a "Create Snapshot" button, I can't see much of a reason to keep this quirk around besides xkcd 1172 (Workflow).

Eh, sounds the opposite to me; the periodic snapshot task gets disabled, so the snapshots associated with that task stop expiring. Not a huge deal either way (most of the time, you create a periodic snapshot task and leave it). @Knowltey, I would be curious to see what happens if you re-enable the periodic snapshot task (presumably you just turned it off, not deleted it); namely, do associated snapshots then expire, or only new snapshots created since re-enabling the task?
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
Yeah, may be the case.

On 03/06 I made some 6 month retention tasks and at that time had also temporarily changed the 2 month tasks to be 3 month, then I changed my mind and changed them back, so I think this may mean that any snapshots from before 03/06 aren't going to be auto-pruned, so I'll probably just have to manually prune every once in a while until the auto prune begins working again.

@Tywin - I can rerun the experiment with that in mind, unfortunately I delted the tasks and associated snapshots already after performing it the first time, but it really only takes about an hour and a half to reperform, so not really a big deal.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Eh, sounds the opposite to me; the periodic snapshot task gets disabled, so the snapshots associated with that task stop expiring. Not a huge deal either way (most of the time, you create a periodic snapshot task and leave it). @Knowltey, I would be curious to see what happens if you re-enable the periodic snapshot task (presumably you just turned it off, not deleted it); namely, do associated snapshots then expire, or only new snapshots created since re-enabling the task?
Either interpretation is valid, since the manual doesn't mention anything. So, at the very least, the docs should be updated.
 

Tywin

Contributor
Joined
Sep 19, 2014
Messages
163
Either interpretation is valid, since the manual doesn't mention anything. So, at the very least, the docs should be updated.

Yes, it is fair to say the documentation should reflect reality ;)
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
Part of it may be the timestamp of the snapshots as well, depending on how the periodic snapshot tasks read those. Since when I did the edit on 03/06 the snapshots changed from being performed 17 minutes after the hour after to being performed on the hour since that is when I happened to make the edit. So perhaps the change in the timestamp is what causes the failure of the auto-expiry if it's looking for Mirror1/Data@auto-20150127.1100-2m instead of the actual snapshot at Mirror1/Data@auto-20150127.1117-2m

Which is also rather worrisome if for example a reboot of the server causes the server to miss a normal timeframe and it gets changed to a new "after the hour" point.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Looks like it's expected behavior: https://bugs.freenas.org/issues/5819

@dlavigne (or anyone who can quickly edit the docs): Could we add a warning to Section 8.2 (Periodic Snapshot Tasks) along the lines of:

Periodic Snapshots will not be automatically deleted if their task is disabled or deleted.
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
Interesting, that does make sense. Do we know if my example about a server reboot causing a normal "x after hour" point to be missed would cause auto-prune to stop as well? (For example if snapshots are normallly taken on the hour after every hour, then a reboot for an update causes the server to be down on the hour, and then when it comes back up it takes a snapshot at 6 minutes after the hour, will the Peridodic Snapshot Task now not expire old snapshots that were taken on the hour?)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Interesting, that does make sense. Do we know if my example about a server reboot causing a normal "x after hour" point to be missed would cause auto-prune to stop as well? (For example if snapshots are normallly taken on the hour after every hour, then a reboot for an update causes the server to be down on the hour, and then when it comes back up it takes a snapshot at 6 minutes after the hour, will the Peridodic Snapshot Task now not expire old snapshots that were taken on the hour?)

I'd expect the pruning to be run every time the task runs. But then again, my expectations have been proven wrong just now. :p
 

Tywin

Contributor
Joined
Sep 19, 2014
Messages
163
(Slightly off-topic, just brainstorming here while I'm thinking about it):

From the ticket above and my cursory googling, it looks like the pruning of expired snaps is handled by the periodic snapshot task itself. This is presumably because only the task knows about when particular snapshots should expire. Perhaps a cleaner way to handle it (of course this would break backward compatibility, but if we were to look at a from-scratch solution) would be to have a field in each snapshot, along with the snapshot name and whatever information is stored about it, indicating an expiry timestamp. Whatever tool is used to create the snapshot populates this field appropriately; for example a manual snapshot would leave it empty, while a periodic snapshot would add "keep duration" to the current time. Then you can have a completely independent task that runs periodically, scans the list of snapshots, and prunes any that are past their expiry time.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It seems the task uses the snapshot's name. I guess this avoids keeping a database for expiring snapshots while also preventing the scenario mentioned in the ticket.
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
I'm currently testing the name hypothesis by manually creating a snapshot named auto-20150127.1800-2m and seeing if it deletes when auto-20150327.1800-2m is created.
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
(Slightly off-topic, just brainstorming here while I'm thinking about it):

From the ticket above and my cursory googling, it looks like the pruning of expired snaps is handled by the periodic snapshot task itself. This is presumably because only the task knows about when particular snapshots should expire. Perhaps a cleaner way to handle it (of course this would break backward compatibility, but if we were to look at a from-scratch solution) would be to have a field in each snapshot, along with the snapshot name and whatever information is stored about it, indicating an expiry timestamp. Whatever tool is used to create the snapshot populates this field appropriately; for example a manual snapshot would leave it empty, while a periodic snapshot would add "keep duration" to the current time. Then you can have a completely independent task that runs periodically, scans the list of snapshots, and prunes any that are past their expiry time.

Yeah, that might make a lot more overhead though. Part of the nice thing about ZFS is that the snapshots take up very little if any room per snapshot, so this quirk may just be the price to pay. As I understand from reading elsewhere, in the even of a server reboot time change it can easily be fixed by a sneaky manual rename back to where you want it to be, which is pat of the reason that I am testing the rename trick to see if that makes it delete the snapshot simply based on the name even though it isn't in actuality two months old.
 

Tywin

Contributor
Joined
Sep 19, 2014
Messages
163
Yeah, that might make a lot more overhead though. Part of the nice thing about ZFS is that the snapshots take up very little if any room per snapshot, so this quirk may just be the price to pay. As I understand from reading elsewhere, in the even of a server reboot time change it can easily be fixed by a sneaky manual rename back to where you want it to be, which is pat of the reason that I am testing the rename trick to see if that makes it delete the snapshot simply based on the name even though it isn't in actuality two months old.

Yup, some; but there is some overhead already in e.g. storing the snapshot name (unless this is a FreeNAS thing and it is maintaining a separate database of Snapshots? I admit I'm not clear on where the division of labour is). It's kind of a corner case anyway, and as you say there are workarounds; I just like to think about what a system would look like if you had all the use-case information you have now when you were designing it :cool:
 
Status
Not open for further replies.
Top