Stale or "stuck" snapshots on TrueNAS

VulcanRidr

Explorer
Joined
Jan 5, 2015
Messages
59
I have snapshots set up on all of my datasets. I have one dataset that is having space issues, due to the amount of churn on it, and it appears that it is mostly due to snapshots. I currently keep snapshots for 8 weeks, but was looking to trim them down a bit. So I started looking at the list of snapshots, expecting them to date back to, at the time of this writing, 2023-09-05, which is 8 weeks back from today. However, I also found a bunch of snapshots from 2023-05-10 through 2023-05-28.

So the questions I have are
  • Why did these snapshots not get purged?
  • Is it safe to delete them from the TrueNAS GUI?
  • How do I monitor/keep this from happening again?
Thanks,
--vr
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
The snapshot tasks rely on the naming scheme, not any "time of creation" property of the snapshot itself. So if you changed anything about the naming, that would explain why they did not get purged. Also I am not sure if a snapshot missed will get purged the next day or not.

E.g. your snapshot schedule is daily and you power down your NAS for 24 hours. When the task runs again it will look for a snapshot aged 8 weeks plus one day and delete that. I am not sure if it will also look for anything older. So the one 8 weeks plus two days old due to the missed schedule might just keep lying around.

You can delete them from the UI but I prefer to do things like follows.

1. Check for snapshots to delete
Code:
zfs list -t snap | awk '/auto.*2023-05/ { printf "zfs destroy %1\n", $1 }'


2. If that looks ok, then actually delete them - hit "cursor up" and append | sh at the end.
Code:
zfs list -t snap | awk '/auto.*2023-05/ { printf "zfs destroy %1\n", $1 }' | sh


HTH,
Patrick
 
Joined
Oct 22, 2019
Messages
3,641
Also I am not sure if a snapshot missed will get purged the next day or not.

E.g. your snapshot schedule is daily and you power down your NAS for 24 hours. When the task runs again it will look for a snapshot aged 8 weeks plus one day and delete that. I am not sure if it will also look for anything older. So the one 8 weeks plus two days old due to the missed schedule might just keep lying around.

It's actually less intuitive than that, in which the way zettarepl is implemented by iXsystems can outright destroy snapshots that you assume would be safe.

Basically, it filters every snapshot on the dataset for the "parseable" string, and then destroys any that are beyond the expiration date.

Sounds good, right?

Well, here's how you can lose snapshots, that you would expect to remain safe:

That was back in July 2021. I'm not sure if it's been addressed or fixed since then. (Or if they even consider it a "problem".)

EDIT: In other words, just be very careful when you start renaming snapshot tasks and changing expiration dates, especially if you have multiple tasks on the same dataset (e.g, frequent snapshots with short lifespans + less frequent snapshots with longterm lifespans.)
 
Last edited:
Top