ZFS Rollup - A script for pruning snapshots, similar to Apple's TimeMachine

kavermeer · Dec 17, 2013

This seems to work just fine - thanks!

I now need to figure out how to use this in a way that does not affect my ability to recover from replication issues. The problem: If there are replication problems, there need to be a common (between primary and secondary server) snapshot to be able to (manually) recover. If I run rollup on both servers, the snapshots that are kept may not be the same one. So for now, I'll stick to running it only on the backup server.

fracai · Dec 17, 2013

Yeah, this is getting out of my league as I'm unfortunately not using replication. The only solution that seems satisfactory would be having the rollup script be in charge of both sender and receiver from one or the other location. IE. it would shell into the other box to list and delete snapshots. That seems doable, but I don't know that I should take it on until I have a replication environment with which to test. Donations accepted ;-)

kavermeer · Dec 23, 2013

I promise to share any money I make on this with you

All of this did make me review the clearempty.py script. It now keeps the latest and the latest NEW, which results in two daily snapshots being copied to my backup server. I tried to figure out why we keep the latest NEW, but I couldn't reconstruct that from memory and this topic. I am considering removing that check. The script would still keep the latest snapshot, it just doesn't seem to matter whether it's NEW or not. Any comments? If not, I can send you a patch (which just removes a couple of lines).

fracai · Dec 23, 2013

I can actually probably fix that the same way I changed the rollup script. Unless that still leads to two being saved.

Unfortunately I don't have a replication setup to test with.

Can you post the output of the command above? Is latest new and latest a different snapshot?

kavermeer · Dec 23, 2013

On my system, the latest NEW is always the latest snapshot. Although there may be no latest NEW.

fracai · Dec 24, 2013

I'm confused about the issue you're referring to. At first you mentioned that clear empty saves latest and latest-new, but then also say these are the same snapshot. Is there an issue in too much being saved? Or just redundant code?

kavermeer · Dec 24, 2013

Sorry for not making myself clear. The script will first find the latest snapshot, and exclude that from the candidate list. Then it will do the same for the latest NEW. But because the latest is already removed from the candidate list, it will select the next latest NEW from the candidate list. As a result, two snapshots are kept, while only one would suffice, because the latest is also the latest NEW.

kavermeer · Jun 9, 2015

I recently updated my FreeNAS system and I now get many errors from the cleanempty.py script:

cannot destroy snapshot <snapshotname>: dataset is busy

I make snapshots every 15 minutes, so this list gets very long. I haven't looked at the system's internals for some time, but my first guess is that this is related to the various 'zfs hold' calls I see in the console window. This hold/release thing is probably implemented relatively recently, and the cleanempty.py script hasn't been updated accordingly.

Is this a known problem? Is it safe to just add a zfs release command before the dataset gets destroyed in the cleanempty.py script? Or is this hold/release thing used for other internal housekeeping by FreeNAS?

kavermeer · Jun 9, 2015

This may just be a problem because I was using an out-of-date autorepl.py script. So if the hold/release thing should not cause any problems with the cleanempty script, please ignore my message.

fracai · Jun 9, 2015

I just took a look at the scripts and they don't specifically handle snapshots that have been held. I can look in to handling that.

For my own case, I haven't seen any issues with errors related to snapshots on hold, but I don't think I have any on hold either.

Thanks for the feedback.

kavermeer · Jul 3, 2015

I'm still getting these errors whenever the script runs.

My guess is that the snapshot script puts the snapshots on hold if there is a replication task. After replication, the hold-status is cleared. I run snapshots during the day, and replicate them in the evening. What I observe is that in the morning, there are only a few lines of errors due to snapshots being on hold. This builds up during the day.

It's not difficult to clear the hold-status in the cleanup-script. The problem is that I don't know if that introduces any new problems. I cannot find any explanation of the logic of the snapshot and replication scripts, or why this hold-logic was introduced.

leonroy · Sep 14, 2015

Which one to use, the one on Github or the newer one on Bitbucket?

fracai · Sep 23, 2015

Aren't they in sync? Oh shoot. I must have forgotten to push to GitHub. I'll update that tonight.

I also need to get the tmsnap utility in there. I'll try for that tonight as well.

leonroy · Sep 23, 2015

fracai said:
Aren't they in sync? Oh shoot. I must have forgotten to push to GitHub. I'll update that tonight.

I also need to get the tmsnap utility in there. I'll try for that tonight as well.

Heh, maybe delete the src in one of them and just stick a URL in the README.md to the other ;)

Thanks btw, they worked great and helped me cleanup a slow box which was stuffed with 10k+ of snapshots.

fracai · Sep 23, 2015

Glad it helped.

I usually keep both repos in sync and I overall like having both up as some people prefer one host or the other.

dwoodard3950 · Nov 9, 2015

I've struggled with this process of creating snapshot tasks on a given dataset with different intervals and differing lifetimes. I've created 3 tasks for a given dataset, with intervals of 15min, hourly, and daily. The problem is that the lifetime of the shorter intervals ends up being deleted locally and then causing a failure for remote replication. Has anyone sorted this out? Any suggestions?

fracai · Nov 10, 2015

Are you using this script? Or just the FreeNAS snapshot and replication settings?

dwoodard3950 · Nov 16, 2015

fracai said:
Are you using this script? Or just the FreeNAS snapshot and replication settings?

I've used another script for backing up to called 'zfs-backup.sh' which looks for user properties to identify datasets to replicate. It runs with a cron job and looks for sanpshots on the destination and source to ensure consistency.

fracai · Nov 16, 2015

OK, I know others have run in to similar issues at various times, but I think the FreeNAS replication setup was supposed to handle those issues now. My only suggestion would be that your script needs more intelligence when determining what can be deleted.

I personally don't have any experience with replication and my only feedback has been that the rollup script I developed doesn't seem to be causing any problems.

Do you have a link for the source of this zfs-backup script? Or post the contents somewhere?

dwoodard3950 · Nov 20, 2015

fracai said:
OK, I know others have run in to similar issues at various times, but I think the FreeNAS replication setup was supposed to handle those issues now. My only suggestion would be that your script needs more intelligence when determining what can be deleted.

I personally don't have any experience with replication and my only feedback has been that the rollup script I developed doesn't seem to be causing any problems.

Do you have a link for the source of this zfs-backup script? Or post the contents somewhere?

https://github.com/adaugherity/zfs-backup/blob/master/zfs-backup.sh

As for the built in replication; I use that as well to another server on the intranet. I use the zfs-backup.sh script to execute an off-site backup since it requires a little more manipulation and I don't want all datasets pushed to off-site backup.

Important Announcement for the TrueNAS Community.

ZFS Rollup - A script for pruning snapshots, similar to Apple's TimeMachine

Explorer

Guru

Explorer

Guru

Explorer

Guru

Explorer

Explorer

Explorer

Guru

Explorer

Explorer

Guru

Explorer

Guru

Dabbler

Guru

Dabbler

Guru

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ZFS Rollup - A script for pruning snapshots, similar to Apple's TimeMachine"

Similar threads