Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

ZFS Rollup - A script for pruning snapshots, similar to Apple's TimeMachine

Western Digital Drives - The Preferred Drives of FreeNAS and TrueNAS CORE

fracai

Neophyte Sage
Joined
Aug 22, 2012
Messages
1,212
Huh, that's weird. Regardless, it sounds like keeping at least LATEST and the most recent NEW should be compatible with what replication needs. Thanks for your research.
 

kavermeer

Member
Joined
Oct 10, 2012
Messages
59
I've just shut down our backup server, so I can see what happens here tomorrow morning.

I also noticed another problem: Empty snapshots are no longer empty after replication. I noticed that running the script on the main server took much longer than on the backup server. Then it turned out that the backup server has a lot of 12.8 kB snapshots. No corresponding snapshots exist on the main server. Does anyone know why this is happening, and how I should address this?
 

fracai

Neophyte Sage
Joined
Aug 22, 2012
Messages
1,212
I'm not running replication so I can't be of much help there. Though there is the "zfs diff" command that might be able to point out what has changed. Unfortunately, I have seen some snapshots on my system that report a very small size (12k is about right) that also report no change in file content. I've meant to do some investigating there, but haven't had the time.
 

kavermeer

Member
Joined
Oct 10, 2012
Messages
59
Thanks for the tip. Unfortunately:
Code:
# zfs list -r -t snapshot -o name,creation,used Internal/HWexp/Doppler
[...]
Internal/HWexp/Doppler@auto-20130429.1700-1y  Mon Apr 29 17:00 2013  12.8K
Internal/HWexp/Doppler@auto-20130429.1715-1y  Mon Apr 29 17:15 2013  12.8K
Internal/HWexp/Doppler@auto-20130429.1730-1y  Mon Apr 29 17:30 2013      0
# zfs diff Internal/HWexp/Doppler@auto-20130429.1700-1y Internal/HWexp/Doppler@auto-20130429.1715-1y
#

So, zfs diff shows no difference at all, yet the snapshot uses 12.8k and is therefore not removed by the cleanup script. Sigh...
 

fracai

Neophyte Sage
Joined
Aug 22, 2012
Messages
1,212
Yeah, that's what I've seen as well even without replication (I just checked and mine are all 16k).
 

kavermeer

Member
Joined
Oct 10, 2012
Messages
59
Yes, I'm seeing some of those as well on the primary system (16.0K). But that's only a few; those 12.8K things on the backup system run in the thousands. It seems like all empty snapshots on the primary system end up as 12.8K snapshots on the backup system.
 

fracai

Neophyte Sage
Joined
Aug 22, 2012
Messages
1,212
When I get the chance I'll try playing around with zdb to see if there's anyway to figure out what's going on with the 16k snapshots. My thought is that zdb should be able to point to all the referenced records and that would hopefully point out what the actual difference is (metadata?). If that points to anything it might help identify the 12.8k issue as well. It might be a bug that can be fixed or just some quirk that can be detected and handled by rollup / clearempty.
 

fracai

Neophyte Sage
Joined
Aug 22, 2012
Messages
1,212
First thing that jumps out from ZDB is right near the top of `zdb -dddd dataset@snapshot`. There's a line stating 'Deadlist: 16.0K (512/512 comp)'. Now, I haven't been able to find anything regarding the "Deadlist", but the match of 16.0K is intriguing. Based on the name I presume that maybe it's a list of blocks that are no longer referenced? In the end, I'm not sure that this helps in determining why the snapshots are no longer empty, nor how or if this case should be detected and still deleted.
 

kavermeer

Member
Joined
Oct 10, 2012
Messages
59
Did you ever find out more about this? I still see them (16.0 KB on the primary server, 12.8 KB on the backup server). It's not really a problem, but it is annoying...
 

fracai

Neophyte Sage
Joined
Aug 22, 2012
Messages
1,212
I looked into it a bit, and started to play with zdb, but never made it very far. Like you say, it's mostly an annoyance. I think this needs a ZFS Guru.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I don't think a bug report is appropriate since its an issue with his script and not a FreeNAS issue. At least, that's what I understand of it.
 
D

dlavigne

Guest
It depends, is the script revealing a flaw in ZFS output or is the script itself flawed?
 

kavermeer

Member
Joined
Oct 10, 2012
Messages
59
I am finally about the set up this script on our backup server, but I think there's a problem for this scenario. It seems that rollup.py would happily remove the newest snapshots that are needed for the next synchronization run.

I run snapshots on my main server every 15 minutes, and they get then replicated to the backup server. In rollup.py, I can only specify the number of hourly backups to keep, but that does not guarantee that the newest snapshot will not be pruned.

Am I overlooking something? If not, would this be easy to change? From my understanding, the replication needs just the latest snapshot on the backup server, so there's no need to check freenas:state. That may be important when running it on a primary server, though.
 

fracai

Neophyte Sage
Joined
Aug 22, 2012
Messages
1,212
Both scripts should never destroy the most recent snapshot.

One thing I've never tested is what happens if the script tried to destroy a snapshot that is being replicated. In assuming the destroy would fail, but I've never tried it to be sure.
 

kavermeer

Member
Joined
Oct 10, 2012
Messages
59
Had some problems with replication, so it took some time to correct that. I still don't understand the design decisions of the replication scripts. Anyway...

The rollup script, when running in test mode, claims that it will remove the most recent snapshot:

zfs list -Hr -o name -t snapshot -d 1 Dataset
[...]
Dataset@auto-20131213.1145-1y
Dataset@auto-20131213.1230-1y
Dataset@auto-20131213.1815-1y
Dataset@auto-20131213.1830-1y
./rollup.py -t -v Dataset
[...]
pruning@auto-20131213.1145-1y - - - 13088
@auto-20131213.1230-1y h - - 340288
@auto-20131213.1815-1y h - - 13088
pruning@auto-20131213.1830-1y - - - 0
Would that be a bug, or am I not using the script in the right way?
 

fracai

Neophyte Sage
Joined
Aug 22, 2012
Messages
1,212
I can't check right now, but I'll investigate later tonight.

For now can you post the output of:
Code:
zfs get -Hrpo name,property,value creation,type,used,freenas:state Dataset


That'll help figure out what's going on with this issue.
 

fracai

Neophyte Sage
Joined
Aug 22, 2012
Messages
1,212
Oh, make sure you're using the latest version of the script too.
 

kavermeer

Member
Joined
Oct 10, 2012
Messages
59
I just checked and indeed use the latest version of the script.
Dataset@auto-20131213.1145-1y creation 1386931502
Dataset@auto-20131213.1145-1y type snapshot
Dataset@auto-20131213.1145-1y used 13088
Dataset@auto-20131213.1145-1y freenas:state -
Dataset@auto-20131213.1230-1y creation 1386934228
Dataset@auto-20131213.1230-1y type snapshot
Dataset@auto-20131213.1230-1y used 340288
Dataset@auto-20131213.1230-1y freenas:state -
Dataset@auto-20131213.1815-1y creation 1386954915
Dataset@auto-20131213.1815-1y type snapshot
Dataset@auto-20131213.1815-1y used 13088
Dataset@auto-20131213.1815-1y freenas:state -
Dataset@auto-20131213.1830-1y creation 1386955825
Dataset@auto-20131213.1830-1y type snapshot
Dataset@auto-20131213.1830-1y used 0
Dataset@auto-20131213.1830-1y freenas:state -
There are many more snapshots, so let me know if you need a full list.
 

fracai

Neophyte Sage
Joined
Aug 22, 2012
Messages
1,212
I just pushed a new version to Bitbucket and GitHub that should correct this. I either never protected the most recent snapshot or broke that behavior at some point. It's now explicitly protected.

Let me know if this doesn't work for your or if you have any other questions or requests.
 
Top