Snapshot retention on destination

Status
Not open for further replies.

Txalamar

Cadet
Joined
Mar 11, 2013
Messages
7
Hello all,

I've tried to modify code of autosnap.py to avoid deletion of snapshots on destination with no luck. I've solved my problem using the zfs feature "hold". But now, /var/log/messages gets spammed with the line:

Failed to destroy snapshot 'backup/NAS06/Datastore01@auto-20130226.0400-7d': cannot destroy snapshots in backup/NAS06/Datastore01: dataset is busy

Do you know how to avoid autosnap.py from spamming messages log. It wolud be better if I could only block this log entry.

Thank you!
 

Txalamar

Cadet
Joined
Mar 11, 2013
Messages
7
Because if I don't remove snapshots in origin filesystem it will never stop growing. Now I have a 7 day retention with a 24 snapshot interval. In destination filesystem there is a cron that generates holds for new snapshots. Later I plan to delete oldest ones, say 2 months.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Ah, I see. You can't set the interval to anything above 7 days. I had to check the GUI to see why you were going to such lengths to do this manually and not via the GUI.

Maybe it would be more effective to solve your problem to put in a ticket(support.freenas.org) requesting various higher intervals than 7 days in the GUI. Perhaps 14 days, 30 days, 60 days, and 90 days?
 

Txalamar

Cadet
Joined
Mar 11, 2013
Messages
7
Maybe I didn't explain myself very well, sorry. There is no way through GUI to have a different retention policy for snapshots in origin and destination filesystems when configuring ZFS replication. So, when a snapshot is erased from origin filesystem (due to retention policy) it is also removed from the destination filesystem. What I want is to have less snapshots in the origin filesystem, with 7 days is ok, but leave those snapshots untouched in the destination filesystem. I achive this with the "hold" feature that prevents snapshots from being removed without removing the "hold" property.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Ah. Yeah, there definitely is no way to do that from the GUI. It's a little weird that your backups will be bigger than the actual "original" data. Normally they are either the same size or smaller(to represent only a subset of all data).

Yeah, I don't know how you'd solve your problem. Any reason why you care about the logs? They should be self rolling-over. In all honesty I don't look at the logs unless something is actually wrong.
 

Txalamar

Cadet
Joined
Mar 11, 2013
Messages
7
Yes, the problem is that they are rotated too often. I wont be able to find any error lines if I need to review this file :). Reviewing the log file I can see that lines "Failed to destroy snapshot" are only spammed between replication task start time and end time configured in the origin FreeNAS.

Ok, found something in autosnap.py:

for snapshot in snapshots_pending_delete:
zfsproc = pipeopen('/sbin/zfs get -H freenas:state %s' % (snapshot, ), logger=log)
output = zfsproc.communicate()[0]
if output != '':
fsname, attrname, value, source = output.split('\n')[0].split('\t')
if value == '-':
snapcmd = '/sbin/zfs destroy %s' % (snapshot)
proc = pipeopen(snapcmd, logger=log)
err = proc.communicate()[1]
if proc.returncode != 0:
log.error("Failed to destroy snapshot '%s': %s", snapshot, err)


I colud try removing that last log.error line... hope it doesn't make me loose debugging information when somethig else goes wrong. It wolud be better if I colud set different retention policy for snapshots through GUI or even locate where snaphots on destination are erased in this python script...
 

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
What you mean destination? On the remote side of the replication? They are not, this is a missing feature.

Unless you did setup a period snapshot task on the remote side, this is not supported.
 

Txalamar

Cadet
Joined
Mar 11, 2013
Messages
7
yes I mean remote side of the replication...

Thank you all for your fast responses anyway!

I see that I colud also add to the list of "Exclude snapshots" in autosnap.py all those snapshots with a hold property so they wolud be ignored by the loop for snapshot in snapshots_pending_delete. But I need to review a bit more this script to achieve this. And I know there is no support here for modifications in autosnap.py. I only wanted some orientation regardless of harm I colud cause to my systems.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I commend you for working so hard to get FreeNAS to work for you. I just want to remind you that after you get everything how you want it if you ever upgrade FreeNAS you will need to reapply your changes. I do hope you post back a small step-by-step guide for others that may want to do the same thing(as well as a good reference for you if you upgrade FreeNAS later and forgot what exactly you did). I reference some of my posts at times to see what I did.
 

Txalamar

Cadet
Joined
Mar 11, 2013
Messages
7
You're right. I will post a step-by-step with any changes made once I get autosnap.py working as I plan. For now I need to work on it and test the whole thing... so it wont be soon.

Thanks for all!
 

Txalamar

Cadet
Joined
Mar 11, 2013
Messages
7
Hello again,

I completely missunderstood how autosnap.py works. If I'm not wrong it relies on snapshot names to determine if they have to be removed. In my destination replica host there is a datastore that is also being snapshoted by autosnap.py. I supose that if there is no automatic snapshot for a local datastore autosnap.py won't be executed and therefore it won't delete snapshots from received filesystems. Finnaly I'v achived to do what I want simply changing the name of older snapshots, those that would be deleted by autosnap.py. So no need to change anything in autosnap.py script. That is what I launch once a day through cron after all filesystems have been replicated:

cat snapshot_retention.sh

#!/bin/sh

datastores="NAS04 NAS05 NAS06"
for ds in $datastores
do
snapnum=`zfs list -Hr -o name -s creation -t snapshot -d 1 backup/$ds/Datastore01 | grep "" -c`
let oldsnapnum=$snapnum-7 > /dev/null
oldsnaps=`zfs list -Hr -o name -s creation -t snapshot -d 1 backup/$ds/Datastore01 | head -n $oldsnapnum`
for snapshot in $oldsnaps
do
if echo $snapshot | grep -q "auto"; then
newsnapname=`echo $snapshot | sed 's/auto/manual/g' | sed 's/-7d//g'`
zfs rename $snapshot $newsnapname
fi
done
done


This script will rename all snapshots excluding the most recent ones. I have configured a snapshot per day with a seven day retention policy. Filesystems are being repilcated daily.
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
Txalamar,

I have just set up a new replication SAN and wanted to achieve similar to you, which in my case is:

- Primary array, hourly snaps keep for 48 hours
- Replication array pulls all snaps from primary but snaps kept for 10 days. NB I don't actually need hourly snaps one every 12 hours would suffice (to remove the clutter).

Some commercial products include this in the GUI but I had worked out that I would have to try and write a script which I could run from cron on the replication array.... I will use your script as a starting point :) - thanks

FYI in case any wondering why the above config. Primary array uses multiple high speed spindle for performance, replication array is a fewer number of large slower disks (ie capacity is cheaper)
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
OK I am running the following script from cron (snapshot_retention.sh "\-1h" "\-10d" ) every 12 hours on my replication array which appears to be achieving what I want ie retaining 10 days worth of 12 hourly snaps on the replication array but only 48 hours worth of hourly snaps on the primary (the script should also be able able to amend retention on primary array eg to 24 hours of hourly snaps and 4 days of 6 hourly snaps - but I want more testing before I let loose on my primary array)

Comments:
1) I am new to FreeNAS so would appreciate anyone pointing out the errors in my script/logic
2) There is minimal error handling, my biggest concern is that there is no link between actions on primary and replication array eg if replication or cron fails for any reason snapshots will simply roll off the primary array and I won't achieve my 10 day protection. A better solution would be to force all scripts from the primary array such that hourly snapshots are not removed if 12 hourly snaps are not created - I have some ideas on this which I will work through
3) Rather than write my own snapshot deletion routine as per http://forums.freenas.org/showthrea...imilar-to-Apple-s-TimeMachine&highlight=state I am using autosnap.py. However there is a limitation in this script. It will not delete expired snaps unless it first has a snap to create. ?Freenas Developers - is there a reason for this? Rather than creating a dummy snap on the replication array and linking to cron schedule I have created modified version of autosnap.py with the following lines commented out
4) I wanted to edit name of retained snaps from 'auto' to 'retained' however this would prevent autosnap.py from deleting expired snaps


# Do not proceed further if we are not going to generate any snapshots for this run
if len(mp_to_task_map) == 0:
exit()




My script


cat ./snapshot_retention.sh

#!/bin/sh

# script to be run from cron to retain the oldest snapshot on a replication SAN longer than on the primary array
# Note must not set "Recursively replicate and remove stale snapshot on remote side" on primary SAN as this removes all snapshots on replication array which are not on primary unless keep hold set
# To remove stale snapshots on replicaiton array (which does not have periodic snapshots) need to call modified version of autosnap.py. Modification is to remove the line which exits the routine if no snapshots to create; deletions happen after the creation

# input paramaters $1 = oldretentionperiod (eg "\-2h"), $2 = new retention period (eg "\-2d"), [optional] $3 recurse depth (defaults to 1) and $4 maxsnapstoretain (defaults to 1)

pools="Repl" #list of pools to be processes, space seperated list
datasets="home VM Installs" #list of datasets to be processes, space seperated list
oldretentionperiod=$1 #Original value of Snapshot Retention Period.
newretentionperiod=$2 #Required value of Snapshot Retention Period
recurse=${3:-1} # 1= top level dataset ie no recurse, 2= recurse into 2nd level dataset etc
maxsnapstoretain=${4:-1} # max number of snaps to retain

python /mnt/Repl/modified_autosnap.py # Need to remove stale snapshots before determining which to retain

for pool in $pools
do

for dataset in $datasets
do

snapnum=`zfs list -Hr -o name -s creation -t snapshot -d $recurse $pool/$dataset | wc -l`

if [ $snapnum -gt 1 ] ; then # can't amend LATEST (freenas:state) snapshot else next replication fails

oldsnaps=`zfs list -Hr -o name -s creation -t snapshot -d $recurse $pool/$dataset | grep "$oldretentionperiod" | head -n $maxsnapstoretain`

for snapshot in $oldsnaps
do

if echo $snapshot | grep -q "auto"; then
newsnapname=`echo $snapshot | sed "s/$oldretentionperiod/$newretentionperiod/g"`
zfs rename $snapshot $newsnapname
fi

done
fi
done

done
 
Status
Not open for further replies.
Top