"freenas:state" and other attributes

Status
Not open for further replies.

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
While working on a script to automatically prune snapshots (similar to how TimeMachine and rsnapshot keep backups at rolling intervals) the "freenas:state" attribute came up as a possible reason to always keep a snapshot.

I think I've identified the three values that this attribute can be represent: empty, "LATEST", and "NEW". Are there any other states? Also, it looks like this attribute is used to indicate the state of the snapshots in regard to replication tasks. Is that correct?

Further, are there other attributes that FreeNAS sets on snapshots that would hold meaning for other FreeNAS tasks? Is there any documentation on these or a section of the code that I could look at for these? Normally I'd just checkout the source and search, but I won't have a chance for that for a while.

Thanks.
 

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
AFAIK those are the only one, from autorepl.py:

Code:
# DESIGN NOTES
#
# A snapshot transists its state in its lifetime this way:
#   NEW:                        A newly created snapshot by autosnap
#   LATEST:                     A snapshot marked to be the latest one
#   -:                          The replication system no longer cares this.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Thanks, I'd meant to take a look at the source code, but things keep coming up.
 

GregP

Dabbler
Joined
Mar 16, 2013
Messages
15
I've been following your pruning script.

When I ran it I found that it would not recursively delete my child snapshots that were created by a recursive periodic snapshot. Debugging the script it seems that my child snapshots never transition from NEW to "-" freenas:state even after they replicate. For instance VRaidZwd snapshots will transition from LATEST to "-" while every VRaidZwd/dsAFP/TM2 snapshot stays stuck at "NEW". Have you noticed this behavior?
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
I'm not actually using replication, so I can't easily debug this. But, it sounds like it's probably a FreeNAS bug anyway. I might be able to help though with some changes to the script.

It sounds like once a snapshot has been replicated it should transition from 'NEW' to either '-' or 'LATEST'. Instead of keeping everything that isn't '-', the script could likely just explicitly keep 'LATEST' and the most recent 'NEW' and prune the rest according to the given rules. I actually originally intended on implementing this behavior, but wasn't sure that it was an appropriate metric.

If keeping 'LATEST', the most recent 'NEW', and anything else that wouldn't otherwise be pruned sounds good, I'll see what I can add to the script.
 

GregP

Dabbler
Joined
Mar 16, 2013
Messages
15
Thanks for your reply.
It indeed appears to be a FreeNAS bug. I submitted a Ticket #2094 on this today.
This also prevents the snapshots from expiring using the built-in mechanism on the server taking the snapshots.
I will change the script tomorrow as you suggested and post it for your perusal.
Currently, your script is also the only way to deal with snapshots after replication to the remote server side because there is no expiration mechanism on the remote.
 

GregP

Dabbler
Joined
Mar 16, 2013
Messages
15
Further investigation on freenas:state, periodic snapshot tasks, recursive option for snapshot tasks, and replication:

If you create a periodic snapshot task for a parent volume and set the recursive option, separate snapshots will be taken of the parent and of every child dataset.

If you set a replication task for the parent volume," recursive option and remove stale snapshot on remote side" set -- what will happen is that the parent volume snapshot and all the child snapshots will replicate to the remote server. However, apparently in ZFS, you can't transfer snapshots from child datasets that overlap with a Recursive snapshot from a parent dataset. This seems to account for the differing sizes reported of the child snapshots between source and target in the replication process. It also accounts for the child snapshots persisting in the freenas:state=NEW on the source and never being deleted there expiration or progressing to state "-".

So, I don't see the point of having the recursive option set on periodic snapshots if you intend to replicate but this appears to be an intended behavior and thus is not a 'bug'. Only way then is to only snapshot and replicate the root volume between servers.
 

GregP

Dabbler
Joined
Mar 16, 2013
Messages
15
I guess I was wrong. Replicating the root volume does not appear to replicate the child datasets. So I still think it is a bug in freenas.

Posted http://forums.freenas.org/showthread.php?12111-ZFS-Periodic-Snapshots-Replication-and-freenas-state to see if anyone had any other ideas.

Modified your script so it would allow deletion of snapshots marked NEW as well as -. It's working for me both on the source server as well as the replication server. Thanks for writing this. Modification attached. View attachment rollup.zip
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
I have been working on scripts to change snapshot retention periods forum link however my focus has been on replicated snaps (I want to leave limited snapshots on primary arrays but weeklies etc on replicated). One of the issues I encountered is if I rename the LATEST snapshot on the replication array it halts replication. I have a workaround for this but wondered why the freenas:state attribute is not set on the replicated snapshots? Any ideas? Is this a reasonable enhancement request?

Jon
 

GregP

Dabbler
Joined
Mar 16, 2013
Messages
15
In reply to your question:
I'm definitely not an expert on this topic and just a beginner with Freenas, however my frustration with having a couple of servers crash due to being overloaded with snapshots has resulted in a lot of searching on the forums and elsewhere.

My understanding is that snapshots on the Push/source server in Replication start out as being marked 'NEW', then after replicating they become 'LATEST', then after another snapshot of that dataset appears they become '-'. I believe the Replication system will not delete/expire something marked 'NEW' or 'LATEST' on the push/source side. It won't delete/expire 'NEW' since presumably 'NEW' still needs to be replicated.

My frustration with the PUSH side is that if you do a recursive snapshot replication of a parent volume, the child dataset snapshots that are created never progress past 'NEW' even though they get properly replicated.

My understanding of the PULL side receiving the snapshots during Replication is that all the snapshots are marked '-' in terms of freenas:state on the receiving side. According to Ticket #388 There is no way to expire/auto-delete the periodic snapshots on the remote server. This ticket appears to be still open. I am not sure what the GUI option on the PUSH side "remove stale snapshots on remote side" is supposed to do since it doesn't appear to kill any snapshots for me on the receiving/pull side.

What I have ended up doing is running a cron job on both sides with the executable file residing on one of my zfs volumes so it doesn't get wiped on a freenas upgrade. The executable is the one discussed here in Python. I have the PUSH side kill various older snapshots even if they are marked as 'NEW', thus solving the bug of non-expiring child snapshots. The PULL side runs a second script to delete certain replicated children local to the remote side since there is no mechanism that I am aware of on the PULL side till Ticket #388 is dealt with.

If anyone has a different understanding or a fix that doesn't require this workaround, I would appreciate the education.
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
Greg,

Thanks for your reply.

I'm definitely not an expert on this topic and just a beginner with Freenas, however my frustration with having a couple of servers crash due to being overloaded with snapshots has resulted in a lot of searching on the forums and elsewhere.

I thought ZFS supported loads of snapshots - My main issues have been seeing the wood from the trees and reducing storage requirements on the primary

My understanding is that snapshots on the Push/source server in Replication start out as being marked 'NEW', then after replicating they become 'LATEST', then after another snapshot of that dataset appears they become '-'. I believe the Replication system will not delete/expire something marked 'NEW' or 'LATEST' on the push/source side. It won't delete/expire 'NEW' since presumably 'NEW' still needs to be replicated.

That's my understanding too.

My frustration with the PUSH side is that if you do a recursive snapshot replication of a parent volume, the child dataset snapshots that are created never progress past 'NEW' even though they get properly replicated.

Interesting I don't have this issue (just checked as below). Up until recently I was setting a daily recurring snapshot (5d retention) on the pool and then replicating

SlowVol@auto-20130331.2315-5d -
SlowVol@auto-20130401.2315-5d -
SlowVol@auto-20130402.2315-5d -
SlowVol/Extra@auto-20130331.2315-5d -
SlowVol/Extra@auto-20130401.2315-5d -
SlowVol/Extra@auto-20130402.2315-5d -
SlowVol/Installs@auto-20130331.2315-5d -
SlowVol/Installs@auto-20130401.2315-5d -
SlowVol/Installs@auto-20130402.2315-5d -
SlowVol/Media@auto-20130331.2315-5d -
SlowVol/Media@auto-20130401.2315-5d -
SlowVol/Media@auto-20130402.2315-5d -
SlowVol/Scratch@auto-20130331.2315-5d -
SlowVol/Scratch@auto-20130401.2315-5d -
SlowVol/Scratch@auto-20130402.2315-5d -
SlowVol/SlowVM@auto-20130331.2315-5d -
SlowVol/SlowVM@auto-20130401.2315-5d -
SlowVol/SlowVM@auto-20130402.2315-5d -

My understanding of the PULL side receiving the snapshots during Replication is that all the snapshots are marked '-' in terms of freenas:state on the receiving side. According to Ticket #388 There is no way to expire/auto-delete the periodic snapshots on the remote server. This ticket appears to be still open. I am not sure what the GUI option on the PUSH side "remove stale snapshots on remote side" is supposed to do since it doesn't appear to kill any snapshots for me on the receiving/pull side.

What I have ended up doing is running a cron job on both sides with the executable file residing on one of my zfs volumes so it doesn't get wiped on a freenas upgrade. The executable is the one discussed here in Python. I have the PUSH side kill various older snapshots even if they are marked as 'NEW', thus solving the bug of non-expiring child snapshots. The PULL side runs a second script to delete certain replicated children local to the remote side since there is no mechanism that I am aware of on the PULL side till Ticket #388 is dealt with.

If you set "remove stale snapshots on remote side" on the push server it keeps both servers in sync - note snapshots on replicated server but not the primary push are removed. As snaps expire on primary push they are removed on replicated server. As far as I can see Freenas:State is not set on the replication server - it is left at "-"

As per my other post (see above) my approach was to run a modified version of autosnap.py via cron on replication server which nicely deletes expires snaps (without "remove stale snapshots on remote side" being set on primary push server). I can't see (but my knowledge is limited) any major downside in permanently running my modified autosnap.py - perhaps it could be made a default? I preferred this approach to the totally custom python script as I expect autosnap.py to be fully maintained with freenas upgrades and the logic determining which snaps to expire is robust. The only downside is that scripts must not remove the LATEST snap on replication array (but this could not be accurately determines as freenas:state is not set.

Happy to work with you to understand why my child snaps are being deleted and your not. As said before wider critique on my script/logic appreciated - I will try to add a comment on #388



Thanks
 
I

ixdwhite

Guest
My frustration with the PUSH side is that if you do a recursive snapshot replication of a parent volume, the child dataset snapshots that are created never progress past 'NEW' even though they get properly replicated.

This is a limitation of zfs send/recv in incremental mode that the replication code knows about. The freenas:state values on child datasets are ignored because those snapshots are incorporated into the parent Recursive snapshot.

It is working as designed.

If you need freenas:state to always reflect the status of a snapshot, don't use Recursive snapshots or create only one Recursive snapshot and no child snapshots.
 

GregP

Dabbler
Joined
Mar 16, 2013
Messages
15
Thanks for weighing in on this topic. I detailed my confusion with an example in:
http://forums.freenas.org/showthread.php?12111-ZFS-Periodic-Snapshots-Replication-and-freenas-state
but didn't get a reply yet. Either my test systems have a glitch or I am missing something.
The goal is to have two servers' zfs volumes (with serveral datasets on them) appear to have identical data after a replication and not get eventually choked with snapshots. Replicating just the parent volume snapshot without recursion didn't seem to do it for me.
A copy of the above reference:

Assume I have two identical freenas servers called freenas1 and freenas2. I have a ZFS volume on freenas1 with three child datasets, lets say: VRaidZwd has VRaidZwd/dsCIFS and VRaidZwd/dsAFP and VRaidZwd/dsAFP/TM2. (These datasets are served using samba and AFP onto the network.) I use the GUI to setup freenas1 daily Periodic Snapshots of VRaidZwd with a 7 day expiration date. I then set-up a replication task of VRaidZwd to freenas2 and set the "Recursively replicate and remove stale snapshot on remote side" checkbox.

On freenas1, if I DONT set the Recursive checkbox on the Periodic Snapshot task, the child datasets don't seem to replicate to freenas2 in a way that I can find them or mount.

If I DO set the Recursive checkbox on the Periodic Snapshot task on freenas1, I can find, mount, and serve read-only copies of the child datasets. But, on freenas1 -> the child dataset snapshots never get expire for deletion because their freenas:state never changes from 'NEW'. (I thought it was supposted to change to 'LATEST' and then '-' after successful replication.) Also, on freenas2 the replicated snapshots never expire or get deleted.

I read ticket #1455 and looked at fracai's python script as a possible solution. I thought this was a bug in the replication process but ticket #2094 seems to suggest that somehow snapshotting without recursion and replicating just the root volume VRaidZwd should somehow replicate the child datasets too. Beats me how to do it though.

SERVER ONE ---- with Recursive Automatic Snapshot of VRaidZwd set -> NEW state doesn't change on child datasets after replication
[root@freenas] ~# zfs list -Ht snapshot -o name,freenas:state
VRaidZwd@auto-20130319.0900-7d -
VRaidZwd@auto-20130320.0900-7d LATEST
VRaidZwd/dsAFP@auto-20130319.0900-7d NEW
VRaidZwd/dsAFP@auto-20130320.0900-7d NEW
VRaidZwd/dsAFP/TM2@auto-20130319.0900-7d NEW
VRaidZwd/dsAFP/TM2@auto-20130320.0900-7d NEW
VRaidZwd/dsCIFS@auto-20130319.0900-7d NEW
VRaidZwd/dsCIFS@auto-20130320.0900-7d NEW

SERVER TWO ----- replication of child datasets happens successfully
[root@freenas2] ~# zfs list -Ht snapshot -o name,freenas:state
VRaidZWd3Tb@auto-20130319.0900-7d -
VRaidZWd3Tb@auto-20130320.0900-7d -
VRaidZWd3Tb/dsAFP@auto-20130319.0900-7d -
VRaidZWd3Tb/dsAFP@auto-20130320.0900-7d -
VRaidZWd3Tb/dsAFP/TM2@auto-20130319.0900-7d -
VRaidZWd3Tb/dsAFP/TM2@auto-20130320.0900-7d -
VRaidZWd3Tb/dsCIFS@auto-20130319.0900-7d -
VRaidZWd3Tb/dsCIFS@auto-20130320.0900-7d -

[root@freenas2 ~]# ls /mnt/VRaidZWd3Tb
.ssh dsCIFS dsAFP


If I just snapshot VRaidZwd without checking the Recursive checkbox on the Periodic Snapshot task, I can't find any child dataset data on server2 after replication.
Server ONE
VRaidZwd@auto-20130319.0900-7d -
VRaidZwd@auto-20130320.0900-7d LATEST
ServerTWO
VRaidZWd3Tb@auto-20130319.0900-7d -
VRaidZWd3Tb@auto-20130320.0900-7d -

ServerTWO
ls /mnt/VRaidZWd3Tb
.ssh
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
Greg,


What I understand is that you have to tick the recursive option on the parent snapshot otherwise the child datasets are not snapshoted - makes sense
ZFS (it appears) from the ticket does not support recursive parent and child snapshots together
Replicated snapshots only get removed when they are removed on primary systems if the "Recursively replicate and remove stale snapshot on remote side" flag is set

Up until recently I was using the parent pool snapshot with recursion and it was working fine. I subsequently changed to snapshots and replication per dataset as I wanted different retention periods for different datasets. I will try to create a similar config to yours next weekend and see if I hit your issue.
 
Status
Not open for further replies.
Top