Snapshots

Status
Not open for further replies.

NASbox

Guru
Joined
May 8, 2012
Messages
644
Given the following snapshots:

zfs snapshot TANK/dataset@first
zfs snapshot TANK/dataset@second
zfs snapshot TANK/dataset@third

Am I correct in assuming that assuming there were no changes to TANK/dataset that all three snapshots show referred as "zero length"?

Even though there is no file data, there has to be meta data of some sort that needs to be stored.
How much physical space will each of these snapshots consume for meta data even though they contain no actual files?

To back up a web server (not running zfs), I was rsyncing to FreeNAS and then making a tar from the local copy after each synchronization.
I just learned a bit more about snapshots, and it occured to me that that was a horrible waste of space, that I should just be making snapshots.
Given that no more than 2% of the file system would change between synchronizations, a snapshot is going to be a lot smaller than a tar of the whole home directory of the web server, and will still preserve the status at different points in time. Correct?

If I do:
zfs snapshot TANK/serverbackup@start_YYYYMMDDHHMMSS
rsync
zfs snapshot TANK/serverbackup@end_YYYYMMDDHHMMSS
zfs snapshot TANK/serverbackup@start_YYYYMMDDHHMMSS
rsync
zfs snapshot TANK/serverbackup@end_YYYYMMDDHHMMSS

where: YYYMMSSDDHHMMSS represents an actual time stamp or other unique identifier

After many backups, I am going to have a lot of duplicate snapshots since the end of one backup and the start of the next backup should be identical. If I have a history of 100 backups, that's going to mean I have 99 duplicate/empty and essentially useless snapshots on my pool.

Are all these redundant snapshots going to consume significant space? (Server backup is about 6GB and has several thousand files.)

Should I be pruning the redundant snapshots to save space?

If so does it matter if I delete several "end" snapshots as a batch (will I be trapping space or creating some sort of fragmentation), or is it important to have the backup script find and delete an "end" snapshot before it creates it's "start" snapshot?

Thanks in advance for any advice/suggestions.
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,555
First, ZFS is CoW which meanings data stay where it’s written, fragmentation is not something to consider.

Second, snapshots don’t consume any considerable space on their own. Sure there is overhead but not in any significant way. Instead they stop the deletion of blocks when they are being changed by a write action. Duplicate snapshots is not a thing. A snapshot simply records what blocks to retain. If you remove the first snapshot taken the second snapshot will retain the state of the pool as it looked at the time of the snapshot, but changes made after the first snapshot was taken will no longer be revertable. Blocks that aren’t changed don't accumulat diskspace.

This is a short summary, given your questions I suggest you read up on ZFS and how the different parts work.

Read the FreeNAS manual cover to cover. Pay extra attention to http://doc.freenas.org/11/zfsprimer.html

When you’ve done that go upstream and read
https://www.freebsd.org/doc/handbook/zfs.html
 

NASbox

Guru
Joined
May 8, 2012
Messages
644
Thanks for the reply and the references... there is some good stuff there which I am going to work my way though in detail over time, and I when I looked at http://doc.freenas.org/11/zfsprimer.html and https://www.freebsd.org/doc/handbook/zfs.html and these excellent references were silent on the question that I was asking. I even started watching this excellent video "ZFS: The Last Word in File Systems", really good/interesting, but it really didn't talk about the sizes of any of the structures.

First, ZFS is CoW which meanings data stay where it’s written, fragmentation is not something to consider.
Pardon the use of the term fragmentation.... not fragmentation in the traditional sense, really what I mean to say is consumption of space within the structures that hold the metadata for the pool/drives.

Second, snapshots don’t consume any considerable space on their own. Sure there is overhead but not in any significant way.

My real question was "how much overhead"? I know there is some sort of UberBlock and a tree structure that keeps track of everything, something about blocks being marked with a "birth time" as a way of keeping track of everything, but what I don't know about the snapshot meta data object. How big is it? Does it occupy a slot in a finite size table? What is the impact on a pool of having several hundred "empty" (zero delta between the snapshot and the previous snapshot)?

Instead they stop the deletion of blocks when they are being changed by a write action. Duplicate snapshots is not a thing. A snapshot simply records what blocks to retain. If you remove the first snapshot taken the second snapshot will retain the state of the pool as it looked at the time of the snapshot, but changes made after the first snapshot was taken will no longer be revertable. Blocks that aren’t changed don't accumulat diskspace.
Thanks... got that concept. My question is how big is the structure that holds the metadata?

This is a short summary, given your questions I suggest you read up on ZFS and how the different parts work.

Read the FreeNAS manual cover to cover. Pay extra attention to http://doc.freenas.org/11/zfsprimer.html

When you’ve done that go upstream and read
https://www.freebsd.org/doc/handbook/zfs.html
Thanks again for these references, I'm going to dig into them more deeply as time goes by, and I suspect the answer is something that could be easily answered by someone who does zfs development, and would take a ton of time for a "user" to understand the requisite background material.

In my case, I'm looking at the practicality of keeping a ton of snapshots on my web host -- likely the average delta on a daily basis is about 1-2M/day... there may be the odd 10-20M days, but they are likely fairly rare. Can I keep a year or so worth of history at that rate? The actual data might add up to 2 or 3 GB, which is trivial on a modern system, but is there an impact on performance of some kind? If it's not going to screw anything up, it's likely easier to keep a ton of history that do the housekeeping.

Thanks again, and I'd appreciate if anyone could help me fill in this very specific gap.
 

mstasch

Cadet
Joined
Dec 15, 2017
Messages
5
A common pitfall of snapshots is not to use the recursive parameter when generating snapshots... I do not know, how your data is organized, but if you want to snapshot a whole dataset including subdirectories, you should use
Code:
zfs snapshot -r TANK/dataset@first 
, otherwise you will get a snapshot of the first directory without any files from subdirectories.

I guess that in this case the snapshots will not report any zero length anymore when you have changed files in it.
 

PhilipS

Contributor
Joined
May 10, 2016
Messages
179
A common pitfall of snapshots is not to use the recursive parameter when generating snapshots... I do not know, how your data is organized, but if you want to snapshot a whole dataset including subdirectories, you should use
Code:
zfs snapshot -r TANK/dataset@first 
, otherwise you will get a snapshot of the first directory without any files from subdirectories.
I think you mean child datasets, not subdirectories. If you have a dataset nested inside a dataset and you want it snapshotted with the parent, then you use the recursive parameter. If you have a single dataset with subdirectories (no child datasets) - this will all get snapshotted without a -r parameter.
 

NASbox

Guru
Joined
May 8, 2012
Messages
644
Try this lecture by one of the ZFS co-authors
https://youtu.be/uJGkyMxdNFE
Thanks... I just started to watch this... looks very interesting.

A common pitfall of snapshots is not to use the recursive parameter when generating snapshots... I do not know, how your data is organized, but if you want to snapshot a whole dataset including subdirectories, you should use
Code:
zfs snapshot -r TANK/dataset@first 
, otherwise you will get a

snapshot of the first directory without any files from subdirectories.

I guess that in this case the snapshots will not report any zero length anymore when you have changed files in it.
Thanks for the heads up... I'm aware of that, but in this application the dataset being snapped is a child dataset without any descendants of it's own, so I don't have to worry about it. I've been playing with it a bit, and it's very cool. Sure beat the crap out of making a 4GB tarball after doing an rsync to save that state.
 

PhilipS

Contributor
Joined
May 10, 2016
Messages
179
If you are rsync'ing still and any of the modified files are large, look into using the --inplace option on rsync - this will limit the writes so your snapshots are smaller, I use this for database backups and it works well at keeping the size down (for replication purposes).

Also for managing snapshots - and lots of empty ones - check out theses scripts: https://forums.freenas.org/index.ph...napshots-similar-to-apples-timemachine.10304/

There is one in particular for pruning empty snapshots on his github page.
 

NASbox

Guru
Joined
May 8, 2012
Messages
644
First, ZFS is CoW which meanings data stay where it’s written, fragmentation is not something to consider.
I was thinking about this statement, and if I understand correctly... a large file on a ZFS pool WILL BE fragmented if data is repeatedly appended since the original data stays on the disk and only the changes are written. The directory structure accounts for it, but on a spinning disk the head is going to be going all over to pick up the pieces.

If you are rsync'ing still and any of the modified files are large, look into using the --inplace option on rsync - this will limit the writes so your snapshots are smaller, I use this for database backups and it works well at keeping the size down (for replication purposes).

Also for managing snapshots - and lots of empty ones - check out theses scripts: https://forums.freenas.org/index.ph...napshots-similar-to-apples-timemachine.10304/

There is one in particular for pruning empty snapshots on his github page.

Thanks for the tips... are you using the scripts for pruning snapshots? I noticed the last updates were about 10 months ago. That either means things are working well and very stable, or they are not being well maintained. Any idea which it is?
 

PhilipS

Contributor
Joined
May 10, 2016
Messages
179
Thanks for the tips... are you using the scripts for pruning snapshots? I noticed the last updates were about 10 months ago. That either means things are working well and very stable, or they are not being well maintained. Any idea which it is?

Yes, I'm using the scripts and they are working fine for me on 11.0-u3.
 
Status
Not open for further replies.
Top