Help verifying periodic snapshots

jgreco · Dec 30, 2015

FreeNASBob said:
That would be a bizarre metric, and "used" is not a fitting label for it. "Unique bytes" would be much better. "Used" implies, well ... used.

Your preference, then, would be ... what?

That seems crazy to me. It's not giving me information about what's contained in the snapshot.

Sure it is. It's especially telling you how much disk space can be reclaimed by removing the snapshot, which is probably the MOST important information about a snapshot...

titan_rw · Dec 30, 2015

FreeNASBob said:
I was under the impression that ZFS snapshots were sector-based, but you're suggesting they're file-based. That changes things.

Snapshots are 'block' based. Default to 128k, or 1meg depending. Called "recordsize" in zfs properties. Sectors infers you're referring to the underlying hard disk sectors, which is not how zfs works.

Take a 100 gig virtual disk file. Make a snapshot of it. Now change 1 byte in the file. The snapshot will show 'used' as 1 block, probably 128k (excluding metadata changes). The entire file has changed (different md5, etc), but the snapshot tracks the underlying block changes. Take another snapshot, and both snapshots will show 'used' of 0, because they contain no unique blocks. The changed block is no longer unique to any snapshot.

FreeNASBob · Dec 30, 2015

jgreco said:
Your preference, then, would be ... what?

"Unique bytes" would be good.

jgreco said:
Sure it is. It's especially telling you how much disk space can be reclaimed by removing the snapshot, which is probably the MOST important information about a snapshot...

It's a poor way of representing that. If I have two snapshots as given in the example above, both showing 0 bytes used (because they both reference the same data), it is implying that deleting both will free 0 bytes. That would be incorrect. In fact, how much will be freed by deleting >1 snapshot is a complete mystery hidden from the user. That's not very useful at all when dealing with hundreds or thousands of snapshots. Are admins supposed to delete one by one and then re-examine the list for the next juicy target for deletion and repeat that for thousands of iterations? That just seems like insanity.

FreeNASBob · Dec 30, 2015

titan_rw said:
Take a 100 gig virtual disk file. Make a snapshot of it. Now change 1 byte in the file. The snapshot will show 'used' as 1 block, probably 128k (excluding metadata changes). The entire file has changed (different md5, etc), but the snapshot tracks the underlying block changes. Take another snapshot, and both snapshots will show 'used' of 0, because they contain no unique blocks. The changed block is no longer unique to any snapshot.

In my case, even the first snapshot taken after hundreds of files have been changed still reports 0.

FreeNASBob · Dec 30, 2015

I assume nobody's going to change how ZFS works, so can someone help me find out how you identify the number of bytes difference between a snapshot and the current filesystem without having to create a clone and check manually?

Thanks.

titan_rw · Dec 30, 2015

FreeNASBob said:
"
It's a poor way of representing that. If I have two snapshots as given in the example above, both showing 0 bytes used (because they both reference the same data), it is implying that deleting both will free 0 bytes. That would be incorrect. In fact, how much will be freed by deleting >1 snapshot is a complete mystery hidden from the user. That's not very useful at all when dealing with hundreds or thousands of snapshots. Are admins supposed to delete one by one and then re-examine the list for the next juicy target for deletion and repeat that for thousands of iterations? That just seems like insanity.

It's not implying that deleting both would free 0 bytes. It's implying that deleting either would free 0 bytes. That's just how it works.

If it was the other way around, people would argue it the other way. I have 100 snapshots and each one was showing that it was 'using' 10 gigs. I deleted one of them, but I didn't get the 10 gigs freed. But the other 99 snapshots still show 10 gigs used. Every '10 gig' snapshot I delete doesn't actually free disk space.

This is an even worse way to represent it.

FreeNASBob said:
In my case, even the first snapshot taken after hundreds of files have been changed still reports 0.

That simply means that there is no data unique to that snapshot. It may be referencing data that has been deleted though, if other snapshots also reference the same data.

FreeNASBob · Dec 30, 2015

titan_rw said:
It's not implying that deleting both would free 0 bytes. It's implying that deleting either would free 0 bytes. That's just how it works.

If it was the other way around, people would argue it the other way. I have 100 snapshots and each one was showing that it was 'using' 10 gigs. I deleted one of them, but I didn't get the 10 gigs freed. But the other 99 snapshots still show 10 gigs used. Every '10 gig' snapshot I delete doesn't actually free disk space.

This is an even worse way to represent it.

I guess that depends on your administration tasks. If you're commonly deleting one snapshot for space I can see the usefulness of the status quo. If you more often need to delete multiple snapshots then it makes your admin life a pain. A better way for everyone would be to describe the size of the snapshot, which would be the difference between the file system and the snapshot, and the number of bytes shared with other snapshots. That way everybody has the information they want.

There is precedence for the way you describe as the "worse" way. As far as I know, hard links are similar in concept. You can have two files that point to the same data on disk. Both of those files will appear to be the size of the actual data, and deleting one of them will not free any disk space at all.

titan_rw said:
That simply means that there is no data unique to that snapshot. It may be referencing data that has been deleted though, if other snapshots also reference the same data.

I don't see how that could be the case every single snapshot. That would mean every single appended file happened to be exactly on a block border every time and that every change to modified files would match some other block somewhere else. Even with training as a statistician I can't even imagine the probabilities involved.

Bidule0hm · Dec 30, 2015

titan_rw said:
Yes, but keep this in mind: If more than one snapshot 'reference' the deleted data, this deleted data won't show in ANY of the snapshots 'used' numbers.

Yep, it's a bit crappy... they could have done something like [used space for one snapshot] = [deleted data size] / [number of snapshots referencing it] :)

FreeNASBob said:
I was under the impression that ZFS snapshots were sector-based, but you're suggesting they're file-based. That changes things.

They are sector based. When I say "data" it's data blocks (or "sectors", not drives sectors though) ;)

titan_rw · Dec 30, 2015

Blocksize is dynamic, so if you have a 129 kb file, you'll have 1 128k block followed by 1 1k block (or 4k block with ashift 12 pools). If you append another 129k, you'll add another 128k block and another 1k (or 4k) block. Either way, the original two blocks stay the same, and don't affect snapshots.

titan_rw · Dec 30, 2015

FreeNASBob said:
I assume nobody's going to change how ZFS works, so can someone help me find out how you identify the number of bytes difference between a snapshot and the current filesystem without having to create a clone and check manually?

Thanks.

I'm not sure I know what you want. You want to know total deleted / changed bytes + total added bytes between a snapshot and the live filesystem? I know if no way to get that easily.

There's no need to create clones though, if you only want read access to them:

"Live" version of the file, post append:

Code:

root@nas test # ls -l
total 15728882
-rw-r--r--  1 root  wheel  16106127360 Dec 30 13:58 10gb

Snapshot version of the file before append:

Code:

root@nas test # ls -l .zfs/snapshot/1-10gig/
total 10485927
-rw-r--r--  1 root  wheel  10737418240 Dec 30 13:57 10gb

"zfs diff" will also show differences between two snapshots, but only represented based on files, ie, which files were deleted, added, or modified:

Code:

root@nas test # zfs diff nas2pool/test@1-10gig nas2pool/test@2-15gig
M  /mnt/nas2pool/test/10gb

Simply shows the file was modified between snapshot @1-10gig and @2-15gig.

You could "zfs send" an incremental between two snapshots, and count the bytes generated, but this will be new / changed data only. Deleted / old data would simply be a reference to the fact it no longer exists, and most likely take up very little space in the resulting zfs stream.

FreeNASBob · Dec 30, 2015

titan_rw said:
I'm not sure I know what you want. You want to know total deleted / changed bytes + total added bytes between a snapshot and the live filesystem? I know if no way to get that easily.

What I really want is a number representing the actual number of bytes used by a snapshot to represent the changes it contains. If I have a 10 MB file where 2 MB are modified, I want a way to know the each snapshot taken of that file since that change contains 2 MB of changes.

For those familiar with hard links in file systems like NTFS and ext3, I want snapshots to behave just like hard links.

titan_rw · Dec 30, 2015

FreeNASBob said:
I don't see how that could be the case every single snapshot. That would mean every single appended file happened to be exactly on a block border every time and that every change to modified files would match some other block somewhere else. Even with training as a statistician I can't even imagine the probabilities involved.

I'm sorry. I've read this a few times and don't understand what you're getting at.

As mentioned, data is always added in whole blocks. Blocks can be anywhere from 512 bytes to 1 meg. Usually no smaller than 4k in modern pools though. 'Changed' data is still data being added, but will simply use the existing block size of whatever block is being modified. To use my previous example of a 129k file, if you modified the last 2k in that file, zfs will add two blocks. A new copy of the first 128k, and a new copy of the last 1k. If the original two blocks are not being referenced by a snapshot, then they will be eventually freed and disk space reclaimed. If they're referenced by a snapshot, they'll be held on to. Assuming dedupe is not involved, the contents of the blocks don't matter. Even if they're the same contents, additional disk space will be required.

You normally only see consistently high numbers for 'used' in large numbers of snapshots when the data is changing extremely fast. If it's changing slowly, ie longer than 2x your snapshot period, then it's likely two or more snapshots will reference any data that gets deleted. Basically, 'used' is 'short lived' data. It needs to be added to the dataset, then a snapshot taken, then deleted before the next snapshot. A lot of usage patterns don't do this, even if data is changed frequently. Take a 100 gig VM. Lets say it writes (changes) 10 gigs of logs a day. But logs are kept for 7 days. If you take daily snapshots, then all the deleted data will be referenced by more than one snapshot, and 'used' will be very low.

FreeNASBob · Dec 30, 2015

titan_rw said:
I'm sorry. I've read this a few times and don't understand what you're getting at.

You normally only see consistently high numbers for 'used' in large numbers of snapshots when the data is changing extremely fast. If it's changing slowly, ie longer than 2x your snapshot period, then it's likely two or more snapshots will reference any data that gets deleted. Basically, 'used' is 'short lived' data. It needs to be added to the dataset, then a snapshot taken, then deleted before the next snapshot. A lot of usage patterns don't do this, even if data is changed frequently. Take a 100 gig VM. Lets say it writes (changes) 10 gigs of logs a day. But logs are kept for 7 days. If you take daily snapshots, then all the deleted data will be referenced by more than one snapshot, and 'used' will be very low.

What I mean is that the first snapshot taken after the dataset has been updated - some files appended, some deleted, some modified internally (not just at the end or beginning of the file), shows 0 bytes used. That can't be the case. There can't be any other snapshots referencing the new data. I understand that once there are two snapshots of the same data ZFS in it's own bizarre way counts each snapshot as using 0 bytes (never mind that if you have two snapshots using 0 bytes it should follow that the total use is 2 x 0), but I'm talking about when there's only one snapshot referencing the changes.

titan_rw · Dec 30, 2015

FreeNASBob said:
What I mean is that the first snapshot taken after the dataset has been updated - some files appended, some deleted, some modified internally (not just at the end or beginning of the file), shows 0 bytes used. That can't be the case. There can't be any other snapshots referencing the new data.

I'm still not understanding. When you say "there can't be any other snapshots referencing the new data", you're referring to a brand new snapshot? There won't be any deleted / changed data at that point. The previous snapshot will show 'used' for any deleted / changed data if the previous snapshot is the only one with that data.

I'd test it to satisfy your curiosity:

Create / copy a largish file to the dataset that you have regular snapshots on. After the next snapshot, but before the "next next" one, delete the file. 'used' for that snapshot should show the size of the deleted file.

FreeNASBob · Dec 30, 2015

titan_rw said:
I'm still not understanding. When you say "there can't be any other snapshots referencing the new data", you're referring to a brand new snapshot? There won't be any deleted / changed data at that point. The previous snapshot will show 'used' for any deleted / changed data if the previous snapshot is the only one with that data.

I'd test it to satisfy your curiosity:

Create / copy a largish file to the dataset that you have regular snapshots on. After the next snapshot, but before the "next next" one, delete the file. 'used' for that snapshot should show the size of the deleted file.

I think two different things are at work in my case. Snapshots run every day at midnight, but when I rsync the PC is arbitrary because it's not running all the time. There can be many snapshots created between instances of running rsync on the PC. I wanted a way to find those "empty" snapshots (i.e., nothing changed on the dataset in the interval), or more specifically the snapshots that weren't "empty". The way ZFS reports the space used by snapshots makes that impossible.

What I think is happening is that the time between rsyncing is almost always longer than the snapshot interval, which means all snapshots will report 0 used except for the very last snapshot taken before an rsync operation. It too will turn to zero at midnight when the next snapshot runs.

I can see from what others are posting that this seems normal to them, but I am finding it difficult to express how foreign this concept of "used" space is to me ... even though I've been familiar with the basics of how filesystems work for 30 years. It contradicts every similar paradigm I've encountered and it makes finding basic information about a snapshot (like the actual space it uses) nearly impossible.

titan_rw · Dec 30, 2015

The last (latest) snapshot BEFORE an rsync should always be 0 used. If a snapshot is created, and then no data is changed (before the rsync), then the snapshot shouldn't list anything as used. (Live data is the same as what the snapshot is referencing).

After the rsync, but before the next snapshot, there will be a divergence of 'live' data to 'snapshot' data. But only if the data was freshly added to the dataset will it show up in 'used'. I understand this is rarely the case for you.

When I first figured out how the 'used' thing works for snapshots, I did think to myself it would be nice if was reported differently, but then after digging into ZFS a little further, I can understand why it is the way it is.

If a snapshot truely has no delta, ie there was no rsync inbetween, then it's overhead is extremely small. You're snapping once a day for 28 weeks? If my math is right, that's around 200 snapshots, which is quite reasonable. I wouldn't worry about it.

titan_rw · Dec 30, 2015

Here's some snapshots taken on a typical lightly loaded business fileserver. Snapshots are every 30 minutes, between 7am and 8pm, kept for 2 weeks.

This is just one day, but it's pretty typical.

Code:

tank/dataset@auto-20151218.0700-2w  198K  -  559G  -
tank/dataset@auto-20151218.0730-2w  21.2M  -  559G  -
tank/dataset@auto-20151218.0800-2w  209K  -  559G  -
tank/dataset@auto-20151218.0830-2w  186K  -  559G  -
tank/dataset@auto-20151218.0900-2w  8.30M  -  559G  -
tank/dataset@auto-20151218.0930-2w  4.71M  -  559G  -
tank/dataset@auto-20151218.1000-2w  4.19M  -  559G  -
tank/dataset@auto-20151218.1030-2w  15.9M  -  559G  -
tank/dataset@auto-20151218.1100-2w  23.6M  -  559G  -
tank/dataset@auto-20151218.1130-2w  22.2M  -  559G  -
tank/dataset@auto-20151218.1200-2w  6.05M  -  559G  -
tank/dataset@auto-20151218.1230-2w  9.05M  -  559G  -
tank/dataset@auto-20151218.1300-2w  6.11M  -  559G  -
tank/dataset@auto-20151218.1330-2w  6.72M  -  559G  -
tank/dataset@auto-20151218.1400-2w  6.37M  -  559G  -
tank/dataset@auto-20151218.1430-2w  6.15M  -  559G  -
tank/dataset@auto-20151218.1500-2w  4.82M  -  559G  -
tank/dataset@auto-20151218.1530-2w  2.50M  -  559G  -
tank/dataset@auto-20151218.1600-2w  2.63M  -  559G  -
tank/dataset@auto-20151218.1630-2w  8.07M  -  559G  -
tank/dataset@auto-20151218.1700-2w  151K  -  559G  -
tank/dataset@auto-20151218.1730-2w  105K  -  559G  -
tank/dataset@auto-20151218.1800-2w  105K  -  559G  -
tank/dataset@auto-20151218.1830-2w  105K  -  559G  -
tank/dataset@auto-20151218.1900-2w  105K  -  559G  -
tank/dataset@auto-20151218.1930-2w  105K  -  559G  -
tank/dataset@auto-20151218.2000-2w  105K  -  559G  -

You can see constant change, even after only 30 minutes. And this is only data that's uniquely deleted every 30 minutes.

For comparison, the datasets 559 gigs, and usedbysnapshots is 2.06 gig.

FreeNASBob · Dec 31, 2015

titan_rw said:
If a snapshot truely has no delta, ie there was no rsync inbetween, then it's overhead is extremely small. You're snapping once a day for 28 weeks? If my math is right, that's around 200 snapshots, which is quite reasonable. I wouldn't worry about it.

I'm not worried about the data occupied by empty snapshots, but rather my inability to find the substantive snapshots. In all the cases I've lost data to file corruption in the past, it has happened to many files at once and I wasn't certain of the date on which the loss actually occurred. In a case such as that I would need to go back through the snapshots to find the last one before all that data changed in order to restore from the most recent set of good data. I see no way to do that with the current implementation. Perhaps I'm wrong, but it seems like the current implementation is only useful if you already know the exact moment when something first went wrong with the data.

jgreco · Dec 31, 2015

FreeNASBob said:
I'm not worried about the data occupied by empty snapshots, but rather my inability to find the substantive snapshots. In all the cases I've lost data to file corruption in the past, it has happened to many files at once and I wasn't certain of the date on which the loss actually occurred. In a case such as that I would need to go back through the snapshots to find the last one before all that data changed in order to restore from the most recent set of good data. I see no way to do that with the current implementation. Perhaps I'm wrong, but it seems like the current implementation is only useful if you already know the exact moment when something first went wrong with the data.

Maybe I'm particularly stupid this morning, but if you lose data to file corruption, that implies that the underlying storage has gone bad. In such a case, if ZFS cannot repair the damage, then there's no point at which the data will still be intact, because the underlying storage was THE only copy of that data for all the snapshots that held it.

titan_rw · Dec 31, 2015

jgreco said:
Maybe I'm particularly stupid this morning, but if you lose data to file corruption, that implies that the underlying storage has gone bad. In such a case, if ZFS cannot repair the damage, then there's no point at which the data will still be intact, because the underlying storage was THE only copy of that data for all the snapshots that held it.

I think he's talking about application level corruption, ie outlook corrupts a .pst or something.

Knowing how much data has changed isn't really going to help you determine which snapshot is 'good'. Corruption can be an extremely small change, or a very large one. It's still going to be up to the admin to find the latest snapshot that's 'good'. I don't see how freenas can help with that.

Poking through the .zfs/snapshots directory is actually a pretty quick way to check various versions of a file in question. And with snapshots exposed through windows' previous version tab, it's even easier. We have to do that here with our windows file server configured for automatic shadow copies. If a user complains that a file is corrupt, or missing, I have to go poking through the shadow copies to find where something changed. It doesn't take too long, as you can kind of 'divide and conquer'. Split the range in half, and check, then in half again, etc.

Important Announcement for the TrueNAS Community.

Help verifying periodic snapshots

Resident Grinch

Guru

Patron

Patron

Patron

Guru

Patron

Server Electronics Sorcerer

Guru

Guru

Patron

Guru

Patron

Guru

Patron

Guru

Guru

Patron

Resident Grinch

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Help verifying periodic snapshots"

Similar threads