Snapshot File-Size

Skywalker

Dabbler
Joined
Dec 30, 2022
Messages
16
Hi guys,
I hope this question is answered quickly, but I've been looking for an answer for a while now.
I store on my TrueNAS a Word file which is 403MB. After that I create a snapshot. Afterwards I change the Word file by deleting pages, so that the file is only half the size, 200MB. After that I just save it again.
According to my feeling the snapshot should now grow to about 200MB, because these are the blocks I deleted. However, the snapshot size has grown to almost 403MB. Where is the error in my thinking? I don't understand why the snapshot size not only contains the change but is as big as the original file. If I do the same test with a folder and delete a whole file there, the snapshot grows only by the size of the deleted file. But this does not work with a test file (as described above).

thanks a lot for your help!
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
If my (limited) understanding of snapshots is correct, it's because the snapshot referenced the old (bigger) file. Basically it has it in "memory" so that it can bring back the full file if you want.
 
Joined
Oct 22, 2019
Messages
3,641
According to my feeling the snapshot should now grow to about 200MB, because these are the blocks I deleted.
Think about it from a different perspective: snapshots don't really "grow", but rather "retain" or "hold onto". Once you create a snapshot, your dataset's total used capacity doesn't "keep growing" with subsequent deletes, but rather its used capacity will not shrink (until the relevant snapshots are destroyed.)

As for how much data a snapshot will retain, it depends on how compressible the records are. I would assume that a Word document is highly compressible. So even if the file itself is 400 MiB or 200 MiB, it may in fact only use a fraction of the space, as the records which comprise this file have been efficiently compressed.


Where is the error in my thinking? I don't understand why the snapshot size not only contains the change but is as big as the original file.
It depends on the software being used. Does it modify the file "in place" or does it create a temporary copy of the file, which replaces the old file upon saving the new one?

Modifications "in place" would yield what you are expecting. The software would be agnostic to how ZFS handles it at a lower-level.

However, if the software makes a copy when saving the new file, then the old file will still be represented by an entirely separate batch of records, in which there is no overlap with the records that make up the new file.

It is for this reason that those who use rsync must include the option --inplace, otherwise modified files will not leverage the efficiency of snapshots on a CoW filesystem, such as ZFS.
 

Skywalker

Dabbler
Joined
Dec 30, 2022
Messages
16
Think about it from a different perspective: snapshots don't really "grow", but rather "retain" or "hold onto". Once you create a snapshot, your dataset's total used capacity doesn't "keep growing" with subsequent deletes, but rather its used capacity will not shrink (until the relevant snapshots are destroyed.)

As for how much data a snapshot will retain, it depends on how compressible the records are. I would assume that a Word document is highly compressible. So even if the file itself is 400 MiB or 200 MiB, it may in fact only use a fraction of the space, as the records which comprise this file have been efficiently compressed.



It depends on the software being used. Does it modify the file "in place" or does it create a temporary copy of the file, which replaces the old file upon saving the new one?

Modifications "in place" would yield what you are expecting. The software would be agnostic to how ZFS handles it at a lower-level.

However, if the software makes a copy when saving the new file, then the old file will still be represented by an entirely separate batch of records, in which there is no overlap with the records that make up the new file.

It is for this reason that those who use rsync must include the option --inplace, otherwise modified files will not leverage the efficiency of snapshots on a CoW filesystem, such as ZFS.
Thank you Winnielinnie for your explanation! Got it :smile:
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
For an Office document, expect major changes because the files are .zip archives in disguise. Of course, that depends on what you're removing from the document.
 
Joined
Oct 22, 2019
Messages
3,641
For an Office document, expect major changes because the files are .zip archives in disguise.
Good catch, that's true.

And for some reason I intuited that Word documents are highly compressible, when in fact they are not, because they are already compressed.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Good catch, that's true.

And for some reason I intuited that Word documents are highly compressible, when in fact they are not, because they are already compressed.
Well, docx, xlsx, etc are like that (Office 2007 and later). IIRC doc, xls, etc are binary formats (Office 2003 and earlier).

Not that I'm advocating using the old formats. There are reasons why we were all glad to ditch them back in the day.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
And the "new" ones have now been around for longer than the "old" ones were before being replaced.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
And the "new" ones have now been around for longer than the "old" ones were before being replaced.
Yeah, as I wrote that I realized that some of my work experience with those things is probably older than some of the users on the forums. Then again, in corporate / government space there are probably still a lot of those files lying around.
 
Top