Help with understanding snapshots

Casvt · Aug 15, 2021

Hey there,

I understand the basic idea of a snapshot: Look what has changed (files remove/edited/added) since the previous snapshot and safe those changes. If you put 1tb of data in that dataset, the snapshot will be 1tb big. When you add one file after, the next snapshot will be +-1mb big. But I have a few questions. It'll be in the form of a story that has questions in it about the situation at that point.

Setup
At 11:59 I make a dataset with a storage capacity of 1tb and setup periodic snapshots to be taken every hour. They will be removed after three hours and taking empty snapshots are allowed.
Story 1:
At 12:00 the first snapshot is taken, which is empty because there aren't any files in the dataset yet
At 12:30 I put a simple file in the dataset/folder (doesn't matter how, but let's say via smb)
At 13:00 the second snapshot is taken, which is +-1mb big (because, how I understand it, the difference between 12:00 and 13:00 is that one file, so the file size is seen in the snapshot(size))
At 13:30 I edit the file and add a few words in it
At 14:00 the third snapshot is taken, which is +-0,01mb big (the only difference between 13:00 and 14:00 is those few words, so that is the size of the snapshot)
At 14:30 I accidentally delete the file

Question 1: If I want to recover the file, would I need to wait 30min until 15:00 so that a new snapshot is taken (the difference between 14:00 and 15:00 is that file, so that file is "stored/saved" in that snapshot and it would be 1,01mb big), so that I can then recover that file by "using/uploading" that snapshot?
Question 2: If I "upload' the snapshot of 14:00, what would happen? Because it only contains the changes (those few words) in the file, not the file itself (1mb). That means that I would recover the contents of a file, but not the file itself. So could I also, instead of waiting for 15:00 and doing what I said in question 1, "upload" the snapshot of 13:00 and AFTER that the snapshot of 14:00? But that wouldn't work if you work on the file for longer that three hours, because then the snapshot with the original file is already deleted. You only have the changes made in the last three hours available to recover.

Story 2:
At 12:00 the first snapshot is taken, which is empty because there aren't any files in the dataset yet
At 12:30 I put 1tb of data in the dataset/folder. The dataset/folder is full.

Question 3: At 13:00, what would happen? Because the snapshot would be 1tb big (the difference between 12:00 and 13:00 is 1tb of data), but the drive is already at 100%. So the snapshot wouldn't "fit" on the drive anymore. Does the snapshot fail? How would the snapshots of 14:00, 15:00 and 16:00 look? If what I'm saying is correct, does that mean that you can only put 50% of the drives capacity of files on the dataset (assuming that the dataset is empty at the beginning) if you want to snapshot all the files? Because the snapshot will be (in this case) also 500gb big, which would make the dataset 100% full (500gb files + 500gb snapshot = 1tb = dataset capacity).

No story but another question:
You can use replication to copy snapshots to other drives (local) or even to other machines (remote). But I thought that snapshots were "fixed" to the drive? The documentation explicitly says that you can't safe snapshots on other drives, but with replication you suddenly can? But only copy, not move? When you access a dataset (via smb on win10), you can right click -> properties -> previous versions to "recover/upload" snapshots made. But how would you do this if you wanted to "recover/upload" snapshots that are on a different drive or remote system (because of replication)? Let's say you have periodic snapshots taken every hour and are saved for two hours, but are also send to a remote system (replication) where they are stored for a week. How would you be able to "recover/upload" a snapshot that is only available on the remote system (because it was taken 2 days ago for example)?

Second-last question:
Lets say my dataset is filled with 1tb of data (100% full) and the last three snapshots are empty (because they were made overnight or no one changed something in that dataset for three hours). I accidentally remove all the files (now the dataset is empty). I wait an hour (or better said until HH:00 hits) and a new snapshot is taken which is 1tb big. The problem now is that luckily all the files are saved in the snapshot, but the drive is still at 100% full, because of the 1tb snapshot. This means that even though my files are technically safed, I can't recover them because the dataset is full of the snapshot. What could you do in this case? Is there someway to recover the files and at the same time delete the snapshot? So that 1tb snapshot -> [recovering: 750gb snapshot 250gb data -> 500gb snapshot 500gb data -> 250gb snapshot 750gb data -> 0gb snapshot 1tb data ] -> 1tb data.

Last question:
As far as I know, you can only take snapshots (automatic or manually) of datasets, not folders or files inside datasets. But a snapshot stores the changes made to a complete dataset, so why can't you then "recover/upload" a snapshot but only for one file. The snapshot HAS the data that was changed on the file, so why can't it look in the snapshot and grab the specific data it has for that file and use it to recover just the file and not the complete dataset?

I know this is a long post but I hope you can help answer my questions!

Important Announcement for the TrueNAS Community.

Help with understanding snapshots

Casvt

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Help with understanding snapshots

Casvt

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Help with understanding snapshots"

Similar threads