Data safety strategy

Revir

Dabbler
Joined
Feb 25, 2020
Messages
16
Hi,
I have two TrueNAS boxes, Master and Backup, on same LAN but in different buildings. Backup always has more disk space than used disk space on Master. Backup will pull backups from Master. Master only hosts iSCSI targets used by Windows file server, SQL server etc., no direct shares on TrueNAS.
What is the best backup strategy? I'm not sure about snapshots, it seems like when I've created an initial snapshot on a new volume it's zero sized so I guess I have to do an inital total copy with RSYNC first. After that I hope I can just create snapshots on Master that Backup can pull. Am I right?
 
Joined
Oct 22, 2019
Messages
3,641
I'm not sure about snapshots, it seems like when I've created an initial snapshot on a new volume it's zero sized so I guess I have to do an inital total copy with RSYNC first.
The snapshot, on its own, is consuming almost nothing (since you haven't yet started deleting/modifying data on the filesystem), thus it perfectly overlaps with the exact same records of the live filesystem. (If you were to destroy this snapshot, right now, for whatever reason, you would still have all your files.)

So even though it's taking up almost nothing, it's still referring to a lot of data. To send the snapshot to the backup pool (as a full, not incremental replication), would take some time, as all the data being referenced by the snapshot needs to be sent over. However, once you start doing incremental sends, only the differences will be sent over to "complete" the replication.
 

Revir

Dabbler
Joined
Feb 25, 2020
Messages
16
I'm soo thankful for your reply, that first snapshot is kind of magic!!! Keywords are "data being referenced", that's the way to look upon it. So that zero size snapshot isn't at all worthless, when I replicate it the whole original volume will be copied. Great!!!
Then second thought. These iSCSI targets are constantly worked on, will a snapshot actually be a true, safe copy of the state of the volume? What if I'm in the middle of an SQL transaction, will ZFS still be able to handle the situation?
 
Joined
Oct 22, 2019
Messages
3,641
ZFS is "copy-on-write" ("CoW"). Once a record is written (whether an entire file or partial modification of a file), it's immutable, regardless of current/future activity. The record can either exist or be destroyed. Nothing else. This is unlike traditional filesystems that work at the "file-level", rather than the "block-level" that ZFS does. (I like to call it the "record-level", since "records" are a particular term with ZFS.)

Taking a snapshot does not write new data. It simply makes a snapshot of the entire filesystem, as it is, in that exact moment in time. The same records can (and almost always are) referred to by multiple snapshots. So only the "differences" are what take up space. Snapshots that have overlapping records take up no extra space.

You could take 1,000 snapshots, without ever having done anything in the filesystem, and they would all consume almost no space.

Or you could have a live filesystem with a bunch of data, take a snapshot of this state, then later delete 10GB worth of data from the live filesystem. The snapshot will now "consume" about 10GB, since it's still referring to the "deleted" data that no longer exists in the live filesystem or any other snapshots.

If you want to reclaim that 10GB of space, you'd need to destroy all the snapshot(s) that exclusively refer to those 10GB worth of "deleted" data.
 
Last edited:

Revir

Dabbler
Joined
Feb 25, 2020
Messages
16
I realized that at about the same time as I submitted the post, it's up to SQL to handle it. It's when I read about that SLOG and synchronized write that I hoped SQL and ZFS could in some way solve that issue but of course it's up to SQL server to determine when to write data to disk. Maybe SLOG on SQL too...
 
Joined
Oct 22, 2019
Messages
3,641
With databases, make sure the dataset/zvol are using a records size that matches what the database software uses, if you want optimal performance. I'd have to defer to someone more versed in that area.
 

Revir

Dabbler
Joined
Feb 25, 2020
Messages
16
So third question, some talk about RSYNC and others about ZFS replication. What's the difference? And, as you stated earlier first RSYNC/replication should be non incremental, then the rest should be incremental right?
 
Joined
Oct 22, 2019
Messages
3,641
Rsync works on files. ZFS works on records. Two completely different things.

If you want to leverage speed, snapshots, and efficiency[1], stick with ZFS-to-ZFS where applicable, such as one TrueNAS server to another.

Rsync comes in handy when backing up from a non-ZFS source (such as a client PC's folder) to the TrueNAS server.

[1] Take for example renaming and moving files. The next Rsync pass will think "files were deleted, and new files were created." When in reality, all you did was rename and move stuff around. The only real difference in the filesystem is some tiny metadata, such as pointers, file locations, filenames, etc. So Rsync will end up wasting time deleting files on the destination (because they were moved or renamed), and re-transferring the same files because they are "new" according to Rsync. This is also bad for snapshot efficiency, since these newly transferred files (even though they are the "same" files as before; just in a different folder / renamed) will be written as new records by ZFS, and hence be wasteful for snapshots; while the "deleted" files on the destination (which were never deleted, just renamed / different folder location), will needlessly consume space on previously created ZFS snapshots on the server.

However, ZFS, as opposed to Rsync, will only transfer the real changes: the tiny bit of metadata records that update the filesystem on the filenames and file locations. (You can even see this for yourself when you use "zfs diff" to compare two snapshots. There will be an "R" symbol to denote the same files, but in different locations / renamed.) Such metadata takes up a tiny bit of space it's not even worth mentioning.
 

Revir

Dabbler
Joined
Feb 25, 2020
Messages
16
Yeah, I noticed RSync was problematic on Windows shares. I want a "bit by bit" copy of one server to the other so then ZFS replication is right for me I guess? Maybe even must because ZFS hasn't, and shouldn't, have any knowledge about iSCSI targets, it has to be on bit or, as you better call it, record level.
 
Joined
Oct 22, 2019
Messages
3,641
Yeah, I noticed RSync was problematic on Windows shares. I want a "bit by bit" copy of one server to the other so then ZFS replication is right for me I guess?
Only the differences will be transferred, yes. In the case of a database, even if its a single file, only the records of the database file will be transferred.

Just make sure to match the record size to the page size when using databases.
 
Top