Occasional (2 months) pool backup options

Fire-Dragon-DoL

Contributor
Joined
Dec 22, 2018
Messages
103
Hi everybody,
I have a pool with mirroring, so 1 drive is fully replicated on the other. While this is not a backup, it helps with availability.

I also run backups every 2 months, I do this by manually connecting the HDD (giant pain because I have to put screws and move stuff), plug it in, then I use "Replication" in the UI to perform what I would consider a backup. (see the config in attached files)
Or at least that's what I thought "Replication" would do.

- One time I performed a "sample test" and discovered missing files
- Now since I don't trust my backup mechanisms anymore, I used zfs-check and discovered missing files
- I also discovered _excessive files_, files that shouldn't be there but are

What are my options for:
- Performing a backup, the easiest way. Destination HDD should become "same" as Source HDD (non-existing files on source should be deleted from right, new/updated files should be copied over), if possible I'd like all snapshots to also be copied, so that the state are similar
- Avoid having to perform a full rewrite of the HDDs
- Sometimes pruning snapshots before a certain date on both source and destination
- Testing that the backup worked. Is zfs-check a reasonable tool? Should I use rsync instead?

I'd appreciate some help, since not trusting my own backups is scary.
 

Attachments

  • truenas-backup-config.jpg
    truenas-backup-config.jpg
    119.1 KB · Views: 168
Joined
Oct 22, 2019
Messages
3,641
I do this by manually connecting the HDD (giant pain because I have to put screws and move stuff), plug it in, then I use "Replication" in the UI to perform what I would consider a backup.
Is BMARS on an external USB drive? For these occasional, infrequent backups, it might be feasible to use a USB enclosure (with external power and UAS support), rather than having to physically install and remove an HDD from your chassis every time.


- One time I performed a "sample test" and discovered missing files
What sample test? From what type of backup? (Using the Replication Tasks in the GUI?)


- Now since I don't trust my backup mechanisms anymore, I used zfs-check and discovered missing files
- I also discovered _excessive files_, files that shouldn't be there but are
This sounds like a third-party tool, not part of upstream zfs-utils.


- Performing a backup, the easiest way. Destination HDD should become "same" as Source HDD (non-existing files on source should be deleted from right, new/updated files should be copied over), if possible I'd like all snapshots to also be copied, so that the state are similar
You would have to use the "Full Filesystem Replication" option (with invokes "-R" in the background).

You would have to include all snapshots, even intermediary ones, with parseable "datestamped" names. Otherwise, if there exists other snapshots besides "franceso-DATE", they will be skipped.


- Sometimes pruning snapshots before a certain date on both source and destination
Let the source side handle this, and use the same retention policy on the destination.

For TrueNAS, the retention policy is taken from a "Periodic Snapshot Task". There is no untethered pruning service that runs independently of this that prunes on its own schedule with a user configured "expiration rule".

Or you can manually prune, but keeping in mind not to destroy a base snapshot from the source (shared with the destination), otherwise you won't be able to do an incremental backup.


- Testing that the backup worked. Is zfs-check a reasonable tool? Should I use rsync instead?
If two snapshots share the same GUID, then they are the same. If all snapshots exists on the source also exist on the destination, and their GUIDs are the same on both sides, you essentially have everything on your backup.

Using a file-based tool to verify every time is inefficient and not helpful, especially for (deleted) files that only "exist" in a snapshot.
 

Fire-Dragon-DoL

Contributor
Joined
Dec 22, 2018
Messages
103
Is BMARS on an external USB drive? For these occasional, infrequent backups, it might be feasible to use a USB enclosure (with external power and UAS support), rather than having to physically install and remove an HDD from your chassis every time.



What sample test? From what type of backup? (Using the Replication Tasks in the GUI?)



This sounds like a third-party tool, not part of upstream zfs-utils.



You would have to use the "Full Filesystem Replication" option (with invokes "-R" in the background).

You would have to include all snapshots, even intermediary ones, with parseable "datestamped" names. Otherwise, if there exists other snapshots besides "franceso-DATE", they will be skipped.



Let the source side handle this, and use the same retention policy on the destination.

For TrueNAS, the retention policy is taken from a "Periodic Snapshot Task". There is no untethered pruning service that runs independently of this that prunes on its own schedule with a user configured "expiration rule".

Or you can manually prune, but keeping in mind not to destroy a base snapshot from the source (shared with the destination), otherwise you won't be able to do an incremental backup.



If two snapshots share the same GUID, then they are the same. If all snapshots exists on the source also exist on the destination, and their GUIDs are the same on both sides, you essentially have everything on your backup.

Using a file-based tool to verify every time is inefficient and not helpful, especially for (deleted) files that only "exist" in a snapshot.
It's not on an external drive. I asked that long time ago and was treated like a witch, so I plug the disk like a normal attached drive (I have a server with 32 bays or 24, I can't remember, but I definitely have plenty to spare).

For backups I do use the Replication task from the GUI. The "full system replication" option seems to rewrite from scratch every time though, is this correct or am I wrong?
The other issue I read is that there seems to be a bug where full system replication doesn't copy intermediary snapshots, at least that's what I read on this forum. I'll do some more research, I'm not home right now.

The "sample test" was opening a couple of directories and verify that some files I consider important were there. I did not find one of them, that's when the alarm bells went off.

zfs-check is a third party tool, I did not find any official utility to verify that the data is correctly copied and I don't feel comfortable trusting the system after that sample test

Finally, I did not include "auto" timestamps, I see that could have caused problems. My approach at snapshots is roughly:
- auto snapshot for peace of mind, these are all discardable
- make a manual snapshot before and after important events (those are named francesco-date)

My problem with that is that manual snapshots are never automatically pruned and sometimes I want to prune auto backups early. What happens if I delete snapshots when I do the replication task, will that delete snapshots on the destination?

Sorry for all the questions, I'm very grateful for your answer and perspective
 
Joined
Oct 22, 2019
Messages
3,641
It's not on an external drive. I asked that long time ago and was treated like a witch
Yeah, it can get a bit zealous in here. But "something" is always better than nothing. To jump through hoops or give up on even the simplest of backup plans (because you can't achieve a "perfect, holy, sanctioned" setup) is more harmful than just using a USB enclosure that at least gives you some sort of backup plan.

Something like this will suit the occasional ZFS backup. Externally powered and supports UAS:


For backups I do use the Replication task from the GUI. The "full system replication" option seems to rewrite from scratch every time though, is this correct or am I wrong?
It only means that it includes "everything" in terms of the source dataset's properties, recursive children datasets, and recursive snapshots. It also deletes extraneous snapshots on the destination that no longer exist on the source.

To "start all over from scratch" happens if there is no "base snapshot" shared by the source and destination, of which to do an incremental send.

This can occur if pruning is not properly handled or kept in harmony between the source and destination.



The other issue I read is that there seems to be a bug where full system replication doesn't copy intermediary snapshots, at least that's what I read on this forum.
That's correct, and I'm the one that brought this to attention, as well as followed up on a bug report of this issue. The reason you see it now labeled as "(Almost) Full Filesystem Replication" is because they made it clear that this "bug" is actually intentional and will not be fixed in TrueNAS Core. So, I explained that at minimum the tooltip and label is misleading. At least they fixed that much.

However, this issue of intermediary snapshots is not exclusive to that option. The reason it's important to note is because you would assume that option will include intermediary snapshots. No option overcomes this. Skipping intermediary snapshots is hardcoded into iXsystem's zettarepl, which is the tool that runs in the background for Replication Tasks.

In order to "capture" all intermediary snapshots, you need to include all patterns in the included "naming schemas".



The "sample test" was opening a couple of directories and verify that some files I consider important were there. I did not find one of them, that's when the alarm bells went off.
After a ZFS replication? Using Windows Explorer via an SMB share? Using the command-line on the server itself? How recent were those missing files created in reference to the latest snapshot creation?



zfs-check is a third party tool, I did not find any official utility to verify that the data is correctly copied and I don't feel comfortable trusting the system after that sample test
That's not really how ZFS works. A snapshot is a snapshot. There is no "partially sent" snapshot that will share the same GUID as the source snapshot. It's either-or. (It either sent to the destination, in which it's an exact replica of the filesystem as the source at the time of snapshot creation, or it failed to send.)



Finally, I did not include "auto" timestamps, I see that could have caused problems. My approach at snapshots is roughly:
- auto snapshot for peace of mind, these are all discardable
- make a manual snapshot before and after important events (those are named francesco-date)
Refer to the bug report and my forum post. This is why I'm not a fan of the (lack of) flexibility of using Replication Tasks in the GUI.



What happens if I delete snapshots when I do the replication task, will that delete snapshots on the destination?
If you delete snapshots on the source, then the next time you replicate to the destination, it will likewise remove the missing snapshots, so long as "Full Filesystem Replication" is enabled.
 
Last edited:

Fire-Dragon-DoL

Contributor
Joined
Dec 22, 2018
Messages
103
It only means that it includes "everything" in terms of the source dataset's properties, recursive children datasets, and recursive snapshots. It also deletes extraneous snapshots on the destination that no longer exist on the source.

To "start all over from scratch" happens if there is no "base snapshot" shared by the source and destination, of which to do an incremental send.

This can occur if pruning is not properly handled or kept in harmony between the source and destination.




That's correct, and I'm the one that brought this to attention, as well as followed up on a bug report of this issue. The reason you see it now labeled as "(Almost) Full Filesystem Replication" is because they made it clear that this "bug" is actually intentional and will not be fixed in TrueNAS Core. So, I explained that at minimum the tooltip and label is misleading. At least they fixed that much.

However, this issue of intermediary snapshots is not exclusive to that option. The reason it's important to note is because you would assume that option will include intermediary snapshots. No option overcomes this. Skipping intermediary snapshots is hardcoded into iXsystem's zettarepl, which is the tool that runs in the background for Replication Tasks.

In order to "capture" all intermediary snapshots, you need to include all patterns in the included "naming schemas".




After a ZFS replication? Using Windows Explorer via an SMB share? Using the command-line on the server itself? How recent were those missing files created in reference to the latest snapshot creation?




That's not really how ZFS works. A snapshot is a snapshot. There is no "partially sent" snapshot that will share the same GUID as the source snapshot. It's either-or. (It either sent to the destination, in which it's an exact replica of the filesystem as the source at the time of snapshot creation, or it failed to send.)




Refer to the bug report and my forum post. This is why I'm not a fan of the (lack of) flexibility of using Replication Tasks in the GUI.




If you delete snapshots on the source, then the next time you replicate to the destination, it will likewise remove the missing snapshots, so long as "Full Filesystem Replication" is enabled.
This is very thorough, thank you.
So I can experiment, I need to figure out a pattern to include the "auto" snapshot, delete all the "named" snapshots that don't match a pattern and then proceed.

The sample test was me as root in the system console. I wouldn't trust samba for these things, the cache caused me headaches over the years!

I'm ok partially losing some snapshots if I don't lose the data. I wish I had more details about the sample test, it was a year ago, which is when I started being concerned for my backups.

But due to google becoming unreliable (with bans), pcloud having some hidden bandwidth limitations that seriously hinder backup capabilities, it seems like my home system is the most reliable so far. There is a way higher chance of google randomly banning you than my house catching fire and losing all 3 hdds at the same time. Yes I haven't figured a way to have an offsite backup too so far, that's harder.
 
Joined
Oct 22, 2019
Messages
3,641
I edited my reply to include something about “using a USB enclosure” at the top of my post.
 

Fire-Dragon-DoL

Contributor
Joined
Dec 22, 2018
Messages
103
I edited my reply to include something about “using a USB enclosure” at the top of my post.
That's a really cool idea, my concern is with backup speed, I'm afraid I have only USB2 available, so the difference is heavy (write speed is 100MB/s otherwise)
 

Fire-Dragon-DoL

Contributor
Joined
Dec 22, 2018
Messages
103
So, yesterday I decided to follow @winnielinnie suggestions: I formatted my BJUPITER (my second pool with family photos) and executed a "Replication Task" (GUI) with "replicate from scratch" and "almost full file system replication. I included also all options in "periodic snapshot tasks" as well as included the format for my "francesco-" snapshots.

I executed zfs-check to spot any inconsistency and it found 1809.
A simple check on 2 of those inconsistencies show that the files are indeed missing:

(as root):

Code:
zeus# ls -la /mnt/BJUPITER/batcave/Photos/IMG_20230620_082904.jpg
ls: /mnt/BJUPITER/batcave/Photos/IMG_20230620_082904.jpg: No such file or directory
zeus# ls -la /mnt/JUPITER/batcave/Photos/IMG_20230620_082904.jpg
-rwxrwxr-x+ 1 francesco  family  3493898 Jun 24 04:09 /mnt/JUPITER/batcave/Photos/IMG_20230620_082904.jpg


These are also the files I care the most (family photos), so I'm highly concerned by this.

I have no idea what's going on.
The other pool seems to be fine, but I'm obviously a bit skeptical.

I can see no AUTO snapshots have been copied, but the "francesco-" ones have been copied over.

I'll attempt to take a new snapshot and run the replication for JUPITER again.

EDIT:
I did not realize that the photo is very new, so it could be that I didn't snapshot it, which would explain the issue given the missing "auto" snapshots (there was one "auto" taken yesterday)

EDIT 2:
This is way newer and was missing:

Code:
zeus# ls /mnt/BJUPITER/bakpcloud/Sync/Family/Photos/FatteDaErica/2023-03-31/100_0095.JPG
ls: /mnt/BJUPITER/bakpcloud/Sync/Family/Photos/FatteDaErica/2023-03-31/100_0095.JPG: No such file or directory
zeus# ls /mnt/JUPITER/bakpcloud/Sync/Family/Photos/FatteDaErica/2023-03-31/100_0095.JPG
/mnt/JUPITER/bakpcloud/Sync/Family/Photos/FatteDaErica/2023-03-31/100_0095.JPG


After running a new replication task a missing 950MB have been copied over, so I'm running another zfs-check to verify the files have been copied over.
It might not be an official utility, but that zfs-check has been very helpful!
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
ZFS replications deal with snapshots. There is no file-based backup or sync.

First off, what is the latest snapshot on each side, and what are their GUIDs? I'll assume you're not using snapshots to "hold data", but rather that your live filesystem is what you expect to find on the source pool/dataset at any given time.
Code:
zfs list -H -o name -t snapshot JUPITER/bakpcloud | grep francesco | tail -n 1 | xargs zfs get -H guid
zfs list -H -o name -t snapshot BJUPITER/bakpcloud | grep francesco | tail -n 1 | xargs zfs get -H guid

I'm assuming "bakpcloud" is a dataset with no children nested below it? (Only folders nested below?)

What is your pool's dataset hierarchy?
Code:
zfs list -H -t filesystem -r -o name JUPITER
zfs list -H -t filesystem -r -o name BJUPITER
 
Last edited:

Fire-Dragon-DoL

Contributor
Joined
Dec 22, 2018
Messages
103
ZFS replications deal with snapshots. There is no file-based backup or sync.

First off, what is the latest snapshot on each side, and what are their GUIDs? I'll assume you're not using snapshots to "hold data", but rather that your live filesystem is what you expect to find on the source pool/dataset at any given time.
Code:
zfs list -H -o name -t snapshot JUPITER/bakpcloud | grep francesco | tail -n 1 | xargs zfs get -H guid
zfs list -H -o name -t snapshot BJUPITER/bakpcloud | grep francesco | tail -n 1 | xargs zfs get -H guid

I'm assuming "bakpcloud" is a dataset with no children nested below it? (Only folders nested below?)

What is your pool's dataset hierarchy?
Code:
zfs list -H -t filesystem -r -o name JUPITER
zfs list -H -t filesystem -r -o name BJUPITER
I started with the analysis when I realized bakpcloud (and bakgdrive) are both encrypted. Turns out when I formatted the pool and then replicated the pool, I did not run "unlock" on BJUPITER (the backup drive). Unlocking it shows the file again, so zfs-check probably is detecting this.

I'll follow your analysis but this might have solved the issue. I'm re-running zfs-check to confirm.

Code:
# zfs list -H -o name -t snapshot JUPITER/bakpcloud | grep francesco | tail -n 1 | xargs zfs get -H guid
zfs list -H -o name -t snapshot BJUPITER/bakpcloud | grep francesco | tail -n 1 | xargs zfs get -H guid
JUPITER/bakpcloud@francesco-202306241607        guid    5367583431013952673     -
BJUPITER/bakpcloud@francesco-202306241607       guid    5367583431013952673     -


The dataset structure is the following:

Code:
JUPITER
JUPITER/bakgdrive
JUPITER/bakpcloud
JUPITER/batcave
JUPITER/batcave/Photos
JUPITER/mymedia3
JUPITER/mymedia3/francesco
JUPITER/mymedia3/francesco_XXXXXXXX
JUPITER/mymedia3/frasmb
BJUPITER
BJUPITER/bakgdrive
BJUPITER/bakpcloud
BJUPITER/batcave
BJUPITER/batcave/Photos
BJUPITER/mymedia3
BJUPITER/mymedia3/francesco
BJUPITER/mymedia3/francesco_XXXXXXXX
BJUPITER/mymedia3/frasmb


I replaced some text wtih XXXXXXXX because it was essentially revealing my email :eek:
 
Joined
Oct 22, 2019
Messages
3,641
As a sanity check (and you can filter out names like you've already done):
Code:
zfs list -r -t filesystem -o name,encryption,keyformat,encroot JUPITER
zfs list -r -t filesystem -o name,encryption,keyformat,encroot BJUPITER
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
For several years, I have run a weekly data backup to an external USB drive. I'm backing up about 4TB of data.

My server, being older, does not have USB 3 so I added a PCI-USB 3 card. I did my homework and chose an add-in card based upon a chipset that supports Linux and FreeBSD - you have to choose carefully. My external drive is a WD Elements drive. I originally started with a 2TB WD Elements drive, then moved to an 8TB Elements drive when I outgrew 2TB.

To actually copy the files, I use RSYNC:
Code:
rsync -avh --stats --delete /mnt/data1/ /mnt/usb-backup3/

This will copy all the files from "data1" to "usb-backup3". The --delete parameter deletes files on the output that do not exist on the input. --stats parameter simply lists the files being copied. It provides less info than --progress but is more compact and easier to read.

When I originally prepared the external disk, I formatted the disk with "0" swap space - that way I can mount or unmount the volume at will without affecting the rest of the running system. I don't know if this is still an issue with newer versions of TrueNAS, but it was necessary several years ago.

@Arwen has some old posts about backing up via USB.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@Arwen has a resource about how she performs externally attached disk backups;
She did not intended it to be a one size fits all, just a starting point or ideas for others.

Gee, talking about myself in the third person is kinda creepy. Have to wait to late October to do that again :smile:.
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
Ha-ha...

Actually, I found @Arwen resources to be very helpful. And yes, they were a starting point - but that is how one learns. :wink:
 
Top