btrfs dup equivalent backup on truenas?

philipt

Cadet
Joined
Jul 30, 2022
Messages
2
Hi guys, first time poster. I've been using truenas core for a couple of weeks now and I think it's time to get a working backup regime.

What I want to do is rotating two USB-devices to an off site location that I visit a couple of times a year. I have two 4GB devices that I bought when i migrated from OMV to truenas. Now I think they'll do this job perfectly. A friend of mine recommended using btrfs with duplication but then if I understand correctly I need to the backup over the network on a workstation that's running linux. My friend does not know that much about zfs (neither do I) and it's possible that this feature is available on zfs as well?

If I'll go with plugging the drive into my truenas machine it would work for me to have a drive plugged in with a weekly regime of automatic backup to it and when I'm going to the off site location I'll just bring the plugged in drive with me and return later and plugin the drive that I bring back with me.

Does zfs have a function that's similar to btrfs dup? I tried googling a little bit but since I'm a novice the guides or questions asked didn't really seem to address what I want to do. The backup I want to do is a single 300GB dataset that's unencrypted.

Thank you in advance!
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I Just googled "btrfs dup" and read a few lines, Is this helpful?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
That's a good one :grin:

Seriously ZFS pioneered everything that is in BTRFS and the latter is still playing catch up. But of course ZFS cannot be the default filesystem in Linux because not invented here.

The link posted above is correct. Snapshots and replication are your friends. I wonder how you will replicate 300 G if data to a 4 G USB drive, though ...
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@philipt - I assume you meant 4TB drives.

Yes, removable USB drives can be used for backups. However, the USB connector and protocol are not as reliable for long term usage. So, you might consider dis-connecting the "4TB" USB attached drive(s) between backups, just to prevent software dis-connect issues.

Next, ZFS has a better version of "BTRFS dup". Assuming your friend meant this;
Wikipedia - BTRFS - Chunk & device tree - Dup

Basically ZFS was designed with redundant metadata by default, which is the directory entries and similarly important blocks. This is to ensure that a failed directory entry block does not cause loss of data. And if ZFS detects one of the "copies" of directory entries is bad, it will read from the other one, repair the bad one and supply the good blocks to the requestor.

Further, critical metadata has 3 "copies". Which is supposed to prevent loss of the entire ZFS pool, (unless you have really bad disks).

If your vDev is a single disk, by default you get 1 copy of data, 2 copies of metadata and 3 copies of critical metadata.


However, if you have the space, and truly want "BTRFS dup" like feature, ZFS supports that using the ZFS dataset feature called "copies=". The default number of copies is 1, (1 data, 2 copies for metadata, 3 copies for critical metadata). You can change that to be "copies=2", which will be 2 copies of data, 3 copies of metadata and 3 copies for critical metadata. If you are not using encryption, you can also use "copies=3". which is the max, and sets all 3 categories to 3 copies.

Of course if you change "copies=2", you will need more disk space for your data, twice as much. Some can be off-set with ZFS compression compared to uncompressed.

One word of warning, previously written data in ZFS does not take advantage of changed ZFS dataset features. Thus, if you want "copies=2" on everything, you need to start with an empty ZFS pool. Set the top level ZFS dataset to "copies=2", then make you backup. Plus, check your backup method and destination after the backup to make sure it does what you think it should.

Last, as you bring your backup disks from off-site, make sure to run a ZFS scrub on them. That will check for any bad blocks and inform of any, and what they affect, (file(s), or metadata). If the problem was just 1 copy of the metadata, then ZFS will automatically repair it. And let you know.
 

philipt

Cadet
Joined
Jul 30, 2022
Messages
2
@philipt - I assume you meant 4TB drives.

Yes, removable USB drives can be used for backups. However, the USB connector and protocol are not as reliable for long term usage. So, you might consider dis-connecting the "4TB" USB attached drive(s) between backups, just to prevent software dis-connect issues.

Next, ZFS has a better version of "BTRFS dup". Assuming your friend meant this;
Wikipedia - BTRFS - Chunk & device tree - Dup

Basically ZFS was designed with redundant metadata by default, which is the directory entries and similarly important blocks. This is to ensure that a failed directory entry block does not cause loss of data. And if ZFS detects one of the "copies" of directory entries is bad, it will read from the other one, repair the bad one and supply the good blocks to the requestor.

Further, critical metadata has 3 "copies". Which is supposed to prevent loss of the entire ZFS pool, (unless you have really bad disks).

If your vDev is a single disk, by default you get 1 copy of data, 2 copies of metadata and 3 copies of critical metadata.


However, if you have the space, and truly want "BTRFS dup" like feature, ZFS supports that using the ZFS dataset feature called "copies=". The default number of copies is 1, (1 data, 2 copies for metadata, 3 copies for critical metadata). You can change that to be "copies=2", which will be 2 copies of data, 3 copies of metadata and 3 copies for critical metadata. If you are not using encryption, you can also use "copies=3". which is the max, and sets all 3 categories to 3 copies.

Of course if you change "copies=2", you will need more disk space for your data, twice as much. Some can be off-set with ZFS compression compared to uncompressed.

One word of warning, previously written data in ZFS does not take advantage of changed ZFS dataset features. Thus, if you want "copies=2" on everything, you need to start with an empty ZFS pool. Set the top level ZFS dataset to "copies=2", then make you backup. Plus, check your backup method and destination after the backup to make sure it does what you think it should.

Last, as you bring your backup disks from off-site, make sure to run a ZFS scrub on them. That will check for any bad blocks and inform of any, and what they affect, (file(s), or metadata). If the problem was just 1 copy of the metadata, then ZFS will automatically repair it. And let you know.

Wow, thank you for that extensive write up. I am so grateful for the help and it set me on my path. I did some experimentation with copies, disconnecting and reimporting the drive, managed to set up the encryption as well. I'm not sure I would have figured all that out by reading the documentation.

So what I did was set up a striped single disk (USB drive) pool with 2 copies and encryption and then I made a dataset with 3 copies (because space is not an issue), the plan is now to set up some kind of task that backs up everything from the dataset that needs the backup. Disconnect the drive and reimport it every so often to backup again until I go off site. and then keep doing that with the second backup drive until I go off site and switch the location of the drives.

I have one (at least) more question and that's about backup of the data. I'm looking at the tasks because I was sure that rsync was going to be the way to go but looking at the lists of tasks there's also something called replication tasks. Rsync tasks seems to only focus on local machine to remote machine and not within the machine between different datasets. Is Replication Tasks the way to or am I'm looking at something different than those two?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I have one (at least) more question and that's about backup of the data. I'm looking at the tasks because I was sure that rsync was going to be the way to go but looking at the lists of tasks there's also something called replication tasks. Rsync tasks seems to only focus on local machine to remote machine and not within the machine between different datasets. Is Replication Tasks the way to or am I'm looking at something different than those two?
Replication Tasks:
Sync Task:

This should be all relative official documentation.
My take Is that "replication" Is ZFS replication, while sync replication Is just simple file replication (no snapshot), but don't take my words for granted as I am not clear myself of the differences.
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@philipt - Some cautionary notes about ZFS encryption. By default, all child datasets of an encrypted dataset are encrypted too. So, if you want both an ZFS encrypted dataset, (maximum 2 copies, because the ZFS encryption checksum takes up the 3rd copies' slot), and un-encrypted datasets, simply make them like so;

my_pool/crypt_ds
my_pool/non_crypt_ds

Then both trees of datasets can have children that either inherit the encryption or no encryption.

Next, be very careful in storing any ZFS encryption credentials. Those can be key files or passphrases, both of which could be printed out and locked in a secure location. We have seen a few people use TrueNAS encryption, loose their credentials and desperately want to "hack in". Except we can't help them do so. Thus, they performed self inflicted ransomware on that data. I would like to help people avoid that fate.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
I have one (at least) more question and that's about backup of the data. I'm looking at the tasks because I was sure that rsync was going to be the way to go but looking at the lists of tasks there's also something called replication tasks. Rsync tasks seems to only focus on local machine to remote machine and not within the machine between different datasets. Is Replication Tasks the way to or am I'm looking at something different than those two?
ZFS replication works locally as well as remotely. If both ends have ZFS, replication is vastly superior to rsync because rsync needs to travel the entire file system and compare all files to to detect the changes while ZFS has this information immediately: All changes between two snapshots are already registered in metadata.
 
Top