This is impossible, isn't it? (Client --> TrueNAS --> Backups)

Joined
Oct 22, 2019
Messages
3,641
I have finally accepted defeat. Naively I believed that ZFS could serve as a replacement for third-party sync / backup tools by leveraging snapshots and replications. However, after trying every possible combination, it appears that this isn't the case, and other users (elsewhere) have accepted the same thing:

---

Currently, I have a simple setup of clients (non-ZFS) rsync'ing to TrueNAS (ZFS), and from TrueNAS replicating to backups (ZFS).
  1. Client (ext4, XFS, Btrfs) regularly rsync's to a dataset on TrueNAS
  2. TrueNAS (ZFS) makes periodic snapshots of this dataset
  3. TrueNAS (ZFS) replicates this dataset's snapshots to another backup pool (ZFS)
  4. Client continues to rsync regularly over the network, life is good
---

The problem: Unlike ZFS, rsync (and practically every backup / sync tool out there) is not record-based; it's file-based. Which means moving files and folders around, renaming large files, renaming folders, re-organizing the directories, etc, will cause the next rsync to inefficiently delete and transfer everything all over again. However, if it was pure ZFS, only metadata actually changed between the previous snapshot, and thus a replication takes mere seconds to complete.

So with a new laptop in hand, I had the idea to make my data drive ZFS, with the following setup in mind:
  1. Client (ZFS) takes periodic snapshots and replicates to TrueNAS (ZFS)
  2. TrueNAS (ZFS) replicates this dataset's snapshots to another backup pool (ZFS)
  3. Client continues to replicate regularly over the network, life is good?
---

I discovered the problem lies in step #2.

I have a task / snapshot on TrueNAS (aptly named "backup_YYYYmmddHHMM") which I use to essentially replicate all the datasets on the pool to a separate offsite pool. (The simplicity makes it intuitive, as a single "backup" snapshot is a point in time for all datasets in the pool that are safely backed up elsewhere.)

The issue is that replications appear to only work one way. Everything moves this direction or that direction, but not both.

Because the dataset on TrueNAS has a snapshot "backup_202108190000" (used as an all-inclusive pool-wide backup to be saved elsewhere), the next time I try to replicate from the client, it complains that the "destination has been modified." The only way around this is to force a rollback (with the recv -F option), which will destroy the backup snapshot, and thus kills any hope to do future incremental replications from TrueNAS to my offsite backup pool.

---

Likewise, if I try to do a "two-way" replication, where I replicate from TrueNAS to my client PC first, and then replicate from my client PC to TrueNAS with the regular snapshots, it will undo any changes to my client PC up to the point in time since the last time I ran a backup replication to the offsite pool!

---

I don't quite understand why ZFS cannot operate by accepting snapshots from a client, while also making its own snapshots locally to the same dataset, as long as they all share common base snapshots?

CLIENT
TRUENAS
@auto001 ---> zfs send (full) --->@auto001
@auto002 ---> zfs send -i @auto001 @auto002 --->@auto002
@backup001 (created locally, replicate from here everything to offsite pool)
@auto003 ---> zfs send -i @auto002 @auto003 --->@auto003
@auto004 ---> zfs send -i @auto003 @auto004 --->@auto004
@auto005 ---> zfs send -i @auto004 @auto005 --->@auto005
@backup002 (created locally, replicate from here everything to offsite pool)
@auto006 ---> zfs send -i @auto005 @auto006 --->@auto006
And so on...And so on...

If you look at the table above, apparently an incremental send from @auto002 to @auto003 is not allowed, and neither is an incremental send from @auto005 to @auto006.
 
Last edited:

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Unlike ZFS, rsync (and practically every backup / sync tool out there) is not record-based; it's file-based. Which means moving files and folders around, renaming large files, renaming folders, re-organizing the directories, etc, will cause the next rsync to inefficiently delete and transfer everything all over again.

Just not use rsync then ?

Here, with Nextcloud, I do not have any problem. The Nextcloud agent will sync only what needs to be and then, TrueNAS and ZFS do the rest. First TrueNAS is copy No1 and ZFS sends that to the 2 others, providing me with copies 2 and 3.

Nextcloud's sync is also 2 ways : if I upload something from one device, it will be downloaded and sync from other devices (if I chose to sync that folder at least...).

I don't quite understand why ZFS cannot operate by accepting snapshots from different clients to the same dataset as long as they all share common base snapshots?

For the simple reason that none of these clients' originals are the same.

Take client No1 : its ZFS hierarchy and structure are in certain sectors. it puts data in other blocks and ends up in state No1. That state No1 is then replicated to your ZFS target. When some changes happen on client No1, modified blocks result in state No2. The delta between the 2 can then be pushed to the ZFS target. But when client No2 joins the game, its ows ZFS structure is in different blocks, just like the data (even if they are the same files) also end up on different blocks. As such, first state for client 2 has nothing in common with either state 1 or state 2 from client No1. The consequence is that you can not express any state on client X as a function or modification compared to a state on client No1.

Even if you start client No2 from a snapshot related to client No1, there is no way modified blocks will be the same on the 2. Disks are spinning and whenever one will have something to write, it will be anywhere but at the very same place that the other used for its own changes... So they will be on different blocks.
 
Joined
Oct 22, 2019
Messages
3,641
Take client No1 : its ZFS hierarchy and structure are in certain sectors. it puts data in other blocks and ends up in state No1. That state No1 is then replicated to your ZFS target. When some changes happen on client No1, modified blocks result in state No2. The delta between the 2 can then be pushed to the ZFS target. But when client No2 joins the game, its ows ZFS structure is in different blocks, just like the data (even if they are the same files) also end up on different blocks. As such, first state for client 2 has nothing in common with either state 1 or state 2 from client No1. The consequence is that you can not express any state on client X as a function or modification compared to a state on client No1.
Looks like there's no way around it. Was hoping to ditch file-based syncs and leverage ZFS's snapshots and records-base replications.


Just not use rsync then ?
Rsync was only one of many possibilities, but they all suffer from the same limitation: transfers are based on folders, files, and file names, whereas ZFS replications are based on records. This means that regardless if it's rsync or a commercial sync program, re-organizing and moving files and folders around will fool the software into thinking "these files are deletee from source, and will be deleted from the target, and these files are brand new and will be copied to the target." When in fact there are no deleted files nor new files, just renamed and re-arranged and re-organized.

However for ZFS in the above scenario, the only new "records" are from metadata changes (locations, directories, file names, etc), and thus the replication would be extremely efficient and fast.

I thought it would be possible to replace rsync with replications from client to TrueNAS (and then from TrueNAS to backups).

Even without specifying "rsync" in a web search, you'll come across other users that outright want to substitute ZFS replications in place of rsync. :wink:

Even the two-way alternatives (such as Unison) are still file-based and suffer from the same limitations as rsync.
 
Last edited:

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Joined
Oct 22, 2019
Messages
3,641
In case you missed it...
I didn't miss it. Setting up Nextcloud is overkill for something like this, plus it doesn't leverage the features of ZFS in terms of snapshots and replications. There's nothing fancy happening on the dataset that neccessitates any type of collaborative suite nor extra layer nor plugin/jail.

Might end up using a script on the client that takes snapshots and replicates to TrueNAS, then from TrueNAS to my backups will use another script that grabs the most recent snapshot on that particular dataset and uses it to complete an incremental backup with the -I flag (includes all intermediary snapshots.)

It'll be separate from the script I use for the rest of my pool.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Setting up Nextcloud is overkill for something like this,

You do not need to use all of its features. Just do file replication if that is all you need... @danb35 published a script for an easy install of it in TrueNAS. So many did it here, you sure can do it too.

it doesn't leverage the features of ZFS in terms of snapshots and replications

You may not see it but it does here: it sure benefits from both. Thanks to these snapshots and the replication, I have my complete backup of everything and an easy-to-do restore plan.

There's nothing fancy happening

The best solutions are the simplest ones; not the fanciest, most complicated, most distorted, ...

Might end up using a script

That script will not work without wasting a ton of resources and putting your data to maximum risk.

Each of your independent client will need to replicate to a different target. The server will then need to mount each of them, merge them and then create a new set of data containing that marge. Then that single merge can be replicated. Now have fun managing concurrent changes, duplicates, collisions and more when you will do your merge at file level from a mechanism that is block level.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
rsync also does not have to be slow. Yes, it does an exhaustive search (comparing every file and traversing every directory) but if you throw some hardware / software at it, the process can be pretty quick - using either a persistent metadata-only L2ARC or via a sVDEV. Both approaches address the biggest issue with rsync I've had - the incredible amount of directory I/O.

As a bonus, browsing, indexing, or otherwise traversing directories for the user is also much faster.
 
Joined
Oct 22, 2019
Messages
3,641
rsync also does not have to be slow. Yes, it does an exhaustive search (comparing every file and traversing every directory) but if you throw some hardware / software at it, the process can be pretty quick - using either a persistent metadata-only L2ARC or via a sVDEV. Both approaches address the biggest issue with rsync I've had - the incredible amount of directory I/O.

As a bonus, browsing, indexing, or otherwise traversing directories for the user is also much faster.
That's not the main reason I'm trying to move away from rsync (to a pure ZFS approach.)

The real problem lies in that rsync is not block-based (i.e, records-based). Moving stuff around, renaming, reorganizing, etc, tricks rsync into think that many files and folders (even large ones) have been deleted and newly created, and thus those will need to be copied to and deleted from the destination.

I know about rsync's "fuzzy" option, but it's rudimentary and doesn't work well beyond one or two folders.

---

Say you have a 1GB file named "customdistro.iso" under a folder named "Downloads". But later on during some spring cleaning and re-organizing, you rename the file to "Custom Distro.iso" and place it into a folder named "ISOs" (maybe even later on you rename this folder to "Distros").

This seemingly minor change will have rsync delete the 1GB file from the destination, and then re-upload it again since rsync mistakenly treats it as a "new" file.

While on pure ZFS, only kilobyes of metadata have changed and need to be transferred. Barely anything at all.

---

Each of your independent client will need to replicate to a different target.
Perhaps my use of the word "client" was too ambiguous. Each client/computer has its own dataset; nothing is shared between them. So currently, they rsync to their own dataset, and that particular dataset gets snapshots and replications to a backup pool/dataset (of the same name). There is no collision nor merging.

The only difference would be the green part highlighted above: they would ZFS snapshot/send/recv to their own dataset.
 
Last edited:

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
I’m in violent agreement that using ZFS as a backup protocol is far superior to rsync. The latter not an efficient way to transmit changes and ZFS send is the way to go.

Additionally, rsync gives you a single snapshot in time, ZFS gives lots over time, which can be super helpful for ransomware attacks; right up to the storage capacity limit of the file system.

But, I’m sticking to rsync for some of my offline backups because that allows me to natively read them, no special install required and 100% known file system compatibility. I gradually want to convert them too to ZFS but it’s one thing at a time here…

….sort of like rsync. if only I could function at a block-storage / meta level like ZFS…:smile:
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Likewise, if I try to do a "two-way" replication, where I replicate from TrueNAS to my client PC first, and then replicate from my client PC to TrueNAS with the regular snapshots, it will undo any changes to my client PC up to the point in time since the last time I ran a backup replication to the offsite pool!

Don’t snapshot your replica dataset on the backup server and it should work fine.

you can select multiple snapshot schedules to run replications, and you can enter multiple snapshot names in the replication. One does not need to be made by the other to work.
 
Joined
Oct 22, 2019
Messages
3,641
Don’t snapshot your replica dataset on the backup server and it should work fine.
This is the route I'm likely to take. The issue is that I had been using an "over-arching" recursive snapshot for all datasets in the pool (to then be used to backup elsewhere). It worked well, since it's a "one and done" process, and works across everything in the pool: rather than many tasks, it's just one task.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
This is the route I'm likely to take. The issue is that I had been using an "over-arching" recursive snapshot for all datasets in the pool (to then be used to backup elsewhere). It worked well, since it's a "one and done" process, and works across everything in the pool: rather than many tasks, it's just one task.

Have you considered using the "exclude" field in the snapshot task?

I just tweaked my over-arching recursive snapshot to exclude my tank/replicas dataset... seems to have fixed a similar problem for me. It now takes snapshots of everything except the replicas...

It helps that I store all of my replicas under <pool>/replicas/<hostname> on the various hosts where I replicate too (on site, offsite, etc)

When you enable multi-dataset replication (ticking boxes in the replication task), it works quite nicely to just specify the remote pool's replica/host directory and truenas will build the dataset hierarchy.
 
Top