Compare contents of two data sets

freenastier · Apr 29, 2018

I am looking for a way to compare the contents of two data-sets. What makes it more complex is that the two data-sets are not on the same system.

I have found the following ZFS command in the Oracle documntation:

Code:

zfs diff tank/cindy@snap1 tank/cindy@snap2

Unfortunately the above command compares snapshots, not data-sets.
Is there an approach that can determine the difference between two data-sets?

I want to achieve that two machines have identical data-sets on both sides. Currently they have data-sets named similar but the content of these data-sets might differ.

After experimenting and thinking this trough the best approach I could come up with is as follows:

Create snapshots of the data-sets on both machines.
Use replication to send then to the other machine.
Execute the 'zfs diff' command on the local and remote snapshot to compare.

Would this work or are there better alternatives?

DrKK · Apr 29, 2018

Sir, unless I am misunderstanding you, you are describing the whole purpose of zfs replication, which is built into the FreeNAS GUI under "storage". Your step 1 is necessary to create a replication task, but there should be no need for your step 3.

You appear to have some confusion between the notion of a "dataset" and a "snapshot". The snapshot *IS* your dataset; the only difference is that which has occurred after the snapshot is not part of the snapshotted dataset. If you would like the two locations to be nearly real time up-to-date with each other, you may select a periodic snapshot task on a very short timescale.

There is nothing more to it than that.

Am I misunderstanding you?

freenastier · Apr 29, 2018

DrKK said:
Sir, unless I am misunderstanding you, you are describing the whole purpose of zfs replication, which is built into the FreeNAS GUI under "storage". Your step 1 is necessary to create a replication task, but there should be no need for your step 3.

I understand that is the purpose of replication, but unfortunately my two machines have been isolated from each other for some time and now contain changes on both sides.
Before I can safely start replicating the data-sets again I first must make sure one of these data-sets does not contain data not present on the other system.

So if I have data-set A1 on system 1 and data-set A2 on system 2. Then I would like to replicate A1 to A2, but before I can do that I must check if A2 does not contain data that was not yet present in A1, otherwise it would be lost when a snapshot of A1 is replicated to A2.
This is a one-time procedure to ensure both systems are in sync with each other. From that moment on I can rely on regular replication again.

You appear to have some confusion between the notion of a "dataset" and a "snapshot". The snapshot *IS* your dataset; the only difference is that which has occurred after the snapshot is not part of the snapshotted dataset. If you would like the two locations to be nearly real time up-to-date with each other, you may select a periodic snapshot task on a very short timescale.

I understand what you are saying. The problem is that my data-sets reside on two machines which should act as a backup of each other. Some data-sets of machine 1 are replicated to machine 2 and a few other data-sets from machine 2 are replicated to machine 1. Due to circumstances this procedure has failed for a while and there have probably been some changes on both sides.

Am I misunderstanding you?

I hope I explained myself clearer, but please let me know it you still have not understood me.

sretalla · Apr 29, 2018

You could run rsync in both directions with the -n switch (so called dry-run), which forces it to make no changes, just reporting what would have been done.

For a copy with only newer files to be sent I tend to use rsync -auv /source/dir/ user@{IP_Address}:/dest/dir/ but you would add n (making it -auvn) to avoid actual copying. Then reverse the source and dest position to have it compare the other way.

While this could show and even correct the differences on both sides, it won't help you to switch to using replication. You could continue just using rsync if it gets the job done, but won't benefit from the goodness that is zfs in making the process truly simplified.

Important Announcement for the TrueNAS Community.

Compare contents of two data sets

freenastier

Dabbler

DrKK

FreeNAS Generalissimo

freenastier

Dabbler

sretalla

Powered by Neutrality

Similar threads

Important Announcement for the TrueNAS Community.

Compare contents of two data sets

freenastier

Dabbler

DrKK

FreeNAS Generalissimo

freenastier

Dabbler

sretalla

Powered by Neutrality

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Compare contents of two data sets"

Similar threads