TrueNAS Replication

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Hi everyone,

I'm having a hard time properly setting up TrueNAS Replication and I was hoping to get some assistance / guidance from you good folks ;)

First a little background and what I'm trying to achieve:
  • My master-TrueNAS server is currently running TrueNAS Core (latest version). I plan to replace (fresh install) this OS with TrueNAS Scale (latest version), once I feel comfortable enough to set it up.
  • Currently I'm using loose HDDs as an irregular offline backup strategy (often multiple months old)
  • I recently bought an old SuperMicro server that I plan to use for the following purposes:
    • Replace my current offline backup strategy of loose HDDs (the idea is to keep it unplugged from the power unless I'm making a new backup, like once a week or month)
    • Mirror the configuration of my main TrueNAS server as much as possible, so that it can function as a "spare" in case my main TrueNAS server is offline / broken.
    • Test environment for new / changed functionality.
I've setup my backup-TrueNAS server with TrueNAS Scale and configured all users, groups and datasets. And now I'm trying to master the TrueNAS replication, but it seems quite hard to me...
First I spend waaaay too much time on setting up a SSH Connection between my backup-TrueNAS (Scale) and master-TrueNAS (Core). As it is bad practice to enable ssh for the root account, I wanted to use a purposely-setup user (ssh-replicator) for this. Although I did succeed in the end, it did give me more grey hairs then I wished for ;)

So over the weekend I was finally able to successfully run my first replication task, but I do notice some behavior that introduces some new questions:
  1. As a test I've removed an old snapshot on the backup-TrueNAS and created (at least the cronjob did) some new snapshots on the master-TrueNAS. However, when I now re-run my replication task, it says "No snapshots to send for replication task 'task_1' on dataset 'hgstpool/datads'" and then does nothing. I was expecting it to transfer the new snapshots and re-upload the removed old snapshot to make the dataset in sync once more.
    These are my (relevant) settings for my replication:
    replication1.jpg

    Does anyone know what I'm doing wrong, what I'm missing or what I'm misunderstanding?
  2. I noticed that the replication also has overwritten the ACLs that I have created on the backup-TrueNAS with the ACLs of the master-TrueNAS, which completely breaks the permissions on the backup-TrueNAS, as it has completely different GIDs for those groups. What is the recommendation to keep these in sync?
    I noticed also that TrueNAS Core non-builtin GIDs start at 1000, while on TrueNAS Scale they start at 3000, which complicates things even more. Should I manually start editing the /etc/group file to make the GIDs match? Or should I recreate all my groups on both master-TrueNAS and backup-TrueNAS so that I can configure them in the GUI with matching GIDs? (that would be a lot of work).
    Or is there perhaps a way to disable overwriting the ACLs during the replication?
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
  1. Or is there perhaps a way to disable overwriting the ACLs during the replication?
You checked "Include Dataset properties". I would imagine that includes the ACL. You could uncheck that option and replicate again (probably need to check "replication from scratch").

  1. As a test I've removed an old snapshot on the backup-TrueNAS and created (at least the cronjob did) some new snapshots on the master-TrueNAS. However, when I now re-run my replication task, it says "No snapshots to send for replication task 'task_1' on dataset 'hgstpool/datads'" and then does nothing. I was expecting it to transfer the new snapshots and re-upload the removed old snapshot to make the dataset in sync once more.
Take this with a grain of salt, as I'm still in the process of trying to understand snapshots / replication tasks myself. I mainly typed this up as an exercise for myself:

From the documentation, emphasis by me:
Snapshots are one of the most powerful features of ZFS. A snapshot provides a read only point-in-time copy of a file system or volume. This copy does not consume extra space in the ZFS pool. The snapshot only records the differences between storage block references whenever the data is modified.

to make the dataset in sync once more.
Your live dataset, the data that is actually there is in sync, independent of old snapshots. On your first full replication all data got transferred along with various historic points in time you can visit (the snapshots). When you modify data and the replication runs once again, you need a common snapshot as an incremental base. All modifications can be expressed as the difference in data blocks compared to that snapshot. Older snapshots are not needed for that and hence they do not get copied over again. Be careful though: you actually lost data here, at least you cannot retrieve versions of files that got deleted along with that snapshot you deleted.

If no snapshots got transferred at all, after you created new snapshots on the source system, I assume you allowed taking empty snapshots?

I assume in order for the older snapshots to get sent again you would need to check full filesystem replication.
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
Thanks for your response!

I've been trying some more with a completely new destination dataset and new replication task. I'm afraid that unchecking "Include Dataset properties" does not prevent the source ACL from overwriting the destination ACL.

I know how snapshots work and empty snapshots are disabled on the source, so that can't be the reason of the new snapshots not being transferred I'm afraid.

I've also discovered a new issue:
My SSH Connection only works temporary, as editing /etc/local/sudoers gets overwritten as soon as you edit a user. So my passwordless-sudo for my replication user gets broken when that happens.
How do other people do this then? I hope I'm not supposed to use my root user for my replication??? (this would mean I have to enable ssh for my root account and I'd prefer not to do that)
 

Mastakilla

Patron
Joined
Jul 18, 2019
Messages
203
As I would really like to make some progress, I've been trying the following:
  • I've created a test-ds dataset on both master-truenas and backup-truenas
  • I've created a test-user and test-group with the same UID and GID on both master-truenas and backup-truenas. test-user has test-group as secondary group and nogroup as primary group.
  • I've created an ACL giving test-group full control on both master-truenas and backup-truenas
  • To avoid requiring passwordless-sudo for my replication user, I've tried to uncheck "Enable passwordless sudo for zfs commands" when creating the SSH Connection and also uncheck "Use Sudo For ZFS Commands" when creating the Replication task
    • This causes the Semi-automatic method for creating a SSH Connection to finally work. So apparently when setting up a SSH Connection from TrueNAS Scale to TrueNAS Core, you're not allowed to enable this. Not sure if I should do a bug report for this, as the documention doesn't mention this at all and also the error could be more clear.
    • The help for this setting says you must use "zfs allow" instead, but no where is clearly described how to do this exactly. I found following forum topic on this subject "https://www.truenas.com/community/t...allow-equivalent-in-the-gui.94429/post-724288", but when actually trying this ('zfs allow ssh-test create,destroy,receive,mount,refreservation,rollback hgstpool/test-ds' and 'zfs allow ssh-test create,destroy,receive,mount,refreservation,rollback hgstpool'), my Replication task still gives the error:
      warning: cannot send 'hgstpool/test-ds@auto-2024-03-01_10-27': permission denied
      cannot receive: failed to read from stream.

Does anyone know what I'm doing wrong? Thanks!!

Fyi:
I've also noticed that running the replication task (which aborts before actually starting the replication) REMOVES the destination dataset. This is with "Replication from scratch", "Include Dataset Properties" and "Full Filesystem Replication" all unchecked!
This explains why the source ACL overwrites the destination ACL I think...
 
Last edited:
Top