Is My Data Lost? A ZFS Encryption / Replication Remote Keys Unable to Be Loaded Question

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
OK, so I know this is another Is my data lost question, and I am sorry to be posting it. I have two questions and appreciate any help you can give.

First, let me describe my situation. I have two machines, one local, one remote. I also recently migrated from one pool to another in both systems as I wanted to change my vdev layouts. I fear in doing this I have introduced some irrecoverable error.

Local TrueNAS Scale 22.12.2
-FrickNASty (still available but would like to delete to add vdevs to pool2) (pool)
----Encrypted Dataset (passphrase)
--------PhotoVideo (inherits encryption)
--------Dataset2 (inherits encryption)

-Roshar(new pool, different vdev layout, in process of transferring over)
----Encrypted Dataset (passphrase, newly created but same passphrase)
--------PhotoVideo (copied over from FrickNASty/Encrypted/PhotoVideo using zfs send -Rw | zfs recv -Fuv originally individually locked (as in not inherited), but after I load the keys and unlock it, I set it to unlock with parent and to inherit parent encryption)
--------Dataset2 (originally individually locked (as in not inherited), but after I load the keys and unlock it, I set it to unlock with parent and to inherit parent encryption)

Remote TrueNAS Core 13.0U5
-BrownNASbackup (no longer available, old config, but potentially relevant) (pool)
----RemoteBackups (passphrase encrypted dataset)
--------PhotoVideo (ZFS replication task target from FrickNASty/Encrypted/PhotoVideo)

-Yolen (new pool with new vdev configuration) (pool)
----RemoteBackups (passphrase encrypted dataset, same passphrase as everything else)
--------PhotoVideo (Copied over dataset and all snapshots using "zfs send -Rw BrownNASbackups/RemoteBackups/PhotoVideo | zfs recv -Fuv Yolen/RemoteBackups/PhotoVideo", then unlocked successfully using passphrase, then via GUI said to unlock with parent and inherit parent encryption)
--------Dataset2 (ZFS replication task target)


1) For the Yolen/RemoteBackups/PhotoVideo that was originally created from zfs replication target and copied over from BrownNASbackup to Yolen, have I lost the ability to access this data since I changed it to inherit parent encryption and I presume from my options that the IVs were not sent?

I copied over dataset and all snapshots using the following:
zfs send -Rw BrownNASbackups/RemoteBackups/PhotoVideo | zfs recv -Fuv Yolen/RemoteBackups/PhotoVideo
then unlocked successfully using passphrase, then via GUI said to unlock with parent and inherit parent encryption

if I do zfs get encryptionroot Yolen/RemoteBackups/PhotoVideo it says the root is Yolen/RemoteBackups
So I zfs load-key Yolen/RemoteBackups and enter passphrase and it seems like it loads the key
And then if I zfs load-key Yolen/RemoteBackups/PhotoVideo it says error: keys must be loaded for encryption root.
I verify that with zfs mount Yolen/RemoteBackups that I can mount it, great
then I do zfs mount Yolen/RemoteBackups/PhotoVideo and I get a Permission denied error
OK, so zfs get keystatus Yolen/RemoteBackups/PhotoVideo shows "keystatus: available"
Ok, so lets load it, keyload error, must be loaded for encryption root. Boo.
zfs get keystatus Yolen/RemoteBackups shows key is available and then "load-key" shows that key is already loaded.

I used the same passphrase throughout for all encrypted datasets. I think my error was changing the dataset to use inherit encryption after sending with the -R flag. Is there anyway to change this. It was created from the local FrickNASty/PhotoVideo which I still have access to.

I am bringing up this whole mounting error because ZFS replication from FrickNASty to Yolen fails only on this dataset and not any others. It actually causes a kernel panic and my machine resets. I can also trigger the same kernel panic and reset by trying to unlink the zfs encryption from the parent and use the same passphrase again on the PhotoVideo dataset, so I think this is where the issue is.

there are older snapshots on Yolen/RemoteBackups/PhotoVideo that I would like to keep, which is why I am not just starting over with the replication task. I want to resume the replication from the last in sync auto snapshot, which both pools have many in common still. I rolled Yolen/RemoteBackups/PhotoVideo back to this last snapshot but the replication task fails, I believe because of these encryption differences.

Is there anything I can do to fix this and save Yolen/RemoteBackups/PhotoVideo?

2). In moving over from FrickNASty to Roshar, have I introduced the same issue?

I copied over from FrickNASty/Encrypted/PhotoVideo using "zfs send -Rw | zfs recv -Fuv" into Roshar/Encrypted/PhotoVideo. After the zfs send recv, it was locked as expected, so I unlocked it, and then set it to unlock with the parent (inherit parental encryption) which was a newly created dataset with the same passphrase as FrickNASty/Encrypted.

I can access Roshar/Encrypted/PhotoVideo however, and it acts as expected when unencrypting the parent, but is this just because I haven't unmounted and deleted FrickNASty/Encrypted and it is still loading the IVs from there? I could always unmount FrickNASty pool all together, then reboot and see if I can access the datasets as expected or not.

How can I fix this?

@morganL ,
this is the root issue of the zfs send recv thread I had started earlier that you had commented on.]
 
Last edited:

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
Come on dude. :tongue::grin::tongue::grin:

I need to collect my laughter. I'll try reading your post again.

Thought it was a good name playing off my last name back when I only had one server and was much younger. Then I named a backup after where it was stored. But it was a bad system.

Now I have this whole schema of naming servers based on worlds in a book series with individual VMs being named after characters on those worlds. But that's all aside. I almost left it off but it got too confusing for me to keep changing FrickNASty to local-truenas-1, so you get the real names.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
-BrownNASbackup (no longer available, old config, but potentially relevant) (pool)
To get one thing out of the way, what do you mean by this?

The BrownNASbackup pool shows up in the Dashboard and GUI, but doesn't actually exist?
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
To get one thing out of the way, what do you mean by this?

The BrownNASbackup pool shows up in the Dashboard and GUI, but doesn't actually exist?

Thanks for bringing that up and reading, let me clarify that more.

I destroyed BrownNASbackup pool. It doesn't show up in GUI or on zpool list because I destroyed it, wiped the drives, and used them to make Yolen. I did a little drive shuffle with a few extra drives then removed those drives, but that's aside from the point and there were no errors on scrub and the current Yolen pool is not degraded. I mostly describe that to say that Yolen/Encrypted's datasets were all zfs send recv'd into it and not natively from there.
 
Joined
Oct 22, 2019
Messages
3,641
And then if I zfs load-key Yolen/RemoteBackups/PhotoVideo it says error: keys must be loaded for encryption root.
You only ever lock/unlock encryptionroots. If a dataset is part of an encryptionroot, you don't do anything key-related with it. (Unless you break its inheritance, in which it becomes its "own" encryptionroot.)

I think you went too much into the weeds, when you should have remained within the GUI to unlock/lock the datasets. TrueNAS doesn't allow you to manually mount/unmount a dataset from the GUI. And once you unlock an encryptionroot, everything is automatically mounted.
 
Joined
Oct 22, 2019
Messages
3,641
Are you doing replications (send/recv) from Local to Remote, and then manually doing things on the Remote afterwards? You can't start manually toying with any of the target (received) datasets on the Remote, which serve as backups. That's not how snapshots and datasets work. (Otherwise, any changes you make will be wiped and undone the next time you send/recv or replicate.)
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
You only ever lock/unlock encryptionroots. If a dataset is part of an encryptionroot, you don't do anything key-related with it. (Unless you break its inheritance, in which it becomes its "own" encryptionroot.)

I think you went too much into the weeds, when you should have remained within the GUI to unlock/lock the datasets. TrueNAS doesn't allow you to manually mount/unmount a dataset from the GUI. And once you unlock an encryptionroot, everything is automatically mounted.
I bring this up only to provide more info rather than just what it shows in the GUI.

In the GUI, PhotoVideo appears as if it is encrypted with inherit parent properties but then anytime you try to zfs send to it, it causes a kernel panic.
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
Are you doing replications (send/recv) from Local to Remote, and then manually doing things on the Remote afterwards? You can't start manually toying with any of the target (received) datasets on the Remote, which serve as backups. That's not how snapshots and datasets work. (Otherwise, any changes you make will be wiped and undone the next time you send/recv or replicate.)
So, Yes and No.

I did replication local to remote, I did not change any of the data in remote dataset. I am just moving that dataset to a new pool so that I can continue to zfs send recv to it. I have not changed any of the data in the dataset. Further, to try to mitigate this problem, I rolled back to an earlier common snapshot (there are many) to allow for a known common base from which to send incrementals onto
 
Joined
Oct 22, 2019
Messages
3,641
But the first time you did a full send/recv of "PhotoVideo@snapshotname" from Local to Remote, it finished 100%?
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
But the first time you did a full send/recv of "PhotoVideo@snapshotname" from Local to Remote, it finished 100%?
Yes, back in 2017 when I first set that up. Then afterwards, I have been applying incremental snapshots to it ever since without issue. I think in 2019, when encrypted datasets became available, I created the RemoteBackups encrypted dataset and placed the snapshots from PhotoVideo into that and have been successfully putting incrementals on top of it since then. It only became an issue when I switched pools, which I think was due to the encryption property being copied wrong. I think it was not an issue then since it was unecrypted dataset to encrypted dataset but I do not know exactly what zfs send recv command I used. I can see all snapshots going back to beginning and as recent as April, when I started having this issue. scrubs were regularly done without issue and no errors on drives and no errors on scrub finished within last week.
 
Joined
Oct 22, 2019
Messages
3,641
I meant the "first time" as in with these new pools.

You did a full replication from BrownNASbackup/Encrypted/PhotoVideo@snap -> Yolen/Encrypted/PhotoVideo@snap, and this worked?

But the server crashes/reboots happen only with successive incremental replications? (From Roshar? FrickNASty?)
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
Oh, sorry, I misunderstood.

Yes, I did BrownNASbackup/RemoteBackups/PhotoVideo@snap-latest zfs send -Rw | zfs recv -Fuv Yolen/RemoteBackups/PhotoVideo

and it completed completely. I was able to see all the snapshots contained and verify the beginning and end were the same and they had the same number of snapshots since its around 1,500.

And that's correct, the crashes of Yolen only happen when trying to incrementally replicate from FrickNASty (which is where it used to replicate from successfully, haven't finished the transfer over to Roshar yet).
 
Joined
Oct 22, 2019
Messages
3,641
That's why. You can't do that.

Even though you think "It's basically the same dataset and snapshots", it's not.

Once you replicate from one to the other, you must keep doing so from the same source to the same target. You can't "shift over" to Roshar or FrickNASty as the new source for replications just because "It has the same dataset name and snapshot names, and they all originated from the same birthplace".

You'll have to do the first full replication from Roshar -> Yolen, and any subsequent incremental replications must come from Roshar -> Yolen.
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
That's why. You can't do that.

Even though you think "It's basically the same dataset and snapshots", it's not.

Once you replicate from one to the other, you must keep doing so from the same source to the same target. You can't "shift over" to Roshar or FrickNASty as the new source for replications just because "It has the same dataset name and snapshot names, and they all originated from the same birthplace".

You'll have to do the first full replication from Roshar -> Yolen, and any subsequent incremental replications must come from Roshar -> Yolen.

Why would that be the case? If it is the same dataset, not just a new dataset with the same name, but the same dataset with all of its raw properties sent, what does it matter what route it takes to get there? How would you move backups from one pool to another during an upgrade if this were the case - I can imagine Solaris would have encountered such a condition?
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
Though I wonder if I am getting into the weeds, perhaps the better question is why can't I access the Yolen/RemoteBackups/PhotoVideo dataset and why does attempting to unlock it cause a kernel panic. as a way of rewording my question #1
 
Joined
Oct 22, 2019
Messages
3,641
If it is the same dataset, not just a new dataset with the same name, but the same dataset with all of its raw properties sent, what does it matter what route it takes to get there?
Sent ("created") from a different source. You might be able to get away with it if you use an existing snapshot, rather than make a new snapshot for the first-time replication (creation). But you already deleted BrownNASbackup.

I can draw something out that might explain this better.
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
Sent ("created") from a different source. You might be able to get away with it if you use an existing snapshot, rather than make a new snapshot for the first-time replication (creation). But you already deleted BrownNASbackup.

I can draw something out that might explain this better.

I only ever used existing snapshots during the transfer. and I added no data to the dataset since then.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
What are the GUIDs of the latest (same) snapshots at each end?

Code:
zfs get guid FrickNASty/Encrypted/PhotoVideo@snap-latest
zfs get guid Roshar/Encrypted/PhotoVideo@snap-latest
zfs get guid Yolen/Encrypted/PhotoVideo@snap-latest
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
What are the GUIDs of the latest (same) snapshots at each end?

Code:
zfs get guid FrickNASty/Encrypted/PhotoVideo@snap-latest
zfs get guid Roshar/Encrypted/PhotoVideo@snap-latest
zfs get guid Yolen/Encrypted/PhotoVideo@snap-latest

Ive not yet moved it over to Roshar, but FrickNASty and Yolen have the same GUID.

Yolen/RemoteBackups/PhotoVideo@auto-2023-04-07_00-00 guid 2118776593680550401
FrickNASty/Encrypted/PhotoVideo@auto-2023-04-07_00-00 guid 2118776593680550401
 
Top