Confused by native ZFS encryption, some observations, many questions

Phil1295

Explorer
Joined
Sep 20, 2020
Messages
79
@winnielinnie
Thank you for your guides and explanations
I am using the subroot dataset trick successfully for Replication Tasks of zfs native encrypted pools

Replication task exp:
  • source: mainpool/tank (zfs native encrypted, inheritance enabled)
  • target: offsitepool/tank
  • recursive replication
  • we want to preserve the properties: check it (will force a raw send with source encryption. Currently, zfs has no implementation to send all properties except the encryption so that we can apply a custom encryption on target while preserving other properties of the dataset)

I noticed the following things:
  • offsitepool must be encrypted else replication task fails. I used the same encryption key as source pool (optional)
  • the dataset 'tank' must not exist on target offsitepool. It will be created by the replication task, else replication task fails
  • after replication, we can unlock the target dataset 'tank' using same key as source
  • we can change the the target dataset 'tank' encryption and make it inherit from offsitepool
  • we can change the encryption of offsitepool to use a passphrase instead of a key
  • we now lock the target offsitepool
  • now, if we run the replication task again (incremental), the target datasets will revert to the same encryption key as the source, despite only incremental snapshots were sent !

That last step is really strange. Is it expected ? Is the encryption mode (key vs passphrase) switchable on the fly by a replication task ? How can the replication task do this since the target was locked and supposedly cannot be accessed without the passphrase ?
 
Joined
Oct 22, 2019
Messages
3,641
That last step is really strange. Is it expected ?

That's how it works. It has to do with encryptionroots, which is effectively assigned when you make a dataset "inherit" its encryption properties. Every time you run a replication, it "undoes" the inherit property, and now tank becomes its own encryptionroot again; losing offsitepool as its encryptionroot.

---

How can the replication task do this since the target was locked and supposedly cannot be accessed without the passphrase ?

Raw sends don't require any decryption nor re-encryption. The records are sent "as is".

---

The terminology in here might clear up some of the mystery:

Start reading from the part that says,
* I don't always use the official terminlogy, and when it comes to native ZFS encryption I try my best to make it clearly distinct "what is what". I prefer terms that reflect what something actually "does", such as a "User Key" being the user's responsibility.
 

Phil1295

Explorer
Joined
Sep 20, 2020
Messages
79
That's how it works. It has to do with encryptionroots, which is effectively assigned when you make a dataset "inherit" its encryption properties. Every time you run a replication, it "undoes" the inherit property, and now tank becomes its own encryptionroot again; losing offsitepool as its encryptionroot.

I really did read all your topics when I migrated from GELI a few months ago. I use the subroot dataset tank also

However, this still not clear in my case because:
- I did change the user key on the target
- I did lock the target
- And after that I did send an INCREMENTAL snapshot. Only the last snapshots were sent
- The last step did reset the user key on the target to the one from source, despite the target dataset was locked

I do well understand that zfs sends raw block, that's fine. However, how can it change the target user key of a locked dataset ?
After it changed the user key, all the data is properly there and readable after unlock with the new key !
 
Joined
Oct 22, 2019
Messages
3,641
- The last step did reset the user key on the target to the one from source, despite the target dataset was locked

I do well understand that zfs sends raw block, that's fine. However, how can it change the target user key of a locked dataset ?
After it changed the user key, all the data is properly there and readable after unlock with the new key !

Think of it as changing the "hat" on top of a person's head. No part of the person's body is modified. Their original hat has a scrambled version of the Master Key. If you use the original keystring/passphrase against it, it will correctly reveal the Master Key.

But then later you place a new hat on their head. (The needed Master Key is always the same.) But this time the new hat has a different scrambled version of the Master Key. If you try the original keystring/passphrase in an attempt to reveal the Master Key, it will fail. If you try the new keystring/passphrase, it will succeed and reveal the Master Key. Now you can encrypt/decrypt (read/write) existing or new records.

What is happening with the raw send (which always implies all dataset properties) is the hat is being replaced if you placed a new hat over their head on the destination.

So what I do is I leave the backup destination alone. There's no need to make it inherit offsitepool's encryption properties.

If you want it always based on a passphrase, you need to change that on the source (mainpool/tank). Then just let the raw sends do their thing.

---

Is there a reason why you can't make mainpool/tank its own encryptionroot with a passphrase (breaking inheritance from mainpool), and just go from there?

If you ever need to switch to the backup, it will use the same passphrase. It really doesn't matter what mainpool and offsitepool use. They can be keystring or passphrase, whichever you prefer. If you need to use iocage and .system dataset on mainpool (the top-level dataset itself), it needs to be keystring (auto-unlock at boot), in which you will need to export the keystring and keep it safe.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Another way to look at it is the Master Key is a one-time permanent thing, always used to encrypt/decrypt the records. It can never change on the dataset. You don't ever use it directly (as it would require memorizing a random sequence of a whopping two-hundred-fifty-six 1's and 0's.) It is randomly generated during creation of a new encrypted dataset.

But if this Master Key isn't going to exist anywhere on nonvolatile storage, where can it exist? It needs to be used, after all. It sits nice and comfy inside the dataset's metadata itself. But it's a scrambled version of it that lives here. (Otherwise, anyone can just read the metadata, and viola, they have the real Master Key, and thus encryption is pointless.)

---

So we create a User Key (based on keystring or passphrase).

To create a new scrambled master key: User Key + Master Key + PBKDF (function) = Scrambled Master Key

The Scrambled Master Key lives nice and comfy in the dataset's header.

The way to reverse the Scrambled Master Key to reveal the real Master Key is to provide the correct User Key.

To reveal the real master key: User Key + Scrambled Master Key + PBKDF (function) = Master Key

While a dataset is unlocked (which means the real Master Key is available in RAM), you can create a new Scrambled Master Key as many times as you want, replacing it on the dataset's header. This is not possible if the dataset is currently locked (no Master Key available for the function to create a new Scrambled Master Key.)

---

So now look at your source and destination for tank.

mainpool/tank and offsitepool/tank will always require the same Master Key in order to access the records. They can differ in Scrambled Master Keys, which implies you are using a different User Key on either of them.

In order to use the User Key, you either need to memorize your passphrase, memorize a 64-character HEX string (not feasible), or export the HEX string. If you cannot use the User Key, you cannot ever descramble the Scrambled Master Key, and hence you cannot ever reveal the real Master Key, and hence your data is gone forever.

---

You can visualize the replication process replacing the original encryption header metadata, "swapping out" the one you manually changed on the destination side. (You don't need unlocked access to the underlying protected data to "Swap the hats on the person's head".) Destination's encryption header (including the Scramble Master Key) has been swapped to match the source's. The required Master Key remains the same, always and forever.

As far as I'm aware, when using raw streams, you cannot exclude any properties, including the encryption properties.

---

Does it make more sense when viewing it from this angle?
 
Last edited:

Phil1295

Explorer
Joined
Sep 20, 2020
Messages
79
Think of it as changing the "hat" on top of a person's head. No part of the person's body is modified. Their original hat has a scrambled version of the Master Key. If you use the original keystring/passphrase against it, it will correctly reveal the Master Key.

I love the hat example :smile:
Yes, it is clear now. What confused me is the statement I read in zfs docs, and that you also say, that userkey can never be changed while dataset is locked. So it seems it is not entirely true any way. I do well understand now the logic since the encrypted master key resides in a property that can be overridden

As far as I'm aware, when using raw streams, you cannot exclude any properties, including the encryption properties.

No, there is isn't. It was a design choice made by the developers, which is not completely understandable from a user perspective. Technically it is completely possible. They just decided to tight the `--props` option to the encryption, just because encryption is a property and "to avoid that users use `--props` and end up with encryption not preserved if encryption was managed by a separate flag". The end consequence is that you cannot use `--props` without `--raw` on an encrypted dataset.

There is an ongoing issue discussion opened on zfs git, where I commented a few days ago:

The only fix would be a change in source code to add an optional flag to not preserve source encryption and be able to apply a target encryption, something like: `zfs send --propos --no-preserve-encryption ... | zfs recv -x [target_encryption] `

That way, the encryption property is still part of the `--props` flag like the devs want it to be, but still we can omit that property separately. Tools like rsync, cp, mv... are good example of flag aliases like `-a` including many "properties", but yet we can ignore an included property by additional flags.

Thank you again for your explanation about the user key being able to be "changed" on a locked dataset. Probably in a few years a development would fix this limitation. It is probably just not the current logic or priority for of the developers.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
What confused me is the statement I read in zfs docs, and that you also say, that userkey can never be changed while dataset is locked.
It cannot. The reason it appears that you can is because you're only looking at "half" of the process: the destination side

To prove this point, try to change the User Key on the destination dataset while it is locked. You can even use the replication task. But the catch is that you cannot unlock the source dataset first. :wink: You see? You need the source dataset unlocked in order to change its Encryption Properties (i.e, "hat") to even be able to replace the hat on the locked destination dataset.

If both datasets are locked, there is no changing the User Key, even when running a replication. It requires at minimum the source dataset to be unlocked in order for you to change its User Key in the first place.

---------

The way I do it: I just keep incrementally replicating the pseudo-roots to a backup pool. I don't tell the destination pseudo-root datasets to inherit anything. If and when the time comes where I need to reverse the order or recover from an emergency, I will nest these (backed-up) pseudo-root datasets underneath my new pool (i.e, new top-level root dataset) and continue like normal.

My pseudo-roots are all passphrase-protected. My top-level root datasets use HEX strings (exported and backed up safely). At no point do any of my pseudo-roots inherit the encryption properties of my top-level root dataset.

This means on my primary pool that rebooting automatically re-locks my pseudo-roots (in which I need to manually unlock them with my passphrases again), while the top-level root dataset is automatically unlocked because it's using a HEX string (saved on the boot device, default TrueNAS behavior). Whoever has unlocked access to my top-level root dataset has nothing. There is zero data saved on it. It's just a glorified "place-holder". :tongue:

The only reason why I cannot use a passphrase for my top-level root dataset is because of the System Dataset and iocage.

---------

Another tip is you can view the 64-character HEX string of your primary pool's top-level root dataset, and manually copy+paste it as the keystring for the backup pool's top-level root dataset. This can minimize how many keystrings you need to "keep safe".

If I export all pools entirely, I can copy+paste the same HEX string for either my primary pool's or backup pool's top-level root dataset upon importing them. (The exported .json file can be viewed in any text editor.)
 
Last edited:

Phil1295

Explorer
Joined
Sep 20, 2020
Messages
79
Another tip is you can view the 64-character HEX string of your primary pool's top-level root dataset, and manually copy+paste it as the keystring for the backup pool's top-level root dataset. This can minimize how many keystrings you need to "keep safe".

If I export all pools entirely, I can copy+paste the same HEX string for either my primary pool's or backup pool's top-level root dataset upon importing them. (The exported .json file can be viewed in any text editor.)

That's the way I do. Top root pool dataset is using a hex key, the "pseudo-root" parent dataset inherits that hex key, as do all the main datasets under the pseudo-root. The hex key is now the same on the target.

I do not keep the server 24/24 and it is usually down at night. I do not need the added security of a passphrase because of the inconvenience of entering the password on restart. It would have been great if devs agreed to implement the use of a custom keyfile that can lie on an USB key like the Netgear NAS devices. I do understand the useless for enterprises, but for a small business or home, it can be fine to just remove the USB key when the server is kept on and physically insert it if a manual restart is done.

I guess it is easy to setup a cronjob for this task and have it started at the end of boot process. The script would loop every 10 seconds until the USB key is mounted.

However, I am not sure if the TrueNAS would automatically import the USB disk if the key is removed manually even without Export/Destroy.
In that case, I could add a few lines in the boot script to mount/import the USB key...

I will look at this option just out of curiosity. I already saw a few threads asking for this feature, but not wanted by devs because of the little use case and their view of servers running 24/24
 
Top