winnielinnie
MVP
- Joined
- Oct 22, 2019
- Messages
- 3,641
I have been playing around with TrueNAS Core 12.0 in a virtual machine, specifically to get a "hands on" feel for native ZFS encryption.
Prior to this, I held the following assumptions where GELI and native ZFS encryption differ:
Here's where things get tricky and confusing. I'm going to withhold copy-pasting text from the terminal for now to keep this from becoming too technical and straining on the eyes:
Block device encryption is easier for me to intuitively grasp: a block device is encrypted and therefore nothing is accessible or known (not even file system metadata) until the block device is unlocked. Otherwise, anything above the block device layer is random garbage. With native ZFS encryption, it's not as straight-forward: you can change options (compression, atime, etc) without unlocking the dataset or even its parent dataset (or encryptionroot). That baffles my small little mind. How is that even possible?
Where does the data scrambling begin?
Where does it end?
How is it possible to use zfs send / recv to only transfer the differences if the dataset is still locked and hence there should be nothing known about the file and folder structure underneath?
Does using separate encryption options for child / nested datasets (different cipher, different keyfile, different passphrase) mean that there are multiple crypto processes, hence double, triple, etc load for the CPU? (UPDATE: Answered in thread.)
Does using separate encryption options for child / nested datasets (different cipher, different keyfile, different passphrase) work like the equivalent of encrypted block devices within block devices? As in, in order to even access the ones buried underneath, you need to first unlock the parents? (Peeling layers of an onion.) (UPDATE: Answered in thread.)
If the above are true, then what is the point of breaking encryption inheritance with child datasets, since it nullifies any benefit of granular control of what gets encrypted and who has access to what: no trusted person can unlock their "assigned" dataset at will without requiring you to first unlock the parent dataset. Even if they know the passphrase for their own dataset.
Let me know if I can clear up my questions. I'm trying to keep these questions and observations "fresh" in my mind. I don't plan on creating crazy and risky dataset structures with all types of different encrypted child datasets willy nilly. The reason I try to figure these things out is because I believe playing around with something to its extremes gives you a better overall understanding of how things really work underneath.
Should I assume there are three different crypto processes? (Since there are three "encryptionroots").
The following all share the same "encryptionroot": mainpool, mainpool/dogs, and mainpool/dogs/puppies
This means that for all three datasets (reads and writes), there is only one crypto process?
However, for mainpool, mainpool/cats/kittens, and mainpool/cats/kittens/runts (reads and writes) there are three separate crypto processes, (since they are all different "encryptionroots")?
So is it really each "encryptionroot" that ultimately dictates access for its child datasets? Would the analogy be that each "encryptionroot" is the equivalent of an encrypted block device? (Ignoring the fact that metadata is always in plaintext.)
Prior to this, I held the following assumptions where GELI and native ZFS encryption differ:
- GELI encrypts partitions / block devices, which are then used as the bottom layer that ZFS resides above (i.e, "all or nothing" encryption).
- Native ZFS encryption is per pool / per dataset, irrespective of the block devices underneath (i.e, possible to have a "mix" of non-encrypted and encrypted in the same pool).
- Native ZFS encryption does not encrypt the ZFS metadata (size, usage, properties, etc), nor does it require unlocking all datasets upon importing a pool.
Here's where things get tricky and confusing. I'm going to withhold copy-pasting text from the terminal for now to keep this from becoming too technical and straining on the eyes:
- I read about native ZFS encryption from multiple places online (Reddit, Oracle docs, wikis, etc), and there seems to be contradictory claims, such as if you encrypt the root dataset you can never create a non-encrypted child dataset. However, I tried this myself and I was in fact able to create a child dataset without encryption by unchecking the "inherit" option. (An icon with an [X] padlock appears next to the child dataset to signify it is non-encrypted.)
- If the root dataset is encrypted, what is the actual relationship between it and all its child datasets that inherit its encryption properties? Is there one crypto process that handles all read-writes to these child datasets? (UPDATE: Answered in thread.)
- For fun, I made a bunch of nested datasets that all use different encryption properties (256, 128, etc), some use keyfiles, some use passphrases, some are outright non-encrypted. For every "break" in a nested child dataset that does not inherit its parent's encryption properties, is this another layer of encryption on top of encryption? (i.e, does it require multiple crypto processes for read-writes, for each different cipher / non-inherited encrypted child?) (UPDATE: Answered in thread.)
- I read about "encryptionroot" as a ZFS property, which seems to indicate that even if you have many, many child datasets, if they all share (inherit) the same "encryptionroot", this is treated as a single crypto process; and they are all immediately accessible upon unlocking their "encryptionroot" dataset? (i.e, you do not need to "unlock child datasets" if they inherit the dataset you are currently unlocking.) (UPDATE: Answered in thread.)
- Related to question 1, even if there is a non-encrypted child dataset within an encrypted root dataset, can it still be mounted before unlocking the root dataset? (UPDATE: Explored further in thread.)
- When no passphrase is ever used to protect the master key, where is the path that the .json file is stored? (It contains the 64-character strings used as keys to decrypt the master key.) I know you can backup the entire pool's "key database" and save it somewhere on a USB stick or client PC, but I'm curious to know where it resides in the local TrueNAS system.
- I noticed that the swap for each pool is encrypted using GELI; 2GB partitions by default. Is there a technical reason or limitation why the swap cannot be encrypted with native ZFS encryption? Why would the NAS desperately need 2GB of swap space before any pool is imported / unlocked?
Where does the data scrambling begin?
Where does it end?
How is it possible to use zfs send / recv to only transfer the differences if the dataset is still locked and hence there should be nothing known about the file and folder structure underneath?
Does using separate encryption options for child / nested datasets (different cipher, different keyfile, different passphrase) mean that there are multiple crypto processes, hence double, triple, etc load for the CPU? (UPDATE: Answered in thread.)
Does using separate encryption options for child / nested datasets (different cipher, different keyfile, different passphrase) work like the equivalent of encrypted block devices within block devices? As in, in order to even access the ones buried underneath, you need to first unlock the parents? (Peeling layers of an onion.) (UPDATE: Answered in thread.)
If the above are true, then what is the point of breaking encryption inheritance with child datasets, since it nullifies any benefit of granular control of what gets encrypted and who has access to what: no trusted person can unlock their "assigned" dataset at will without requiring you to first unlock the parent dataset. Even if they know the passphrase for their own dataset.
Let me know if I can clear up my questions. I'm trying to keep these questions and observations "fresh" in my mind. I don't plan on creating crazy and risky dataset structures with all types of different encrypted child datasets willy nilly. The reason I try to figure these things out is because I believe playing around with something to its extremes gives you a better overall understanding of how things really work underneath.
Should I assume there are three different crypto processes? (Since there are three "encryptionroots").
The following all share the same "encryptionroot": mainpool, mainpool/dogs, and mainpool/dogs/puppies
This means that for all three datasets (reads and writes), there is only one crypto process?
However, for mainpool, mainpool/cats/kittens, and mainpool/cats/kittens/runts (reads and writes) there are three separate crypto processes, (since they are all different "encryptionroots")?
So is it really each "encryptionroot" that ultimately dictates access for its child datasets? Would the analogy be that each "encryptionroot" is the equivalent of an encrypted block device? (Ignoring the fact that metadata is always in plaintext.)
Last edited: