Unencrypted dataset size is smaller than same data encrypted

grigory

Dabbler
Joined
Dec 22, 2022
Messages
32
Hi there, I had an encrypted dataset, and I moved the files to an unencrypted one. The total data size is about 30% less when it is unencrypted, is this normal? should I be worried about something not getting moved? Everything else indicates all the data is there (spot checking, file count, no errors during move).

Thanks!
 
Joined
Oct 22, 2019
Messages
3,641
There's not enough information to go by.

Can you expand?

How did you copy the data? What is your pool layout? Is this data moving within the same pool or across pools? How are snapshots involved? Anything else you can share that can help.

There's a good deal of useless information that can be had by using zfs and zpool commands to grab dataset and pool properties.
 

grigory

Dabbler
Joined
Dec 22, 2022
Messages
32
Sure, thanks for the response.
These datasets are in the same pool. (I have two pools, one for apps and one for data but these datasets are both in the data pool.)
The data was moved in the same pool.
I moved the data using the mv command from the system shell.
There are 28 snapshots on the encrypted dataset.
There are 0 snapshots on the unencrypted dataset.
I am not sure what else might be useful here, thanks again.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
I moved the data using the mv command from the system shell.
Why "mv" and not "cp" (or even better "rsync"? or even yet better "zfs send/recv?") If something goes wrong with "mv" or it's interrupted, you can be left in a state of limbo.

There are 28 snapshots on the encrypted dataset.
There are 0 snapshots on the unencrypted dataset.
This is likely why you're seeing a difference in used space.
 
Last edited:

grigory

Dabbler
Joined
Dec 22, 2022
Messages
32
Thanks, I actually made a typo and just updated my last post it had 28 snapshots, so it seems even more likely based on your reply.
IIRC I did some reading and mv was recommended to preserve file properties (create dates etc), although I am not sure if zfs send/recv would do this. I also wanted to be sure that nothing would have been encrypted in the new dataset. I agree rsync would probably have been better thinking back.
Thank you.
 
Joined
Oct 22, 2019
Messages
3,641
mv was recommended to preserve file properties (create dates etc), although I am not sure if zfs send/recv would do this.
"mv's" advantage over "cp" or "rsync" is irrelevant when traversing beyond filesystems. (Each dataset is its own filesystem.)

Regardless, "cp -a" ("archive mode") will preserve all properties, as will "rsync -a".

"zfs send/recv" replicates the filesystem itself (which means you're getting an exact recreation of the filesystem, including timestamps and metadata.)


I also wanted to be sure that nothing would have been encrypted in the new dataset.
You can specify what properties to exclude when replicating a dataset for the first time. However...


...careful when mixing encrypted with unencrypted datasets in the same pool. Is your top-level root dataset encrypted? If so, it's highly discouraged not to use any unencrypted datasets in your pool. (It can even lead to unpredictable behavior.)
 

grigory

Dabbler
Joined
Dec 22, 2022
Messages
32
Ok, this is a good reminder, thanks for these details (I'm sure I will refer back to this post) The pool is not encrypted, only the dataset.
In light of these details, what is the recommended way to migrate from an encrypted dataset to am unencrypted one?

Thanks again
 
Joined
Oct 22, 2019
Messages
3,641
what is the recommended way to migrate from an encrypted dataset to am unencrypted one?
You have to be extra careful using the command-line.

If this encrypted dataset has no children, you can replicate it into a new unencrypted dataset by first creating a new snapshot, and then sending that snapshot (while excluding the encryption property.)


First switch to the root user if you are not already root (not sure how SCALE handles this):
Code:
sudo su


Create the migration snapshot:
Code:
zfs snap mypool/cryptdata@migrate-`date +%Y-%m-%d_%H-%M`


Send the snapshot to a new dataset (which does not exist yet.) The new dataset will be created from this replication.
Code:
zfs send -v -L -e -c mypool/cryptdata@migrate-2023-07-03_14-38 | zfs recv -v -o encryption=off mypool/plaindata


You will now have a new unencrypted dataset named "plaindata" which will be as up-to-date as the moment you created the new "migrate" snapshot.

A quick breakdown:
  • mypool = the name of your pool (also the name of your root dataset)
  • cryptdata = the name of the currently encrypted dataset
  • plaindata = the name of the new unencrypted dataset (which doesn't exist until you run the send/recv)
  • @migrate-`date +%Y-%m-%d_%H-%M` = the new snapshot name which will be timestamped (with the "zfs snap" command)
  • @migrate-2023-07-03_14-38 = an example of how the actual snapshot name will appear
  • -v = be verbose
  • -L = large blocks support
  • -e = more efficient embedded stream
  • -c = more efficient compressed stream (don't "decompress" records that are already compressed)
  • -o = override a property on the receiving side (in this case, disable encryption)

If everything looks good, you can now destroy the old dataset.

However, keep in mind that you will not have any snapshots sent over, except for the most recent "migrate" snapshot.

The above method will not send any "nested children" living underneath your source dataset.

The above method requires that you (temporarily) have enough free space for both datasets, since nothing is deleted automatically.



EDIT: I should add that it's recommended to do the above in a new "tmux" session, so that you can close the window without interrupting the send/recv process. It's also advisable to use a "resume token" in case the process does interrupt or abort.
 
Last edited:

grigory

Dabbler
Joined
Dec 22, 2022
Messages
32
Thanks for this, this approach seems more comprehensive. I appreciate you walking through the steps to achieve this.
zfs is a new to me and I'm learning, it always feels safer doing the familiar method but I will definitely use this next time.
 
Joined
Oct 22, 2019
Messages
3,641
It goes without saying, nothing is a substitute for having up-to-date backups.

You don't want to find yourself in a situation where you've lost everything because of a "typo" or "fat-fingering" the wrong syntax in a command or accidentally deleting the wrong target.
 
Top