Best way to give an existing directory its own dataset

ajohnson · Nov 10, 2014

Hello,

I'm wanting to "convert" a directory that contains a bunch of data (multiple TBs) into its own dataset. I'd like to do this a way that results in the least possible impact to my replication backups. That is, ideally after moving the data into its own dataset, those multiple TBs will not need to be re-copied during my next replication backup. I'm not sure this is even possible, but I'd like to float an idea out there and see what others with perhaps more experience think of it.

So right now, I have the following datasets:
masterData
masterData/lion
masterData/tiger

And the masterData dataset has a directory in it called "bear" with many TBs of data. Ultimately, I want the contents of the bear directory to belong to its own dataset called masterData/bear. Naturally, bear's data should no longer be stored in the masterData dataset.

I found this thread which explains a possible strategy, which is as follows:

Code:

#Create a snapshot of masterData
zfs snapshot masterData@mybearsnap
# Clone the snapshot (Should be quick, uses copy-on-write. BUT is this handled smartly in a subsequent recursive replication stream of masterData?)
zfs clone masterData@mybearsnap masterData/newbear
# Remove all directories that are a part of masterData but are not part of bear.
rm -rf /mnt/masterData/newBear/otherDir /mnt/masterData/newBear/otherDir2
# Move contents of the bear subdirectory into the root of the newbear dataset
mv /mnt/masterData/newbear/bear/* /mnt/masterData/newbear/
mv /mnt/masterData/newbear/bear/.* /mnt/masterData/newbear/
# No need for this empty directory...
rmdir /mnt/masterData/newbear/bear
# Remove contents of existing bear directory on masterData (Obviously make sure it's not in use first)
rm -rf /mnt/masterData/bear
# Rename newbear dataset to bear
zfs rename masterData/newbear masterData/bear
# Remove snapshot
zfs destroy masterData@mybearsnap

On to my questions:

Does this seem like a workable solution?

This post mentions inefficiency, so that has me concerned. If I understand correctly, given I remove the old snapshot (last step), this shouldn't leave a bunch of referenced deleted files around. Is this a correct understanding, or is this inefficient in some way that I'm not noticing (i.e. Does it waste a bunch of space that will not be recovered? If yes, why?).

As mentioned, I backup my volumes using recursive replication of the masterData dataset to a remote server. My bandwidth is metered on a monthly basis, so I don't want this dataset creation to result in the transfer of the many TBs of data contained in the bear dataset. Given snapshot clones are COW (see step two), is this accounted for in the replication stream such that no new data will need to be transferred?

Thank you!

cyberjock · Nov 11, 2014

It seems like it would work, but I'd *never* do that with my data, nor anyone else's data that I care about.

I'd rename the directory (call it bear2), make a new dataset, and copy the data from bear2 into bear 1.

Regardless of how you make it all work, once you have a new dataset it's *going* to want to do a whole new slew of snapshots, so you *will* be replicating the new dataset all over again, whether you like it or not. ;)

Unfortunately, if your data is metered then you need to either:

1. Move data in small chunks so you don't go over your monthly cap until it's done
2. Move the two servers so they are local to each other for initial replication
3. Decide not to do the move to a dataset

denmon · Nov 23, 2017

I think your last step:

Code:

# Remove snapshot
zfs destroy masterData@mybearsnap

will fail. To my understanding, you can't delete the snapshot while a dataset created from it via a clone operation still exists. I tried a test of your process and the final step gave me the error

Code:

cannot destroy 'backup@converttest': snapshot has dependent clones
use '-R' to destroy the following datasets:
backup/test

Also the Oracle docs say the snapshot cannot be deleted while clones exist: https://docs.oracle.com/cd/E19253-01/819-5461/gbiob/index.html.

Important Announcement for the TrueNAS Community.

Best way to give an existing directory its own dataset

ajohnson

Dabbler

cyberjock

Inactive Account

denmon

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Best way to give an existing directory its own dataset

ajohnson

Dabbler

cyberjock

Inactive Account

denmon

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Best way to give an existing directory its own dataset"

Similar threads