SOLVED Moving all files to a "sub dataset"

Joined
Sep 18, 2019
Messages
9
On a FreeNAS system I inherited, the customer put all his files in the root dataset that was eventually SMB shared.

To avoid seeing the jails dataset within its projects, I would like to move the work files to a sub dataset (named "projects") and share this one instead.

My problem is that I'm talking of 70 TiB of data.

Is there any combination of "snapshot -> clone -> promote" that would let me avoid to really move/copy this amount of data ?

If not, I'll probably live with it instead of stopping a lot of co workers during an unknown amount of time :/

Thank you for any pointers !
 
Joined
Sep 18, 2019
Messages
9
I do not believe this plays a huge factor but I forgot to add this:

Build FreeNAS-11.1-U7
Platform Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz
Memory 65465MB
 

PhilipS

Contributor
Joined
May 10, 2016
Messages
179
You could do a zfs send/receive if you have enough free space. Then do an incremental to bring it up to date while your share is offline to minimize down time.
 
Joined
Sep 18, 2019
Messages
9
Thanks. I'm afraid I won't have enough room for so much. I think I'll have to plan a maintenance weekend where I'll move projects one after another, freeing the root dataset each time.

I realize this is more of a ZFS question than a FreeNAS question, actually.

I hoped someone would have a magical way to do this without "really" moving the files.

Which lets me wonder if I should move them with rsync or cp

My experience with rsync being that it's horrendously slow on the kind of stuff I have (lots of folders with 20k X 1.6 MB png).
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

PhilipS

Contributor
Joined
May 10, 2016
Messages
179
Were I in this situation, I'd probably just use mv.

I'm not sure if this behavior has changed recently, but using mv on a dataset with windows permissions will give operation not supported errors since internally it tries to chmod.

Is there any combination of "snapshot -> clone -> promote" that would let me avoid to really move/copy this amount of data ?

You could clone the root dataset, but promote won't do any good as you won't be able to destroy the root dataset once it is the clone since it will always have children. Thus, the original snapshot would become permanent. If this is acceptable, then clone to a new dataset then delete the files that are not needed in root.
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
Incremental rsync of each folder. Write a script that walks the dataset root and copies over the content to the new dataset and then removes the source folder. You will have to turn off snapshots and remove all of them to gain the space needed if utilization is above 40%. If it’s below you can just zfs send.

And then the common sense (that isn’t that common) disclaimers. Don’t run untested scripts on prod data. Make sure backups are up to date if there are any. Test, then test and then test some more. Don’t take shortcuts, make every step part of a scripted process.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
but using mv on a dataset with windows permissions will give operation not supported errors since internally it tries to chmod.
Good point--I avoid using Windows permissions, so I don't run into this.
 
Joined
Sep 18, 2019
Messages
9
You could clone the root dataset, but promote won't do any good as you won't be able to destroy the root dataset once it is the clone since it will always have children. Thus, the original snapshot would become permanent. If this is acceptable, then clone to a new dataset then delete the files that are not needed in root.

Thank you Philip. I also found this possibility, which would work for the time being but feels downright ugly. I probably could not sleep at night anymore, knowing I've this, somewhere, waiting for me ;) what I was hoping for is that there be a way to promote a sub dataset to root dataset (while everything is unmounted) and re mount everything after that. But this is not a feature of ZFS, I'm afraid (not sure it's a feature of any modern filesystem, tbh).

Incremental rsync of each folder. Write a script that walks the dataset root and copies over the content to the new dataset and then removes the source folder. You will have to turn off snapshots and remove all of them to gain the space needed if utilization is above 40%. If it’s below you can just zfs send.

Thank you Garm. This seems to be the hard way I'll have to follow.

Something else popped in my mind, though:

1) De-share the projects folder in the root dataset
Create a subfolder "old_projects", move everything to it and reshare (all this takes roughly 30 seconds)

At this point, users do not see a difference except that jails folder and other datasets cannot be seen anymore (I'm not sure how intelligently snapshots handle this, but I'll find out)

2) Create a sub dataset "projects" which, thanks to the preceding actions, won't appear to users (who would be quick to wonder and test >_<)
Create the project folders in the new sub-dataset and mount_nullfs the real projects folders to them
Again, deshare/reshare to the new dataset

At this point, system is running everything seems to work, nothing really changed.

3) Each time I can organize downtime, I demount a project, rsync it to the subdataset, and delete it from the root dataset.
Or I just let the project leader inform every coworker/freelancer not to touch the project until further notice while I'm moving the files and count on the fact that people are responsible, can be trusted and just kidding

Does it sounds like evil and catastrophic sorcery ? yes.
Is there an elegant solution to my problem, not involving one week downtime ? (considering %used > 60% and rsync taking ages to transfer lots of huge folders ~20k files ?) does not seem like it :/
 
Joined
Sep 18, 2019
Messages
9
3) Each time I can organize downtime, I demount a project, rsync it to the subdataset, and delete it from the root dataset.

Actually, I refined point 3) :

I put the mount_nullfs mounts to a subfolder of my new sub-dataset and I share this
I rsync a project to the root of my project sub-dataset
When the project are nearly identical:
  • I run a last rsync
  • unmount the project
  • move the rsync mirror to the shared folder
  • delete the original project from the root dataset
When all is done (all project have been moved), I can move the projects one folder lower and redo the share.

I believe with this, I achieve minimal downtime.

During all this, I maintain a mirror on a backup server that connects to the SMB share (in order to always see what other users see and backup the current data).

Once I'm done with all the crap, of course, I'll use snapshots and replication as intended on FreeNAS.

Let this thread be a tribute to "never put your files directly in the root dataset" (also as I said, I inherited the crap).
 
Top