Best way to duplicate data inside an existing dataset?

Status
Not open for further replies.

Pistolwhip

Dabbler
Joined
Feb 24, 2016
Messages
18
Hi all,

So, I've got what would seem to be a fairly simple operation to complete on a system I inherited at a new job.

Put simply, I need to duplicate a very large (8TB) directory structure which exists under a dataset, into a new folder also under that same dataset.

A mockup looks like this:

Code:
tank/
└──Array/
   ├── Data
   │	 └── directoryTree (copy source)
   └── DataCopy
		 └── directoryTree (copy destination)


where Array is the zfs dataset, and directoryTree is what I need to copy from Data to DataCopy.

Should be quite simple with rsync inside screen/tmux right?
Code:
rsync -avhP /mnt/tank/Array/Data/directoryTree /mnt/tank/Array/DataCopy/


However upon execution, it will run for about an hour and then silently hang up. Running with additional verbosity/breaking the operation indicates that for some reason it's going into a state of failed: Operation not permitted. (1)
I've checked all permissions, tried running as the owner user, root. Same behavior. The system as far as I know is pure vanilla FreeNAS with no modifications, and was updated to 11.1 about 2 weeks ago.
Is there a method of doing an operation like this specific to FreeNAS/ZFS that I'm not aware of that would work better?

System specs:

MB: X10SLR-F-0
CPU: E5-1650
RAM: 64GB DDR4
HBAs: all m1015
HDD: 22x 6TB ST6000VN0001 (Seagate Enterprise)

zfs:
Code:
tank
 ├── raidz3-0
 │	  └── 11x HDD
 └──raidz3-1
		└── 11x HDD


Total utilization is at about 35%

I realize the HW needs some work. Definitely needs more RAM, and the pool layout leaves much to be desired. I'm in the process of fixing this, but didn't think it would be interfering with simple file operations. I've included it for completeness in the event this is not the case.
 
Last edited by a moderator:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I can't tell you if this is the problem however FreeNAS 11.1 has a memory leak. 11.1-U1 should have the memory leak fixed. There is no memory leak in 11.0-U4.

EDIT: why don't you just use a cp command vice rsync? It sure seams like a much easier way to do things.
 

Pistolwhip

Dabbler
Joined
Feb 24, 2016
Messages
18
Thanks for the tip about the memory leak; I wasn't aware of that at all, and will update before attempting the xfer again.
As for why rsync....uh, mostly habit? Most of my work is with large clusters of remote systems, so I've just gotten used to defaulting to it. Plus the rectification features are somewhat nice for partials/interruptions. I'll give good ole' cp a shot too if it fails again post-update.

As for replication, can you do that inside the same dataset? The copy is between two subdirectories in the same dataset.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
As for replication, can you do that inside the same dataset? The copy is between two subdirectories in the same dataset.
Ah, no, I'd thought it would be going to a different dataset.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'd vote for replication--perhaps not as simple as cp, but this is what it's designed for.
As for why rsync....uh, mostly habit? Most of my work is with large clusters of remote systems, so I've just gotten used to defaulting to it. Plus the rectification features are somewhat nice for partials/interruptions.
And I learn more again today :cool:
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
As for replication, can you do that inside the same dataset? The copy is between two subdirectories in the same dataset.
I understand you want a copy, but may I ask why you want a copy? One thing about ZFS is that you can make a snapshot of the file system as a point in time and using the command line, you can mount that snapshot at another mount point in a read-only form. It is like making a backup that only costs you the storage space of the difference. So, as the original continues to be changed over time, the changes to files that are different in the backup increases space used by the backup because the original (at snapshot creation) is kept and the new copy is kept.
I suppose it depends on why you want a copy, but this might be much MUCH faster (almost instant) and cost you almost no space on disk, at least to start.
 

Pistolwhip

Dabbler
Joined
Feb 24, 2016
Messages
18
That's super cool, I didn't know you could use snapshots in that way!
The reason we need a copy is that the data is going to be heavily modified and re-sorted for some testing/experimentation efforts that might wind up being rather destructive. So it needs to be RW, and we don't want to use the original dataset since it needs to remain intact to support its current uses.

Ideally I'd have a second SAN for this type of stuff, but that's still in the works (and would allow me to have, you know, actual redundancy/backups).
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
That's super cool, I didn't know you could use snapshots in that way!
The reason we need a copy is that the data is going to be heavily modified and re-sorted for some testing/experimentation efforts that might wind up being rather destructive. So it needs to be RW, and we don't want to use the original dataset since it needs to remain intact to support its current uses.

Ideally I'd have a second SAN for this type of stuff, but that's still in the works (and would allow me to have, you know, actual redundancy/backups).
You could still make a snapshot and do a ZFS send and receive to copy this dataset into another dataset in the local storage pool. You could mount the 'copy' dataset as a new share name if you wanted. This would still give you a snapshot to roll back to if you needed to go back to a 'restore point' but you could have two fully interactive shares that are living next to each other in the same zpool.

Here is a fun little tutorial on how to do a send and receive:
http://blog.fosketts.net/2016/08/18/migrating-data-zfs-send-receive/

You just need to create a new dataset to be the target instead of creating a subdirectory inside your existing dataset.
 
Last edited:

Pistolwhip

Dabbler
Joined
Feb 24, 2016
Messages
18
Quick update for future searchers:

Seems this issue is something to do with rsync. Repeating the copy attempt using plain old cp seems to work fine (it's about 90% complete as of writing this, where rsync never managed to hit 5%). Still not sure exactly what all is happening there.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Quick update for future searchers:

Seems this issue is something to do with rsync. Repeating the copy attempt using plain old cp seems to work fine (it's about 90% complete as of writing this, where rsync never managed to hit 5%). Still not sure exactly what all is happening there.
I have a cron job that does a rsync between two pools in the same NAS for backup purposes once a week and that works perfectly. I don't see why it would fail just because the source and destination are both in the same pool.

You might want to lookup the BSD version of rsync to ensure that your -avhP is actually supported. I ran into an issue about a year ago where I found that the Linux version of rsync and the BSD version had some different switches and the problem that was causing the rsync to fail in that scenario was because they were trying to use a switch that didn't exist in BSD.
 
Status
Not open for further replies.
Top