Best way to duplicate data inside an existing dataset?

Pistolwhip · Jan 21, 2018

Hi all,

So, I've got what would seem to be a fairly simple operation to complete on a system I inherited at a new job.

Put simply, I need to duplicate a very large (8TB) directory structure which exists under a dataset, into a new folder also under that same dataset.

A mockup looks like this:

Code:

tank/
└──Array/
   ├── Data
   │	 └── directoryTree (copy source)
   └── DataCopy
		 └── directoryTree (copy destination)

where Array is the zfs dataset, and directoryTree is what I need to copy from Data to DataCopy.

Should be quite simple with rsync inside screen/tmux right?

Code:

rsync -avhP /mnt/tank/Array/Data/directoryTree /mnt/tank/Array/DataCopy/

However upon execution, it will run for about an hour and then silently hang up. Running with additional verbosity/breaking the operation indicates that for some reason it's going into a state of failed: Operation not permitted. (1)
I've checked all permissions, tried running as the owner user, root. Same behavior. The system as far as I know is pure vanilla FreeNAS with no modifications, and was updated to 11.1 about 2 weeks ago.
Is there a method of doing an operation like this specific to FreeNAS/ZFS that I'm not aware of that would work better?

System specs:

MB: X10SLR-F-0
CPU: E5-1650
RAM: 64GB DDR4
HBAs: all m1015
HDD: 22x 6TB ST6000VN0001 (Seagate Enterprise)

zfs:

Code:

tank
 ├── raidz3-0
 │	  └── 11x HDD
 └──raidz3-1
		└── 11x HDD

Total utilization is at about 35%

I realize the HW needs some work. Definitely needs more RAM, and the pool layout leaves much to be desired. I'm in the process of fixing this, but didn't think it would be interfering with simple file operations. I've included it for completeness in the event this is not the case.

joeschmuck · Jan 21, 2018

I can't tell you if this is the problem however FreeNAS 11.1 has a memory leak. 11.1-U1 should have the memory leak fixed. There is no memory leak in 11.0-U4.

EDIT: why don't you just use a cp command vice rsync? It sure seams like a much easier way to do things.

danb35 · Jan 21, 2018

I'd vote for replication--perhaps not as simple as cp, but this is what it's designed for.

Pistolwhip · Jan 21, 2018

Thanks for the tip about the memory leak; I wasn't aware of that at all, and will update before attempting the xfer again.
As for why rsync....uh, mostly habit? Most of my work is with large clusters of remote systems, so I've just gotten used to defaulting to it. Plus the rectification features are somewhat nice for partials/interruptions. I'll give good ole' cp a shot too if it fails again post-update.

As for replication, can you do that inside the same dataset? The copy is between two subdirectories in the same dataset.

danb35 · Jan 21, 2018

Pistolwhip said:
As for replication, can you do that inside the same dataset? The copy is between two subdirectories in the same dataset.

Ah, no, I'd thought it would be going to a different dataset.

joeschmuck · Jan 21, 2018

danb35 said:
I'd vote for replication--perhaps not as simple as cp, but this is what it's designed for.

Pistolwhip said:
As for why rsync....uh, mostly habit? Most of my work is with large clusters of remote systems, so I've just gotten used to defaulting to it. Plus the rectification features are somewhat nice for partials/interruptions.

And I learn more again today

Chris Moore · Jan 21, 2018

Pistolwhip said:
As for replication, can you do that inside the same dataset? The copy is between two subdirectories in the same dataset.

I understand you want a copy, but may I ask why you want a copy? One thing about ZFS is that you can make a snapshot of the file system as a point in time and using the command line, you can mount that snapshot at another mount point in a read-only form. It is like making a backup that only costs you the storage space of the difference. So, as the original continues to be changed over time, the changes to files that are different in the backup increases space used by the backup because the original (at snapshot creation) is kept and the new copy is kept.
I suppose it depends on why you want a copy, but this might be much MUCH faster (almost instant) and cost you almost no space on disk, at least to start.

Pistolwhip · Jan 21, 2018

That's super cool, I didn't know you could use snapshots in that way!
The reason we need a copy is that the data is going to be heavily modified and re-sorted for some testing/experimentation efforts that might wind up being rather destructive. So it needs to be RW, and we don't want to use the original dataset since it needs to remain intact to support its current uses.

Ideally I'd have a second SAN for this type of stuff, but that's still in the works (and would allow me to have, you know, actual redundancy/backups).

Chris Moore · Jan 21, 2018

Pistolwhip said:
That's super cool, I didn't know you could use snapshots in that way!
The reason we need a copy is that the data is going to be heavily modified and re-sorted for some testing/experimentation efforts that might wind up being rather destructive. So it needs to be RW, and we don't want to use the original dataset since it needs to remain intact to support its current uses.

Ideally I'd have a second SAN for this type of stuff, but that's still in the works (and would allow me to have, you know, actual redundancy/backups).

You could still make a snapshot and do a ZFS send and receive to copy this dataset into another dataset in the local storage pool. You could mount the 'copy' dataset as a new share name if you wanted. This would still give you a snapshot to roll back to if you needed to go back to a 'restore point' but you could have two fully interactive shares that are living next to each other in the same zpool.

Here is a fun little tutorial on how to do a send and receive:
http://blog.fosketts.net/2016/08/18/migrating-data-zfs-send-receive/

You just need to create a new dataset to be the target instead of creating a subdirectory inside your existing dataset.

Pistolwhip · Jan 22, 2018

Quick update for future searchers:

Seems this issue is something to do with rsync. Repeating the copy attempt using plain old cp seems to work fine (it's about 90% complete as of writing this, where rsync never managed to hit 5%). Still not sure exactly what all is happening there.

Chris Moore · Jan 22, 2018

Pistolwhip said:
Quick update for future searchers:

Seems this issue is something to do with rsync. Repeating the copy attempt using plain old cp seems to work fine (it's about 90% complete as of writing this, where rsync never managed to hit 5%). Still not sure exactly what all is happening there.

I have a cron job that does a rsync between two pools in the same NAS for backup purposes once a week and that works perfectly. I don't see why it would fail just because the source and destination are both in the same pool.

You might want to lookup the BSD version of rsync to ensure that your -avhP is actually supported. I ran into an issue about a year ago where I found that the Linux version of rsync and the BSD version had some different switches and the problem that was causing the rsync to fail in that scenario was because they were trying to use a switch that didn't exist in BSD.

Important Announcement for the TrueNAS Community.

Best way to duplicate data inside an existing dataset?

Pistolwhip

Dabbler

joeschmuck

Old Man

danb35

Hall of Famer

Pistolwhip

Dabbler

danb35

Hall of Famer

joeschmuck

Old Man

Chris Moore

Hall of Famer

Pistolwhip

Dabbler

Chris Moore

Hall of Famer

Pistolwhip

Dabbler

Chris Moore

Hall of Famer

Similar threads

Important Announcement for the TrueNAS Community.

Best way to duplicate data inside an existing dataset?

Dabbler

Old Man

Hall of Famer

Dabbler

Hall of Famer

Old Man

Hall of Famer

Dabbler

Hall of Famer

Dabbler

Hall of Famer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Best way to duplicate data inside an existing dataset?"

Similar threads