data copy to a new server (10TB)

LucP

Dabbler
Joined
Apr 20, 2019
Messages
13
I tried replication instead of teracopy to copy ~10TB of files to a new server and I get errors... Any suggestions would be greatly appreciated. It looks like it is trying to unmout the source and destination volumes (?!) and it cannot. This is the first time I attempt to use replication so it is possible / likely / I am doing something wrong. TrueNAS 13.0-U4 on both servers (G10 microservers from HP with AMD quad cores), 32GB EEC RAM, 128GB boot SSD on both, zraid 3 for both source and destination volumes, both servers connected to the same switch. Using root with no encryption for ssh (home network, personal but not critical data).

[2023/04/05 09:10:20] INFO [Thread-44] [zettarepl.paramiko.replication_task__task_1] Connected (version 2.0, client OpenSSH_8.8-hpn14v15)
[2023/04/05 09:10:20] INFO [Thread-44] [zettarepl.paramiko.replication_task__task_1] Authentication (publickey) successful!
[2023/04/05 09:10:20] INFO [replication_task__task_1] [zettarepl.retention.calculate] Not destroying 'auto-2023-04-05_00-00' as it is the only snapshot left for naming schema 'auto-%Y-%m-%d_%H-%M'
[2023/04/05 09:10:20] INFO [replication_task__task_1] [zettarepl.retention.calculate] Not destroying 'auto-2023-04-05_00-00' as it is the only snapshot left for naming schema 'auto-%Y-%m-%d_%H-%M'
[2023/04/05 09:10:20] INFO [replication_task__task_1] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: []
[2023/04/05 09:10:20] WARNING [replication_task__task_1] [zettarepl.replication.run] No incremental base for replication task 'task_1' on dataset 'zraid3_9x8T/av', destroying destination dataset
[2023/04/05 09:10:21] ERROR [replication_task__task_1] [zettarepl.replication.run] For task 'task_1' unhandled replication error ExecException(1, "cannot unmount '/mnt/zr3-5x8Toshiba/a/h': pool or dataset is busy\ncannot unmount '/mnt/zr3-5x8Toshiba/a': pool or dataset is busy\n")
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/zettarepl/replication/run.py", line 181, in run_replication_tasks
retry_stuck_replication(
File "/usr/local/lib/python3.9/site-packages/zettarepl/replication/stuck.py", line 18, in retry_stuck_replication
return func()
File "/usr/local/lib/python3.9/site-packages/zettarepl/replication/run.py", line 182, in <lambda>
lambda: run_replication_task_part(replication_task, source_dataset, src_context, dst_context,
File "/usr/local/lib/python3.9/site-packages/zettarepl/replication/run.py", line 279, in run_replication_task_part
run_replication_steps(step_templates, observer)
File "/usr/local/lib/python3.9/site-packages/zettarepl/replication/run.py", line 542, in run_replication_steps
step_template.dst_context.shell.exec(["zfs", "destroy", "-r", step_template.dst_dataset])
File "/usr/local/lib/python3.9/site-packages/zettarepl/transport/interface.py", line 92, in exec
return self.exec_async(args, encoding, stdout).wait(timeout)
File "/usr/local/lib/python3.9/site-packages/zettarepl/transport/local.py", line 80, in wait
raise ExecException(self.process.returncode, stdout)
zettarepl.transport.interface.ExecException: cannot unmount '/mnt/zr3-5x8Toshiba/a/h': pool or dataset is busy
cannot unmount '/mnt/zr3-5x8Toshiba/a': pool or dataset is busy
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Is the target dataset already existing on the target system (in which case likely mounted), can you just specify an additional level for the backup which doesn't exist yet?
 

LucP

Dabbler
Joined
Apr 20, 2019
Messages
13
Target is empty - just the data set created, and visible over the network (SMB shared). Run some benchmarks to confirm that it was setup correctly, after mounting it as a network drive.
 
Last edited:

LucP

Dabbler
Joined
Apr 20, 2019
Messages
13
Pretty much same settings, but the replication is running on the source, not on the destination server, and it seems to work fine. Other deltas from before:
Source data set is not shared and is on a slower system, running zraid2 (HP microserver N40L w 16GB RAM, dual AMD CPU, 5 * 4TB disks in zraid2) instead of of G10 w 32GB RAM and quad AMD CPU)
Same as before: new SSH connection, no encryption, running as root, a couple of snapshots present on the source, and empty destination folder

TerraCopy copied 1.5TB for files with no issues between the two G10 servers where replication failed (~16 hours including the verify step).

Utterly confused :( Will try again a couple of setup variations to understand what works and what does not with replication under 13.0 U4
 

LucP

Dabbler
Joined
Apr 20, 2019
Messages
13
N40L / 16GB /TrueNAS 13.0U3.1 -> G10 /32GB / TrueNAS 13.0 U4 (push) seems to work as expected (transfer to two separate servers, zraid2 to zraid3). Destination dataset was empty, but initially I did get errors "destination not empty, data overwrite prohibited", maybe because I did copy and deleted a couple of files by hand before. Deleted and re-created the dataset on G10 (so it was guaranteed empty), and afterwards the one time replication to the empty destination started to work.

G10#1 to G10#2, or G10#1 to G10#1 (same server replication) - no go, regardless of push or pull. Used "auto-%Y-%m-%d_%H-%M" as the name for snapshots and created a couple for the source, but no success so far - messages are from missing snapshots to I/O errors (which are not reflected anywhere else). Will try again after the current replication in progress is complete.
 
Top