jmruc
Cadet
- Joined
- Dec 26, 2022
- Messages
- 4
I have a PULL replication task in TrueNAS-13.0-U3.1 which runs for a few minutes before getting totally stuck and not making any progress. It still hasn't gotten over the initial import of a 60GB dataset. I see from the network graph and the used storage, that it runs for several seconds to several minutes. If I leave it for a while I get a "Stopping stuck replication process" and it is marked as failed. How do I diagnose what the problem is?
The full log is:
The only way to make any progress is to restart the machine and start the task again, OR wait for a long time until it fails and start it again.
The TrueNAS-13.0-U3.1 with 8 GB RAM, but the CPU usage never gets over 3-4%, the services memory more than 2 GB and the ZFS cache more than 200 MB.
The remote machine is running TrueNAS-SCALE-22.12.0, and it has AMD Ryzen 5 PRO 4650G processor with 16 GB ECC memory. It also sees no increase in CPU, memory, cache, disk reads, networking.
Here's the replication task config:
The full log is:
Code:
[2022/12/27 12:38:48] INFO [Thread-5] [zettarepl.paramiko.replication_task__task_1] Connected (version 2.0, client OpenSSH_8.4p1) [2022/12/27 12:38:48] INFO [Thread-5] [zettarepl.paramiko.replication_task__task_1] Authentication (publickey) successful! [2022/12/27 12:38:53] INFO [replication_task__task_1] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: [] [2022/12/27 12:38:53] INFO [replication_task__task_1] [zettarepl.replication.run] Resuming replication for destination dataset 'tank/backup/releases/photoprism/volumes/pvc-8af53ee9-0479-47af-95fb-151da4fcb107' [2022/12/27 12:38:53] INFO [replication_task__task_1] [zettarepl.replication.run] For replication task 'task_1': doing pull from 'hoard/ix-applications/releases/photoprism/volumes/pvc-8af53ee9-0479-47af-95fb-151da4fcb107' to 'tank/backup/releases/photoprism/volumes/pvc-8af53ee9-0479-47af-95fb-151da4fcb107' of snapshot=None incremental_base=None receive_resume_token='1-1a5f1d5005-158-789c6d905d4ec3300cc783c4d784c419b8406852124adf38c4dea3247558b6768e92741ac7e08923c04590380b4f1c81b6a031c12c5b7fcb3fdb924d8ec864a35c0d71f6939f0f8166093613226e8fa79e7fdcb9047924973b7ebac7cd638634e8f6f9faf3e4c07cc687de3784cc9e64f5f6f2f1febac767135feb0e0859a08e4de1b75487d07aabb3c7752a22b4a013a4222c3063883e75c506dbbe1b4b1b4befb49337003565a2aaa9a8b4a3b5748672c91b2d9c359c55f77f9652a3edaa0f741e7b9823b6aa6465a9f8e05c71a19854acfcbee382fcfecd621722a4842bb2b32ff2a7492d' encryption=False [2022/12/27 12:38:53] INFO [replication_task__task_1] [zettarepl.paramiko.replication_task__task_1.sftp] [chan 73] Opened sftp connection (server version 3) [2022/12/27 12:38:53] INFO [replication_task__task_1] [zettarepl.transport.ssh_netcat] Automatically chose connect address '192.168.68.68' [2022/12/27 13:58:54] WARNING [replication_task__task_1.monitor] [zettarepl.replication.process_runner] Stopping stuck replication process [2022/12/27 13:58:54] WARNING [replication_task__task_1] [zettarepl.replication.run] For task 'task_1' at attempt 1 recoverable replication error StuckReplicationError('Replication has stuck') [2022/12/27 13:58:54] ERROR [replication_task__task_1] [zettarepl.replication.run] Failed replication task 'task_1' after 1 retries
The only way to make any progress is to restart the machine and start the task again, OR wait for a long time until it fails and start it again.
The TrueNAS-13.0-U3.1 with 8 GB RAM, but the CPU usage never gets over 3-4%, the services memory more than 2 GB and the ZFS cache more than 200 MB.
The remote machine is running TrueNAS-SCALE-22.12.0, and it has AMD Ryzen 5 PRO 4650G processor with 16 GB ECC memory. It also sees no increase in CPU, memory, cache, disk reads, networking.
Here's the replication task config: