Rsync doesn't progress past 2.8 TB

Thedriveguy

Explorer
Joined
Apr 24, 2022
Messages
51
Hello,

I have two Truenas boxes. One is a proliant dl380p gen8 server running TrueNAS-12.0-U8.1 with 6 tb usable. The other is a proliant dl380e gen8 server running TrueNAS-13.0-U3.1 with 7tb usable. When I setup a rsync task to backup one pool from the first server to the other it seems to never pass 2.8 tb. I have tried restarting the task multiple times with no luck.

Any ideas?
 

Thedriveguy

Explorer
Joined
Apr 24, 2022
Messages
51
Now it just sits there saying it is transferring files but then a hour later it just says that it has been pulled and fails.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I can't offer much, except:

- If I recall correctly, RSync needs to make the catalog of files in memory, not enough memory, results in slow downs or failure
- If your source server's files can be divided up by directories or ZFS datasets, you might try transferring the files by smaller batches.

For my home client backups, I do the RSync transfers by ZFS datasets. Not that any of mine are as large. But, the media dataset is 2TBs and all the times I backed it up with RSync, it worked.
 

Thedriveguy

Explorer
Joined
Apr 24, 2022
Messages
51
I can't offer much, except:

- If I recall correctly, RSync needs to make the catalog of files in memory, not enough memory, results in slow downs or failure
- If your source server's files can be divided up by directories or ZFS datasets, you might try transferring the files by smaller batches.

For my home client backups, I do the RSync transfers by ZFS datasets. Not that any of mine are as large. But, the media dataset is 2TBs and all the times I backed it up with RSync, it worked.
Thanks I'll try it.
 

Thedriveguy

Explorer
Joined
Apr 24, 2022
Messages
51
I can't offer much, except:

- If I recall correctly, RSync needs to make the catalog of files in memory, not enough memory, results in slow downs or failure
- If your source server's files can be divided up by directories or ZFS datasets, you might try transferring the files by smaller batches.

For my home client backups, I do the RSync transfers by ZFS datasets. Not that any of mine are as large. But, the media dataset is 2TBs and all the times I backed it up with RSync, it worked.
I tried copying one dataset over that is 400gb in size and it still failed.
 
Joined
Oct 22, 2019
Messages
3,641
Is this via the GUI or a custom rsync script run in the command-line?

Does the failed task yield a logfile that you can view?

What options are you using in your rsync transfer?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I'd try it from the command line, (like @winnielinnie mentioned). Might even add the -v option to have it print each file as it's transferred. Or used "--progress".
 

Thedriveguy

Explorer
Joined
Apr 24, 2022
Messages
51
I'd try it from the command line, (like @winnielinnie mentioned). Might even add the -v option to have it print each file as it's transferred. Or used "--progress".
Is this via the GUI or a custom rsync script run in the command-line?

Does the failed task yield a logfile that you can view?

What options are you using in your rsync transfer?
I just setup a log file for it. I'm waiting for it to fail.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
why rsync instead of replication? replication will be better in pretty much all ways.
 

Thedriveguy

Explorer
Joined
Apr 24, 2022
Messages
51
why rsync instead of replication? replication will be better in pretty much all ways.
I'll try it.

This came out at the end of the log file for the rsync task:

2023/01/07 15:14:49 [32867] ERROR: out of memory in receive_sums [sender]
2023/01/07 15:14:49 [32867] rsync error: error allocating core memory buffers (code 22) at util2.c(105) [sender=3.1.3]
2023/01/07 15:14:49 [32867] rsync error: error allocating core memory buffers (code 22) at io.c(1674) [receiver=3.2.5]
2023/01/07 16:20:54 [32851] rsync error: error allocating core memory buffers (code 22) at io.c(1674) [generator=3.2.5]
So yes it is a memory issue.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
It may be possible to work around the RSync memory issue by adding swap. Usually same as memory size is good. It won't be fast, but, if you can't use ZFS replication, then it is better than nothing.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
why rsync instead of replication? replication will be better in pretty much all ways.

Replication blows if you want to do stuff like sneakernet forwarding or other partial processing that replication is incapable of doing. rsync FTW. Replication is nice anywhere where you need verbatim dataset copies.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
Replication blows if you want to do stuff like sneakernet forwarding or other partial processing that replication is incapable of doing. rsync FTW. Replication is nice anywhere where you need verbatim dataset copies.
this is why i asked for the reasoning and didnt say "better in all ways" :cool:
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Ok you weaseled your way out of that one. ;-) Another one is if you happen to have multiple upstream links and you're trying to send a bunch of data, you can manually send half one way and half another way.
 

Thedriveguy

Explorer
Joined
Apr 24, 2022
Messages
51
Now when I try to setup the replication task to push files from the main server to the remote it gives me this when I am trying to connect through ssh to the remote server:
Error: Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zettarepl.py", line 654, in _handle_ssh_exceptions
yield
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zettarepl.py", line 409, in list_datasets
datasets = await self.middleware.run_in_thread(list_datasets, shell)
File "/usr/local/lib/python3.9/site-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread
return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))
File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.9/site-packages/zettarepl/dataset/list.py", line 13, in list_datasets
return [dataset["name"] for dataset in list_datasets_with_properties(shell, dataset, recursive)]
File "/usr/local/lib/python3.9/site-packages/zettarepl/dataset/list.py", line 30, in list_datasets_with_properties
output = shell.exec(args)
File "/usr/local/lib/python3.9/site-packages/zettarepl/transport/interface.py", line 89, in exec
return self.exec_async(args, encoding, stdout).wait(timeout)
File "/usr/local/lib/python3.9/site-packages/zettarepl/transport/interface.py", line 93, in exec_async
async_exec.run()
File "/usr/local/lib/python3.9/site-packages/zettarepl/transport/base_ssh.py", line 27, in run
client = self.shell.get_client()
File "/usr/local/lib/python3.9/site-packages/zettarepl/transport/base_ssh.py", line 123, in get_client
client.connect(
File "/usr/local/lib/python3.9/site-packages/paramiko/client.py", line 435, in connect
self._auth(
File "/usr/local/lib/python3.9/site-packages/paramiko/client.py", line 764, in _auth
raise saved_exception
File "/usr/local/lib/python3.9/site-packages/paramiko/client.py", line 664, in _auth
self._transport.auth_publickey(username, pkey)
File "/usr/local/lib/python3.9/site-packages/paramiko/transport.py", line 1580, in auth_publickey
return self.auth_handler.wait_for_response(my_event)
File "/usr/local/lib/python3.9/site-packages/paramiko/auth_handler.py", line 250, in wait_for_response
raise e
paramiko.ssh_exception.AuthenticationException: Authentication failed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 138, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1213, in _call
return await methodobj(*prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 975, in nf
return await f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/replication.py", line 642, in list_datasets
return await self.middleware.call("zettarepl.list_datasets", transport, ssh_credentials)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1256, in call
return await self._call(
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1213, in _call
return await methodobj(*prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zettarepl.py", line 409, in list_datasets
datasets = await self.middleware.run_in_thread(list_datasets, shell)
File "/usr/local/lib/python3.9/contextlib.py", line 199, in __aexit__
await self.gen.athrow(typ, value, traceback)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zettarepl.py", line 657, in _handle_ssh_exceptions
raise CallError(repr(e).replace("[Errno None] ", ""), errno=errno.EACCES)
middlewared.service_exception.CallError: [EACCES] AuthenticationException('Authentication failed.')

But for some reason I can use ssh to connect to the remote server with the same credentials on my local machine.

Thank you all btw!
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Hmm, turns out my largest ZFS dataset that I RSync, (mostly video media, but music, photos & ebooks too), has only 17,230 files. It may be 2TB used, yet most of that is larger videos, like 1080p movies and some T.V. shows. The rest of the large files would be 480p videos.
 
Top