Rsync doesn't progress past 2.8 TB

Thedriveguy · Jan 5, 2023

Hello,

I have two Truenas boxes. One is a proliant dl380p gen8 server running TrueNAS-12.0-U8.1 with 6 tb usable. The other is a proliant dl380e gen8 server running TrueNAS-13.0-U3.1 with 7tb usable. When I setup a rsync task to backup one pool from the first server to the other it seems to never pass 2.8 tb. I have tried restarting the task multiple times with no luck.

Any ideas?

Thedriveguy · Jan 6, 2023

Now it just sits there saying it is transferring files but then a hour later it just says that it has been pulled and fails.

Arwen · Jan 6, 2023

I can't offer much, except:

- If I recall correctly, RSync needs to make the catalog of files in memory, not enough memory, results in slow downs or failure
- If your source server's files can be divided up by directories or ZFS datasets, you might try transferring the files by smaller batches.

For my home client backups, I do the RSync transfers by ZFS datasets. Not that any of mine are as large. But, the media dataset is 2TBs and all the times I backed it up with RSync, it worked.

Thedriveguy · Jan 7, 2023

Arwen said:
I can't offer much, except:

- If I recall correctly, RSync needs to make the catalog of files in memory, not enough memory, results in slow downs or failure
- If your source server's files can be divided up by directories or ZFS datasets, you might try transferring the files by smaller batches.

For my home client backups, I do the RSync transfers by ZFS datasets. Not that any of mine are as large. But, the media dataset is 2TBs and all the times I backed it up with RSync, it worked.

Thanks I'll try it.

Thedriveguy · Jan 7, 2023

Arwen said:
I can't offer much, except:

- If I recall correctly, RSync needs to make the catalog of files in memory, not enough memory, results in slow downs or failure
- If your source server's files can be divided up by directories or ZFS datasets, you might try transferring the files by smaller batches.

For my home client backups, I do the RSync transfers by ZFS datasets. Not that any of mine are as large. But, the media dataset is 2TBs and all the times I backed it up with RSync, it worked.

I tried copying one dataset over that is 400gb in size and it still failed.

winnielinnie · Jan 7, 2023

Is this via the GUI or a custom rsync script run in the command-line?

Does the failed task yield a logfile that you can view?

What options are you using in your rsync transfer?

Arwen · Jan 7, 2023

I'd try it from the command line, (like @winnielinnie mentioned). Might even add the -v option to have it print each file as it's transferred. Or used "--progress".

Thedriveguy · Jan 7, 2023

Arwen said:
I'd try it from the command line, (like @winnielinnie mentioned). Might even add the -v option to have it print each file as it's transferred. Or used "--progress".

winnielinnie said:
Is this via the GUI or a custom rsync script run in the command-line?

Does the failed task yield a logfile that you can view?

What options are you using in your rsync transfer?

I just setup a log file for it. I'm waiting for it to fail.

winnielinnie · Jan 7, 2023

Thedriveguy said:
I'm waiting for it to fail.

Story of my life...

Wait. Wrong forum.

Thedriveguy · Jan 7, 2023

winnielinnie said:
Story of my life...

Wait. Wrong forum.

Ha Ha Its still going strong so far...

artlessknave · Jan 7, 2023

why rsync instead of replication? replication will be better in pretty much all ways.

Thedriveguy · Jan 7, 2023

artlessknave said:
why rsync instead of replication? replication will be better in pretty much all ways.

I'll try it.

This came out at the end of the log file for the rsync task:

2023/01/07 15:14:49 [32867] ERROR: out of memory in receive_sums [sender]
2023/01/07 15:14:49 [32867] rsync error: error allocating core memory buffers (code 22) at util2.c(105) [sender=3.1.3]
2023/01/07 15:14:49 [32867] rsync error: error allocating core memory buffers (code 22) at io.c(1674) [receiver=3.2.5]
2023/01/07 16:20:54 [32851] rsync error: error allocating core memory buffers (code 22) at io.c(1674) [generator=3.2.5]

So yes it is a memory issue.

Arwen · Jan 7, 2023

It may be possible to work around the RSync memory issue by adding swap. Usually same as memory size is good. It won't be fast, but, if you can't use ZFS replication, then it is better than nothing.

jgreco · Jan 7, 2023

artlessknave said:
why rsync instead of replication? replication will be better in pretty much all ways.

Replication blows if you want to do stuff like sneakernet forwarding or other partial processing that replication is incapable of doing. rsync FTW. Replication is nice anywhere where you need verbatim dataset copies.

artlessknave · Jan 7, 2023

jgreco said:
Replication blows if you want to do stuff like sneakernet forwarding or other partial processing that replication is incapable of doing. rsync FTW. Replication is nice anywhere where you need verbatim dataset copies.

this is why i asked for the reasoning and didnt say "better in all ways"

jgreco · Jan 7, 2023

Ok you weaseled your way out of that one. ;-) Another one is if you happen to have multiple upstream links and you're trying to send a bunch of data, you can manually send half one way and half another way.

winnielinnie · Jan 7, 2023

Arwen said:
If I recall correctly, RSync needs to make the catalog of files in memory, not enough memory, results in slow downs or failure

Thedriveguy said:
So yes it is a memory issue.

Million dollar question. How much RAM is installed?

Thedriveguy · Jan 7, 2023

winnielinnie said:
Million dollar question. How much RAM is installed?

I have 32gb installed on remote and 128gb on main.

Thedriveguy · Jan 7, 2023

Now when I try to setup the replication task to push files from the main server to the remote it gives me this when I am trying to connect through ssh to the remote server:

Error: Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zettarepl.py", line 654, in _handle_ssh_exceptions
yield
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zettarepl.py", line 409, in list_datasets
datasets = await self.middleware.run_in_thread(list_datasets, shell)
File "/usr/local/lib/python3.9/site-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread
return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))
File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.9/site-packages/zettarepl/dataset/list.py", line 13, in list_datasets
return [dataset["name"] for dataset in list_datasets_with_properties(shell, dataset, recursive)]
File "/usr/local/lib/python3.9/site-packages/zettarepl/dataset/list.py", line 30, in list_datasets_with_properties
output = shell.exec(args)
File "/usr/local/lib/python3.9/site-packages/zettarepl/transport/interface.py", line 89, in exec
return self.exec_async(args, encoding, stdout).wait(timeout)
File "/usr/local/lib/python3.9/site-packages/zettarepl/transport/interface.py", line 93, in exec_async
async_exec.run()
File "/usr/local/lib/python3.9/site-packages/zettarepl/transport/base_ssh.py", line 27, in run
client = self.shell.get_client()
File "/usr/local/lib/python3.9/site-packages/zettarepl/transport/base_ssh.py", line 123, in get_client
client.connect(
File "/usr/local/lib/python3.9/site-packages/paramiko/client.py", line 435, in connect
self._auth(
File "/usr/local/lib/python3.9/site-packages/paramiko/client.py", line 764, in _auth
raise saved_exception
File "/usr/local/lib/python3.9/site-packages/paramiko/client.py", line 664, in _auth
self._transport.auth_publickey(username, pkey)
File "/usr/local/lib/python3.9/site-packages/paramiko/transport.py", line 1580, in auth_publickey
return self.auth_handler.wait_for_response(my_event)
File "/usr/local/lib/python3.9/site-packages/paramiko/auth_handler.py", line 250, in wait_for_response
raise e
paramiko.ssh_exception.AuthenticationException: Authentication failed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 138, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1213, in _call
return await methodobj(*prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 975, in nf
return await f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/replication.py", line 642, in list_datasets
return await self.middleware.call("zettarepl.list_datasets", transport, ssh_credentials)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1256, in call
return await self._call(
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1213, in _call
return await methodobj(*prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zettarepl.py", line 409, in list_datasets
datasets = await self.middleware.run_in_thread(list_datasets, shell)
File "/usr/local/lib/python3.9/contextlib.py", line 199, in __aexit__
await self.gen.athrow(typ, value, traceback)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zettarepl.py", line 657, in _handle_ssh_exceptions
raise CallError(repr(e).replace("[Errno None] ", ""), errno=errno.EACCES)
middlewared.service_exception.CallError: [EACCES] AuthenticationException('Authentication failed.')

But for some reason I can use ssh to connect to the remote server with the same credentials on my local machine.

Thank you all btw!

Arwen · Jan 7, 2023

Hmm, turns out my largest ZFS dataset that I RSync, (mostly video media, but music, photos & ebooks too), has only 17,230 files. It may be 2TB used, yet most of that is larger videos, like 1080p movies and some T.V. shows. The rest of the large files would be 480p videos.

Important Announcement for the TrueNAS Community.

Rsync doesn't progress past 2.8 TB

Explorer

Explorer

MVP

Explorer

Explorer

MVP

MVP

Explorer

MVP

Explorer

Wizard

Explorer

MVP

Resident Grinch

Wizard

Resident Grinch

MVP

Explorer

Explorer

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Rsync doesn't progress past 2.8 TB"

Similar threads