Hello everyone,
I got some weird behaviours and issues with my replication tasks.
Im running two instances of TrueNAS. First (A) TrueNAS-13.0-U5.3, second (B) TrueNAS-SCALE-22.12.3.3. Im replicating from A to B.
I got two replication tasks running on A, each replicates one of two pools (recursively the complete pool) to B. In each pool there are multiple zvols as disks for VMs.
Replication task 1 is still running without any issues.
Task 2 has multiple issues since 1,5 weeks. All started, when I did a manual rollback from B to A for a single zvol in pool 2. It worked well but afterwards there occured errors. To to the rollback I deactivated the replication task for pool2 for one night and created a dedicated replication task only for one zvol for the VM i wanted to test the rollback for. The next day after the snapshot was replicated I deactivated the task and reactivated the pool2 task again. After that the errors occured.
The task created a new parent dataset "proxmox" in dest B and renamed the old dataset "proxmox" to "proxmoxrecv-4054437-1". Means now I got two datasets with all child-zvols and that requires the double amount of space.
This is issue one.
2nd issue is, that I recieved Truenas alert mails with content "SSD_Pool2/proxmox@auto-2024-03-01_01-00-1H because it was created after the destination snapshot (auto-2024-03-01_00-00) skipping snapshot" and with the hint "Please run `zfs destroy -r HDD_Pool/Backup/SSD_Pool2/proxmox@auto-2024-03-01_00-00` on the target system and
run replication again.." what I did. The same behaviours appeared the next two days and both times I executed the command.
The current behaviour is that the systems still tries to re-replicate the complete pool 2 what takes a huge amount of time (days). He tried this one time and it failed with a similar failure:
"Replication "pool2" failed: Last full ZFS replication failed to transfer all the children of the snapshot SSD_Pool2/proxmox@auto-2024-02-24_00-00. The snapshot HDD_Pool/Backup/SSD_Pool2/proxmox/vm-205-disk-0@auto-2024-02-24_00-00 was not transferred. Please run `zfs destroy -r HDD_Pool/Backup/SSD_Pool2/proxmox@auto-2024-02-24_00-00` on the target system and run replication again.."
Because I tried the similar command three times and it didnt really help I dont want to experiment any more.
Another issue: When I'm on destination B (Truenas Scale) and click under Datasets on any of the two replicated Datasets "proxmox" or "proxmoxrecv-4054437-1" I get one of two messages. Each on both datasets. Sometimes message one, sometimes message two:
For the "old renamed" dataset:
For the "new" dataset:
Regarding the "quotas not found" there is something 'interesting'. In the dataset tree overview it tells "User Quotas: Quotas set for 1 user " and "Group Quotas: Quotas set for 1 group". For both datasets. But when I click on it to see the list, its empty and tells "No User Quotas"/"No Group Quotas".
What is wrong? Does anybody know this behaviour?
Best
I got some weird behaviours and issues with my replication tasks.
Im running two instances of TrueNAS. First (A) TrueNAS-13.0-U5.3, second (B) TrueNAS-SCALE-22.12.3.3. Im replicating from A to B.
I got two replication tasks running on A, each replicates one of two pools (recursively the complete pool) to B. In each pool there are multiple zvols as disks for VMs.
Replication task 1 is still running without any issues.
Task 2 has multiple issues since 1,5 weeks. All started, when I did a manual rollback from B to A for a single zvol in pool 2. It worked well but afterwards there occured errors. To to the rollback I deactivated the replication task for pool2 for one night and created a dedicated replication task only for one zvol for the VM i wanted to test the rollback for. The next day after the snapshot was replicated I deactivated the task and reactivated the pool2 task again. After that the errors occured.
The task created a new parent dataset "proxmox" in dest B and renamed the old dataset "proxmox" to "proxmoxrecv-4054437-1". Means now I got two datasets with all child-zvols and that requires the double amount of space.
This is issue one.
2nd issue is, that I recieved Truenas alert mails with content "SSD_Pool2/proxmox@auto-2024-03-01_01-00-1H because it was created after the destination snapshot (auto-2024-03-01_00-00) skipping snapshot" and with the hint "Please run `zfs destroy -r HDD_Pool/Backup/SSD_Pool2/proxmox@auto-2024-03-01_00-00` on the target system and
run replication again.." what I did. The same behaviours appeared the next two days and both times I executed the command.
The current behaviour is that the systems still tries to re-replicate the complete pool 2 what takes a huge amount of time (days). He tried this one time and it failed with a similar failure:
"Replication "pool2" failed: Last full ZFS replication failed to transfer all the children of the snapshot SSD_Pool2/proxmox@auto-2024-02-24_00-00. The snapshot HDD_Pool/Backup/SSD_Pool2/proxmox/vm-205-disk-0@auto-2024-02-24_00-00 was not transferred. Please run `zfs destroy -r HDD_Pool/Backup/SSD_Pool2/proxmox@auto-2024-02-24_00-00` on the target system and run replication again.."
Because I tried the similar command three times and it didnt really help I dont want to experiment any more.
Another issue: When I'm on destination B (Truenas Scale) and click under Datasets on any of the two replicated Datasets "proxmox" or "proxmoxrecv-4054437-1" I get one of two messages. Each on both datasets. Sometimes message one, sometimes message two:
For the "old renamed" dataset:
[ENOENT] Path /mnt/HDD_Pool/Backup/SSD_Pool2/proxmoxrecv-4054437-1 not found
Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 760, in get_quota
quotas = resource.userspace(quota_props)
File "libzfs.pyx", line 465, in libzfs.ZFS.__exit__
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 760, in get_quota
quotas = resource.userspace(quota_props)
File "libzfs.pyx", line 3532, in libzfs.ZFSResource.userspace
libzfs.ZFSException: cannot get used/quota for HDD_Pool/Backup/SSD_Pool2/proxmoxrecv-4054437-1: dataset is busy
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 115, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
return methodobj(*params)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
return methodobj(*params)
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 762, in get_quota
raise CallError(f'Failed retreiving {quota_type} quotas for {ds}')
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for HDD_Pool/Backup/SSD_Pool2/proxmoxrecv-4054437-1
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 204, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1344, in _call
return await methodobj(*prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1378, in nf
return await func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 4112, in get_quota
quota_list = await self.middleware.call(
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1395, in call
return await self._call(
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1352, in _call
return await self._call_worker(name, *prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1358, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1273, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1258, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for HDD_Pool/Backup/SSD_Pool2/proxmoxrecv-4054437-1
Error: Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 204, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1355, in _call
return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1258, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1382, in nf
return func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1252, in nf
res = f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/filesystem.py", line 363, in stat
raise CallError(f'Path {_path} not found', errno.ENOENT)
middlewared.service_exception.CallError: [ENOENT] Path /mnt/HDD_Pool/Backup/SSD_Pool2/proxmoxrecv-4054437-1 not found
For the "new" dataset:
[EFAULT] Failed retreiving USER quotas for HDD_Pool/Backup/SSD_Pool2/proxmox
Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 760, in get_quota
quotas = resource.userspace(quota_props)
File "libzfs.pyx", line 465, in libzfs.ZFS.__exit__
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 760, in get_quota
quotas = resource.userspace(quota_props)
File "libzfs.pyx", line 3532, in libzfs.ZFSResource.userspace
libzfs.ZFSException: cannot get used/quota for HDD_Pool/Backup/SSD_Pool2/proxmox: dataset is busy
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 115, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
return methodobj(*params)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
return methodobj(*params)
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 762, in get_quota
raise CallError(f'Failed retreiving {quota_type} quotas for {ds}')
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for HDD_Pool/Backup/SSD_Pool2/proxmox
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 204, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1344, in _call
return await methodobj(*prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1378, in nf
return await func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 4112, in get_quota
quota_list = await self.middleware.call(
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1395, in call
return await self._call(
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1352, in _call
return await self._call_worker(name, *prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1358, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1273, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1258, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for HDD_Pool/Backup/SSD_Pool2/proxmox
[ENOENT] Path not found.
Error: Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 204, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1355, in _call
return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1258, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1382, in nf
return func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/filesystem_/acl_linux.py", line 384, in getacl
raise CallError('Path not found.', errno.ENOENT)
middlewared.service_exception.CallError: [ENOENT] Path not found.
Regarding the "quotas not found" there is something 'interesting'. In the dataset tree overview it tells "User Quotas: Quotas set for 1 user " and "Group Quotas: Quotas set for 1 group". For both datasets. But when I click on it to see the list, its empty and tells "No User Quotas"/"No Group Quotas".
What is wrong? Does anybody know this behaviour?
Best