SOLVED Replication Authentication fails after upgrading receiving server to Core 13

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188
Push from 12.8 to 12.8 working fine. Fails for 12.8 to 13. Rediscovered ssh key for receiver, but no help. Revert to 12.8 on receiver and OK. Suggestions?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
This is covered in the Release Notes for 13. In particular, you have to add an Aux parameter to the 13 system's SSH service to account for the change in ciphers between versions.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
I have yet to achieve a successful 13-to-13 replication ...
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Working just fine, here.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188

mbrante

Cadet
Joined
May 18, 2016
Messages
5
Hello, after migrate one box to 13, replication task fails always...and after upgrade both boxes to 13, replication stills failing.

I added "PubkeyAcceptedAlgorithms +ssh-rsa" (just in case) on both sides, but problem persists.

ssh connection lost after a few minutes

I can see " message authentication code incorrect" on ssh auth logs.

Advices?

Thanks in advance
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
@mbrante, create new keys on both sides, and populate in the SSH Connections on both sides.
 

mbrante

Cadet
Joined
May 18, 2016
Messages
5
@mbrante, create new keys on both sides, and populate in the SSH Connections on both sides.
Thanks Samuel, i deleted and re-create ssh keys on both sides; ssh log " message authentication code incorrect" has been dissappear but replication stills failing with ssh disconnection.

For task 'task_1' at attempt 1 recoverable replication error RecoverableReplicationError("Connection to x.x.x.x closed by remote host.\nwarning: cannot send 'pool/dataset@auto-2022-09-25_00-22': signal received")

This log message has been the same since first failed replication
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Did you manually test the new keys before replicating by trying to SSH both ways?
 

mbrante

Cadet
Joined
May 18, 2016
Messages
5
Did you manually test the new keys before replicating by trying to SSH both ways?
Thanks again for your reply,

And yes, actually task starts normally (ssh auth works fine) but after a while, ssh simply disconnects. (Connection to R.R.R.R closed by remote host)

This is a portion of the last zfs replication log:

.R.R] [shell:49] [async_exec:3345] Reading stdout
[2022/11/21 07:48:55] DEBUG [Thread-111] [zettarepl.paramiko.replication_task__task_1] [chan 9] EOF received (9)
[2022/11/21 07:48:55] DEBUG [replication_task__task_1.dataset_size_observer] [zettarepl.transport.base_ssh] [ssh:root@R.R.R.R] [shell:49] [async_exec:3345] Waiting for exit status
[2022/11/21 07:48:55] DEBUG [Thread-111] [zettarepl.paramiko.replication_task__task_1] [chan 9] EOF sent (9)
[2022/11/21 07:48:55] DEBUG [replication_task__task_1.dataset_size_observer] [zettarepl.transport.base_ssh] [ssh:root@R.R.R.R] [shell:49] [async_exec:3345] Success: 'STGO-01/XXXX-DCVLP\tused\t4335714064\t-\n'
[2022/11/21 07:48:55] DEBUG [replication_task__task_1.progress_observer] [zettarepl.transport.local] [shell:1] [async_exec:3346] Running ['ps', '-o', 'command', '-p', '61972']
[2022/11/21 07:48:55] DEBUG [replication_task__task_1.progress_observer] [zettarepl.transport.local] [shell:1] [async_exec:3346] Success: 'COMMAND\nzfs: sending XXXX-01/XXXX-DCVLP@auto-2022-09-25_00-22 (0%: 2684280488/34085758329648) (zfs)\n'
[2022/11/21 07:48:58] DEBUG [replication_task__task_1.async_exec_tee.wait] [zettarepl.transport.local] [shell:1] [async_exec:3326] Error 1: None
[2022/11/21 07:48:58] DEBUG [replication_task__task_1.process] [zettarepl.transport.local] [shell:1] [async_exec:3325] Error 1: "Connection to R.R.R.R closed by remote host.\nwarning: cannot send 'XXXX-01/XXXX-DCVLP@auto-2022-09-25_00-22': signal received\n"
[2022/11/21 07:48:58] DEBUG [replication_task__task_1.monitor] [zettarepl.transport.local] [shell:1] [async_exec:3326] Stopping
[2022/11/21 07:48:58] WARNING [replication_task__task_1] [zettarepl.replication.run] For task 'task_1' at attempt 1 recoverable replication error RecoverableReplicationError("Connection to R.R.R.R closed by remote host.\nwarning: cannot send 'XXXX-01/XXXX-DCVLP@auto-2022-09-25_00-22': signal received")
[2022/11/21 07:48:58] ERROR [replication_task__task_1] [zettarepl.replication.run] Failed replication task 'task_1' after 1 retries
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
No, I mean did you try SSH from the Shell of one system to the other, and vice versa, to verify you have the correct keys on both sides? Usually this sort of SSH stoppage is because one or both sides has the wrong key in authorized_keys.
 

mbrante

Cadet
Joined
May 18, 2016
Messages
5
No, I mean did you try SSH from the Shell of one system to the other, and vice versa, to verify you have the correct keys on both sides? Usually this sort of SSH stoppage is because one or both sides has the wrong key in authorized_keys.
Now i added ssh keypair (root/.ssh/id_rsa) to remote machine in order to test ssh rsa auth.
This is working fine, but in theory replication task uses another pair of keys (Generated by UI on System SSH keypairs).

Maybe replication task uses both keypairs in the proccess? Task is running now, i will let for some minutes to see whats happens.

Thanks
 

mbrante

Cadet
Joined
May 18, 2016
Messages
5
UPDATE: Task failed again, exact same result.

I will continue investigating, thanks!
 
Top