Fssh_ssh_dispatch_run_fatal: ... message authentication code inc

Status
Not open for further replies.

Anon93873

Dabbler
Joined
Dec 28, 2016
Messages
19
All,

I'm looking for more experienced minds to take a look at these circumstances. After almost a year of ZFS replication offsite between two hosts, I'm seeing an unusual error from ssh during encrypted ZFS replication. It is:
Fssh_ssh_dispatch_run_fatal: Connection to IP_ADDRESS port PORT_NUMBER: message authentication code incorrect

Of course, I've redacted the IP address and port number. What I can report:
  • Since January 2017, SSH keys have been unchanged on these two systems and I've observed no authentication problems between them.
  • I upgraded to FreeNAS 11.0-U4 immediately upon release, which appears to be September 25th
  • SSH keys are 4096 bits, and CPUs are barely burdened during the transfer (1-4% total utilization across all cores)
  • The last ZFS snapshot appearing on the destination is dated October 19th
  • I have two snapshot tasks on the source: daily, retained for a week. Weekly, retained for 3 months.
  • Daily snapshots average 500MB. Weekly snapshots average 15GB.
  • Link speed of source: 80 megabits/sec symmetric; destination: 110 megabits/sec downstream, 30 megabits/sec upstream
  • I can only observe this error when I run zfs send and pipe it to SSH manually
  • Sometimes this error will appear within minutes of starting zfs send, other times it will take up to four hours to happen
  • I have defined no additional options in the FreeNAS configuration for SSH
  • I noticed this error after receiving a flurry of "replication failed" emails
  • The problem occurs in both directions.
  • Replication of smaller snapshots (kilobytes to single-digit megabytes) are uneventful.
  • I have had problems in the past with connection persistence between the hosts - but this is new: I've never observed the SSH error, only a broken pipe, several restarts, and finally a successful replication.
  • Turning off encryption on the replication task results in endless restarting of the replication task, presumably due to a broken connection. I'm not familiar with how to manually initiate a zfs send without piping it over SSH
Your insight is appreciated.
 
Last edited:

Anon93873

Dabbler
Joined
Dec 28, 2016
Messages
19
Nope, just checking to see if you made any progress.

I come here when I’ve exhausted all my abilities and resources. It’s truly a last ditch effort when I post here.

Right now, I’m treating ZFS replication as broken for my implementation.
 
D

dlavigne

Guest
In that case, it is worth creating a report at bugs.freenas.org. Include a debug (System -> Advanced -> Save Debug) which will hide the ticket until a dev has a chance to review it. Post the issue # here.
 

Anon93873

Dabbler
Joined
Dec 28, 2016
Messages
19
I've posted bug number 27244 with a Debug capture from the sending system.

To clarify, what I called the "sending" system was also in the process of attempting to receive ZFS replications last night. They were both failing in that attempt. I can also reproduce the error and upload a Debug session from the other system if that's helpful, but didn't want to clog you with unnecessary duplication.
 
D

dlavigne

Guest
Doesn't hurt to add that debug as well to the report and indicate it's from the other system.
 

Anon93873

Dabbler
Joined
Dec 28, 2016
Messages
19
I've reverted both machines to FreeNAS 9.10.2-U5. Replication from one to the other works flawlessly, but fails in the reverse direction with the same error. I reverted by using the System > Boot menu. The boot environment I reverted to is dated 15 June 2017. I will update the bug report with this same text.
 

Anon93873

Dabbler
Joined
Dec 28, 2016
Messages
19
I've been using FreeNAS 9.10.2-U5 on both machines without issue. I'm happy to help the developers diagnose as they need.
 

Anon93873

Dabbler
Joined
Dec 28, 2016
Messages
19
Update: On both machines, I deleted all boot environments that existed after 9.10.2-U5, and updated to 11.1-RELEASE to ensure the settings remained unchanged. I experienced the same problem. I reverted to 9.10.2-U5.
 
Status
Not open for further replies.
Top